Intelligent Tutoring Systems PDF
Intelligent Tutoring Systems PDF
Intelligent Tutoring Systems PDF
Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
University of Dortmund, Germany
Madhu Sudan
Massachusetts Institute of Technology, MA, USA
Demetri Terzopoulos
New York University, NY, USA
Doug Tygar
University of California, Berkeley, CA, USA
Moshe Y. Vardi
Rice University, Houston, TX, USA
Gerhard Weikum
Max-Planck Institute of Computer Science, Saarbruecken, Germany
This page intentionally left blank
James C. Lester Rosa Maria Vicari
Fábio Paraguaçu (Eds.)
Intelligent
Tutoring Systems
Springer
eBook ISBN: 3-540-30139-9
Print ISBN: 3-540-22948-5
No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,
mechanical, recording, or otherwise, without written consent from the Publisher
The papers also reflect an increased interest in affect and a growing emphasis
on evaluation. In addition to paper and poster presentations, ITS 2004 featured
a full two-day workshop program with eight workshops, an exciting collection of
panels, an exhibition program, and a student track. We were honored to have
an especially strong group of keynote speakers: Stefano A. Cerri (University of
Montpellier II, France), Bill Clancey (NASA, USA), Cristina Conati (University
of British Columbia, Canada), Riichiro Mizoguchi (Osaka University, Japan),
Cathleen Norris (University of North Texas, USA), Elliot Soloway (University
of Michigan, USA), and Liane Tarouco (Federal University of Rio Grande do
Sul, Brazil).
We are very grateful to the many individuals and organizations that made
ITS 2004 possible. Thanks to the members of the Program Committee, the ex-
ternal reviewers, and the Poster Chairs for their thorough reviewing. We thank
the Brazilian organizing committee for their considerable effort in planning the
conference and making it a reality. We appreciate the sagacious advice of the
ITS Steering Committee. We extend our thanks to the Workshop, Panel, Poster,
Student Track, and Exhibition Chairs for assembling such a strong program. We
thank the General Information & Registration Chairs for making the conference
run smoothly, and the Press & Web Site Art Development Chair and the Press
Art Development Chair for their work with publicity. Special thanks to Thomas
Preuß of ConfMaster for his assistance with the paper review management sys-
tem, to Bradford Mott for his invaluable assistance in the monumental task
of collating the proceedings, and the editorial staff of Springer-Verlag for their
assistance in getting the manuscript to press. We gratefully acknowledge the
sponsoring institutions and corporate sponsors (CNPq, CAPES, FAPEAL, FINEP,
FAL, and PETROBRAS) for their generous support of the conference, and AAAI
and the AI in Education Society for their “in cooperation” sponsorship.
Finally, we extend a heartfelt thanks to Claude Frasson, the conference’s
founder. Claude continues to be the guiding force of the conference after all
of these years. Even with his extraordinarily busy schedule, he made himself
available for consultation on matters ranging from the mundane to the critical
and everything in between. He has been a constant source of encouragement.
The conference is a tribute to his generous spirit.
Program Committee
Esma Aïmeur (University of Montréal, Canada)
Vincent Aleven (Carnegie Mellon University, USA)
Elisabeth André (University of Augsburg, Germany)
Guy Boy (Eurisco, France)
Karl Branting (North Carolina State University, USA)
Joost Breuker (University of Amsterdam, Netherlands)
Paul Brna (Northumbria University, Netherlands)
Peter Brusilovsky (University of Pittsburgh, USA)
Stefano Cerri (University of Montpellier II, France)
Tak-Wai Chan (National Central University, Taiwan)
Cristina Conati (University of Vancouver, Canada)
Ricardo Conejo (University of Malaga, Spain)
Evandro Barros Costa (Federal University of Alagoas, Brazil)
Ben du Boulay (University of Sussex, UK)
Isabel Fernandez de Castro (University of the Basque Country, Spain)
Claude Frasson (University of Montréal, Canada)
Gilles Gauthier (University of Québec at Montréal, Canada)
Khaled Ghedira (ISG, Tunisia)
Guy Gouardères (University of Pau, France)
Art Graesser (University of Memphis, USA)
Jim Greer (University of Saskatchewan, Canada)
Mitsuru Ikeda (Japan Advanced Institute of Science and Technology)
Lewis Johnson (USC/ISI, USA)
Judith Kay (University of Sydney, Australia)
Ken Koedinger (Carnegie Mellon University, USA)
Fong Lok Lee (Chinese University of Hong Kong)
Chee-Kit Looi (Nanyang Technological University, Singapore)
Rose Luckin (University of Sussex, UK)
Stacy Marsella (USC/ICT, USA)
Gordon McCalla (University of Saskatchewan, Canada)
Riichiro Mizoguchi (University of Osaka, Japan)
Jack Mostow (Carnegie Mellon University, USA)
Tom Murray (Hampshire College, USA)
Germana Nobrega (Catholic University of Brazil)
Toshio Okamoto (Electro-Communications University, Japan)
VIII Organization
Organizing Committee
Evandro de Barros Costa (Federal University of Alagoas, Brazil)
Cleide Jane Costa (Seune University of Alagoas, Maceió, Brazil)
Clovis Torres Fernandes (Technological Institute of Aeronautics, Brazil)
Lucia Giraffa (Pontifical Catholic University of Rio Grande do Sul, Brazil)
Leide Jane Meneses (Federal University of Rondônia, Brazil)
Germana da Nobrega (Catholic University of Brasília, Brazil)
David Nadler Prata (FAL University of Alagoas, Maceió, Brazil)
Patricia Tedesco (Federal University of Pernambuco, Brazil)
Organization IX
Panels Chairs
Vincent Aleven (Carnegie Mellon University, USA)
Lucia Giraffa (Pontifical Catholic University of Rio Grande do Sul, Brazil)
Poster Chairs
Mitsuru Ikeda (JAIST, Japan)
Marco Aurélio Carvalho (Federal University of Brasília, Brazil)
Exhibition Chair
Clovis Torres Fernandes (Technological Institute of Aeronautics, Brazil)
External Reviewers
C. Brooks C. Eliot T. Tang
A. Bunt H. McLaren M. Winter
B. Daniel K. Muldner
This page intentionally left blank
Table of Contents
Adaptive Testing
A Learning Environment for English for Academic Purposes
Based on Adaptive Tests and Task-Based Systems 1
J.P. Gonçalves, S.M. Aluisio, L.H.M. de Oliveira, O.N. Oliveira, Jr.
A Model for Student Knowledge Diagnosis
Through Adaptive Testing 12
E. Guzmán, R. Conejo
A Computer-Adaptive Test That Facilitates the Modification
of Previously Entered Responses: An Empirical Study 22
M. Lilley, T. Barker
Affect
An Autonomy-Oriented System Design for Enhancement
of Learner’s Motivation in E-learning 34
E. Blanchard, C. Frasson
Inducing Optimal Emotional State for Learning
in Intelligent Tutoring Systems 45
S. Chaffar, C. Frasson
Evaluating a Probabilistic Model of Student Affect 55
C. Conati, H. Maclare
Politeness in Tutoring Dialogs:
“Run the Factory, That’s What I’d Do” 67
W.L. Johnson, P. Rizzo
Providing Cognitive and Affective Scaffolding Through Teaching
Strategies: Applying Linguistic Politeness to the Educational Context 77
K. Porayska-Pomsta, H. Pain
Authoring Systems
EASE: Evolutional Authoring Support Environment 140
L. Aroyo, A. Inaba, L. Soldatova, R. Mizoguchi
Cognitive Modeling
Toward Tutoring Help Seeking
(Applying Cognitive Modeling to Meta-cognitive Skills) 227
V. Aleven, B. McLaren, I. Roll, K. Koedinger
Table of Contents XIII
Collaborative Learning
Analyzing Discourse Structure to Coordinate Educational Forums 262
M.A. Gerosa, M.G. Pimentel, H. Fuks, C. Lucena
Evaluation
Evaluating the Effectiveness of a Tutorial Dialogue System
for Self-Explanation 443
V. Aleven, A. Ogan, O. Popescu, C. Torrey, K. Koedinger
Pedagogical Agents
Pedagogical Agent Design: The Impact of Agent Realism, Gender,
Ethnicity, and Instructional Role 592
A.L. Baylor, Y. Kim
XVI Table of Contents
Student Modeling
Using Knowledge Tracing to Measure Student Reading Proficiencies 624
J.E. Beck, J. Sison
The Massive User Modelling System (MUMS) 635
C. Brooks, M. Winter, J. Greer, G. McCalla
An Open Learner Model for Children and Teachers:
Inspecting Knowledge Level of Individuals and Peers 646
S. Bull, M. McKay
Scaffolding Self-Explanation to Improve Learning
in Exploratory Learning Environments. 656
A. Bunt, C. Conati, K. Muldner
Metacognition in Interactive Learning Environments:
The Reflection Assistant Model 668
C. Gama
Predicting Learning Characteristics
in a Multiple Intelligence Based Tutoring System 678
D. Kelly, B. Tangney
Alternative Views on Knowledge:
Presentation of Open Learner Models 689
A. Mabbott, S. Bull
Modeling Students’ Reasoning About Qualitative Physics:
Heuristics for Abductive Proof Search 699
M. Makatchev, P. W. Jordan, K. VanLehn
From Errors to Conceptions – An Approach to Student Diagnosis 710
C. Webber
Discovering Intelligent Agent:
A Tool for Helping Students Searching a Library 720
K. Yammine, M.A. Razek, E. Aïmeur, C. Frasson
Table of Contents XVII
Poster Papers
Inferring Unobservable Learning Variables
from Students’ Help Seeking Behavior 782
I. Arroyo, T. Murray, B.P. Woolf, C. Beal
The Social Role of Technical Personnel in the Deployment
of Intelligent Tutoring Systems 785
R.S. Baker, A.Z. Wagner, A.T. Corbett, K.R. Koedinger
Intelligent Tools for Cooperative Learning in the Internet 788
F. de Almeida Barros, F. Paraguaçu, A. Neves, C.J. Costa
A Plug-in Based Adaptive System: SAAW 791
L. de Oliveira Brandaõ, S. Isotani, J.G. Moura
Helps and Hints for Learning with Web Based Learning Systems:
The Role of Instructions 794
A. Brunstein, J.F. Krems
Intelligent Learning Environment for Film Reading
in Screening Mammography 797
J. Campos, P. Taylor, J. Soutter, R. Procter
Reuse of Collaborative Knowledge in Discussion Forums 800
W. Chen
A Module-Based Software Framework for E-learning
over Internet Environment 803
S.-J. Cho, S. Lee
XVIII Table of Contents
Invited Presentations
Opportunities for Model-Based Learning Systems
in the Human Exploration of Space 901
B. Clancey
Toward Comprehensive Student Models:
Modeling Meta-cognitive Skills and Affective States in ITS 902
C. Conati
Having a Genuine Impact on Teaching and Learning –
Today and Tomorrow 903
E. Soloway, C. Norris
Interactively Building a Knowledge Base for a Virtual Tutor 904
L. Tarouco
Ontological Engineering and ITS Research 905
R. Mizoguchi
Agents Serving Human Learning 906
S.A. Cerri
Panels
Affect and Motivation 907
W.L. Johnson, C. Conati, B. du Boulay, C. Frasson,
H. Pain, K. Porayska-Pomsta
Table of Contents XXI
Workshops
Workshop on Modeling Human Teaching Tactics and Strategies 908
F. Akhras, B. du Boulay
Workshop on Analyzing Student-Tutor Interaction Logs
to Improve Educational Outcomes 909
J. Beck
Workshop on Grid Learning Services 910
G. Gouardères, R. Nkambou
Workshop on Distance Learning Environments
for Digital Graphic Representation 911
R. Azambuja Silveira, A.B. Almeida da Silva
Workshop on Applications of Semantic Web Technologies
for E-learning 912
L. Aroyo, D. Dicheva
Workshop on Social and Emotional Intelligence
in Learning Environments 913
C. Frasson, K. Porayska-Pomsta
Workshop on Dialog-Based Intelligent Tutoring Systems:
State of the Art and New Research Directions 914
N. Heffernan, P. Wiemer-Hastings
Workshop on Designing Computational Models
of Collaborative Learning Interaction 915
A. Soller, P. Jermann, M. Muehlenbrock, A. Martínez Monés
1 Introduction
There is a growing need for students from non-English speaking countries to learn
and employ English in their research and even in school tasks. Only then can these
students take full advantage of the enormous amount of teaching material and scien-
tific information in the WWW, which is mostly in English. For graduate students, in
particular, a minimum level of instrumental English is required, and indeed universi-
ties tend to require the students to undertake proficiency exams. There are various
paradigms for both the teaching and the exams which may be adopted. In the Institute
for Mathematics and Computer Science (ICMC) of University of São Paulo, USP, we
have decided to emphasize the mastering of English for Academic Purposes. Building
upon previous experience in developing writing tools for academic works [1, 2, 3],
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 1–11, 2004.
© Springer-Verlag Berlin Heidelberg 2004
2 J.P. Gonçalves et al.
we conceived a test that checks whether the students are prepared to understand and
make use of the most important conventions of scientific texts in English [4]. This
fully-automated test, called CAPTEAP1, consists of objective questions in which the
user is asked to choose or provide a response to a question whose correct answer is
predetermined. CAPTEAP comprises four modules, explained in Section 2. In order
to get ready for the test – which is considered as an official proficiency test required
for the MSc. at ICMC, students may undertake training tests that are offered in the
CAPTEAP system. However, until recently there was no module that assisted stu-
dents in the learning process or that could assess their performance in their early stage
of learning. This paper describes the Computer-Aided Learning of English for Aca-
demic Purposes (CALEAP-Web) system that fills in this gap, by providing students
with adaptive tests integrated into a computational environment with a variety of
learning tasks.
CALEAP-Web employs a computer-based adaptive test (CAT) named Adaptive
English Proficiency Test for Web (ADEPT), with questions selected on the basis of
the estimated knowledge of a given student, being therefore a fully customized sys-
tem. This is integrated into the Computer-Aided Task Environment for Scientific
English (CATESE) [5] to train the students about conventions of the scientific texts,
in the approach known as learning by doing [6].
The main idea behind adaptive tests is to select the items of a test according to the
ability of the examinee. That is to say, the questions proposed should be appropriate
for each person. An examinee is given a test that adjusts to the responses given previ-
ously. If the examinee provides the correct answer for a given item, then the next one
is harder. If the examinee does not answer correctly, the next question can be easier.
This allows a more precise assessment of the competences of the examinees than
traditional multiple-choice tests because it reduces fatigue, a factor that can signifi-
cantly affect an examinee’s test results [7]. Other advantages are an immediate feed-
back, the challenge posed as the examinees are not discouraged or annoyed by items
that are far above or below their ability level, and reduction in the time required to
take the tests.
According to Conejo et al. [8], Adaptive Testing based on Item Response Theory
(IRT) comprises the following basic components: a) an IRT model describing how
the examinee answers a given question, according to his/her level of knowledge.
When the level of knowledge is assessed, one expects that the result should not be
affected by the instrument used to assess, i.e. computer or pen and paper; b) a bank of
1
http://www.nilc.icmc.usp.br/capteap/
A Learning Environment for English for Academic Purposes 3
items containing questions that may cover part or the whole knowledge of the do-
main. c) the level of initial knowledge of the examinee, which should be chosen ap-
propriately to reduce the time of testing. d) a method to select the items, which is
based on the estimated knowledge of the examinee, depending obviously on the per-
formance in previous questions. e) stopping criteria that are adopted to discontinue
the test once the pre-determined level of capability is achieved or when the maximum
number of items have been applied, or if the maximum time for the test is exceeded.
2.2 ADEPT
ADEPT provides a customized test capable of assessing the students with only a few
questions. It differs from the traditional tests that employ a fixed number of questions
for all examiees and do not take into account the previous knowledge of each exami-
nee.
2.2.1 Item Response Theory. This theory assumes some relationship between the
level of the examinee and his/her ability to get the answers right for the questions,
based on statistical models. ADEPT employs the 3-parameter logistic model [9] given
by the expression:
where a (discrimination) denotes how well one item is able to discriminate between
examinees of slightly different ability, b (difficulty) is the level of difficulty of one
item and c (guessing) is the probability that an examinee will get the answer right
simply by guessing.
2 According to Weissberg and Buker [12], the main components of an Introduction are Set-
ting, Review of the Literature, Gap, Purpose, Methodology, Main Results, Value of the
Work and Layout of the Article.
4 J.P. Gonçalves et al.
hension, aimed to check whether the student recognizes the relationships between the
ideas conveyed in a given section of the paper. Module 4 - strategies of scientific
writing. It checks whether the student can distinguish between rhetorical strategies
such as definitions, descriptions, classifications and argumentations. Today this mod-
ule covers two components of Introductions, namely Setting and Review of the Li-t-
erature.
The questions for Modules 1 and 4 are simple, independent from each other. How-
ever, the questions for Modules 2 and 3 are testlets, which are a group of items related
to a given topic to be assessed. Testlets are thus considered as “units of test”; for
instance, in a test there may be four questions about a particular item [12]. Calibration
of the items is carried out with the algorithm of Huang [10], viz. the Content Bal-
anced Adaptive Testing (CBAT-2), a self-adaptive testing which calibrates the pa-
rameters of the items during the test, according to the performance of the students. In
the ADEPT, there are three options for the answers (choices a, b, or c). Depending on
the answer (correct or incorrect), the parameter b is calibrated and there is the updat-
ing of the parameters R (number of times that the question was answered correctly in
the past), W (number of times the question was answered incorrectly in the past) and
(difficulty accumulator) [10]. Even though the bank of items in ADEPT covers
only Instrumental English, several subjects may be present. Therefore, the contents of
the items had to be balanced [13], with the items being classified according to several
components grouped in modules. In ADEPT, the contents are split into the Modules 1
through 4 with 15%, 30%, 30% and 25%, respectively. As for the weight of each
component and Module in the curriculum hierarchy [14], 1 was adopted for all levels.
In ADEPT, the student is the agent of calibration in real time of the test, with his/her
success (failure) in the questions governing the calibration of the items in the bank.
2.2.3 Estimate of the Student Ability. In order to estimate the ability of a given
student, ADEPT uses the modified iterative Newton-Raphson method [9], using the
following formulas:
where is the estimated ability after the nth question. if the ith-answer was
correct and if the anwer was wrong. For the initial ability was adopted.
The Newton-Raphson model was chosen due to the ease with which it is imple-
mented.
2.2.4 Stopping Criteria. The criteria for stopping an automated test are crucial. In
ADEPT two criteria were adopted: i) The number of questions per module of the test
is between 3 (minimum) and 6 (maximum), because we did not the test to be too
A Learning Environment for English for Academic Purposes 5
long. In case deficiencies were detected, the student would be recommended to per-
form tasks in the corresponding learning module. ii) should lie between -3.0 and
3.0 [15].
3 Task-Based Environments
A task-based environment provides the student with tasks for a specific domain. The
rationale of this type of learning environment is that the student will learn by doing,
in a real-world task related to the domain being taught. There is no assessment of the
performance from the students while carrying out the tasks, but in some cases expla-
nations on the tasks are provided.
3.1 CATESE
The CALEAP-Web integrates two systems associated with assessing and learning
tasks, as follows [5]: Module 1 (Mod1) – assessment of the student with ADEPT to
determine his/her level of knowledge of Instrumental English and Module 2 (Mod2)
6 J.P. Gonçalves et al.
– tasks are suggested to the student using CATESE, according to his/her estimated
knowledge, particularly to address difficulties detected in the assessment stage.
Mod1 and Mod2 are integrated as illustrated in Fig. 1.
The sequence suggested by CALEAP-Web involves activities for Modules 1, 2, 4
and 3 of the EPI, presented below. In all tasks, chunks of text from well-written sci-
entific papers are retrieved. The cases may be retrieved as many times as the student
needs, and the selection is random.
Fig. 1. Integration Scheme in CALEAP-Web. Information for modeling the user performance
(L1) comes from the EPI Module in which the student is deficient, and normalized
score of the student in the test, number of correct and incorrect answers and time taken for the
test in the EPI module being assessed. At the end of the test of each module of the EPI, the
student will be directed to CATESE if his/her performance was below a certain level (if 2 or
more answers are wrong in a given module). This criterion is being used in an experimental
basis. In the future, other criteria will be employed to improve the assessment of the users’
abilities, which may include: final abilities, number of questions answered, time of testing, etc.
As an example of the interaction between ADEPT and CATESE is the following: if the student
does not do well in Module 1 (involving Gap and Purpose) for questions associated with the
component Gap, he/she will be asked to perform a task related to Gap (see Task 1 in Section
3.1), but not Purpose. If the two wrong answers refer to Gap and Purpose, then two tasks will
be offered, one for each component. The information about the student (L2) includes the tasks
recommended to the student and monitoring of how these tasks were performed. It is provided
by CATESE to ADEPT, so that the student can take another EPI test in the module where
deficiencies were noted. If the performance is now satisfactory, the student will be taken to the
next test module.
Task 1 deals with the components Gap and Purpose of Module 1 from EPI, with
the texts retrieved belonging to two classes for the Gap component: Class A: special
words are commonly used to indicate the beginning of the Gap. Connectors such as
“however” and “but” are used for this purpose. The connector is followed immedi-
ately by a gap statement in the present or present perfect tense, which often contains
A Learning Environment for English for Academic Purposes 7
modifiers such as “few”, “little”, or “no”: Signal word + Gap (present or present per-
fect) + Research topic; Class B: subordinating conjunctions like “while”, “although”
and “though” can also be used to signal the gap. When such signals are used, the
sentence will typically include modifiers such as “some”, “many”, or “much” in the
first clause, with modifiers such as “little”, “few”, or “no” in the second clause: Signal
word + Previous work (present or present perfect) + Gap + topic.
In this classification two chunks of text are retrieved, where the task consists in the
identification and classification of markers in the examples, two of which are shown
below.
Class A: However, in spite of this rapid progress, many of the basic physics issues of x-
ray lasers remain poorly understood.
Class B: Although the origin of the solitons has been established, some of their physical
properties remained unexplained.
The texts retrieved for the Purpose component are classified as: Class A: the ori-
entation of the statement of purpose may be towards the report itself. If you choose
the report orientation you should use the present or future tense: Report orientation +
Main Verb (present or future) + Research question; Class B: the orientation of the
statement of purpose may be towards the research activity. If you choose the research
orientation you should use the past tense, because the research activity has already
been completed: Research orientation + Main Verb (past) + Research question.
The Tasks consists in identifying and classifying the markers in the examples for
each class, illustrated below.
Class A: In this paper we report a novel resonant-like behavior in the latter case of diffu-
sion over a fluctuating barrier.
Class B: The present study used both methods to produce monolayers of C16MV on
silver electrode surfaces.
For the Review of the Literature, there are also three classes: Class A: Citations
grouped by approaches: better suited for reviews of the literature which encompass
different approaches; Class B: Citations ordered from general to specific: citations are
organized in order from those most distantly related to the study to those most closely
related; Class C: Citations ordered chronologically: used, for example, when de-
scribing the history of research in an area.
The last Task is related to Comprehension of Module 3 of EPI. Here a sequence of
discourse markers are presented to the student, organized according to their function
in the clause (or sentence). Also shown is an example of well-written text in English
with annotated discourse markers. Task 3 therefore consists in reading and verifying
examples of markers for each discourse function. The nine functions considered are:
contrast/opposition, signaling of further information/addition, similarity, exemplifi-
cation, reformulation, consequence/result, conclusion, explanation, deduc-
tion/inference. The student may navigate through the cases and after finishing, he/she
will be assessed by the CAT. It is believed that after being successful in the four
stages described above in the CALEAP-Web system, the student is prepared to un-
dertake the official test at ICMC-USP.
5 Evaluating CALEAP-Web
CALEAP-Web has been assessed according to two main criteria: item exposure of the
CAT module and robustness of the whole computational environment. With regard to
robustness, we ensured that the environment works as specified in all stages, with no
crash or error, by simulating students using the 4 tasks presented in Section 4. The
data from four students that evaluated ADEPT, graded as having intermediate level of
proficiency in the range were selected as a starting point of the
simulation. All the four tasks were performed and the environment was proven to be
robust to be used by prospective students in preparation for the official exam in 2004
at ICMC-USP. The analysis of item exposure is crucial to ensure a quality assess-
ment. Indeed, item exposure is critical because adaptive algorithms are designed to
select optimal items, thus tending to choose those with high discriminating power
(parameter a). As a result, these items are selected far more often than other ones,
leading to both over-exposure of some parts of the item pool and under-utilization of
others. The risk is that over-used items are often compromised as they create a secu-
rity problem that could jeopardize a test, especially if it’s a summative one. In our
CAT parameters a and c were constant for all the items, and therefore item exposure
depends solely on parameter b. To measure item exposure rate of the two types of
item from our EPI (simple and testlet) we performed two experiments, the first with
12 students who failed the 2003 EPI and another with 9 students that passed it. From
the 140 items only 66 were accessed and re-calibrated3 after both experiments, where
3
The second author has realized a pre-calibration of the parameter b of all the 140 items from
the bank, using a 4-value table including difficult, medium, easy and very easy item category
with respectively 2.5, 1.0, -1.0 and -2.5 value.
A Learning Environment for English for Academic Purposes 9
30 of them were from testlets. Testlets are problematic because they impose applica-
tion of questions as soon as selected. The 21 testlets of CAT involve 78 questions,
with 48 remaining non re-calibrated. As for the EPI modules, most calibrated ques-
tions were from modules 1 and 4 because they include simple questions, allowing
more variability in items choice. In experiment 1 questions 147 and 148 were ac-
cessed 9 times, with 16 questions being accessed only once and 89 were not accessed
at all. In experiment 2, the most accessed questions were 138, 139 and 51 with 9
accesses each. On the other hand, 16 questions had only one access and 83 were not
accessed at all. Taken together these results show the need to extend the studies with a
larger number of students in order to achieve a more precise item calibration.
6 Related Work
Particularly with the rapid expansion of open and distance-learning programs, fully-
automated tests are being increasingly used to measure student performance as an
important component in educational or training processes. This is illustrated by a
computer-based large-scale evaluation using specifically adaptive testing to assess
several knowledge types, viz. the Test of English as a Foreign Language
(http://www.toefl.org/). Other examples of learning environments with an assessment
module are the Project entitled Training of European Environmental trainers and
technicians in order to disseminate multinational skills between European countries
(TREE) [16, 17, 8] and the Intelligent System for Personalized Instruction in a Re-
mote Environment (INSPIRE) [18]. TREE is aimed at developing an Intelligent Tu-
toring System (ITS) for classification and identification of European vegetations. It
comprises three main subsystems, namely, an Expert System, a Tutoring System and
a Test Generation System. The latter, referred to as Intelligent Evaluation System
using Tests for Teleducation (SIETTE), assesses the student with a CAT implemented
with the CBAT-2 algorithm, the same we have used in this work. The task module is
the ITS. INSPIRE monitors the students’ activities, adapting itself in real time to
select lessons that are adequate to the level of knowledge of the student. It differs
from CALEAP-Web, which is based in the learn by doing paradigm. In INSPIRE
there is a module to assess the student with adaptive testing [19], also using the
CBAT-2 algorithm.
ones implemented in CALEAP-Web were all associated with English for academic
purposes, but the rationale and the tools developed can be extended to other domains.
ADEPT is readily amenable to be portable because it only requires a change in the
bank of items. CATESE, on the other hand, needs to be rebuilt because the tasks are
domain specific. One major present limitation of CALEAP-Web is the small size of
the bank of items; furthermore, increasing this size is costly in terms of man power
due to the time-consuming corpus analysis to annotate the scientific papers used in
both the adaptive testing and the task-based environment. With a reduced bank of
items, at the moment we recommend the use of the adaptive test of CALEAP-Web
only in formative tests and not in summative tests as we still have items with over-
exposure and a number of them under-utilized.
References
1. Aluisio, S.M., Oliveira Jr. O.N.: A case-based approach for developing writing tools
aimed at non-native English users. Lectures Notes in Artificial Intelligence, Vol. 1010.
Springer-Verlag, Berlin Heidelberg New York (1995) 121-132
2. Aluísio, S.M., Gantenbein, R.E.: Towards the application of systemic functional linguis-
tics in writing tools. Proceedings of International Conference on Computers and their Ap-
plications (1997) 181-185
3. Aluísio, S.M., Barcelos, I. Sampaio, J., Oliveira Jr., O N.: How to learn the many unwrit-
ten “Rules of the Game” of the Academic Discourse: A hybrid Approach based on Cri-
tiques and Cases. Proceedings of the IEEE International Conference on Advanced Learn-
ing Technologies, Madison/Wisconsin (2001) 257-260
4. Aluísio, S. M., Aquino, V. T., Pizzirani, R., Oliveira JR, O. N.: High Order Skills with
Partial Knowledge Evaluation: Lessons learned from using a Computer-based Proficiency
Test of English for Academic Purposes. Journal of Information Technology Education,
Califórnia, USA, Vol. 2, N. 1 (2003)185-201
5. Gonçalves, J. P.: A integração de Testes Adaptativos Informatizados e Ambientes
Computacionais de Tarefas para o aprendizado do inglês instrumental. (Portuguese).
Dissertação de mestrado, ICMC-USP, São Carlos, Brasil (2004)
6. Schank, R.: Engines for Education (Hyperbook ed.). Chicago, USA: ILS, Northwestern
University (2002). URL http://www.engines4ed.org/hyperbook/index.html
7. Olea, J., Ponsoda V., Prieto, G.: Tests Informatizados Fundamentos y Aplicaciones.
Ediciones Pirámede (1999)
8. Conejo, R., Millán, E., Cruz, J.L.P., Trella, M.: Modelado del alumno: um enfoque
bayesiano. Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial N. 12
(2001) 50–58. URL http://tornado.dia.fi.upm.es/caepia/numeros/12/Conejo.pdf
9. Lord, F. M.: Application of Item Response Theory to Practical Testing Problems.
Hilsdale, New Jersey, EUA: Lawrence Erlbaum Associates (1980)
10. Huang, S.X.: A Content-Balanced Adaptive Testing Algorithm for Computer-Based
Training Systems. Intelligent Tutoring Systems (1996) 306-314
11. Weissberg, R., Buker, S.: Writing Up Research - Experimental Research Report Writing
for Students of English. Prentice Hall Regents (1990)
A Learning Environment for English for Academic Purposes 11
12. Oliveira, L. H. M.: Testes adaptativos sensíveis ao conteúdo do banco de itens: uma
aplicação em exames de proficiência em inglês para programas de pós-graduação.
(Portuguese). Dissertação de mestrado, ICMC-USP, São Carlos, Brasil (2002)
13. Huang, S.X.: On Content-Balanced Adaptive Testing. CALISCE (1996) 60-68
14. Collins, J.A., Geer, J.E., Huang, S.X.: Adaptive Assessment Using Granularity Hierarchies
and Bayesian Nets. Intelligent Tutoring Systems (1996) 569-577
15. Baker, F.: The Basics of Item Response. College Park, MD: ERIC Clearinghouse, Univer-
sity of Maryland (2001)
16. Conejo, R.; Rios, A., Millán, M.T.E., Cruz, J.L.P.: Internet based evaluation system.
AIED-International Conference Artificial Intelligence in Education, IOS Press (1999).
URL http://www.lcc.uma.es/~eva/investigacion/papers/aied99a.ps.
17. Conejo, R., Millán, M.T.E., Cruz, J.L.P., Trella,M.: An empirical approach to online
learning in Siette. Intelligent Tutorial Systems (2000) 604–615
18. Papanikolaou, K., Grigoriadou, M., Kornilakis, H., Magoulas, G.D.: Inspire: An intelli-
gent system for personalized instruction in a remote environment. Third Workshop on
Adaptive Hypertext and Hypermedia (2001) URL
http://wwwis.win.tue.nl/ah2001/papers/papanikolaou.pdf.
19. Gouli, E, Kornilakis, H.; Papanikolaou, K.; Grigoriadou. M.: Adaptive assessment im-
proving interaction in an educational hypermedia system. PC-HCI Conference (2001).
URL http://hermes.di.uoa.gr/lab/CVs/papers/gouli/F51.pdf
A Model for Student Knowledge Diagnosis Through
Adaptive Testing*
Abstract. This work presents a model for student knowledge diagnosis that can
be used in ITSs for student model update. The diagnosis is accomplished
through Computerized Adaptive Testing (CAT). CATs are assessment tools
with theoretical background. They use an underlying psychometric theory, the
Item Response Theory (IRT), for question selection, student knowledge
estimation and test finalization. In principle, CATs are only able to assess one
topic for each test. IRT models used in CATs are dichotomous, that is,
questions are only scored as correct or incorrect. However, our model can be
used to simultaneously assess multiple topics through content-balanced tests. In
addition, we have included a polytomous IRT model, where answers can be
given partial credit. Therefore, this polytomous model is able to obtain more
information from student answers than the dichotomous ones. Our model has
been evaluated through a study carried out with simulated students, showing
that it provides accurate estimations with a reduced number of questions.
1 Introduction
One of the most important features of Intelligent Tutoring Systems (ITSs) is the
capability of adapting instruction to student needs. To accomplish this task, the ITS
must know the student’s knowledge state accurately. One of the most common
solutions for student diagnosis is testing. The main advantages of testing are that it
can be used in quite a few domains and it is easy to implement. Generally, test-based
diagnosis systems use heuristic solutions to infer student knowledge. In contrast,
Computerized Adaptive Testing (CAT) is a well-founded technique, which uses a
psychometric theory called Item Response Theory (IRT). The CAT theory is not used
only with conventional paper-and-pencil test questions, that is, questions comprising a
stem and a set of possible answers. CAT can also include a wide range of exercises
[5]. On the contrary, CATs are only able to assess a single atomic topic [6]. This
restricts its applicability to structured domain models, since when in a test more than
one content area is being assessed, the test is only able to provide one student
* This work has been partially financed by LEActiveMath project, funded under FP6 (Contr.
N° 507826). The author is solely responsible for its content, it does not represent the opinion
of the EC, and the EC is not responsible for any use that might be made of data appearing
therein.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 12–21, 2004.
© Springer-Verlag Berlin Heidelberg 2004
A Model for Student Knowledge Diagnosis Through Adaptive Testing 13
knowledge estimation for all content areas. In addition, in these multiple topic tests,
the content balance cannot be guaranteed.
In general, systems that implement CATs use dichotomous IRT based models. This
means that student answers to a question can only be evaluated as correct or incorrect,
i.e. no partial credit can be given. IRT has defined other kinds of response models
called polytomous. These models allow giving partial credit to item answers. They are
more powerful, since they make better use of the responses provided by students, and
as a result, student knowledge estimations can be obtained faster and more accurately.
Although in literature there are a lot of polytomous models, they are not usually
applied to CATs [3], because they are difficult to implement.
In this paper, a student diagnosis model is presented. This model is based on a
technique [4] of assessing multiple topics using content-balanced CATs. It can be
applied to declarative domain models structured in granularity hierarchies [8], and it
uses a discrete polytomous IRT inference engine. It could be applied in ITS as a
student knowledge diagnosis engine. For instance, at the beginning of instruction, to
initialize the student model by pretesting; during instruction, to update the student
model; and/or at the end of instruction, providing a global snapshot of the state of
knowledge.
The next section is devoted to showing the modus operandi of adaptive testing.
Section 3 supplies the basis of IRT. Section 4 is an extension of Section 3, introducing
polytomous IRT. In Section 5 our student knowledge diagnosis model is explained.
Here, the diagnosis procedure of this model is described in detail. Section 6 checks
the reliability and accuracy of the assessment procedure through a study with
simulated students. Finally, Section 7 discusses the results obtained.
2 Adaptive Testing
selection method: Adaptive tests select the next item to be posed depending on the
student’s estimated knowledge level (obtained from the answers to items previously
administered). 4) The termination criterion: Different criteria can be used to decide
when the test should finish, in terms of the purpose of the test.
The set of advantages provided by CATs is often addressed in the literature [11].
The main advantage is that it reduces the number of questions needed to estimate
student knowledge level, and as a result, the time devoted to that task.. This entails an
improvement in student motivation. However, CATs contain some drawbacks. They
require the availability of huge item pools, techniques to control item exposure and to
detect compromised items. In addition, item parameters must be calibrated. To
accomplish this task, a large number of student performances are required, and this is
not always available.
IRT [7] has been successfully applied to CATs as a response model, item selection
and finalization criteria. It is based on two principles: a) Student performance in a test
can be explained by means of the knowledge level, which can be measured as an
unknown numeric value. b) The performance of a student with an estimated
knowledge level answering an item i can be probabilistically predicted and modeled
by means of a function called Item Characteristic Curve (ICC). It expresses the
probability that a student with certain knowledge level has to answer the item
correctly. Each item must define an ICC, which must be previously calibrated. There
are several functions to characterize ICCs. One of the most extended is the logistic
function of three parameters (3PL) [1] defined as follows:
where represents that the student has successfully answered item i. If the student
answers incorrectly, The three parameters that determine the
shape of this curve are:
Discrimination factor It is proportional to the slope of the curve. High values
indicate that the probability of success from students with a knowledge level
higher than the item difficulty is high.
Difficulty It corresponds to the knowledge level at which the probability of
answering correctly is the same as answering incorrectly . The range of values
allowed for this parameter is the same as the ones allowed for the knowledge
levels.
Guessing factor It is the probability of that a student with no knowledge at
all will answer the item correctly by randomly selecting a response.
In our proposal, and therefore throughout this paper, the knowledge level is measured
using a discrete IRT model. Instead of taking real values, the knowledge level takes K
values (or latent classes) from 0 to K-1. Teachers decide the value of K in terms of the
assessment granularity desired. Likewise, each ICC is turned into a probability vector
A Model for Student Knowledge Diagnosis Through Adaptive Testing 15
IRT supplies several methods to estimate student knowledge. All of them calculate a
probability distribution curve where is the vector of items
administered to students. When applied to adaptive testing, knowledge estimation is
accomplished every time the student answers each item posed, obtaining a temporal
estimation. The distribution obtained after posing the last item of the test becomes the
final student knowledge estimation. One of the most popular estimation methods is
the Bayesian method [9]. It applies the Bayes theorem to calculate student knowledge
distribution after posing an item i:
One of the most popular methods for selecting items is the Bayesian method [9]. It
selects the item that minimizes the expectation of a posteriori student knowledge
distribution variance. That is, taking the current estimation, it calculates the posterior
expectation for every non-administered item, and selects the one with the smallest
expectation value. Expectation is calculated as follows:
where r can take value 0 or 1. It is r=1-, if the response is correct, or r=0 otherwise.
is the scalar product between ICC (or its inverse) of item i and the
current estimated knowledge distribution.
4 Polytomous IRT
In dichotomous IRT models, items are only scored as correct or incorrect. In contrast,
polytomous models try to obtain as much information as possible from the student’s
response. They take into account the answer selected by students in the estimation of
knowledge level and in the item selection. For this purpose, these models add a new
type of characteristic curve associated to each answer, in the style of ICC. In the
literature these curves are called trace lines (TC) [3], and they represent the
probability that certain student will select an answer given his knowledge level.
To understand the advantages of this kind of model, let us look at the item
represented in Fig. 1 (a). A similar item was used in a study carried out in 1992 [10].
Student performances in this test were used to calibrate the test items. The calibrated
TCs for the item of Fig. 1 (a) are represented in Fig. 1 (b). Analyzing these curves, we
see that the correct answer is B, since students with the highest knowledge levels have
16 E. Guzmán and R. Conejo
high probabilities of selecting this answer. Options A and D are clearly wrong,
because students with the lowest knowledge levels are more likely to select these
answers. However, option C shows that a considerable number of students with
medium knowledge levels tends to select this option. If the item is analyzed, it is
evident that for option C, although incorrect, the knowledge of students selecting it is
higher than the knowledge of students selecting A or D. Selecting A or D may be
assessed more negatively than selecting B. Answers like C are called distractors,
since, even though these answers are not correct, they are very similar to the correct
answers. In addition, polytomous models make a difference between selecting an
option or leave the item blank. Those students who do not select any option are
modeled with the DK option TC. This answer is considered as an additional possible
option and is known as don’t know option.
Fig. 1. (a) A multiple-choice item, and (b) its trace lines (adapted from [10])
Domain models can be structured on the basis of subjects. Subjects may be divided
into different topics. A topic can be defined as a concept regarding which student
knowledge can be assessed. They can also be decomposed into other topics and so on,
forming a hierarchy with a degree of granularity decided by the teacher. In this
hierarchy, leaf nodes represent a unique concept or a set of concepts that are
indivisible from the assessment point of view. Topics and their subtopics are related
by means of aggregation relations, and no precedence relations are considered. For
diagnosis purposes, this domain model could be extended by adding a new layer to
include two kinds of components: items and test specifications. This extended model
has been represented in Fig. 2. The main features of these new components are the
following:
A Model for Student Knowledge Diagnosis Through Adaptive Testing 17
the product of the number of answers of i, with the number of topics assessed using i.
In this section, the elements required for diagnosis have been depicted. The next
subsection will focus on how the diagnosis procedure is accomplished.
This testing algorithm follows the steps described in Section 2, although item
selection and knowledge estimation procedures differ because of the addition of a
discrete polytomous response model. Student knowledge estimation uses a variation
of the Bayesian method described in Equation 2. After administering item i, the new
estimated knowledge level in topic j is calculated using Equation 4.
Note that the TC corresponding to the student answer, has replaced the ICC
term. Being r the answer selected by the student, it can take values between 1 to the
number of answers R. When r is zero, it represents the don’t know answer.
Once the student has answered an item, this response is used to update student
knowledge in all topics that are descendents of topic j. Let us suppose test (Fig.
1(b)) is being administered. If item has just been administered, student knowledge
estimation in topic is updated according to Equation 4. In addition, item
provides information about student knowledge in topics and Consequently,
the student knowledge estimation in these topics is also updated using the same
equation.
The item selection mechanism modifies the dichotomous Bayesian one (Equation
3). In this modification, expectation is calculated from the TCs, instead of the ICC (or
its inverse), in the following way:
A Model for Student Knowledge Diagnosis Through Adaptive Testing 19
represents student knowledge in topic j. Topic j is one of the test topics. Let us
take test again. Expectation is calculated for all (non-administered) items that
assess topics or any descendent. Note that Equation 5 must always be applied to
knowledge distributions in test topics (i.e. and since the main goal of the test is
to estimate student knowledge in these topics. The remaining estimations can be
considered as a collateral effect. Additionally, this model guarantees content-balanced
tests. The adaptive selection engine itself tends to select the item that makes the
estimation more accurate [4]. If several topics are assessed, the selection mechanism
is separated in two phases. In the first one, it will select the topic whose student
knowledge distribution is the least accurate. The second one selects, from items of
this topic, the one that contributes the most to increase accuracy.
6 Evaluation
Some authors have pointed out the advantages of using simulated students for
evaluation purposes [12], since this kind of student allows having a controlled
environment, and contributes to ensuring that the results obtained in the evaluation are
correct. This study consists of a comparison of two CAT-based assessment methods:
the polytomous versus the dichotomous one. It uses a test of a single topic, which
contains an item pool of 500 items. These items are multiple-choice items with four
answers, where the don’t know answer is included. The test stops when the knowledge
estimation distribution has a variance that is less than The test has been
administered to a population of 150 simulated students. These students have been
generated with a real knowledge level that is used to determine their behavior during
the test. Let us assume that the knowledge level of the student John is When an
item i is posed, John’s response is calculated by generating a random probability
value v . The answer r selected by John is the one that fulfils,
Using the same population and the same item pool, two adaptive tests have been
administered for each simulation. The former uses polytomous item selection and
knowledge estimation, and the latter dichotomous item selection and knowledge
estimation. Different simulations of test execution have been accomplished changing
the parameters of the item curves. ICCs have been generated (and are assumed to be
well calibrated), before each simulation, according to these conditions. The correct
answer TC corresponds to the ICC, and the incorrect response TCs are calculated in
such a way that their sum is equal to 1-ICC. Simulation results are shown in Table 1.
In Table 1 each row represents a simulation of the students taking a test with the
features specified in the columns. Discrimination factor and difficulty of all items of
the pool are assigned the value indicated in the corresponding column, and the
guessing factor is always zero. When the value is “uniform”, item parameter values
20 E. Guzmán and R. Conejo
have been generated uniformly along the allowed range. The last three columns
represent the results of simulations. “Item number average” is the average of items
posed to students in the test; “estimation variance average” is the average of the final
knowledge estimation variances. Finally , “success rate” is the percentage of students
assessed correctly. This last value has been obtained by comparing real student
knowledge with the student knowledge inferred by the test. As can be seen, the best
improvements have been obtained for a pool of items with a low discrimination
factor. In this case, the number of items has been reduced drastically. The polytomous
version requires less than half of the dichotomous one, and the estimation accuracy is
only a bit lower . The worst performance of the polytomous version takes place when
items have a high discrimination factor. This can be explained because high
discrimination ICCs get the best performance in dichotomous assessment. In contrast,
for the polytomous test, TCs have been generated with random discriminations, and
as a result, TCs are not able to discriminate as much as dichotomous ICCs. In the
most realistic case, i.e. the last two simulations, item parameters have been calculated
uniformly. In this case, test results for the polytomous version is better than the
dichotomous one, since the higher the accuracy, the lower the number of items
required. In addition, the evaluation results obtained in [4] showed that the assessment
of multiple topics is simultaneously able to make a content-balanced item selection.
Teachers do not have to specify, for instance, the percentage of items that must be
administered for each topic involved in the test.
7 Discussion
This work proposes a well-founded student diagnosis model, based on adaptive
testing. It introduces some improvements in traditional CATs. It allows simultaneous
assessment of multiple topics through content-balanced tests. Other approaches have
presented content-balanced adaptive testing, like the CBAT-2 algorithm [6]. It is able
to generate content-balanced tests, but in order to do so, teachers must manually
introduce the weight of topics in the global test for the item selection. However, in our
model, item selection is carried out adaptively by the model itself. It selects the next
item to be posed from the topic whose knowledge estimation is the least accurate.
Additionally, we have defined a discrete , IRT-based polytomous response model.
The evaluation results (where accuracy has been overstated to demonstrate the
A Model for Student Knowledge Diagnosis Through Adaptive Testing 21
strength of the model) have shown that, in general, our polytomous model makes
more accurate estimations and requires fewer items.
The model presented has been implemented and is currently used in the SIETTE
system [2]. SIETTE is a web-based CAT delivery and elicitation tool
(http://www.lcc.uma.es/siette) that can be used as a diagnosis tool in ITSs. Currently,
we are working on TC calibration techniques. The goal is to obtain a calibration
mechanism that minimizes the number of prior student performances required to
calibrate the TCs.
References
1. Birnbaum, A. Some Latent Trait Models and Their Use in Inferring an Examinee’s Mental
Ability. In : Lord, F. M. and Novick, M. R, eds. Statistical Theories of Mental Test Scores.
Reading, MA: Addison-Wesley; 1968.
2. Conejo, R.; Guzmán, E.; Millán, E.; Pérez-de-la-Cruz, J. L., and Trella, M. SIETTE: A
web-based tool for adaptive testing. International Journal of Artificial Intelligence in
Education (forthcoming).
3. Dodd, B. G.; DeAyala, R. J., and Koch, W. R. Computerized Adaptive Testing with
Polytomous Items. Applied Psychological Measurement. 1995; 19(1):pp. 5-22.
4. Guzmán, E. and Conejo, R. Simultaneous evaluation of multiple topics in SIETTE. LNCS,
2363. ITS 2002. Springer Verlag; 2002: 739-748.
5. Guzmán, E. and Conejo, R. A library of templates for exercise construction in an adaptive
assessment system. Technology, Instruction, Cognition and Learning (TICL)
(forthcoming).
6. Huang, S. X. A Content-Balanced Adaptive Testing Algorithm for Computer-Based
Training Systems. LNCS, 1086. ITS 1996. Springer Verlag; 1996: pp. 306-314.
7. Lord, F. M. Applications of item response theory to practical testing problems. Hillsdale,
NJ: Lawrence Erlbaum Associates; 1980.
8. McCalla, G. I. and Greer, J. E. Granularity-Based Reasoning and Belief Revision in
Student Models. In: Greer, J. E. and McCalla, G., eds. Student Modeling: The Key to
Individualized Knowledge-Based Instruction. Springer Verlag; 1994; 125 pp. 39-62.
9. Owen, R. J. A Bayesian Sequential Procedure for Quantal Response in the Context of
Adaptive Mental Testing. Journal of the American Statistical Association. 1975 Jun;
70(350):351-371.
10. Thissen, D. and Steinberg, L. A Response Model for Multiple Choice Items. In: Van der
Linden, W. J. and Hambleton, R. K., (eds.). Handbook of Modem Item Response Theory.
New York: Springer-Verlag; 1997; pp. 51-65.
11. van der Linden, W. J. and Glas, C. A. W. Computerized Adaptive Testing: Theory and
Practice. Netherlands: Kluwer Academic Publishers; 2000.
12. VanLehn, K.; Ohlsson, S., and Nason, R. Applications of Simulated Students: An
Exploration. Journal of Artificial Intelligence and Education. 1995; 5(2):135-175.
A Computer-Adaptive Test That Facilitates the
Modification of Previously Entered Responses:
An Empirical Study
1 Introduction
The use of computer-adaptive tests (CAT) has been increasing, and indeed replacing
traditional computer-based tests (CBTs) in various areas of education and training.
Projects such as SIETTE [4] and the replacement of CBTs with CATs in large scale
examinations such as the Graduate Management Admission Test (GMAT) [6], Test of
English as a Foreign Language (TOEFL) [20], Graduate Records Examination (GRE)
[20], Armed Sciences Vocational Aptitude Battery (ASVAB) [20] and Microsoft
Certified Professional [13] are evidence of this trend.
CATs differ from the conventional CBTs primarily in the approach used to select
the set of questions to be administered during a given assessment session. In a CBT,
the same set of questions is administered to all students. Because of individual differ-
ences in knowledge levels within the subject domain being tested, this static approach
often poses problems for certain students. For instance, what might seem a difficult
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 22–33, 2004.
© Springer-Verlag Berlin Heidelberg 2004
A Computer-Adaptive Test That Facilitates the Modification 23
and therefore bewildering question to one student could seem too easy and thus un-
interesting to another.
By dynamically selecting questions to match the estimated ability level of each in-
dividual student, the CAT approach offers higher levels of individualisation and in-
teraction than those offered by traditional CBTs. By tailoring the level of difficulty of
the questions presented to each individual student according to his or her previous
responses, it is intended that a CAT would mimic aspects of an oral examination [5,
19]. Similar to a real oral exam, the first question to be administered within a CAT is
typically one of medium difficulty. In the event of the student providing a correct
response, a more difficult question will be administered next. Conversely, an incor-
rect response will cause an easier question to follow.
The underlying concept within CATs is that questions that are too difficult or too
easy provide little or no information regarding a student’s knowledge within the sub-
ject domain. Only those few questions exactly at the boundary of the student’s
knowledge provide tutors with valuable information about the level of a student’s
ability. The questions administered during a given session of CAT are intended to be
at this level of difficulty and therefore continually re-evaluated in order to establish
the boundary of the learner’s knowledge.
Adaptive algorithms within CATs are usually based on Item Response Theory
(IRT), which is a family of mathematical functions used to predict the probability of a
student answering a given question correctly [12]. The CAT prototype used in this
study is based on the Three-Parameter Logistic Model and the mathematical function
shown in Equation 1 [12] is used to evaluate the probability P of a student with an
unknown ability correctly answering a question of difficulty b, discrimination a
and pseudo-chance c.
In order to evaluate the probability Q of a student with an unknown ability in-
correctly answering a question of difficulty b, the function is used
[12]. Within a CAT, the question to be administered next as well as the final score
obtained by any given student is computed based on the set of previous responses,
which is obtained using the mathematical function shown in Equation 2 [12].
mate. The discussion on IRT here is necessarily brief, but the interested reader is
referred to the work of Lord [12] and Wainer [20].
This paper marks further progress in research previously done at the University of
Hertfordshire on the use of computerised adaptive testing in Higher Education. In the
next section of this paper, we present a summary of two empirical studies concerning
the use of CATs in Higher Education, followed by the main findings of our most
recent study.
from this statistical analysis corroborate the findings from the first empirical study in
that the CAT approach was a fair method of assessment and also potentially capable
of offering a more consistent and accurate measurement of student ability than that
offered by conventional CBTs. The latter student performance analysis also indicated
that a score obtained by a student in one of the four assessments was a fair and satis-
factory predictor of performance in any other. A further finding from this second
empirical study was that learners with different cognitive styles were not disadvan-
taged by the CAT approach. This is a brief account of the findings from our second
empirical study, which are described in full by Barker & Lilley [2] and Lilley &
Barker [8].
In both studies, student feedback regarding the CAT approach was mainly positive.
Although students were receptive to the idea of computerised adaptive testing, some
students expressed their concern about not being able to review and modify previ-
ously entered responses. This aspect of computerised adaptive testing is discussed in
the next section of this paper.
The underlying idea within a CAT is that the ability of a test-taker can be estimated
based on his or her responses to a set of items by using the mathematical functions
provided by Item Response Theory. There is a common assumption that, within a
CAT, test-takers should not be allowed to review and modify previously entered
responses [17, 20], as this might compromise the legitimacy of the test and the appro-
priateness of the set of questions selected for each individual participant. For instance,
it is often assumed that returning to previously answered questions might provide
students with an opportunity to obtain correct responses by intelligent guessing.
Based on the premise that a student has an understanding of how the adaptive algo-
rithm works, if this student answers a question and the following question is an easier
one, he or she can deduce that the previous response was incorrect. This would, in
turn, allow the student to keep modifying his/her responses until the following ques-
tion was a more difficult one.
Nevertheless, previous work by the authors [8, 10] suggest that the inability to re-
view and modify previously entered responses could lead to an increase of student
anxiety levels and perceived loss of control over the application. Test-takers from a
study by Rafacz & Hetter [17] expressed a similar concern. Olea et al. have also re-
ported student preference towards CATs in which the review and modification of
previously entered responses is permitted [15]. In summary, students seem to favour
a computer-assisted assessment in which they have more control over the application
and the review and modification of previously entered responses is permitted. Fur-
thermore, the inability to return to and modify responses seemed to be contrary to
previous experiences of most students who have taken tests either in the traditional
CBT or paper-and-pencil formats.
In order to provide students with more control over the application, we first con-
sidered allowing students to review and modify previously entered responses at any
26 M. Lilley and T. Barker
Our CAT prototype was modified to allow students to revise previously entered re-
sponses immediately after all questions have been administered and answered. In
order to investigate the feasibility of the approach, we performed an empirical study
using the modified version of the prototype. This empirical study is described in the
next section of this paper.
4 The Study
Within this most recent version of the prototype, students were expected to answer 30
multiple-choice questions within a 40-minute time limit. The 30 questions were di-
vided into 2 groups. First, a set of 10 non-adaptive questions (i.e. CBT) followed by
20 adaptive questions (i.e. CAT). The set of CBT questions was identical for all
participanting students. Not only would this be a useful addition for comparative
purposes, but it would also help ensure that the test was fair and that no student would
be disadvantaged by taking part in the study. Students were allowed to review and
modify CBT and/or CAT responses only after all 30 questions had been answered.
The empirical study described here had two groups of participants. The first group
(CD2) comprised second year students from a Higher National Diploma (HND) pro-
gramme in Computer Science. The second group (CS2) consisted of second year
students from a Bachelor of Science (BSc) programme in Computer Science. Both
groups of participants took the same tests as part of their normal assessment for a
A Computer-Adaptive Test That Facilitates the Modification 27
programming module. The first assessment took place after 6 weeks of the course
and the second after 9 weeks.
The CAT was based on a database of 215 questions that were independently
ranked according to their difficulty by experts and assigned a value for the b parame-
ter. Values to the b parameter were assigned according to Bloom’s taxonomy of
cognitive skills [1, 16], as shown in Table 1.
Questions for the CBT component of the tests were also drawn from this database
across the range of difficulty levels. Participants were given a brief introduction to the
use of the software, but were unaware of the full purpose of the study and to the CAT
component of the tests until after both assessments had been completed. Each assess-
ment was conducted under supervision in computer laboratories. We present the main
findings from this study in the next section of this paper.
5 Results
The mean scores obtained by both groups of students are shown in Table 2. In Table
2, the mean value of the estimated ability for the adaptive component of the as-
sessment session is presented in the column named “CAT Level”. The estimated
ability ranged from +2 to –2, with 0.01 intervals.
Table 3 shows the results obtained by those students who made use of the option to
modify their previous answers. It can be seen from Table 3 that, for both groups,
28 M. Lilley and T. Barker
most students who used the review facility increased rather than lowered their final
scores. Further analysis was performed on the data from only one test (Test 2), as the
data from this test was the most complete.
Table 4 shows the number of students who made use of the review option on Test
2. Olea et al. [15] have reported that 60% of the participants in their study changed at
least one answer. Similarly, approximately 92% of CD2 participants in this study
used the review function on Test 2 and 60% of the participants changed at least one
answer. As for the CS2 group, it can be seen from Tables 3 and 4 that 92% of this
group used the review functions, but only 46% of the students changed at least one
answer.
A Computer-Adaptive Test That Facilitates the Modification 29
Table 5 shows the mean changes in scores obtained on Test 2 for students from
CS2 and CD2 groups. In addition, it shows the results of an Analysis of Variance
(ANOVA) on the data summarised in the columns “Mean score before review”,
“Mean score after review” and “Mean change”.
Table 6 shows the mean scores obtained by CS2 students who took Test 2, ac-
cording to their usage of the review option. An Analysis of Variance (ANOVA) was
performed on the data summarised in Table 6 to examine any significance of differ-
ences in the mean scores obtained for the two groups.
Mean standard error is presented in Figure 1 for a random sample of 45 CS2 stu-
dents who took Test 2. The subject sample was subdivided into three groups: 15 stu-
dents who performed well, 15 students who performed in the middle range and 15
students who performed poorly. Figure 2 shows the standard error for a random sam-
ple of 45 CD2 students who took Test 2. Similarly to Figure 1, the CD2 subject sam-
ple is subdivided into three groups: 15 students who performed well (i.e. “high per-
forming participants”), 15 students who performed in the middle range (i.e. “mid-
range performing participants”) and 15 students who performed poorly (i.e. “low
performing participants”).
It can be seen from Figures 1 and 2 that, irrespective of group or performance, the
standard error tends to a constant value of approximately 0.1.
30 M. Lilley and T. Barker
An important finding from our earlier work [8, 9, 10, 11] was that students valued the
ability to review and change answers in paper-based and traditional CBT assessments.
Similarly, in focus group studies, participants reported that they would like the ability
A Computer-Adaptive Test That Facilitates the Modification 31
to review and change response to CAT test questions before they were submitted. A
basic assumption of the CAT method is that the next question to be administered to a
test-taker is determined by his or her set of previous answers. In this study, the CAT
software was modified to allow review and change of selection at the end of the test.
This method presented itself as the simplest from a limited range of options. The
solution implemented allowed students the flexibility to modify their responses to
questions prior to submission, without the need for the programmers to change the
adaptive algorithm upon which the CAT was based. At the end of the test, the ability
of each individual student was recalculated using his/her latest set of responses. It was
important to test the effect of this modification to the CAT on the performance of
students. Overall, the data presented in Table 3 suggested that most learners were
able to improve their scores on the CAT and CBT components of the tests by re-
viewing their answers prior to submission.
Differences in the performance of the two different groups of learners were inter-
esting. Table 5 showed that for the CS2 (BSc Computer Science) group the option to
review scores led to significant increase in performance in terms of the percentage of
correct responses in the CBT (p<0.01), the percentage of correct responses in the
CAT (p<0.001) and the CAT level obtained (p<0.001). This was not the case for the
CD2 group. The CD2 (HND Computer Science) group had performed less well than
the CS2 group on both tests. Analysis of Variance of the data presented in Table 2
showed that for Tests 1 and 2, the CS2 group performed significantly better than the
CD2 group (p<0.001). The option to review, although used by most students, did not
lead to significantly better performance in the CBT sections of the course (p=0.38) or
in the final CAT level achieved (p=0.75). There was a significant improvement in the
percentage of CAT responses answered correctly (p<0.001), although this did not
lead to a significant increase in CAT level.
The reasons for this difference are possibly related to the CAT method and to the
ability of the students in each group. Only by getting the difficult questions correct
during the review process will the CAT level be affected significantly.
This seemed to be harder to do for the CD2 students. The CS2 learners perform
significantly better on the CAT test than the CD2 group. CS2 learners are more likely
to correct the more difficult questions they got wrong prior to review. CD2 learners
are more likely to correct the simpler questions they got wrong the first time, which
has little effect on the CAT level, but has an effect on the CAT % score. It is as if
there is a barrier above which the CD2 learners were not able to go. This is supported
by the fact that there was no significant change in the CBT % after review for the
CD2 group, showing that when the questions are set above their CAT level (as many
of the CBT questions were) then they did not improve their score by changing their
answers to them. When the answers to the more difficult questions were reviewed and
modified, they were less likely to get them right. Changing only the easier questions
has little effect up or down on the CAT level. CS2 students are able to perform at a
higher level and the barrier was not evident for them. It is interesting to note that they
were able to change significantly their performance on CBT and CBT sections of the
test after review.
32 M. Lilley and T. Barker
Of further interest is a comparison of the standard error curves for CS2 and CD2
groups of students. The standard error for both groups and for all levels of perform-
ance was very similar and relatively constant. The adaptive nature of the CAT test
will ensure that the final CAT level achieved by learners on the test is fairly constant
after relatively few questions. The approach of allowing students to modify their
responses at the end of the CAT is not likely to change the final level of the test-taker
significantly, unless they have performed slightly below their optimum level first time
round. It is possible that CS2 students adopt a different strategy when answering the
CAT from CD2 students. Perhaps CS2 students enter their answers more quickly and
rely on the review process to modify them, whereas CD2 students enter them at their
best level first time. This would explain the difference in performance after the re-
view for both groups. It would be of interest to investigate the individual strategies
adopted by learners on CATs in future work.
In summary, all students valued the option to review, even though in many cases
this had little effect on the final levels achieved. Certainly less able learners did not
significantly improve performance by reviewing their answers, though most were
able to improve their scores slightly. Some learners performed less well after review,
though slightly more gained than lost by reviewing.
It is likely that the attitude of learners to the review process was an important fea-
ture. The effect on motivation was reported in earlier studies and for this reason alone
it is probably worth allowing review in CATs. Reflection is an important study skill
that should be fostered. A mature approach to examination requires reflection and it
is still the best advice to students to read over their answers before finishing a test.
Standard error (SE) was shown to be a reliable stopping condition for a CAT, since
for both groups of students at three levels of performance the SE was approximately
the same.
References
1. Anderson, L.W. & Krathwohl, D.R. (Eds.) (2001). A Taxonomy for Learning, Teaching,
and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives. New York:
Longman.
2. Barker, T. & Lilley, M. (2003). Are Individual Learners Disadvantaged by the Use of
Computer-Adaptive Testing? In Proceedings of the 8th Learning Styles Conference. Uni-
versity of Hull, United Kingdom, European Learning Styles Information Network
(ELSIN), pp. 30-39.
3. Carlson, R. D. (1994). Computer-Adaptive Testing: a Shift in the Evaluation Paradigm.
Journal of Educational Technology Systems, 22(3), pp 213-224.
4. Conejo, R., Millán, E., Pérez-de-la-Cruz, J. L. & Trella, M. (2000). An Empirical Ap-
proach to On-Line Learning in SIETTE. Lecture Notes in Computer Science (2000) 1839,
pp. 605-614.
5. Freedle, R. O. & Duran, R. P. (1987). Cognitive and Linguistic Analyses of test perform-
ance. New Jersey: Ablex.
A Computer-Adaptive Test That Facilitates the Modification 33
1 Introduction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 34–44, 2004.
© Springer-Verlag Berlin Heidelberg 2004
An Autonomy-Oriented System Design 35
autonomy). But other researches also underlined the necessity of monitoring and
sometime guiding and helping (i.e. coaching) students in order to keep them focused
on learning activities [5, 6, 11]. In this paper, we will propose a system design in
order to give autonomy and, at the same time, monitor/coach learners in an e-
Learning system.
Contrary to O’Regan’s outcomes, non-controlling virtual environments (for
example Virtual Harlem [13]) have positive feedbacks from learners and appear to be
very motivating. But those systems don’t adapt to learners specificities. On their side,
ITS provide organized knowledge and personalized help to learners but are very
controlling systems. Thus, we propose a hybrid e-Learning system design using non-
controlling virtual environment and agents inspired by pedagogical agents [8]. We
demonstrate how this system can enhance learner’s motivation.
In section two, we give an overview of some of the main AM theories and also
present some of the motivation factors that have been defined in the AM field. In
section three, we emphasize the importance of learner’s autonomy in order to
maintain a high level of motivation. We also focus on the necessity of finding balance
between coaching and autonomy. In section four, we propose a virtual learning
environment system for maintaining and enhancing motivation in an e-Learning
activity. This environment will enhance learner’s motivation by giving him more
control on his learning activity, allowing him to explore solutions, to make
hypothesis, to interact and play different roles.
Petri [14] defines motivation as “the concept we use when we describe the forces
acting on or within an organism to initiate and direct behavior”. He also notices that
motivation is used “to explain differences in the intensity of behavior” and “to
indicate the direction of behavior. When we are hungry, we direct our behavior in
ways to get food”.
Study of motivation is a huge domain in psychology. AM is a sub-part of this
domain where “theorists attempt to explain people’s choice of achievement tasks,
persistence on those tasks, vigor in carrying them out, and quality of task
engagement” [5]. There are many different theories dealing with AM. Actual ones are
mostly referring to a social-cognitive model of motivation proposed by Bandura [1],
described by Eccles and her colleagues [5] as a postulate that “human achievement
depends on interactions between one’s behavior, personal factors (e.g. thoughts,
beliefs) and environmental conditions”. In the next part, we give a review of some of
the main actual AM theories.
Elements that can affect AM are numerous. In AM literature, those which appear
most frequently are individual goals, social environment, emotions, intrinsic interest
for an activity and self beliefs. It exist relations between all those factors so they must
not be seen as being independent of the others.
Individual Goals. As explained by Eccles et al [6], researches show that a learner
can develop ego-involved goals (if he wants to maximize the probability of a good
evaluation of his competences and create a positive image of himself), task-involved
goals (if he wants to master tasks and improve his competences) or also work-
avoidant goals (if he wants to minimize the effort). In fact, goals are generally said to
be oriented to performance (ego-involved) or mastery (task-involved).
An Autonomy-Oriented System Design 37
children who believe they control their achievement outcomes should feel more
competent. They also made a link between control beliefs and competence needs and
hypothesized that the fulfillment of needs was influenced by social environment
characteristics (such as autonomy provided to the learner). In her AM model, Skinner
[16] described a control belief as an expectation a person has to be able to produce
desired events.
In the next part, we show the interest for e-Learning of giving to the learner the
belief that he has control (i.e. autonomy) on his activities and achievements.
Eccles et al [6] reported that many psychologists “have argued that intrinsic
motivation is good for learning” and that “classroom environments that are overly
controlling and do not provide adequate autonomy, undermine intrinsic motivation,
mastery orientation, ability self-concepts and expectation and self-direction, and
induce a learner helplessness response to difficult tasks”. Flink et al [7] made
experimentation in this way. They created homogeneous groups of learners and asked
different teachers to teach with either controlling methodology or by giving autonomy
to the learner. All the sessions were videotaped. After that, they showed the tapes to a
group of observers and asked them who the best teachers were. Observers answered
that teachers having the controlling style were better (maybe because they seemed
more active, directive and better organized [6]). In fact learners having more
autonomy obtained results significantly better. Others researchers found similar
results.
In Deci and Ryan’s Self-Determination Theory [15], a process called
internalization is presented. As Eccles et al mentioned [6], “Internalization is the
process of transferring the regulation of behavior from outside to inside the
individual”. Ryan and Deci also proposed different regulatory styles which
correspond to different level of autonomy. Figure 1 represents these different levels,
their corresponding locus of control (i.e. “perceived locus of control”) and the
relevant regulatory processes.
In this figure, we can see that the more internal the locus of control is, the better
the regulatory processes. That means that, if a learner has intrinsic motivation for an
activity, his regulation style will be intrinsic, his locus of control will be intern and he
will resent inherent satisfaction, enjoyment, and interest. Intrinsic motivation will
only occur if the learner is highly interested in the activity. In many e-Learning
activities, interest for the activity will be lower. And motivation will be somewhat
extrinsic. So, in e-Learning, we have to focus on enhancing an learner’s internal
perception of locus of control.
An Autonomy-Oriented System Design 39
Fig. 1. A taxonomy of Human Motivation (as proposed by Ryan and Deci [15])
Some e-Learning systems already provide autonomy without being focused on it.
Virtual Harlem [13] for example is a reconstruction of Harlem during the 1920s. The
aim of this system is to provide a Distance Learning Classroom concerning African-
American Literature of that period. Some didactic elements like sound or text can be
inserted in the virtual world and the learner is able to retrieve those didactic elements.
Virtual Harlem is also a collaborative environment and learners can interact with
other learners or teachers in order to share the experience they acquired in the virtual
world. An interesting element in Virtual Harlem is that learners can add content to the
world, expressing what they felt and making the virtual world richer (this is some
kind of asynchronous help for future students). Virtual Harlem provides autonomy not
because it is Virtual Reality but because this is an “open world”. By “open
environment”, we mean that constraints in term of movements and actions are limited.
Contrary to O’Regan’s study, Virtual Harlem received positive feedbacks from
learners who said there should be more exposure to technologies in classrooms.
But Virtual Harlem has also problems. The system itself has few pedagogical
capabilities. There is no adaptation to learner specificities, which limits learning
strategies. Asynchronous learning remains difficult because a huge part of the
learning depends on the interaction with other human beings connected to the system.
We have seen that autonomy is positive for learning. Many ITS systems are used to
support e-Learning activities. They can be described as extremely controlling because
they adopt a state-transition scheme (i.e. the learner does an activity, the ITS assesses
the learner and, given the results ask the learner to do another activity). Locus of
control is mainly external. Virtual reality pedagogical agents like STEVE [8] are also
40 E. Blanchard and C. Frasson
controlling. If STEVE decides to perform an activity, the learner will also have to do
the activity if he wants to learn. He has limited way of learning by himself.
But ITS, of course, have positive aspects for e-Learning. Student model is an
important module in an ITS architecture. A student model allows the system to adapt
its teaching style and strategy to a learner. It can provide many different kinds of help.
STEVE can be used asynchronously because each STEVE can be either human-
controlled or AI-controlled. This means that if you are logged to the system, there can
be ten STEVE interacting with you but you can be the only human being.
We have seen that actual e-Learning Systems, which don’t provide autonomy,
provoke more negatives than positive emotions on learners and lower the interest for
the learning activity [12]. We have shown that “open world” can deal with learners
autonomy needs. But actual systems like Virtual Harlem lack of pedagogical
capabilities and adaptation to the learner. ITS provide that adaptation but are very
controlling systems.
In the next part, we propose to define a motivation-oriented hybrid system. The
aim of this system is to provide an environment where learning initiatives (i.e.
autonomy) are encouraged in order to increase learner’s intrinsic interest for the
learning activity. This system is a multi-learner online system using role-playing
practices. Thus, it has a constructivist learning approach [18]. As we pointed before, if
they are used with parsimony, help and guidance are useful for learner’s motivation
and success. Our system monitors learner’s behaviors and takes the initiative to
propose help to the learner only when a problem is detected.
To go further Virtual Harlem and other systems, we propose to define
motivational e-Learning systems.
that there is a fire next to the hospital and that many injured persons (new patients)
are coming. The doctor will then ask the learner to make diagnostic on those new
patients and determine whose patients have to be cured first.
There are three types of entities that can communicate together in the system: the
learner application, Motivational Pedagogical Agents (MPAs) and the Motivational
Strategy Controller (MSC). Two other elements complete MeLS design: the Open
Environment and the Curriculum Module. Figure 2 represents the global architecture
of MeLS.
The Open Environment is a 3D environment. Some interactive objects can be
added to the environment in order to ameliorate constructivist learning.
The Learner Application contains the student model of the learner and sensors to
analyze the activity of the learner. If the learner is passive, MeLS will deduce that he
needs to be motivated. Those sensors also help to maintain the student model of the
learner. In the open environment, each learner is represented by an avatar.
An MPA is an extension of the pedagogical agent concept proposed by Johnson
and his colleagues [8]. Each MPA is represented by an avatar and has particular
behavior, personality and knowledge given their assigned role (doctors and nurses
don’t know the same things about medicine) in the virtual environment. Given the
behavior of the learner, an MPA can also decide to contact a learner (when he is
passive) in order to propose him an activity. To compare with ITS and depending of
the learning strategy used, MPAs can be seen as companion or tutor.
The MSC has no physical representation in the open environment. Its meaning is
to define a more global strategy for enhancing learners’ motivation. The MSC is also
in charge of proposing scenario (adapted from a bank of scenario template) and, for
this purpose, he can generate new MSC. As we said before, there can be many
learners on the system. The MSC can organize collaborative activities (collaboration
enhances motivation [6]) within a scenario. MSC is in someway the equivalent of the
planner in ITS.
The Curriculum Module has the same meaning than in an ITS. When we say that
an MPA has certain knowledge, we mean they have the right to access to
corresponding knowledge resources in the curriculum module. We decide to
42 E. Blanchard and C. Frasson
There are two kinds of learner’s needs that the system can try to fulfill: academic
help or motivation enhancement. In the first case (academic help need), DMP detects
that a learner has academic problems (for example, he is always failing to achieve an
activity). In the second case (motivation enhancement need) DMP detects that a
learner is passive (some work on motivation diagnosis in ITS was done by De Vicente
and Pain [4]). DMP deduce that this learner needs to be motivated. Once a problem is
detected, a strategy (ex: a scenario) to resolve it will be elaborate by the MSC.
interest for e-Learning and, by the way, learners’ success. But coaching can also be
positive for learner’s success.
In order to mix coaching and learner’s autonomy in e-Leaming systems, we
defined an hybrid system design between open environments and ITS, called
Motivational e-Learning System (MeLS). This system resolves problems of learner’s
autonomy that we described in ITS: it gives possibilities of self-learning to learners by
interacting with the environment, which can be seen as a constructivist learning
approach. The Discreet Monitoring Process (DMP) was proposed to foster learner’s
motivation. DMP can deal with academic or learner’s passive behavior problems.
DMP is able to generate strategies (such as scenarios) to correct learner’s problems.
Motivational Pedagogical Agents (MPAs), inspired by pedagogical agents like
STEVE [8] and represented in the virtual world by avatars, are in charge of executing
those strategies.
Next step will be to create a whole course using MeLS design. Motivational
student model, strategies (local or global) have to be clarified. In this purpose, further
readings on volition and intrinsic motivation concepts and on academic help seeking
are planned.
References
[1] Bandura, A. (1986). Social foundations of thought and action: a social-cognitive theory.
Englewood Cliffs, NJ: Prentice Hall.
[2] Corno, L. (1993). The best-laid plans: modern conceptions of volition and educational
research. Educational Researcher, 22. pp 14-22.
[3] Connell, J. P. & Wellborn J. G. (1991). Competence, autonomy and relatedness: a
motivational analysis of self-system processes. R Gunnar & L. A. Sroufe (Eds),
Minnesota Symposia on child psychology, 23. Hillsdale, NJ: Erlbaum. pp 43-77.
[4] De Vicente, A. and Pain, H. (2002). Informing the detection of the students’
motivational state: An empirical study. In S.A. Cerri, G. Gouardères, & F. Paraguaçu
(Eds.), Proceedings of the 6th International Conference on Intelligent Tutoring Systems.
Berlin: Springer-Verlag. pp 933-943.
[5] Eccles, J. S. & Wigfield A. (2002). Development of achievement motivation. San Diego,
CA: Academic Press.
[6] Eccles, J. S., Wigfield A. & Schiefele U. (1998). Motivation to succeed. N. Eisenberg
(Eds), Handbook of child psychology, 3. Social, emotional, and personality development
(5th ed.), New York: Wiley. pp 1017-1095.
[7] Flink, C., Boggiano A. K., Barrett M. (1990). Controlling teaching strategies:
undermining children’s self-determination and performance. Journal of Personality and
Social Psychology, 59. pp 916-924.
[8] Johnson, W.L., Rickel J.W. & Lester J.C. (2000) Animated pedagogical agents: face-to-
face interaction in interactive learning environments. International Journal of Artificial
Intelligence in Education, 1. pp 47-78.
[9] Kuhl, J. (1987). Action control: The maintenance of Motivational states. F. Halisch & J.
Kuhl (Eds), Motivation, Intention and Volition. Berlin: Springer-Verlag. pp 279-307.
44 E. Blanchard and C. Frasson
1 Introduction
Researches in neurosciences and psychology have shown that emotions exert influ-
ences in various behavioral and cognitive processes, such as attention, long-term
memorizing, decision-making, etc. [5, 18]. Moreover, positive affects are fundamen-
tal in cognitive organization and thought processes; they also play an important role
to improve creativity and flexibility in problem solving [11]. However, negative af-
fects can block thought processes; people who are anxious have deficit in inductive
reasoning [15], slow decision latency [20] and reduced memory capacity [10]. This is
not new to teachers involved in traditional learning; students who are bored or anx-
ious could not retain knowledge and think efficiently.
Intelligent Tutoring Systems (ITS) are used to support and improve the process of
learning for any field of knowledge [17].Thus, new ITS should deal with student’s
emotional states such as sadness or joy, by identifying his current emotional state and
attempting to address it. Some ITS architectures integrate learner emotion in the stu-
dent model. For instance, Conati [4] used a probabilistic model based on Dynamic
Decision Networks to assess the emotional state of the user with educational games.
In the best of our knowledge; there is no ITS systems dealt with the optimal emo-
tional state.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 45–54, 2004.
© Springer-Verlag Berlin Heidelberg 2004
46 S. Chaffar and C. Frasson
So, we define the optimal emotional state as the affective state which maximizes
learner’s performance such as memorization, comprehension, etc. To achieve this
goal, we address here the following fundamental questions: how can we detect the
current emotional state of the learner? How can we recognize his optimal emotional
state for learning? How can we induce this optimal emotional state in the learner?
In the present work we have developed and implemented a system called ESTEL
(Emotional State Towards Efficient Learning system) which is able to predict the
optimal emotional state of the learner and to induce it, that means to trigger actions so
that the learner be in his optimal emotional state.
After reviewing some previous work realized we present ESTEL, architecture of a
system intended to generate emotions able to improve learning. We detail all its com-
ponents and show how we obtained from experiment various elements of theses mod-
ules. We present in particular this experiment.
2 Previous Work
This section will survey some of previous work in inducing emotion in psychology
and in computer science domains.
Researchers in psychology have developed a variety of experimental techniques
for inducing emotional state aiming to find a relationship between emotions and
thought tasks; one of them is the Velten procedure which consists of randomly as-
signing participants to read a graded set of self-referential statements for example, “I
am physically feeling very good today” [19]. A variety of other techniques exists
including guided imagery [2] which consists of asking participants to imagine them-
selves in a series of described situations, for example: “You are sitting in a restaurant
with a friend and the conversation becomes hilariously funny and you can’t stop from
laughing”. Some other existing techniques are based upon exposing to participants
films, music or odors. Gross and Levenson (1995) found that 16 film clips could
induce really one of the following emotions (amusement, anger, contentment, disgust,
fear, neutrality, sadness, and surprise) from the 78 films shown to 494 subjects [9].
Researchers in psychology have also developed hybrid techniques which combine
two or more procedures; Mayer et al. (1995) used the guided imagery procedure with
music procedure to induce four types of emotions, joy, anger, fear, sadness. They
used the guided imagery to occupy the foreground attention and the music to empha-
size the background.
However, few works in computer science attempted to induce emotions. For in-
stance, at MIT Media Lab, Picard et al. (2001) used pictures to induce a set of emo-
tions which include happiness, sadness, anger, fear, disgust, surprise, neutrality, pla-
tonic love and romantic love [14]. Moreover at affective Social Computing Labora-
tory, Nasoz et al. used results of Gross and Levenson (1995) to induce sadness, anger,
surprise, fear, frustration, and amusement [13].
As mentioned previously, emotions play a fundamental role in thought processes;
Estrada et al. have found that positive emotions may increase intrinsic motivation [6].
In addition, two recent studies, trying to check the influence of positive emotions on
Inducing Optimal Emotional State for Learning in Intelligent Tutoring Systems 47
motivation, have also found that positive affects can enhance performance on the task
at hand [11]. For these reasons, our present work aims to induce optimal emotional
state which is a positive emotion that maximizes learner’s performance.
In the next section, we present the architecture of the ESTEL.
3 ESTEL Architecture
The different modules of this architecture intervene according to the following se-
quence; we detail further the functionalities of each module:
the learner has first to accede to the system through a user interface and his
actions are intercepted by the Emotion Manager,
the Emotion Manager module launches the Emotion Identifier module
which identifies the current emotion of the learner (2),
the Learning Appraiser module receives instruction (3) from the Emotion
Manager to submit the learner to a pre-test in order to evaluate his perform-
ance in the current emotional state,
the Emotion Manager module triggers the Personality Identifier module (4)
which identifies the personality of the learner,
in the same way, the Optimal Emotion Extractor (5) is started to predict the
optimal emotional state of the learner according to his personality,
48 S. Chaffar and C. Frasson
then next module launched is the Emotion Inducer (6),which will induce the
optimal emotional state for the learner,
finally, the Learning Appraiser (7) module will submit the learner to a post-
test to evaluate his performance under the optimal emotional state.
The different modules mentioned previously are described bellow:
The role of this module is to monitor the entire emotional process of ESTEL, to dis-
tribute and synchronize tasks, and to coordinate between the other modules. In fact
the emotion manager is a part of the student model in an ITS. It receives various pa-
rameters from the other modules. As we can see in Fig. 1, ESTEL architecture is
centralized, all the information passes by the Emotion Manager module which will
successively trigger the other modules.
The Emotion Identifier module recognizes the current emotional state of the learner;
it is based on the Emotion Recognition Agent (ERA). ERA is an agent that has been
developed in our lab to identify a user’s emotion given a sequence of colors. To
achieve this goal, we have conducted an experiment in which 322 participants have to
associate color sequences with their emotions. Based on results obtained in the ex-
periment, the agent uses an ID3 algorithm to provide a decision tree which represents
the sequence of colors with the corresponding emotions. This decision tree allows us
to predict the current emotional state of a new learner according to his choice of a
color sequence with 57, 6 % accuracy.
The Optimal Emotion Extractor module uses a set of rules that we have obtained from
the experiment which will be described later. Those rules allow us to determine the
learner’s optimal emotional state according to his personality. Let us take an example
to show how the Optimal Emotion Extractor works; we suppose that the learner’s
personality is extraversion. To predict his optimal emotional state, the Optimal Emo-
tion Extractor browses the rules to find a case corresponding to the personality of the
learner; the rules are represented as:
If (personality = Extraversion) then optimal-emotional-state = joy.
By applying the rule above, the Optimal Emotion Extractor module will identify
the learner’s optimal emotional state as joy.
After identifying the optimal emotional state of the learner, ESTEL will induce
it via the Emotion Inducer module.
The Emotion Inducer module attempts to induce the optimal emotional state, which
represents a positive state of mind that maximizes learner’s performance, found by
the Optimal Emotion Extractor. For example, when a new learner accedes to ESTEL;
the Personality Identifier determines his personality as extraversion, and then the
Optimal Emotion Extractor retrieves joy as the optimal emotional state for this per-
sonality. Emotion Inducer will elicit joy in this learner by using the hybrid technique
which consists of displaying different interfaces. These interfaces include guided
imagery vignettes, music and images. The Emotion Inducer is inspired by the study of
Mayer et al. [12] that has been done to induce four specific emotions (joy, anger, fear,
and sadness). After inducing emotion, the Emotion Manager module will restart the
Learning Appraiser module for evaluating learning efficiency.
This module allows us to assess the performance of the learner in his current emo-
tional state and then in his optimal one. The Learning Appraiser module uses, firstly,
a pre-test for measuring the knowledge retention of the learner in the current emo-
tional state. Secondly, it uses a post-test to evaluate the learner in the optimal emo-
tional state. The results obtained will be transferred to the Emotion Manager to find
out which of the two emotional states really enhances learning. If the results of the
learner obtained in the pre-test (current emotional state) are better than those obtained
in the post-test (optimal emotional state), ESTEL will take into consideration the
current emotional state of this learner to eventually update the set of possible optimal
emotional state for new learners.
As follows, we present the results of the experiment conducted to predict learner’s
optimal emotional state and to induce it.
50 S. Chaffar and C. Frasson
Since different people have different optimal emotional states for learning, we have
conducted an experiment to predict optimal emotional state according to the learner’s
personality. The sample included 137 participants from different genders and ages.
First, participants choose the optimal emotional state that maximizes their learning
from a set of sixteen emotions (as shown in Fig. 2).
After selecting their optimal emotional state, subjects answer to the 24-items of the
EPQR-A [8]. The data collected was used to establish a relationship between optimal
emotional state and personality.
As shown in the table above, from the initial set of sixteen emotions given to the
137 participants, just thirteen have been selected. As you notice more than 28% of the
participants, who their personality is extraversion, select joy as the optimal emotional
state. There are also about 36% of the participants who have the most score in the lie
Inducing Optimal Emotional State for Learning in Intelligent Tutoring Systems 51
scale, choose confident for representing their optimal emotional state. Nearly, 29% of
the neurotic participants find that their optimal emotional state is pride. Moreover,
from the 137, we have found just six psychotic participants, 50% of them have se-
lected joy as the optimal emotional state.
Where:
n = the number of the users who their optimal emotional state is
number of the users who their optimal emotional state is and their per-
sonality is
p = a priori estimate for
m = the size of the sample.
Looking at P (Joy/Extraversion), we have 53 cases where and in
15 of those cases Thus, n = 53 and since we have just one attribute
value and p = 1/ (number-of attribute values), so p = 1 for all attributes. The size of
the sample is m = 137, therefore, from formula (2), we get:
52 S. Chaffar and C. Frasson
and, , therefore,
Suppose that we have just two attributes: Anxious and Joy. By applying the same
steps for Anxious, we obtained:
Using formula (1): since 0.011 < 0.021, the optimal emotional state predicted ac-
cording to extraversion is joy.
By applying the Naïve Bayes classifier to all attributes, we have obtained the fol-
lowing tree which allows us to predict the optimal emotional state for a new learner
according to his personality (see Fig. 4).
imagery, we also put music to improve the background. For example, we say to the
learner imagine that “It’s your birthday and friends throw you a terrific surprise
party” [12], we show him an image that reflects this situation to help him in his
imagination, in the background we put to him a music expressing joy such as Bran-
denburg Concerto #2 composed by Bach [3]. We use the same principle to induce the
two other optimal emotional states.
In this paper, we have presented the architecture of our system ESTEL. By which, we
proposed a way to predict optimal emotional state for learning and to induce it. We
know that it is hard to detect the optimal emotional state for learning. For this reason,
we have used the Naïve Bayes classifier which helps us to find the optimal emotional
state for each personality. Moreover, we are also aware of the fact that inducing emo-
tions in humans is not easy. That is why we have used the hybrid technique including;
guided imagery, music and images attempting to change learner’s emotion.
It remains for future research to study the effect of the emotion intensity in thought
processes. On one hand, as mentioned previously, positive affects play an important
role to enhance learning; on the other hand, the excess of the emotion sensed could go
in the opposite direction. So, the learner will be submerged by this emotion and could
not achieve the learning tasks in a good way. For this reason, future studies will con-
centrate on emotion intensities to regulate the emotion induced by ESTEL. So, we are
thinking to add a new module called the Emotion Regulator which will be able to
control and to regulate the optimal emotional state intensity in order to improve even
more the learner’s performance.
References
1. Abou-Jaoude, S., Frasson, C. Charra, O., Troncy, R.: On the Application of a Believable
Layer in ITS. Workshop on Synthetic Agents, 9th International Conference on Artificial
Intelligence in Education, Le Mans (1999)
2. Ahsen, A.: Guided imagery: the quest for a science. Part I: Imagery origins. Education,
Vol. 110, (1997) 2-16
3. Bach, J. S.: Brandenburg Concerto No.2. In Music from Ravinia series, New York, RCA
Victor Gold Seal, (1721) 60378-2-RG
4. Conati C.: Probabilistic Assessment of User’s Emotions in Educational Games. Journal of
Applied Artificial Intelligence, Vol. 16, (2002) 555-575
54 S. Chaffar and C. Frasson
5. Damasio, A.: Descartes Error. Emotion, Reason and the Human Brain, Putnam Press, New
York (1994)
6. Estrada, C.A., Isen, A.M., Young, M. J.: Positive affect influences creative problem solv-
ing and reported source of practice satisfaction in physicians. Motivation and Emotion,
Vol. 18, (1994) 285-299
7. Eysenck, H. J., Eysenck, M. W.: Personality and individual differences. A natural science
approach, New York: Plenum press (1985)
8. Francis, L., Brown, L., Philipchalk, R.: The development of an Abbreviated form of the
Revised Eysenck Personality Questionnaire (EPQR-A). Personality and Individual Differ-
ences, Vol. 13, (1992) 443-449
9. Gross, J.J., Levenson, R.W.: Emotion elicitation using films. Cognition and Emotion, Vol.
9, (1995) 87-108
10. Idzihowski, C., Baddeley, A.: Fear and performance in novice parachutists. Ergonomics,
Vol. 30, (1987) 1463-1474
11. Isen, A. M.: Positive Affect and Decision Making. Handbook of Emotions, New York:
Guilford (1993) 261-277
12. Mayer, J., Allen, J., Beauregard, K.: Mood Inductions for Four Specific Moods. Journal of
Mental imagery, Vol. 19, (1995) 133-150
13. Nasoz, F., Lisetti, C.L., Avarez, K., Finkelstein, N.: Emotion Recognition from Physio-
logical Signals for User Modeling of Affect. The 3rd Workshop on Affective and Attitude
User Modeling, USA (2003)
14. Picard, R. W., Healey, J., Vyzas, E.: Toward Machine Emotional Intelligence Analysis of
Affective Physiological State. IEEE Transactions onPattern Analysis and Machine Intelli-
gence, Vol. 23 (2001) 1175-1191
15. Reed, G. F.: Obsessional cognition: performance on two numerical tasks. British Journal
of Psychiatry, Vol. 130 (1977) 184-185
16. Rish, I.: An empirical study of the naive Bayes classifier. Workshop on Empirical Meth-
ods in AI (2001)
17. Rosic, M., Stankov, S. Glavinic, V.: Intelligent tutoring systems for asynchronous distance
education. 10th Mediterranean Electrotechnical Conference (2000) 111-114
Evaluating a Probabilistic Model of Student Affect
1 Introduction
Electronic games for education are learning environments that try to increase student
motivation by embedding pedagogical activities in highly engaging, game-like inter-
actions. Several studies have shown that these games are usually successful at in-
creasing the level of student engagement, but they often fail to trigger learning [10]
because students play the game without actively reasoning about the underlying in-
structional domain. To overcome this limitation, we are designing pedagogical agents
that generate tailored interactions to improve student learning during game playing.
In order not to interfere with the student’s level of engagement, these agents should
take into account the student’s affective state (as well as their cognitive state) when
determining when and how to intervene. However, understanding someone’s emo-
tions is hard, even for human beings. The difficulty is largely due to the high level of
ambiguity in the mapping between emotional states, their causes and their effects
[12].
One possible approach to tackling the challenge of recognizing user affect is to re-
duce the ambiguity in the modeling task, either by focusing on a specific emotion in a
fairly constraining interaction (e.g. [9]) or by only recognizing emotion intensity and
valence (e.g. [1]). In contrast, our goal is to devise a framework for affective model-
ing that pedagogical agents can use to detect multiple specific emotions in interac-
tions in which this information can improve the effectiveness of the adaptive support
provided. To handle the high level of uncertainty in this modeling task, the frame-
work integrates in a Dynamic Bayesian Network (DBN [8]) information on both the
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 55–66, 2004.
© Springer-Verlag Berlin Heidelberg 2004
56 C. Conati and H. Maclare
causes of a student’s emotional reactions and their effects on the student’s bodily
expressions. Model construction is done as much as possible from data, integrated
with relevant psychological theories of emotion and personality.
While the model structure and construction is described in previous publications
[3,13], in this paper we focus on model evaluation. In particular, we focus on evalu-
ating the causal part of the model. To our knowledge, whilst there have been user
studies to evaluate sources of affective data (e.g., [2]), this is the first empirical
evaluation of an affective user model, embedded in a real system and tested with real
users.
We start by describing our general framework for affective modeling. We then
summarize how we built the causal part of the model for Prime Climb, an educational
game for number factorization. Finally we describe the user study, its results and the
insights that it generated on how to improve the model’s accuracy.
Fig. 1 shows two time slices of our DBN for affective modeling. The nodes represent
classes of variables in the actual DBN, which combines evidence on both causes and
effects of emotional reactions, to compensate for the fact that often evidence on
causes or effects alone is insufficient to accurately assess the student’s emotional
state.
The part of the network above the nodes Emotional States represents the relations
between possible causes and emotional states, as they are described in the OCC the-
ory of emotions [11]. In this theory, emotions arise as a result of one’s appraisal of
the current situation in relation to one’s goals. Thus, our DBN includes variables for
Evaluating a Probabilistic Model of Student Affect 57
Goals that a student may have during interaction with the game. Situations consist of
the outcome of any event caused by either a student’s or an agent’s action (nodes
Student Action Outcome and Agent Action Outcome in Fig. 1). Agent actions are
represented as decision variables, indicating points where the agent must decide how
to intervene in the interaction. The desirability of an event in relation to the student’s
goals is represented by the node class Goals Satisfied, which in turn influences the
student’s Emotional States.
Assessing student goals is non-trivial, especially when asking the student directly
is not an option (as is the case in educational games). Thus, our DBN includes nodes
to infer student goals from both User Traits that are known to influence goals (such as
personality [7]) and Interaction Patterns.
The part of the network below Emotional States represents the interaction between
emotional states, their observable effects on student behavior (Bodily Expressions)
and sensors that can detect them. It is designed to modularly combine any available
sensor information, to compensate for the fact that a single sensor can seldom reliably
identify a specific emotional state.
In the next section, we show how we instantiated the causal part of the model to
assess students’ emotions during the interaction with the Prime Climb educational
game. For details on the diagnostic part see [5].
Fig. 2 shows a screenshot of Prime Climb, a game designed to teach number factori-
zation to and grade students. Two players must cooperate to climb a series of
mountains that are divided in numbered sectors. Each player should move to a num-
ber that does not share any factors with her partner’s number, otherwise she falls.
Prime Climb provides two tools to help students: a magnifying glass to see a num-
ber’s factorization, and a help box to communicate with the pedagogical agent we are
building for the game. In addition to providing help when a student is playing with a
partner, the agent engages its player in a “Practice Climb” during which it climbs with
the student as a climbing instructor. The affective user model described here assesses
the player’s emotions during these practice climbs, and will eventually be integrated
with a model of student learning [6] to inform the agent’s pedagogical decisions.
We start by summarizing how we defined the sub-network that assesses students’
goals. For more details on the process see [13]. Because all the variables in this sub-
network are observable, we identified the variables and built the corresponding con-
ditional probability tables (CPTs) using data collected through a Wizard of Oz study
where students interacted with the game whilst an experimenter guided the pedagogi-
cal agent. The students took a pretest on factorization knowledge, a personality test
based on the Five Factor personality theory [7], and a post-game questionnaire to
express what goals they had during the interaction. The probabilistic dependencies
58 C. Conati and H. Maclare
among goals, personalities, interaction patterns and student actions were established
through correlation analysis between the test results, the questionnaire results and
student actions logged during the interactions.
Fig. 3 shows the resulting sub-network, incorporating both positive and negative
correlations. The bottom level specifies how interaction patterns are recognized from
the relative frequency of individual actions [13]. We intended to represent different
degrees of personality type and goal priority by using multiple values in the corre-
sponding nodes. However, we did not have enough data to populate the larger CPTs
and resorted to binary nodes. Let’s consider now the part of the network that repre-
sents the appraisal mechanism (i.e. how the mapping between student goals and game
states influences student emotions). We currently represent in our DBN only 6 of the
22 emotions defined in the OCC model. They are joy /distress for the current state of
the game, pride/shame of the student toward herself, and admiration/reproach toward
Evaluating a Probabilistic Model of Student Affect 59
the agent, modeled in the network by three two-valued nodes: emotion for event,
emotion for self and emotion for agent (see Fig. 4).
The links and CPTs between Goal nodes, the outcome of student or agent actions
and Goal Satisfied nodes, are currently based on subjective judgment. For some of
these links, the connections are quite obvious. For instance, if the student has the goal
Avoid Falling, a move resulting in a fall will lower the probability that the goal is
achieved. Other links (e.g., those modeling which student actions cause a student to
have fun or learn math) are less obvious, and could be built only through explicit
student interviews that we had no way to conduct during our studies. When we did
not have good heuristics to create these links, we did not include them in the model.
The links between Goal Satisfied nodes and the emotion nodes are defined as follows.
We assume that the outcome of every agent or student action is subject to student
appraisal. Thus, each Goal Satisfied node influences emotion-for-event in every slice.
Whether a Goal Satisfied node influences emotion-for-self or emotion-for-agent in a
given slice depends upon whether the slice was generated, respectively, by a student
action (slice in Fig. 4) or agent’s action (not shown due to lack of space). The CPTs
for emotion nodes are defined so that the probability of each positive emotion is pro-
portional to the number of true Goal Satisfied nodes.
4 Evaluation
In order to gain an idea of how approximation due to lack of data affected the causal
affective model we ran a study to produce an empirical evaluation of its accuracy.
However, evaluating an affective user model directly is difficult. It requires assessing
the students’ actual emotions, which are ephemeral and can change multiple times
during the interaction. Therefore it is not feasible to ask the students to describe them
60 C. Conati and H. Maclare
after game playing. Asking the students to describe them during the interaction, if not
done properly, can significantly interfere with the very emotional states that we want
to assess. Pilot testing various ways to try this second option showed that the least
intrusive solution consisted of using two identical dialogue boxes [4]. One dialogue
box (Fig. 5) is always available next to the game window for students to input their
emotional states spontaneously. A similar dialogue box pops up if a student does not
do this frequently enough, or if the model assesses that the student’s emotional state
has likely changed. Students were asked to report feelings toward the game and the
agent only, as it was felt that our 11-year-old subjects would be too confused if asked
to describe three separate feelings.
20 grade students participated in the study, run in a local school. They were told
that they would be playing a game with a computer-based agent that was trying to
understand their needs and help them play the game better. Therefore, the students
were encouraged to provide their feelings whenever their emotions changed so that
the agent could adapt its behavior. In reality, the agent was directed by an experi-
menter who was instructed to provide help if the student showed difficulties with the
climbing task. Help was provided through a Wizard of Oz interface that allowed the
experimenter to generate hints at different levels of detail. All of the experimenter’s
and student’s actions were captured by the affective model, which was updated in real
time to direct the appearance of the additional dialogue box, as described earlier.
Students filled the same personality test and goal questionnaire used in previous
studies. Log files of the interaction included the student’s reported emotions and
corresponding model assessment.
We start our data analysis by measuring how often the model’s assessment agreed
with the student’s reported emotion. We translated the students’ reports for each
emotion pair (e.g. joy/distress) and the model’s corresponding probabilistic assess-
ment into 3 values; ‘positive’ (any report higher than ‘neutral’ in the dialogue box),
‘negative’ (any report lower than ‘neutral’) and ‘neutral’ itself. If the model’s assess-
ment was above a simple threshold then it was predicting a positive emotion, if not
Evaluating a Probabilistic Model of Student Affect 61
then it was predicting a negative emotion. We did not include a ‘neutral’ value in the
model’s emotion nodes because we did not have sufficient knowledge from previous
studies to populate the corresponding CPTs.
Making a binary prediction from the model’s assessment is guaranteed to disagree
with any neutral reports given. However, we found that 25 student reports (53% and
35% of the neutral joy and admiration reports respectively) were neutral for both joy
and admiration. If, as these reports indicate, the student had a low level of emotional
arousal, then this state that can be easily picked up by biometric sensors in the diag-
nostic part of the model [5]. This is a clear example of a situation where the observed
evidence of a student’s emotional state can inform the causal assessment of the
model.
agreed with the student. To determine whether any students had significantly different
accuracy, we performed cross-validation to produce a measure of standard deviation.
This measure is quite high for reproach and distress because far fewer data points
were recorded for these negative emotions, but it is low for the other emotions,
showing that the model produced similar performances for each student.
Table 1 shows that the combined accuracy for admiration/reproach is much lower
than the combined accuracy for joy/distress. To determine to what extent these re-
sults are due to problems with the sub-network assessing student goals or with the
sub-network modeling the appraisal process, we analyzed how the accuracy changed
if we added evidence on student goals into the model, simulating a situation in which
the model assesses goals correctly.
Table 2 shows that, when we add evidence on student goals, the accuracy for ad-
miration improves, but the accuracy for joy is reduced. To understand why, we took a
closer look at the data for individual students. While the increase in accuracy for
admiration was a general improvement for all students who reported this emotion, the
decreases in accuracy for joy and distress were due to a small number of students for
whom the model no longer gave a good performance. We have identified 2 reasons
for this result:
Reason 1. As we mentioned in a previous section, we currently have no links con-
necting student actions to the satisfaction of the goals Have Fun and Learn Math
because we did not have sufficient knowledge to build these links. However, in this
study, 4 students reported that they only had goals of Have Fun or Learn Math (or
both). For these students, the model’s belief for joy only changed after agent actions.
Since the agent acted infrequently, the model’s joy belief changed very little from its
initial value of 0.5. Thus, because of the 0.65 threshold, all student reports for
joy/distress were classified as distress, and the model’s accuracy for this emotion pair
was reduced. Removing these 4 students from the data set improved the accuracy for
detecting joy when goal evidence was used from 50% to 74%. An obvious fix for this
problem is to add to the model the links that relate the goals Have Fun and Learn
Evaluating a Probabilistic Model of Student Affect 63
Math to student actions. We plan to run a study explicitly designed to gather the rele-
vant information from student interviews after game playing.
Reason 2. Of the 7 distress reports collected, 4 were not classified correctly because
they occurred in a particular game situation. The section of the graph within the rec-
tangle in Fig. 6 shows the comparison between the model’s assessment and the stu-
dent’s reported emotions (normalized between 0 and 1 for the sake of comparison)
during one such occurrence. In this segment of the interaction, the student falls and
then makes a rapid series of successful climbs to get back to the position that she fell
from. She then falls again and repeats the process until eventually she solves the
problem. This student has declared the goals Have Fun, Learn Math, and Succeed by
Myself but, for reason 1 above, only the latter goal influences the student’s emotional
state after a student action. Thus, each fall reduces the model’s belief for joy because
the student is not succeeding. Each successful move without the agent’s help (i.e. in
most of her moves) increases the model’s belief for joy. However, apparently the
model overestimated how quickly the student’s level of joy recovered because of the
successful moves. This was the case for all students whose reports of distress were
misclassified. In order to fix this problem the model needs a long-term assessment of
the student’s overall mood that will influence the priorities of student goals. It also
needs an indication of whether moves represent actual progress in the game, adding
links that relate this to the satisfaction of the goal Have Fun. Finally, we can use per-
sonality information to distinguish between students who experience frustration in
such a situation and those who are merely ‘playing’ (some students enjoy falling and
do not care about succeeding).
The improvement in the accuracy of emotional assessment (after taking into ac-
count the problems just discussed) when goal evidence is included shows that the
model was not always accurate in predicting student goals. Why then was the accu-
racy for joy and distress so high when goal evidence was not included? Without this
information, the model’s belief for each goal tended to stay close to its initial value of
0.5, indicating that it did not know whether the student had the goal or not. Because
successful moves can satisfy three out of the five goals in the model (Succeed by
Myself, Avoid Falling and Beat Partner) and all students moved successfully more
often than they fell, the model’s assessment for joy tended to stay above the threshold
value of 0.65, leading to a high number of reports being classified as joy. Most of the
5 distress reports related to the frustrating situations described earlier were also classi-
fied correctly. This is because the model did not correctly assess the fact that all the
students involved in these situations had the goal Succeed by Myself and therefore
did not overestimate the rising of joy as it did in the presence of goal evidence. This
behavior may suggest that we don’t always need an accurate assessment of goals to
have an acceptable model of student affect. However, we argue that knowing the
exact causes of the student’s affective states can help an intelligent agent to react to
these states more effectively. Thus, the next stage of our analysis relates to under-
standing the model’s performance in assessing goals and how to improve it. In par-
ticular we explore whether having information on personality and interaction patterns
is enough to accurately determine a person’s goals.
64 C. Conati and H. Maclare
Only 10 students completed the personality test in our study. Table 3 shows, for each
goal, the percentage of these students for whom the declaration of that goal was cor-
rectly identified, and how these percentages change when personality information is
used. A threshold of 0.6 was used to determine whether the model thought that a
student had a particular goal, because goals will begin to substantially affect the as-
sessment of student emotions at this level of belief. The results show that personality
information improves the accuracy for only two of the five goals, Have Fun and Beat
Partner. For the other goals, the results appear to indicate that the model’s belief
about these goals did not change. However, what actually happened is that in these
cases the belief simply did not change enough to alter the models predictions using
the threshold.
The model’s belief about a student’s goals is constructed from causal knowledge
(personality traits) and evidence (student actions). Fig. 3 showed the actions identified
as evidence for particular goals . When personality traits are used, they produce an
initial bias towards a particular set of goals. Evidence collected during the game
should then refine this bias, because personality traits alone cannot always accurately
assess which goals a student has. However, currently the bias produced by personality
information is stronger than the evidence coming from game actions. There are two
reasons for this strong bias:
Reason 1. Unfortunately, some of the actions collected as evidence (e.g. asking the
agent for advice) did not occur very frequently, even when the student declared the
particular goal that the action was evidence for. One possible solution is to add to the
model a goal prior for each of the covered goals. The priors would be produced by a
short test before the game and only act as an initial influence since the model’s goal
assessments will be dynamically refined by evidence. Integration of the prior infor-
mation with the information on personality and interaction patterns will require ficti-
tious root goal nodes to be added the model.
Reason 2. Two of the personality traits that affect the three goals Learn Math, Avoid
Falling, and Succeed by Myself (see Fig. 3) are Neuroticism and Extraversion. How-
ever, the significant correlations that are represented by the links connecting these
goals and personality traits were based on very few data points. This has probably led
Evaluating a Probabilistic Model of Student Affect 65
to stronger correlations than would be found in the general population. Because evi-
dence coming from interaction patterns is often not strong enough (see Reason 1
above), then the model is not able to recover from the bias that evidence on these two
personality traits brings to the model assessment. An obvious fix to these problems is
to collect more data to refine the links between personality and goals.
References
1. Ball, G. and Breese, J. 1999. Modeling the Emotional State of Computer Users. Work-
shop on ‘Attitude, Personality and Emotions in User-Adapted Interaction’, UM’99, Can-
ada.
2. Bosma, W. and André, E. 2004. Recognizing Emotions to Disambiguate Dialogue Acts.
International Conference on Intelligent User Interfaces 2004. Madeira, Portugal.
66 C. Conati and H. Maclare
Abstract. Intelligent Tutoring Systems usually take into account only the cog-
nitive aspects of the student: they may suggest the right actions to perform, cor-
rect mistakes, and provide explanations. However, besides cognition, educa-
tional researchers increasingly recognize the importance of factors such as self-
confidence and interest that contribute to learner intrinsic motivation. We be-
lieve that the student affective goals can be taken into account by implementing
a model of politeness into the tutoring system. This paper aims at providing an
overall account of politeness in tutoring interactions (in particular, natural lan-
guage dialogs), and describes the way in which politeness has been imple-
mented in an intelligent tutoring system based on an animated pedagogical
agent. The work is part of a larger project building a socially intelligent peda-
gogical agent able to monitor learner performance and provide socially sensi-
tive coaching and feedback at appropriate times. The project builds on the ex-
perience gained in realizing several other pedagogical agents.
1 Introduction
Intelligent Tutoring Systems usually take into account only the cognitive aspects of
the student: they may suggest the right actions to perform, correct mistakes, and pro-
vide explanations. However, besides cognition, educational researchers increasingly
recognize the importance of factors such as self-confidence and interest that contrib-
ute to learner intrinsic motivation [21]. ITSs not only usually ignore the motivational
states of the student, but might even undermine them, for instance when the system
says “Your answer is wrong” (affecting learner self-confidence), or “Now execute
this action” (affecting learner initiative).
We believe that the student affective goals can be taken into account by imple-
menting a model of politeness into a tutoring system. A polite tutor would respect the
student need to be in control, by suggesting rather than imposing actions; it would
reinforce the student self-confidence, by emphasizing his successful performances, or
by leveraging on the assumption that he and the tutor are solving the problems to-
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 67–76, 2004.
© Springer-Verlag Berlin Heidelberg 2004
68 W.L. Johnson and P. Rizzo
gether; it would make the student more comfortable and motivated towards the
learning task, by trying to build up a positive relationship, or “rapport”, with him; and
it would stimulate the student interest, by unobtrusively highlightling open and unre-
solved issues.
This paper aims at providing an overall account of politeness in tutoring interac-
tions (in particular, natural language dialogs), and describes the way in which polite-
ness has been implemented in an intelligent tutoring system incorporating an ani-
mated pedagogical agent. The work is part of a larger project building a socially in-
telligent pedagogical agent able to monitor learner performance and provide socially
sensitive coaching and feedback and appropriate times [11]. Animated pedagogical
agents can produce a positive affective response on the part of the learner, sometimes
referred to as the persona effect [16]. This is attributed to the natural tendency for
people to relate to computers as social actors [20], a tendency that animate agents
exploit. Regarding politeness, the social actor hypothesis lead us to expect that hu-
mans not only respond to social cues, but also that they behave politely toward the
agents.
3. Negative politeness: the speaker redresses the hearer’s negative face by sug-
gesting that the hearer is free to decide whether to comply with the FTA,
e.g.: “Now you might want to set the planning methodology of the factory.”
4. Off record: the speaker provides some sort of hint to what he means, without
committing to a specific attributable intention, for example: “What about the
planning methodology of the factory?”
5. Don’t do the FTA: when the weightiness of the FTA is considered too high,
the speaker might simply avoid performing the FTA.
In other words, the socratic hint is a case in which politeness is used not only for
mitigating face threats, but also for indirectly influencing the student motivation.
Although politeness theory and motivation theory come out of distinct literatures,
their predictions regarding the choice to tutorial interaction tactics are broadly con-
sistent. This is not surprising, since the wants described by politeness theory have a
clear motivational aspect; negative face corresponds to control, and positive face
corresponds somewhat to confidence in educational settings. Therefore, we are led to
think that tutors may use politeness strategies not only for minimizing the weightiness
of face threatening acts, but also for indirectly supporting the student’s motivation.
For instance, the tutor may use positive politeness for promoting the student positive
face (e.g. his desire for successful learning), and negative politeness for supporting
the student negative face (e.g. his desire for autonomous learning).
Here Wx+ and Wx- are the amounts of positive and negative face threat redress, re-
spectively, T represents the tutor and S represents the student. Rx+ is the inherent
positive face threat of the communicative act (e.g., advising, critiquing, etc.,), Rx- is
the inherent negative face threat of the act, D+ is the amount of augmentation of
positive face desired by the tutor, and D - is the augmentation of learner negative
face.
As a final modification of Brown and Levinson’s theory, we have grouped polite-
ness strategies in a more fine-grained categorization (see Table 1), that takes into
account the types of speech acts observed in the transcripts of real tutoring dialogs.
ance. For each utterance type a set of politeness strategies are available, ordered by
the amount of face threat mitigation they offer. Each strategy is in turn described as a
set of dialog moves, similar to those shown in Figure 1. These are passed to the natu-
ral language generator, which selects a dialog move. The combined dialog generator
takes as input the desired utterance type, language elements, and a set of parameters
governing face threat mitigation (social distance, social power, and motivational sup-
port) and generates an utterance with the appropriate degree of face threat redress.
Using this generation framework, it is possible to present the same tutorial comment
with different degrees of politeness. For example, a suggestion to save the current
factory description, can be stated either bald on record (e.g., “Save the factory now”),
as a hint, (“Do you want to save the factory now?”), as a suggestion of what the tutor
would do (“I would save the factory now”), as a suggestion of a joint action (“Why
don’t we save our factory now?”), etc.
how to evaluate positive and negative politeness were not clear enough to the sub-
jects. We revised the wording of the questionnaire, based on feedback from this set of
subjects, and collected data from 47 subjects from University of California Santa
Barbara, with much more consistent results.
9 Related Work
Affect and motivation in learning environments are attracting increasing interest, e.g.,
the work of del Soldato et al [8] and de Vicente [7]. Heylen et al. [10] highlight the
importance of these factors in tutors, and examine the interpersonal factors that
should be taken into account when creating sociallly intelligent computer tutors.
Cooper [6] has shown that profound empathy in teaching relationships is important
because it stimulates positive emotions and interactions that favor learning. Baylor
[3] has conducted experiments in which learners interact with multiple pedagogical
agents, one of which seeks to motivate the learner. User interface and agent research-
ers are also beginning to apply the Brown & Levinson model to human-computer
interaction in other contexts [5; 17]; see also André’s work in this area [2].
Porayska-Pomsta [19] has also been using the Brown & Levinson model to ana-
lyze teacher communications in classroom settings. Although there are similarities
between her approach and the approach described here, her model makes relatively
less use of face threat mitigating strategies. This may be due to the differences in the
Politeness in Tutoring Dialogs: “Run the Factory, That’s What I’d Do” 75
social contexts being modeled: one-on-one coaching and advice giving is likely to
result in a greater degree of attention to face work.
Other researchers such as Kort et al. [1, 13], and Zhou and Conati [22] have been
addressing the problem of detecting learner affect and motivation, and influencing it.
Comparisons with this work are complicated by differences in terminology regarding
affect and emotion. We adhere to the terminological usage of Lazarus [14], who
consider all emotions to be appraisal-based, and distinguish emotions from other
states and attitudes that may engender emotions in specific context. In this sense our
focus is not on emotions per se, but on states (i.e., motivation, face wants) that can
engender emotions in particular contexts (e.g., frustration, embarassment). Although
nonverbal emotional displays were not prominent in the tutorial dialogs described in
this paper, they do arise in tutorial dialogs that we have studied in other domains, and
we plan in our future work to incorporate them into our model.
10 Conclusion
This paper has presented a model of politeness in tutorial dialog, based on transcripts
of student-tutor interaction. We have shown how politeness theory, extended to ad-
dress the specifics of tutorial dialog, can provide a common account for tutorial ad-
vice giving, motivational tactics, and Socratic dialog. We believe that this model
could be applied broadly to intelligent tutoring systems to engender a more positive
learner attitude, both toward the subject matter and toward the tutoring system.
Once we complete our experimental evaluations of the model, we plan to extend it
to other domains, such as foreign language learning. Future work will then investigate
how to integrate nonverbal gesture and affective displays into the model, in order to
control the behavior of an animated pedagogical agent.
References
1. Aist G., Kort B., Reilly R., Mostow J., Picard R.W: Adding Human-Provided Emotional
Scaffolding to an Automated Reading Tutor that Listens Increases Student Persistence. In
S. A. Cerri, G. Gouardères, F. Paraguaçu (Eds.): ITS 2002. Springer, Berlin (2002)
2. André, E. Rehm, M., Minker, W., Bühner, D.: Endowing spoken language dialogue sys-
tems with emotional intelligence. In: Proceedings ADS04. Springer, Berlin (2004)
76 W.L. Johnson and P. Rizzo
3. Baylor, A.L., Ebbers, S.: Evidence that Multiple Agents Facilitate Greater Learning. Inter-
national Artificial Intelligence in Education (AI-ED) Conference. Sydney (2003)
4. Brown, P., Levinson, S.C.: Politeness: Some universals in language use. Cambridge
University Press, New York (1987)
5. Cassell, J., Bickmore, T.: Negotiated Collusion: Modeling Social Language and its Rela-
tionship Effects in Intelligent Agents. User Modeling and User-Adapted Interaction, 13, 1-
2(2003)89–132
6. Cooper B.: Care – Making the affective leap: More than a concerned interest in a learner’s
cognitive abilities. International Journal of Artificial Intelligence in Education, 13, 1
(2003)
7. De Vicente, A., Pain, H.: Informing the detection of the students’ motivatonal state: An
empirical study. In: S.A. Cerri, G. Gouardères, F. Paraguaçu (Eds.): Intelligent Tutoring
Systems. Springer, Berlin (2002) 933-943
8. Del Soldato, T., du Boulay, B.: Implementation of motivational tactics in tutoring systems.
Journal of Artificial Intelligence in Education, 6, 4 (1995) 337-378
9. Dessouky, M.M., Verma, S., Bailey, D., Rickel, J.: A methodology for developing a Web-
based factory simulator for manufacturing education. IEE Transactions 33 (2001) 167-
180
10. Heylen, D., Nijholt, A., op den Akker, R., Vissers, M.: Socially intelligent tutor agents.
Social Intelligence Design Workshop (2003)
11. Johnson, W.L.: Interaction tactics for socially intelligent pedagogical agents. Int’l Conf.
on Intelligent User Interfaces. ACM Press, New York (2003) 251-253
12. Johnson, W.L., Rizzo, P., Bosma W., Kole S., Ghijsen M., Welbergen H.: Generating
Socially Appropriate Tutorial Dialog. In: Proceedings of the Workshop on Affective Dia-
logue Systems (ADS04). Springer, Berlin (2004)
13. Kort B., Reilly R., Picard R.W.: An Affective Model of Interplay between Emotions and
Learning: Reengineering Educational Pedagogy – Building a Learning Companion. In:
ICALT(2001)
14. Lazarus, R.S.: Emotion and adaptation. Oxford University Press, New York (1991)
15. Lepper, M.R., Woolverton, M., Mumme, D., Gurtner, J.: Motivational techniques of ex-
pert human tutors: Lessons for the design of computer-based tutors. In: S.P. Lajoie, S.J.
Derry (Eds.): Computers as cognitive tools. LEA, Hillsdale, NJ (1993) 75-105
16. Lester, J. C., Converse, S. A., Kahler, S. E., Barlow, S. T., Stone, B. A., Bhogal, R. S.:
The persona effect: Affective impact of animated pedagogical agents. In: CHI ’97. (1997)
359-366
17. Miller C. (ed.): Etiquette for Human-Computer Work. Papers from the AAAI Fall Sympo-
sium. AAAI Technical Report FS-02-02 (2002)
18. Pilkington, R.M.: Analysing educational discourse: The DISCOUNT scheme. Technical
report 99/2, Computer-Based Learning Unit, University of Leeds (1999)
19. Porayska-Pomsta, K.: Influence of Situational Context on Language Production. Ph.D.
thesis. University of Edinburgh (2004)
20. Reeves, B. ,Nass, C.: The media equation. Cambridge University Press, New York (1996)
21. Sansone, C., Harackiewicz, J.M.: Intrinsic and extrinsic motivation: The search for opti-
mal motivation and performance. Academic Press, San Diego (2000)
22. Zhou X., Conati C.: Inferring User Goals from Personality and Behavior in a Causal
Model of User Affect. In: Proceedings of IUI 2003 (2003).
Providing Cognitive and Affective Scaffolding Through
Teaching Strategies: Applying Linguistic Politeness to the
Educational Context
Edinburgh University
ICCS/HCRC
2, Buccleuch Place, Edinburgh EH8 9LW, United Kingdom
{kaska, helen} @inf.ed.ac.uk
1 Introduction
Teaching strategies are teachers’ primary tool for controlling the flow of a lesson and
the flow of the student’s progress. Traditionally a teaching strategy is associated with
a method for teaching a particular topic, with its nature being dictated either by the
content taught, the student’s cognitive needs and abilities, or both. For example, the
content may dictate that the strategy of presenting a particular problem by analogy is
better than a strategy, which simply describes it. On the other hand, a student’s cur-
rent cognitive demands may indicate that a problem decomposition strategy may be
more advantageous to him than prompting. The relevant literature (e.g., [2]; [7]) re-
veals that depending on the task and the student, on average, a teacher may have to
choose between at least eight different high level strategies, each of which may pro-
vide her with as many more sub-strategies. A teacher needs to discriminate between
the available strategies and, for each feedback move, to choose the one that brings the
most significant educational benefits. Unfortunately, as McArthur et al. [7] point out,
teaching strategies constitute the aspect of teaching, which is the least developed and
understood to date. This may be due to the general lack of understanding of the con
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 77–86, 2004.
© Springer-Verlag Berlin Heidelberg 2004
78 K. Porayska-Pomsta and H. Pain
ditions under which particular strategies may be used, and of the effects that their use
has on student’s learning.
Most of the catalogued strategies are aimed at remedying students’ misconceptions
through corrective feedback of a sort, which structures the content appropriately or
gives at least part of the answer away (cognitive scaffolding). However, it is recog-
nised in education that the success of cognitive development of students also depends
on the support that the teacher provides with respect to their emotional needs (e.g.
[4]). Several attempts to define a list of affective scaffolding strategies have been
made to date. Malone and Lepper [5] propose that as well as having purely content-
oriented pedagogical goals, teachers also have motivational goals such as to challenge
the student, to arouse his curiosity, and to support his sense of self-control or self-
confidence. Clearly, certain of these motivational goals (challenge/curiosity support)
are strongly related to both providing the student with cognitive scaffolding (e.g.,
appropriate level of difficulty, suitable representation of a problem to be solved by
the student, goals of the task, etc.), and with affective scaffolding (e.g., suitable level
of challenge should allow the student to solve the problem independently of the
teacher, resulting in the student’s sense of accomplishment and a raised level of self-
esteem).
Despite useful progress being made with respect to defining what constitutes a
good teaching strategy and despite there being a number of catalogues of teaching
strategies, there are still no systematic accounts of: (1) the relationship between cog-
nitive and affective type of support, (2) the conditions under which given strategies
may be used in terms of providing the student with both cognitive and affective scaf-
folding most effectively, or (3) the way in which the two types of support are mani-
fested in teachers’ concrete actions. Natural language is widely recognised as a pow-
erful means for delivering the appropriate guidance and affective support to the stu-
dent: in this paper we take it as the basis for explaining the nature of and the relation-
ship between the cognitive and the affective scaffolding. Based on dialogue analysis,
we relate these two types of support to concrete strategies that tutors tend to use in
student corrective situations, and we specify the contextual conditions under which
these strategies may be used successfully. We present a model of teachers’ selecting
corrective feedback and show how the cognitive and the affective nature of instruc-
tion can be consolidated in terms of language general communicative strategies as
accounted for in research on linguistic politeness.
situations the tutor is often unable to provide fully positive feedback such as “Well
done!” or “Good!” without being untrue to her assessment of the student’s action;
such positive feedback is typically reserved for praising the student for what he did
correctly. Instead, as Fox [3] observes, tutors use indirect language (e.g., “Why don’t
you try again?” or “Okay” said in a hesitating manner) to convey to the student in as
motivating a way as possible that his answer was problematic, while leading him to
the desired cognitive goals. Through appropriate use of indirect language experienced
tutors maintain a necessary balance between allowing students as much learning ini-
tiative as possible while giving them enough guidance and encouragement to prevent
their frustration. Tutors adjust their language according to what they think are the
current cognitive and psychological needs of their students, in order to achieve spe-
cific communicative and pedagogical goals, i.e. they choose language in a highly
strategic way based on the current socio-situational settings.
Strategic language use based on social interactions is a primary notion in research
on linguistic politeness. In particular, Brown and Levinson’s theory [1], henceforth
B&L, provides many valuable insights as to the way in which the social and the emo-
tional aspects of participants in an interaction affect communication in general. In
this theory the cognitive and the affective states of the participants, and their ability to
recognise those states accurately, are inherently linked to the success of communica-
tion. According to B&L every social interaction involves face – a psychological
dimension that applies to all members of society. Face is a person’s self-image, which
can be characterised by two dimensions:
1. Negative Face: a need for freedom of action and freedom from imposition,
i.e., a desire for autonomy
2. Positive Face: a need to be approved of by others, i.e., the need for approval.
In addition to face, all members of society are equipped with an ability to behave
in a rational way. The public self-image regulates all speakers’ linguistic actions at all
times. Speakers choose their language to minimise the threat to their own and to oth-
ers’ face, i.e., they engage in facework. The ability of speakers to behave rationally
enables them to assess the extent of the potential threat of their intended actions and
to accommodate (in various degrees) for others’ face while achieving their own goals
and face needs. Every community provides its members with a set of rules (conven-
tions) which define means for achieving their goals in a socially and culturally ac-
ceptable, i.e., polite, manner. In terms of language, these conventions are manifested
in concrete communicative strategies that are available to speakers.
B&L propose four main strategies (Fig. 1) which represent the social conventions
that speakers use to make appropriate linguistic choices: the On-record, bald (e.g. to a
perfect stranger: “Give me your money!”), the On-record, redressive (e.g. To a
friend: “Look, I know you’re broke right now, but could you lend me some money,
please?”) the Off-record (e.g. To a friend: “I can’t believe it! I forgot my wallet at
home”) and the Don’t do face threatening action (FTA) strategies. Each strategy
leads to a number of different sub-strategies and their prototypical surface form reali-
sations.
The appropriateness of a communicative strategy for achieving a particular speaker
goal is determined along the two dimensions of face. According to B&L, speakers
80 K. Porayska-Pomsta and H. Pain
tend to choose the strategies and consequently their language based on three vari-
ables: (1) the social distance between them and the hearer, (2) the power that the
hearer has over them, and (3) a ranking of imposition for the act that they want to
commit. Speakers establish the values of those variables based on the current situation
and the cultural conventions under which a given social interaction takes place. For
example, power may depend on the interlocutors’ status or their access to goods or
information; distance depends on the degree of familiarity between the parties in-
volved, while rank of imposition typically reflects social and cultural conventions of a
given speech community, which ranks different acts with respect to how much they
interfere with people’s need for autonomy and approval. A speaker’s ability to assess
the situation with respect to a hearer’s social, cultural and emotional needs constitutes
a crucial facet of his social and linguistic competence.
Intuitively, in education, the two dimensions of face seem to play an important role in
guiding teachers in their selection of strategies. In particular, in situations which re-
quire the teacher to correct her student, i.e., to perform a potentially face threatening
action, the teacher’s awareness and her will to find the means to accommodate for the
student’s desire for autonomy and approval seem essential to the pedagogical success
of her actions. A teacher’s obligation vis à vis the student is to promote his cognitive
progress. As many researchers currently accept, such progress is achieved best by
having the student recognise that he made an error and by allowing him the initiative
to find the correct solution. (e.g., [2]). This means that teachers should avoid giving
the answers to students. Thus, the need to provide the student with autonomy of
action seems to be a well recognised aspect of good teaching. However, cognitive
progress is said to be facilitated also by avoiding any form of demotivation (e.g., [4]).
This means that teachers should avoid criticising (or disapproving of) students’ ac-
tions in a point blank manner, i.e. in B&L’s terms they ought to use Off-record strate-
gies. As with autonomy, the notion of approval seems to constitute an integral part of
good teaching. This suggests that, in line with the communicative strategies referred
by B&L in the language-general context, teaching strategies can be defined along the
two dimensions of face: teaching strategies may be viewed as a specialised form of
communicative strategies.
To date, there has been relatively little effort made towards relating the theory of
linguistic politeness to teachers’ strategic language use. The most prominent attempt
is that by Person et al. [7] in which they analyse tutorial dialogues to assess whether
or not facework impacts the effectiveness of student’s learning. They confirm that,
just as speakers in normal conversations, tutors also engage in facework during tu-
toring. The resulting language varies in terms of the degree of indirectness of the
communicated messages, with a given degree of indirectness being dependent on the
level of politeness that the tutor deems necessary in a particular situation and with
respect to a particular student. For example, students who are not very confident may
need to be informed about the problems in their answers more indirectly than students
Providing Cognitive and Affective Scaffolding Through Teaching Strategies 81
who are fairly self-satisfied. However, the overall conclusions of Person’s analysis
do not bode well for the role of politeness in tutoring which, they claim, may inhibit a
tutor’s ability to give adequately informative feedback to students as a way of avoid-
ing face threat. In turn, vagueness of the tutor’s feedback may lead to the student’s
confusion and lack of progress.
Although valuable in many respects, Person et al.’s analysis is problematic in that
it assumes that tutorial interactions belong to the genre of normal conversation for
which B&L’s model was developed. However, B&L’s theory is not entirely applica-
ble to teaching in that language produced in tutoring circumstances is governed by
different conventions than that of normal conversations [8]. These differences impact
both the type of contextual information that is relevant to making linguistic choices
and the nature of the strategies. With respect to the strategies, teachers do not tend to
offer gifts or information as a way of fulfilling their students’ face needs, nor do they
tend to apologise for requesting information from them; their questions are typically
asked not to gain information in the conventional sense, but to test the students’
knowledge, to highlight problematic aspects of his reasoning, to prompt or to hint.
Similarly, instructions and commands are not typically perceived by teachers as out
of the ordinary in the teaching circumstances. While some of B&L’s strategies simply
do not apply to educational contexts, others require a more detailed specification or
complete redefinition. With respect to the contextual information used to guide the
selection of the strategies, power and distance seem relatively constant in the educa-
tional, student-corrective genre, rendering the rank of imposition the only immedi-
ately contextual variable relevant to teachers’ corrective actions [13].
In order to explore the relationship between the Positive and the Negative face di-
mensions, and the cognitive and the affective aspects of instruction, in terms of a
formal model, and given the observation that the language of education is governed
by different conventions than that of normal conversation, it is necessary to (1) define
face and facework for an educational context; (2) determine a system of strategies
representative of the linguistic domain under investigation; (3) define the strategies
included in our model in terms of face. Furthermore it is necessary to identify the
contextual variables which affect teachers’ linguistic choices and to relate them to the
notion of face.
We analysed two sets of human-human tutorial and classroom dialogues: one in the
domain of basic electricity and electronics (BEE) and one in the domain of literary
analysis. In line with Person et al.’s analysis, we observed that facework plays a
crucial role in education: teachers tend to employ linguistic indirectness so as not to
threaten the student’s face. However, we found B&L’s definitions of the face dimen-
82 K. Porayska-Pomsta and H. Pain
sions not to be precise enough to explain the nature of face and facework in educa-
tional circumstances. Our dialogue analysis confirms other researchers’ suggestions
that indirect use of language by teachers results from their attempt to allow their stu-
dents as much freedom of initiative as possible (pedagogical/cognitive considerations)
while making sure that they do not flounder and become demotivated (motivational
concerns) [4]. Specifically, we found that all of teachers’ corrective feedback can be
interpreted in terms of both differing amount of content specificity, that is, how spe-
cific and how structured the tutor’s feedback is with respect to the answer sought
from the student (compare: “No, that’s incorrect” with “Well, if you put the light
bulb in the oven then it will get a lot of heat, but will it light up?”), and illocutionary
specificity, that is, how explicitly accepting or rejecting the tutor’s feedback is (com-
pare: “No that’s incorrect” with “Well, why don’t you try again ? ”).
Based on these observations we define the Negative and the Positive face directly
in terms of:
Autonomy: letting the student do as much of the work as possible (determi-
nation of the appropriate level of content specificity and accommodation for
the student’s cognitive needs)
Approval: providing the student with as positive feedback as possible (de-
termination of the appropriate level of illocutionary specificity and accom-
modation for the student’s affective needs).
The less information the teacher gives to the student, the more autonomy she gives
him and vice versa. The more explicit the references to the student’s good traits, his
prior or current achievements or the correctness of his answer, the more approval the
teacher gives to the student. However, if the teacher supports the student’s reasoning
without giving away too much of the answer, she can be said also to approve of the
student to an extent. Thus the level of approval given by the tutor can be affected by
the amount of autonomy given and vice versa, which suggests that the two dimen-
sions are not fully independent from each other. It can be further inferred from this
that cognitive and affective support, as provided through teachers’ language, are also
dependent of each other.
The tightened definitions of the face dimensions allowed us to identify the student
corrective strategies used by tutors and teachers in our dialogues, and to characterise
them in terms of the degree to which each accommodates for the student’s need for
autonomy and approval (henceforth <Aut, App>). In defining the system of strate-
gies representative of our data, first we identified those of B&L strategies which seem
to apply to the educational settings. We then identified other strategies used in the
dialogues, whenever possible we related them to the teaching strategies proposed by
other researchers, and combined them with those proposed by B&L. The resulting
strategic system differs in a number of respects from that of B&L. Whilst B&L’s
system proposes a clear separation between those strategies which address Negative
and Positive face, in our model all strategies are characterised in terms of the two face
Providing Cognitive and Affective Scaffolding Through Teaching Strategies 83
dimensions. In B&L’s model the selection of a strategy was based only on one nu-
meric value – the result of summing the three social variables. In our model two
values are used in such a selection: one referring to the way in which a given strategy
addresses a student’s need for autonomy and another to the way in which the strategy
addresses a student’s need for approval.
Although we retain B&L’s high-level distinction between On-record, Off-record
and Don’t do FTA strategies, the lower level strategies refer explicitly to both the
pedagogical goals of tutors’ corrective actions as encapsulated in our definition of
autonomy, and to the affective goals as expressed in our definition of approval. We
split the strategies into two types: the main strategies which are used to express the
main message of the corrective act, i.e., the teacher’s rejection of the student’s previ-
ous answer, and the auxiliary strategies which are used primarily to express redress.
Although both types of strategies affect both face dimensions, the auxiliary strategies
tend to increase the overall level of approval given to the student. For example one of
the main on-record strategies, give complete answer away (e.g. “The answer
is...”), which is characterised by no autonomy and lack of explicit approval, and thus
as being quite threatening to the student’s face, can combine with the auxiliary strat-
egy state FTA as a general rule (e.g. “We are running out of time, so I will
tell you the answer”) to reduce the overall face threat. Unlike in B&L’s model in
which the strategies are rigidly assigned to a particular type of facework, in our ap-
proach the split between the strategies provides for a more flexible generative model
which reflects the way in which teachers tend to provide corrective feedback: in a
single act a teacher often makes use of several different strategies simultaneously.
The assignment of the <Aut, App> values, each being anything between 0 and 1,
to the individual strategies is done relative to other strategies in our system. For ex-
ample when contrasting a strategy such as give complete answer away (e.g.
“The answer is...”) with a strategy such as use direct hinting (e.g. “That’s one
way, but there is a better way to do this”) we assessed the first strategy as giving less
autonomy and less approval to the student than the second strategy. On the other
hand, when compared with a third strategy such as request self-explanation
(e.g., “Why?”), the hinting strategy seems to give less autonomy, but more approval.
For each strategy we compiled a list of its possible surface realisations and we also
ordered them according to the degrees of <Aut, App> that they seem to express.
We implemented the model in a system, shown in figure 1. The surface forms coded
for <Aut, App> values are stored in a case base (CB2) which provides different feed-
back alternatives using a standard Case Based Reasoning technique. A Bayesian net-
work (BN) combines evidence from the factors to compute values for <Aut, App> for
every situational input. The structure of the network reflects the relationship of fac-
tors as determined by the study with teachers. The individual nodes in the network are
populated with the conditional probabilities calculated using the types of rules de-
scribed above. To generate feedback recommendations, the system expects an input
in the form of factor-values. The factor-values are interpreted by the Pre-processing
unit (PPU) as evidence required by the BN. The evidence consists of salience of each
factor-value in the input. It is either retrieved directly from the Case Base 1 (CB1)
which stores all the situations seen and ranked by the teachers in the study or, if there
is no situation in the CB1 that matches the input, it is calculated for each factor-value
from the mean salience of three existing nearest matching situations using the K-
nearest neighbour algorithm (KNN1). When evidence is set, the BN calculates <Aut,
App>. These are passed to the linguistic component. KNN2 finds N closest matching
pairs of <Aut, App> (N being specified by the user) which are associated with spe-
Providing Cognitive and Affective Scaffolding Through Teaching Strategies 85
cific linguistic alternatives stored in CB2, and which constitute the output of the sys-
tem.
The model was evaluated by four experienced BEE tutors. Each tutor was presented
with twenty different situations in the form of short dialogues between a student and a
tutor. Each interaction ended with either incorrect or partially correct student answer.
For each situation, the participants were provided with three possible tutor responses
to the student’s answer and were asked to rate each of them on a scale from 1 to 5
according to how appropriate they thought the response was in a given situation.
They were asked to pay special attention to the manner in which each response at-
tempted to correct the student. The three types of responses rated included: a response
that a human gave, the system’s preferred response, and a response that the system
was less likely to recommend for the same situation (the less preferred response).
A t-test was performed to determine any significant differences between the three
types of responses. The analysis revealed a significant difference between human
responses and the system’s less preferred responses (t(19) = 4.40, p < 0.001), as well
as a significant difference between the system’s preferred and the system’s less pre-
ferred responses (t(19) = 2.72, p = 0.013). However, there was no significant differ-
ence between the ratings of the human responses and the system’s preferred responses,
(t(19)=1.99, p=0.61). This preliminary evaluation suggests that the model’s choices
are in line with those made by a human tutor in identical situations.
86 K. Porayska-Pomsta and H. Pain
Based on the dialogue analysis we observed that cognitive and affective scaffolding is
present in all strategies used by teachers in corrective situations. The two types of
strategies can be related to the more general notion of face, considered by theories of
linguistic politeness to be central to successful communication. We have formalised
the model proposed by B&L, and adapted it to the educational domain. We show how
educational strategies can be viewed as specialised forms of communicative strate-
gies. We believe that viewing teaching strategies from this perspective extends our
understanding of the relationship between their cognitive and the affective dimen-
sions, clarifies the conditions under which such strategies may be used to provide
both cognitive and affective scaffolding, and demonstrates how these dimensions
might be manifested in teachers’ corrective actions. Whilst the current implementation
of the model is in the domain of BEE, we are extending the model to the domain of
Mathematics. In doing so we will be exploring further the conditions for selecting
strategies, the methods for assigning <Aut, App> values to strategies and the corre-
sponding surface forms, and we plan to evaluate the revised model within a dialogue
system.
References
1. Brown, P., and Levinson, S. (1987). Politeness: Some Universal in Language Use, CUP.
2. Chi, M.T. H., Siler, S. A., Jeong, H., Yamauchi, T., and Hausmann, R.G. (2001). Learning
from human tutoring. Cognitive Science Society, (25), 471-533.
3. Fox, B. (1991). Cognitive Interactional aspects of correction in tutoring. P Goodyear (eds.),
Teaching knowledge and intelligent tutoring., pp. 149-172, Ablex, Norwood, N.J.
4. Lepper, M.R., Woolverton, M., Mumme, D. L., and Gurtner, J. (1993). Motivational Tech-
niques of Expert Tutors: Lessons for the Design of Computer-Eased Tutors, chapter 3,
pages 75-107, LEA, NJ.
5. Malone, T. W. and Lepper, M. R. (1987). Making learning fun: a taxonomy of intrinsic
motivations for learning. In R.E. Snow and M.J. Farr eds, Aptitude, Learning and Instruc-
tion: Conative and Affective Process Analyses., pages 261-265, AAAI.
6. McArthur, D., Stasz, C., and Zmuidzinas, M. (1990). Tutoring techniques in algebra.
Cognition and Instruction, (7), 197-244.
7. Person, N. K., Kreuz, R. J., Zwaan, R. A., Graesser, A. C. (1995). Pragmatics and peda-
gogy: Conversational rules of politeness strategies may inhibit effective tutoring. Cognition
and Instruction, 2(13), 161-188. Lawrence Erlbaum Associates, Inc.
8. Porayska-Pomsta, K. (2003). Influence of situational context on language prduction: Mod-
elling teachers’ corrective responses. PhD thesis, Edinburgh University.
Knowledge Representation Requirements for Intelligent
Tutoring Systems
1 Introduction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 87–97, 2004.
© Springer-Verlag Berlin Heidelberg 2004
88 I. Hatzilygeroudis and J. Prentzas
Like other knowledge-based systems, we distinguish three main phases in the life-
cycle of an ITS, the construction phase, the operation phase and the maintenance
phase. The main difference is that an ITS requires a great deal of feedback with the
users and iteration between phases. Three types of users are involved in those phases:
domain experts, knowledge engineers (both mainly involved in the construction and
maintenance phases) and learners (mainly involved in the operation phase). Each
type of user has different requirements from the KR formalism(s) to be used.
On the other hand, the system itself imposes a number of requirements to the KR
formalism. An ITS consists of three main modules: (a) the domain knowledge, which
contains the teaching content and information about the subject to be taught, (b) the
user model, which records information concerning the user, and (c) the pedagogical
model, which encompasses knowledge regarding various pedagogical decisions. Each
component imposes different KR requirements.
their usability. They consider that the effectiveness of the theories in assisting
students to learn the teaching subject is of extreme importance. Tutors are highly
involved in the construction and maintenance stages. However, in most cases, their
relation to AI is rather superficial. Sometimes even their experience in computers is
low. This may potentially make them restrained in their interaction with the
knowledge engineer. Furthermore, the teaching theories they want to incorporate
within the system can be rather difficult to express.
So, it is evident that one main requirement that tutors impose on the knowledge
representation formalism is naturalness of representation. Naturalness facilitates
interaction with the knowledge engineer and helps the tutor in overcoming his/her
possible restraints with AI and computers in general. In addition, it assists the tutor in
proposing updates to the existing knowledge. The more natural the knowledge
representation formalism, the better understanding of the existing knowledge and
communication with the knowledge engineer.
Also, checking knowledge during the knowledge acquisition process is a tedious
task. Capability of providing explanations is quite helpful for the expert. So, this is
another requirement. On the other hand, if the knowledge base can be easily updated,
then existing items of the acquired knowledge can be easily removed or updated and
new items can be easily inserted. This demands ease of update.
the KR formalism is efficient, the time spent by the knowledge engineer is reduced.
Also, the possibility of an explanation mechanism associated with the KR formalism
is important, because explanations justifying how conclusions were reached can be
produced. This feature can assist in the location of deficiencies in the knowledge
base. Hence, two other requirements are efficient inferences and explanation facility.
2.1.3 End-User
An end-user (learner) is the one who uses the system in its operation stage. He/she
imposes constraints regarding the user-interface and the time performance of the
system. The basic requirement for KR, from the point of view of end-users, concerns
time efficiency. ITSs are highly interactive knowledge-based systems requiring time-
efficient responses to the users’ actions. The decisions an ITS makes during a training
session are based on the conclusions reached by the inference engine associated with
the knowledge representation formalism. The faster the conclusions can be reached,
the faster will the system interact with the user. Therefore, the time performance of an
ITS significantly depends on the time-efficiency of the inference engine. In case of
Web-based ITSs, time performance is even more crucial since the Web imposes
additional time constraints. The server hosting the ITS may be accessed by a
significant number of users. Some of them may even possess a low communication
bandwidth. The server must respond as fast as possible. Besides efficiency, the
inference engine should also be able to reach conclusions from partially known
inputs. It is very common that, during a learning session, certain parameters may be
unknown. However, the system should be able to make inferences and reach
conclusion, no matter whether all or some of the inputs are known.
because it is much like the medical task of inferring a hidden physiological state from
observable signs. There are many possible user characteristics that can be recorded in
the user model. One of them is the knowledge that he/she has learned. In this case,
diagnosis refers to evaluation of learner’s knowledge level. Other characteristics may
be ‘learning ability’ and ‘concentration’. Diagnosis in those cases means estimation
of the learning ability and the concentration of the learner, based on his/her behavior
while interacting with the system. Measurement and interpretation of such user
behavior is quite uncertain.
There is not a clear process for evaluating learner’s characteristics. Also, there is
no a clear-cut between various levels (values) of the characteristics (e.g. between
‘low’ and ‘medium’ concentration). It is quite clear that a representation and
reasoning formalism for the user model should be able to deal with uncertain and
vague knowledge. Also, heuristic (rule of thumb) knowledge is required to make
evaluations.
Semantic nets and their descendants (frames or schemata) represent knowledge in the
form of a graph (or a hierarchy). Nodes in the graph represent concepts and edges
represent relations between concepts. Nodes in a hierarchy also represent concepts,
but they have internal structure describing concepts via sets of attributes. They are
very natural and well suited for representing structural and relational knowledge.
They can also make efficient inferences for small to medium graphs (hierarchies).
However, it is difficult to represent heuristic knowledge, uncertain knowledge and
make inferences from partial inputs. Also explanations knowledge updates are
difficult.
Symbolic rules (of prepositional type) represent knowledge in the form of if-then
rules. They satisfy a number of the requirements. Symbolic rules are natural since one
can easily comprehend the encompassed knowledge and follow the inference steps.
Due to their modularity, updates such as removing existing rules or inserting new
rules are easy to make. Explanations of conclusions are straightforward and of
various types. Heuristic knowledge representation is feasible and procedural
knowledge can be represented in their conclusions too. The inference process may be
not very efficient, when there is a large number of rules and multiple paths are to be
followed. Knowledge acquisition is one of their major drawbacks. Also, conclusions
cannot be reached if some of the inputs are unknown. Finally, they cannot represent
uncertain knowledge and are not suitable for representing structural and relational
knowledge.
Fuzzy logic is used to represent imprecise and fuzzy terms. Sets of fuzzy rules are
used to infer conclusions based on input data. Fuzzy rules outperform symbolic rules
and other formalisms in representing uncertainty. However, fuzzy rules are not as
natural as symbolic rules, because the concepts contained in them are associated with
membership functions. Furthermore, for the same reason, compared to symbolic
rules, they have great difficulties in making updates, providing explanations and
acquiring knowledge (e.g. for specifying membership functions). Inference is more
complicated and less natural than symbolic rule-based reasoning, but its overall
performance is not worse due, because a fuzzy rule can replace more than one
symbolic rule. Explanations are feasible, but not all reasoning steps can be explained.
Finally, fuzzy rules are much like symbolic rules as to structural, heuristic and
relational knowledge as well as the ability to perform partial input inferences.
Case-based representations store a large set of previous cases with their solutions
and use them whenever a similar new case has to be dealt with. Case-based
Knowledge Representation Requirements for Intelligent Tutoring Systems 93
representation satisfies several requirements. Cases are usually easy to obtain in most
domains and unlike other formalisms case acquisition can also take place during the
system’s operation further enhancing the knowledge base. Cases are natural since
their knowledge is quite comprehensible by humans. Explanations cannot be easily
provided in most situations, due to the complicated numeric similarity functions.
Conclusions can be reached even if some of the inputs are not known, through
similarity to stored cases. Updates can be made easier compared to other formalisms,
since no changes need to be made in preexisting knowledge. However, inference
efficiency is not always the desirable when the case library becomes very large.
Finally, cases are not suitable for representing structural, uncertain and heuristic
knowledge.
Neural networks represent a totally different approach to AI, known as
connectionism. Neural networks can easily obtain knowledge from training examples,
which are usually available in abundance for most application domains. Neural
networks are very efficient in producing conclusions and can reach conclusions based
on partially known inputs due to their generalization ability. On the other hand, neural
networks lack naturalness. The encompassed knowledge is in most cases
incomprehensible and explanations for the reached conclusions cannot be provided. It
is also difficult to make updates to specific parts of the network. The neural network
is not decomposable and any changes affect the whole network. Neural networks do
not possess inherent mechanisms for representing structural, relational and uncertain
knowledge. Heuristic knowledge can be represented to some degree since it can be
implicitly incorporated into a trained neural network.
Belief networks (or probabilistic nets) are graphs, where nodes represent statistical
concepts and links represent mainly causal relations between them. Each link is
assigned a probability, which represents how certain is that the concept where the link
departs from causes (lead to) the concept where the link arrives at. Belief nets are
good at representing causal relations between concepts. Also, they can represent
heuristic knowledge to some extend. Furthermore, they can represent uncertain
knowledge through the probabilities and make relatively efficient inferences (via
computations of probabilities propagation). However, estimation of probabilities is a
difficult task, which gives great problems to the knowledge acquisition process. For
the same reason, it is difficult to make updates. Also, explanations are difficult to
produce, since the inference steps cannot be easily followed by humans. Furthermore,
given that belief networks representation and reasoning are based on numerical
computation, their naturalness is reduced.
is natural, but not as much as that of symbolic rules. Inferences in DLs may have
efficiency problems. Explanations cannot be easily provided.
Neurules are a type of hybrid rules integrating symbolic rules with
neurocomputing, introduced by us [4]. The most attractive features of neurules are
that they improve the performance of symbolic rules and simultaneously retain their
modularity and, in a large degree, their naturalness, in contrast to other hybrid
approaches. So, neurules offer a number of benefits for knowledge representation in
an ITS. Apart from the above, updating a neurule base (add to or remove neurules
from) is easy, due to the modularity of neurules [5]. The explanation mechanism
produces natural explanations. Neurule-based inference is more efficient than
symbolic rule-based reasoning and inference in other hybrid neuro-symbolic
approaches. Neurules can be constructed either from symbolic rules or from empirical
data enabling the exploitation of various knowledge sources [5]. In contrast to
symbolic rules, neurule-based reasoning can derive conclusions from partially known
inputs, due to its connectionist part.
4 Discussion
5 Conclusions
In this paper, we make a first effort to define requirements for KR in an ITS. The
requirements concern all stages of an ITS’s life cycle (construction, operation and
maintenance), all types of users (experts, engineers, kearners) and all its modules
(domain knowledge, user model, pedagogical model). According to our knowledge,
such requirements have not been defined yet in the ITS literature. However, we
consider them of great importance as they can assist in choosing the KR formalisms
for representing knowledge in the components of an ITS.
From our analysis, it appears that various hybrid approaches to knowledge
representation can satisfy the requirements in a greater degree than that of single
representations. So, we believe that use of hybrid KR approaches in ITSs can become
a popular research trend, although, till now, only a few efforts exist. Another finding
is that there is not a hybrid formalism that can satisfy the requirements of all of the
modules of an ITS. So, a multi-paradigm representation could provide a solution.
We feel that our research needs to be further completed by getting more detailed
and more specific to ITSs nature. What is further needed is a more in-depth analysis
of the three modules of an ITS. Also, a more fine-grained comparison of the KR
formalisms may be required. These are the main concerns of our future work.
References
1. Gallant, S.I.: Neural Network Learning and Expert Systems. MIT Press (1993).
2. Golding, A.R., Rosenbloom, P.S.: Improving accuracy by combining rule-based and case-
based reasoning. Artificial Intelligence 87 (1996) 215-254.
3. Guin-Duclosson, N., Jean-Danbias, S., Norgy, S.: The AMBRE ILE: How to Use Case-
Based Reasoning to Teach Methods. In Cerri, S.A., Gouarderes, G., Paraguacu, F. (eds.):
Sixth International Conference on Intelligent Tutoring Systems. Lecture Notes in
Computer Science, Vol. 2363. Springer-Verlag, Berlin (2002) 782-791.
4. Hatzilygeroudis, I., Prentzas, J.: Neurules: Improving the Performance of Symbolic Rules.
International Journal on Artificial Intelligence Tools 9 (2000) 113-130.
Knowledge Representation Requirements for Intelligent Tutoring Systems 97
1 Introduction
The use of TV (and radio) in education has a long history — longer than the use of
computers in education. But the traditions within which TV operates, such as the
strong focus on narrative and the emphasis on viewer engagement, are rather different
from those within which computers in education, and more particularly ITS & AIED
systems operate. We can characterise ITS & AIED systems as being fundamentally
concerned with individualising the experience of learners and groups of learners and
supporting a range of representations and reifications of either the domain being
explored or the learning process. The traditional division of the subject into student
modelling, domain modelling, modelling teaching and interface issues reflects this
concern with producing systems that react intelligently to the learner or group of
learners using the system. Even where the system is simply a tool or a vehicle to
promote collaboration (say), there will be a concern to monitor and perhaps adjust the
parameters within which that collaboration takes place, if the system is to be regarded
as of interest to the ITS & AIED community. One novel aspect of the HomeWork
system is its concern with modelling and managing the narrative flow of the learners’
experience both at the micro level within sessions and at the macro level between
sessions and over extended use. This project is building an exemplar system for
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 98–107, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Coherence Compilation: Applying AIED Techniques 99
children aged 6-7 years, their parents, teachers and classmates at school to tackle both
numeracy and literacy at Key Stage 1. In the classroom the child will be able to work
alone or as part of a group and interact both with a large interactive whiteboard and
with a handheld digital slate as directed by the teacher. When the handheld slate is
taken home further activities can be completed using a home TV and the slate either
working alone, with their family, or with other classmates who may be co-located or
at a distance in their own homes.
This paper concentrates on the narrative aspects of the HomeWork project and on
the Coherence Compiler that ensures narrative coherence. We start by outlining the
HomeWork project. We then give the theoretical background to the narrative work.
Finally we discuss how the coherence compiler is being designed to maintain
narrative coherence across different technologies in different locations despite the
interactive interventions of the learners.
4 An Example Scenario
The scenario presented in Table 1 below describes the desired learner experience and
the proposed system behaviour.
5 Coherence Compilation
The Coherence compiler is an attempt to operationalise guidelines drawn from the
Non-linear Interactive Narrative Framework. The original Non-linear Interactive
Narrative Framework (NINF) was the product of the MENO research project [6]. This
framework was subsequently adapted and used in the design of the IETV pilot system
developed at Sussex [2] and is now being further expanded for use in the HomeWork
project. In this section of the document we discuss the relevant theoretical grounding
Coherence Compilation: Applying AIED Techniques 101
102 R. Luckin et al.
for the NINF, and the influence of previous work, in particular that of the MENO
project on the NINF. We then present the current version of the NINF for use in the
Home Work Project.
It is this need to support learner creativity that provides us with a third theoretical
position to explore. Creativity can be considered as a process through which
individuals, groups and even entire societies are able to transcend an accepted model
of reality. It has been differentiated by [9] into three broad categories: combinatorial,
exploratory and transformational all of which require the manipulation of an accepted
familiarity, pattern or structure in order to produce a novel outcome. The perceptions
of reality that are the backdrop for creativity vary not only from individual to
individual, but also from culture to culture. Communities explore and transform these
Coherence Compilation: Applying AIED Techniques 103
realities in many ways, through art, drama and narrative for example. In developing
the coherence compiler we are particularly interested in the relationship between
creativity and narrative as applied to education. Narrative offers us a way to play with
the constraints of reality: to help learners to be creative. Used appropriately it also
allows us to engage learners.
The narrative context of a learning episode has both cognitive and affective
consequences. Incoherent or unclear narrative requires extra cognitive effort on the
listener’s part to disentangle the ambiguities. As a consequence the learner may be
distracted from the main message of the learning episode, which may in turn detract
from her ability to understand the concepts to be communicated. It may also
disengage her altogether. On the other hand engaging narrative may motivate her to
expend cognitive effort in understanding concepts to which she would not otherwise
be inclined to attend. The Non-linear Interactive Narrative Framework identifies ways
in which narrative might be exploited in interactive learning environments. The NINF
distinguishes two key aspects of narrative:
Narrative guidance (NG): the design elements that teachers and/or software need to
provide in order to help learners interpret the resources and experiences they are
offered, and
Narrative construction (NC): the process through which learners discern and
impose a structure on their learning experiences, making links and connections in a
personally meaningful way.
5.3 How Does the Coherence Compiler Interact with Other System
Components?
In order to provide the kind of services suggested above the Coherence Compiler
needs information about: the available content and its relation to other content; the
learner’s characteristics; the learner’s history of activity; the learner’s personal goals
and curriculum goals; the tools available to help learners relate content, learning
objectives and past learning; the tools available to help teachers build routes through
content; and existing coherent routes through content (lesson plans, schemes of work,
ways of identifying content that is part of a series, and popular associations between
content).
Much of this information might be provided by the content management system or
other system components: the content metadata, including relationship data; the
Learner Model; logs of learner activity; a Curriculum or Pedagogic Model; a
collection of suitable user interfaces (teacher / child / helper) for visualising content
search results, learner activity and learning / curriculum objectives; a database of
coherent trail information (e.g. lesson plans, other authored routes, popular routes, i.e.
sequences of content that many similar learners have followed).
So, while the content management system and other components are able to
successfully identify and retrieve content that is suited to a learner’s needs and to
present that content along with information about how it relates to other content
elements, the value of the Coherence Compiler is that it enables the teacher and/or
learner to create a coherent route through that suitable content. The Coherence
Compiler provides user interfaces appropriate to each of its user groups (teacher /
learners / learner collaborators, parents etc...) for those of its services, which are
visible to users, i.e. tools for narrative construction and explicit narrative guidance.
Primarily for Learners. The interface for learners should: (i) Provide access to the
data and information that learners will need to construct their personal narrative
understanding: i.e. learning history, available content, learning objectives, content
menus and search facilities, etc. (ii) Remind learners of (macro and micro) objectives
in a timely manner in order to focus their attention on a purposeful interpretation of
the content. (iii) Guide learners towards accessing content that delivers these learning
goals. Guidance may be more or less constraining depending on the learner’s
independence. (iv) Vary the degree of (system-user) narrative control over the
sequence of events and activities or route through content, to match the needs of
different learners. (v) Guide a child in choosing what to do next (for young children
this guidance is likely to be very constraining – a linear progression of ‘next’ and
‘back’ buttons or a limited number of choices. For more independent learners
guidance (and interface) would become less constraining. (vi) Enable the learner to
record and reflect on their activity and progress towards goals. Possibly by annotating
suitable representations of her activity log and objectives. Again, this needs to be
done in a way that is intelligible and accessible to young children. (vii) Be able to
suggest ‘coherent paths’ through content (to learners, parents, teachers) through
analysis of content usage in authored paths and in other learners’ activity histories.
For example, if I choose to use a certain piece of video, and learners with similar
profiles have used this perhaps what they chose to do next will also be suitable for me
(something like the way Amazon suggests purchases?). Or if a piece of content I
choose to incorporate in a ‘lesson plan’ has been used on other lesson plans maybe
other pieces of content used to follow on from this content in those plans will be
appropriate to the new plan. This feature will obviously become more useful over
time as the system incorporates larger volumes of content and usage but will have to
be careful not to confuse users with divergent recommendations.
Primarily for Learners with Collaborators. The interface for learners with
collaborators should allow learners (and their parents/guardian/teachers) to review
and annotate the learner’s history of interaction with the system. This could facilitate
a form of collaborative parent child narrative construction. This interface might be a
bit like a browser history, learners would be able to revisit past interactions. Maybe if
asked what did you do today at school a child would be able to show as well as tell
through the use of this feature. There are many challenging issues to address here
including separating out individual and group learner models as well as assignment of
credit.
Not Visible to Users. Although not directly visible to users, the system should: (i)
Have access to a record of a child’s activity with the system. (ii) Have access to
authored ‘coherent journeys’ through available content: coherent journeys are linked
sequences of guidance comments, activities and content that make sense (e.g. existing
lesson plans and schemes of work authored by material producers and/or users of the
system, other sensible sequences of interaction and guidance possibly obtained
through analysis of content usage by all learners). (iii) Be able to identify suitable
content for a child’s next interaction based on the record of her activity and the
‘coherent journeys’ described above. Decisions about suitable content will also
involve consideration of the learner’s individual identity and needs described in the
106 R. Luckin et al.
learner model and pedagogic objectives (possibly described by the curriculum). (iv)
Be able to choose/suggest ‘paths’ through content that are interesting/motivating to
individual learners; i.e. if there are several paths through content/plans for learning at
an appropriate level for this learner choose the one that is most likely to be
interesting/motivating to this learner
6 Conclusions
In this paper we have described the initial design of the Coherence Compiler for the
HomeWork project. The HomeWork project is making existing content materials,
including TV programs, available to learners. The original programs may not be used
in their original entirety, but parts selected, re-ordered or repeated and interspersed
with other materials and activities according to the needs of individual or groups of
children. The Coherence Compiler is responsible for maintaining narrative coherence
across these materials and across devices so that the learner experiences a well-
ordered sequence that supports her learning effectively. Such support may be
provided both through narrative guidance and tools to support the learner’s own
personal narrative construction. Narrative guidance should be adaptive to the needs of
the learner, it initially offers a strong ‘storyline’ explicitly linking new and old
learning and then fades as the learner becomes more accomplished at making these
links for herself. Such support may be provided both through narrative guidance and
tools to support the learner’s own personal narrative construction.
Coherence Compilation: Applying AIED Techniques 107
References
1. Luckin, R., Connolly, D., Plowman, L., and Airey, (2002) The Young Ones: the
Implications of Media Convergence for Mobile Learning with Infants, in S. Anastopolou,
M. Sharples & G. Vavoula (Eds.) Proceedings of the European Workshop on Mobile and
Contextual Learning, University of Birmingham, 7-11.
2. Luckin, R. and du Boulay, B. (2001) Imbedding AIED in ie-TV through Broadband User
Modelling (BbUM). In Moore, J.D., Redfield, C.L. and Johnson, W.L. (Eds.) Artificial
Intelligence in Education: AI-ED in the Wired and Wireless Future, Amsterdam: IOS
Press, 322--333.
3. Luckin, R. and du Boulay, B. (1999) Capability, potential and collaborative assistance, in
J. Kay (Ed) UM99 User Modelling: International conference on user modeling, Banff,
Alberta, Canada, CISM Courses and Lectures, No. 407, Springer-Verlag, Wien, 139–148.
4. Luckin, T. and Hammerton, L. (2002) Getting to Know me: Helping Learners Understand
their Own Learning Needs through Metacognitive Scaffolding, in S.A. Cerri, G.
Gouarderes & F. Paranguaca (Eds), Intelligent Tutoring Systems, Berlin: Springer-Verlag,
759-771.
5. Luckin, R., Plowman, L. Laurillard, D, Stratfold, M. and Taylor, J. (1998) Scaffolding
Learners’ Constructions of narrative in A. Bruckerman, M. Guzdial, J. Kolodner and A
Ram (Eds) International Conference of the Learning Sciences, Atlanta: AACE, 181-187.
6. Plowman, L., Luckin, R., Laurillard, D., Stratfold, M., & Taylor, J. (1999). Designing
Multimedia for Learning: Narrative Guidance and Narrative Construction, in the
proceedings of CHI 99 (pp. 310-317). May 15-20, 1999, Pittsburgh, PA USA.: ACM.
7. Bruner, J. (1996). The Culture of Education. Harvard University Press, Cambridge MA.
8. Vygotsky, L. S. (1986). Thought and Language. Cambridge, Mass: The MIT Press.
9. Boden, M. A. (2003) The Creative Mind: Myths and Mechanisms. London, Weidenfeld
and Nicolson.
10. AFAIDL Distance Learning Initiatve: www.cbd-net.com/index.php/search/show/536227
The Knowledge Like the Object of Interaction in an
Orthopaedic Surgery-Learning Environment
1 Introduction
The work we present in this paper is motivated by the conjunction of two categories
of problems in surgery. First, there are some well-known instructional difficulties. In
the traditional approach, the student interacts with an experienced surgeon to learn
operative procedures, learning materials being patient cases and cadavers. This prin-
cipally presents the following problems: it requires one surgeon for one learner, it is
unsafe for patients, cadavers must be available and there is no way to quantify the
learning curve. The introduction of computers in medical education is seen by several
authors as something to develop to face these issues in medical education [7], but on
the condition that real underlying educational principles are integrated [2], [9]. In
particular, the importance of individual feedback is stressed [13]; from our point of
view, it is the backbone of the relevance of computer based systems for learning.
As pointed by Eraut and du Boulay [7], we can consider Information Technology
in medicine as divided into “tools” and “training systems”. Tools support surgeons in
their practice, while training systems are dedicated to the apprenticeship. Our per-
sonal aim is to use the same tools developed in the framework of computer assisted
surgical techniques to create also training systems for conceptual notions useful in
both computer assisted and classical surgery.
We want to take explicitly into account the issue of provided feedback by embed-
ding a model of knowledge in our system. We want to provide a feedback linked to
the user current knowledge that is diagnosed according to the user’s actions on the
system. This article presents the design of an environment for the learning of screw
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 108–117, 2004.
© Springer-Verlag Berlin Heidelberg 2004
The Knowledge Like the Object of Interaction 109
Based on this analysis of surgical knowledge, we developed for the last two years,
in the framework of VOEU, the Virtual European Orthopedics University project
(European Union IST-1999-13079 [4]), different multimedia educational modules
related to these different knowledge types. Declarative knowledge is well adapted to
multimedia courses, both classical and case-based. Operational knowledge is obvi-
ously adapted to simulators, including those with haptic feedback. Our objective is
now to create an environment for the learning of procedural knowledge, which is
more complex.
3 Tool Presentation
Fig. 1. Re-sliced CT images along the screw axis and sacrum 3D model
In the work we present here, we focus on the planning step of this surgical tool; the
principal reason is that in this step we can see to appear the procedural knowledge.
4 Methodology
In our learning environment, we separate the simulation component from the system
component dealing with didactical and pedagogical intentions [5], [8]. The simulation
is not intended for learning: it is designed to be used by an expert who wants to define
a screw placement trajectory.
From the software point of view, we would like to respect the simulation architec-
ture. The system part concerned with didactical and pedagogical intentions is to be
plugged only in learning situations; we call this complete configuration the learning
level. The learning level must also allow the construction of learning situations.
Concerning interactions, we chose the architecture describe en the next figure
(Fig. 2):
Fig. 2. Architecture.
We chose this architecture because we would like to observe the student’s activity
while he/she uses the simulation. The feedback produced by the simulation is not
necessarily in terms of knowledge: for example, the system can send feedback about
the segmentation of the images or about the gestural process. Our system must inter-
vene when it detects a didactical or pedagogical reason, and then generate an interac-
tion. We do not want to constrain “a priori” the student in his/her activity with the
112 V. Luengo, D. Mufti-Alchawafa, and L. Vadcard
simulation. On the other hand, the didactical and pedagogical system has to determine
the feedback in relation to the knowledge that the user manipulates.
In this case, the simulation will produce traces about the user’s activity. We want
these traces to give information about the piece of knowledge that the system has
detected [11]. In this work, we try to determine this information from the actions on
the interface and to deduce the knowledge that the user manipulates. We determined
how the simulation system transmits this information to the learning level. The first
version that we produced is based on a DTD specification; the XML file describes all
test trajectories that the user proposes in the planning software.
We differentiate two kinds of feedback: feedback related to the validity of the
knowledge, and feedback related to the control activity.
We define the first kind of feedback as a function of the knowledge object.
A control feedback is defined according to the knowledge of the expert and to the
manner that the expert wants to transmit his/her expertise to the novice. The idea is to
reproduce the interaction between expert and novice in a learning situation. In this
case, the expert uses his/her own controls to validate or invalidate the novice action
and consequently he/she determines the feedback to the novice.
In our methodology, we take into account the didactical and computers considera-
tions for produce the learning level.
We use the framework of the didactical situations theory [3]. This implies that the
system has to allow interactions for actions, formulations and validations. In this case,
the system will be a set of properties [10].
In this paper, our objective is to specify a methodology for designing the validation
interactions.
The aim of our research in this paper is to allow the acquisition of procedural
knowledge in surgery. The adopted methodology is based on two linked phases. In
the first phase, we must identify some procedural components of the surgeon’s
knowledge. This is done by observation of expert and learner interactions during
surgical interventions, and by surgeon’s interviews. In this part, we focus on the con-
trol component of knowledge, because we assume that control is the main role of
procedural knowledge during problem solving. This hypothesis is related to the theo-
retical framework of knowledge modeling, which we will present just after. During
the second phase, we must implement this knowledge model in the system, in order to
link the provided feedback to the user’s actions.
We adopt the point of view described by Balacheff to define the notion of concep-
tion, which “has been used for years in educational research, but most often as com-
mon sense, rather than being explicitly defined” [1]. To shorten the presentation of
the model, we will just describe its structure and specificity.
A first aspect of this model is rather classical: it defines a conception as a set of
related problems (P), a set of operators to act on these problems (R), and an associ-
The Knowledge Like the Object of Interaction 113
ated representation system (L). It also takes into account a control structure, called
Schoenfeld [14] has already pointed out the crucial role of control in problem solving.
In the problem-solving process, the control elements allow the subject to decide
whether an action is relevant or not, or to decide that a problem is solved. In the cho-
sen model, a problem solving process can thus be formally described as a succession
of solving steps: with and In an apprenticeship per-
spective, we will focus on differences between novice’s and expert’s conceptions.
Below is an example of formalization, to illustrate the way we use the model.
Let us consider the problem P2: “define a correct trajectory for a second screw in
the vertebra”. Indeed, the surgeon has often two screws to introduce, each on one side
of the vertebra, through the pedicles (see Fig. 3):
The didactical analysis of the knowledge objects will be the key to the success of our
model implementation. The choice that will be suitable in relation to knowledge will
determine the main characteristics of the design. For the judgment interaction design,
we identified a set of pedagogical constraints: no blocking system response, any
true/false feedback and feedback after every step. According to the point of view of
the expert model, we should not compare this model to the student activity. Our ob-
114 V. Luengo, D. Mufti-Alchawafa, and L. Vadcard
jective is to follow the student’s work. Thus, if there are automatic deduction tools,
they should not provide an expected solution because it would constrain the student’s
work [11], but they should rather be used to facilitate the system-student interaction.
We can use this kind of tools to give the system the capacities to argue or to refuse
through counter-examples.
For our computer learning level, this implies that we have to link a judgment inter-
action with declarative knowledge. For example, if the student chooses a trajectory
that can touch a nerve, the interaction can be refer to the anatomy knowledge in order
to explain (to show) that in these body parts there can be a nerve.
In other words, one kind of judgment interaction is the explanation of an error. For
this, we will identify the declarative knowledge in relation to the procedural knowl-
edge in order to produce an explanation related to the error.
For the generation of validation interaction we identify the knowledge that inter-
venes in the planning activity. We identify four kinds of necessary knowledge to
validate the screw trajectory’s planning:
Pathology: declarative knowledge concerning the illness type;
Morphology: declarative knowledge concerning the patient’s state;
Anatomy: declarative knowledge concerning the anatomy of body part;
Planning: procedural knowledge concerning the screw and its position in the bone.
We have an example for the vertebra classification knowledge [12]. We can see
that procedural knowledge have a relationship with declarative knowledge. The pro-
cedural knowledge is based on the declarative one, in consequence in order to vali-
date procedural knowledge the system needs to know declarative knowledge, which
intervene to build the other.
In the case of the learning situation about the screw trajectory’s planning, we also
identified for the validation a hierarchical deduction’s relationships between these
kinds of knowledge (Fig. 4).
From the computer point of view, the learning environment contains a learning
component. This component has to represent the surgical knowledge and to produce a
diagnostic of the student’s knowledge. Our approach based on a representation and
knowledge diagnosis system “Emergent Diagnosis via coalition formation” [16].
The Webber approach [16] represents knowledge in the form of MAS (Multi-
Agents System). This representation uses the model [1] (explained above). Con-
ceptions are characterized by sets of agents. The society of agents is composed of
four categories: problems, operators, language and control. Each element from the
quadruplet C is the core of one reactive agent. This approach [16] consid-
ers diagnosis as the emergent result of collective actions of reactive agents.
The general role of any agent is to check whether the element it represents is pres-
ent in the environment. If the element is found, the agent becomes satisfied. Once
satisfied, the agent is able to influence the satisfaction of other agents by voting. The
diagnosis result is the identification of a set of conceptions, which the system de-
duces, in the form of a vector of the votes.
This approach has been created for a geometry proof system. We identified a set of
differences in the nature of knowledge between the geometry and surgical domain. In
particular, for the diagnosis of the geometry students’ knowledge, the representation
of knowledge is only declarative and the results of the diagnosis are the identification
of conceptions related also to the declarative knowledge. However, in the surgical
domain, we showed how the declarative and procedural knowledge can intervene in
the student’s activity. Furthermore, for the validation of the procedural knowledge, is
not sufficient to identify the conceptions related to procedural knowledge (planning),
it is also necessary to identify the conceptions related to the learning situation; in
other words, the declarative knowledge (pathology, morphology, anatomy). For
example, if the system deduces that the screw is in the bone and there is no lesion,
that is not sufficient to validate the screw trajectory. The system has to deduce if this
trajectory is the solution of the clinical case or not.
In our system, we distinguish two diagnosis levels according to the type of
knowledge. The first diagnosis level allows the deduction of the student’s errors re-
lated to declarative knowledge. For example, the system may deduce that the student
has applied a false screw trajectory’s theory for this type of illness. In this case, we
give a link feedback to a semantic web that we are building.
If the system deduces that, there are no errors for this level that means the student
knows the declarative surgical knowledge. In the second diagnosis level, the system
will evaluate his procedural surgical knowledge. Consequently, we adapt the repre-
sentation and diagnosis system “Emergent Diagnosis via coalition formation” to our
knowledge representation. We choose to use a “computer mask” that the system ap-
plies to the vector of votes resulting from the diagnosis. This mask filters a set of
conceptions, which are related to the declarative knowledge in the vote vector. It
allows to “see” the piece of knowledge that we try to identify at the first diagnosis
level.
The system generates the mask by an “a priori” analysis of the expected vector.
This analysis is applied to the declarative knowledge (learning situation) before the
diagnosis phase. After this phase, the system applies the mask and then starts the first
116 V. Luengo, D. Mufti-Alchawafa, and L. Vadcard
diagnosis level (the declarative knowledge). If the system deduces that there is an
error in this level, it generates an interaction with the student in order to explain
which knowledge he/she has to revise. If there are no errors in the first level, it starts
the second diagnosis level to validate the screw trajectory. Finally, the system
generates the validation interaction to the student.
The research involved in the work presented here come from tow domains, the didac-
tic and computer fields. By its nature, this project consists of two interrelated parts.
The first part is the modeling of surgical knowledge and it is related to didactic re-
search, the latter is the design a computer system of this model and the definition of
system’s feedback.
We searched to design a computer system for surgical learning environment. This
environment should provide to student a feedback related to his/her knowledge during
the problem-solving activity. In other words, the knowledge, in the learning situation,
is the object of feedback. To realize this idea, we based on a didactical methodology
for the design of our system.
The design of the computer system for a learning environment depends to the
learning domain. In consequence, we analyzed the knowledge of domain (didactic
work) before the representation of the computer system. That allows to identify the
knowledge domain’s constraints for the representation’s model.
For validate our work, it will involve some junior surgeons in the task of defining
good screw trajectories for a learning situation in the simulator. The provided feed-
back and the students reactions will be analyzed in terms of apprenticeship (that is,
regarding knowledge at stake).
We will also validate the model of knowledge and the chosen representation. By
the analyze of the generality of the model, we will try to distinguish the differ-
ences with other representations and their implementations in the computer sys-
tems.
In addition, we will analyze our computer system with the objective of evaluating
our diagnostic system and the relationships between the diagnostic system and the
feedback system. We started to work on the feedback system and we decided to use a
Bayesian network for the representation of the didactical decisions. The idea is to
represent -whenever a procedural conception is detected by our diagnostic system -
what are the possible situations problems that can destabilize this conception.
In this paper, we studied the learning of the planning level of simulator. In this
level, there are two types of surgical knowledge: the declarative and procedural. In
our future work, we want to complete the research, by including the operational
knowledge, the third type of surgical knowledge [15].
Our final objective is the implementation of a complete surgical learning environ-
ment with declarative, procedural and operational surgical knowledge. This environ-
ment will be contains also a component for the medical diagnosis and another com-
ponent for the construction of the learning situation by the teacher in surgery.
The Knowledge Like the Object of Interaction 117
References
1. Balacheff N. (2000), Les connaissances, pluralité de conceptions (le cas des mathéma-
tiques). In: Tchounikine P. (ed.) Actes de la conférence Ingénierie de la connaissance (IC
2000, pp.83-90), Toulouse.
2. Benyon D., Stone D., Woodroffe M. (1997), Experience with developing multimedia
courseware for the world wide web: the need for better tools and clear pedagogy, Interna-
tional Journal of Human Computer Studies, n° 47, 197-218.
3. Brousseau G., (1997). Theory of Didactical Situations. Dordrecht : Kluwer Academic
Publishers edition and translation by Balacheff N., Cooper M., Sutherland R. and Warfield
V.
4. Conole G., Wills G., Carr L., Hall W., Vadcard L., Grange S. (2003), Building a virtual
university for orthopaedics, in Ed-Media 2003 World conference on educational multime-
dia, hypermedia & telecommunications, 23-28 juin 2003, Honolulu, Hawaii, USA.
5. De Jong T. (1991), Learning and instruction with computer simulations, in Education &
Computing 6, 217-229.
6. De Oliveira K., Ximenes A., Matwin S., Travassos G., Rocha A.R. (2000), A generic
architecture for knowledge acquisition tools in cardiology, proceedings of ID AM AP 2000,
Fifth international workshop on Intelligent Data Analysis in Medicine and Pharmacology,
at the 14th European conference on Artificial Intelligence, Berlin.
7. Eraut M., du Boulay B. (2000), Developing the attributes of medical professional judge-
ment and competence, IN Cognitive Sciences Research Paper 518, University of Sussex,
http://www.cogs.susx.ac.uk/users/bend/doh.
8. Guéraud V., Pernin J.P. et al. (1999), Environnements d’apprentissage basés sur la simula-
tion : outils auteur et expérimentations, in Sciences et Techniques Educatives, special is-
sue “Simulation et formation professionnelle dans l’industrie”, vol.6 n°l, 95-141.
9. Lillehaug S.I., Lajoie S. (1998), AI in medical education – another grand challenge for
medical informatics, In Artificial intelligence in medecine 12, 197-225.
10. Luengo V. (1997). Cabri-Euclide : un micromonde de preuve intégrant la réfutation. Prin-
cipes didactiques et informatiques. Réalisation. Thèse. Grenoble : Université Joseph Fou-
rier.
11. Luengo V. (1999), Analyse et prise en compte des contraintes didactiques et informatiques
dans la conception et le développement du micromonde de preuve Cabri-Euclide, In Sci-
ences et Techniques Educatives Vol. 6 n°l.
12. Mufti-Alchawafa, D. (2003), Outil pour l’apprentissage de la chirurgie orthopédique à
l’aide de simulateur, Mémoire DEA Informatique, Systèmes et Communications, Univer-
sité Joseph Fourier.
13. Rogers D., Regehr G., Yeh K., Howdieshell T. (1998), Computer-assisted learning versus
a lecture and feedback seminar for teaching a basic surgical technical skill, The American
Journal of Surgery, 175, 508-510.
14. Schoenfeld A. (1985). Mathematical Problem Solving. New York: Academic Press.
15. Vadcard L., First version of the VOEU pedagogical strategy, Intermediate deliverable
(n°34.07), VOEU IST 1999 – 13079. 2002.
16. Webber, C., Pesty, S. Emergent diagnosis via coalition formation. In: IBERAMIA 2002 -
Proceedings of the 8th Iberoamerican Conference on Artificial Intelligence. Garijo,F.
(ed.), Springer Verlag, 2002.
Towards Qualitative Accreditation with Cognitive Agents
1 Introduction
In the world of aeronautical training, many training tasks are more and more per-
formed in simulators. Aeronautical simulators are very powerful training tools which
allow to reach a very high degree of realism (perception of the simulator as a real
aircraft by trainee). However, several problems may appear. One of the most critical
problems is taking into account the behaviour of the trainee, which remains relatively
limited because of the lack of the online feedback on the users’ behaviour.
Our research is centred on the description and the qualification of various types of
behaviours in critical situations (resolution of a problem under risk constraints) de-
pending on committed errors. We articulated these two issues by describing two ma-
jor sources of errors that come from the trainee’s behaviour, using an adapted version
of the ACT-R/PM model [3]. The first, fairly general source of errors in ACT-R mod-
els, is the failure of retrieval or mis-retrieval of various pieces of knowledge (in CBT,
Computer-Based Training, systems – checked Flow-Charts, or in PFC – Procedures
Follow-Up Component – in terms of ASIMIL, see end of paragraph). The second and
more systematic error source is the time/accuracy trade-off in decision-making.
There are also other secondary sources of error, such as the trainee failing to see a
necessary sign/indicator in the time provided in order to perform the needed opera-
tion. These sources of error are mainly due to ergonomics or psychology affects.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 118–127, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Towards Qualitative Accreditation with Cognitive Agents 119
problem of centralised control (Amygdala) or not (sensory effectors). The basic level
is the activation of a concept, which characterises the state of a concept at rest. That
level is more significant for experts than for non-experts.
The expertise of knowledge acquisition can be described like the sequential applica-
tion of independent rules, which are compiled and reinforced by the exercise of
automation, thus allowing the acquisition of procedural knowledge. Moreover, in
ASIMIL, we have needed to control several methods of parallel dialogue and ex-
changes (messages – texts/word, orders – mouse/stick/caps/instructions, alarms –
visual/sound...). In this model, one can also specify the role of the cognitive resources
in the high level cognitive tasks and adopt proposals exchanged at the time of a con-
versation.
We have used the interaction in a manner of ACT-R/PM model, which provides an
integrated model (module of cognition connected to perception-motor module) and a
strong psychology theory on how interaction occurs. Furthermore, ACT-R/PM model
allows to produce diagnostics in real-time, what is very important in the context of
aeronautic training exercises, which are often time-critical.
1 ACT-R/PM architecture is presented on the left part of figure. ASIMIL interface – on the
right part (System of procedures follow-up on the left, flight simulator on the right and an
animated agent (Baldi))
Towards Qualitative Accreditation with Cognitive Agents 121
According to the error’s gravity, the graph of total performance is built online.
Teacher’s intervention is carried out in different cases detected according to score’s
and its derivatives’ changes.
Moreover, the terms of surprised error and awaited error are introduced in order to be
able to calculate the rate of error’s expectation by the ITS. This coefficient is used in
the process of decision-making – is this particular error expected by the ITS or not?
More coefficients K are high, more error’s expectation is low (error’s surprise is
high). Thus, K determines the character of teacher’s assistance provided to learner.
As learner evolves in three-dimensional space (knowledge, ergonomy, psychology),
we have the possibility to follow his/her integral progression (by measuring instanta-
neous length of the vector of error like its performances on each one of the criteria
c, e or p, see also results presented in section 5).
in the architecture of intelligent agents such as Actors [4]. This architecture is pre-
sented on the Fig.2, and the interface of the whole system on Fig.3.
The experimental framework for the ASIMIL training system is simulation-based
intelligent peer-to-peer review process which is performed by autonomous agents (as
Knowledge, Ergonomic, Psychologic). Each agent scans separately a common stream
of messages coming from other actors (Human, intelligent agents, physical disposals).
They perform coalitions in order to supply a given community of users (instructors,
learners, moderators,...) with diagnoses and advises as well as to allow actors mutu-
ally help one another.
A dedicated architecture called ASITS [5] was directly adopted from Actors [4] by
including a cognitive architecture based on ACT-R/PM. Within the ASITS agent’s
framework, ACT (“Adaptive Control of Thought”) is preferred to “Atomic Compo-
nents of Thought”, R stands for “rationale accepted as Revision or Reviewing”, and
PM stands for “perceptual and motor” monitoring of task. [3]
Fig. 3. System of procedures follow-up on the left, flight simulator on the right and an ani-
mated agent (Baldi)
Two undeniable advantages consist in the fact that the agents do not let pass any
deviation/error, and in carrying out, for the instructor, the supervision of several
trainees simultaneously.
2
The Joint Aviation Authorities (JAA) is an associated body of the European Civil Aviation
Conference (ECAC) publishes the Joint Aviation Requirements (JARs) whereas the Federal
Aviation Administration edits the Federal Aviation Regulations (FAR).
Towards Qualitative Accreditation with Cognitive Agents 125
Fig. 4. Instructor’s «Dashboard» cases – “disturbed” trainee (above) and “normal” trainee
(below)
The following details are presented in the window of instructor (see Fig.4). The axis
of abscissa means time starting from the beginning of the exercise. The axis of ordi-
nates means the variation of the objective of the exercise (also called user’s “qualita-
tive score”). A certain number of general options enters in account, such as level of
learner, mode of training, tolerances, coefficients of performance Kc, Ke, Kp etc. The
monitoring table (in the middle of each panel on Fig.4) holds the chronology of the
session. One can see the moment when an error has appeared (column “Temps”), the
qualification of the error (“Diagnostic”), its gravity (“Gravité”, a number of points to
126 A. Minko and G. Gouardères
be removed, associated with gravity with the error – slight, serious or critical), degree
of error’s expectation (“Attente”), and proposed help (“Aide”).
The analysis of the curves shows that:
on the panel above, the session is performed by a learner with the high level of
knowledge (“confirmé”), but rather weak Kp, which seems to be confirmed by
the error count of the type P (psychology). This trainee has been lost facing an
error but, after some hesitations, has found the correct solution of the exercise
on the panel below, the session is performed by a regular trainee, who made two
errors, but quickly found the ways of correcting them.
The analysis of the curves of performance by the instructor not only makes it possible
to evaluate learners, but also of re-adjusting the rating system of errors, by modifying
weights of various errors. As an expected issue, the qualitative accreditation of differ-
entiated users, can be done by reflexive comparison of the local deviation during the
dynamic construction of the user profile. The analysis of the curves red and black
allows to match similar patterns (or not) to be detected (manually in the current ver-
sion) and the green curve give alarms to start the qualitative accreditation process.
References
1. E. Aïmeur, C. Frasson. Reference Model For Evaluating Intelligent Tutoring Systems.
Université de Montréal, TICE 2000 Troyes – Technologie de l’Information et de la Com-
munication dans les Enseignements d’ingénieurs et dans l’industrie.
2. P. Brusilovsky. Intelligent tutor, environment, and manual for introductory programming.
Educational and Training Technology International 29: pp.26-34.
3. M.D. Byrnes, J.R. Anderson, Serial modules in parallel: The psychological refractory
period and perfect time-sharing, Psychological Review, 108, 847–869. 2001.
4. C. Frasson, T. Mengelle, E. Aïmeur, G. Gouardères. «An actor-based architecture for
intelligent tutroing systems», Intl Conference on ITS, Montréal–96.
5. G. Gouardères, A. Minko, L. Richard. «Simulation et Systèmes Multi-Agents pour la
formation professionnelle dans le domaine aéronautique», Dans Simulation et formation
professionelle dans l’industrie, Coordonnateurs M. Joab et G. Gouardères, Hermès Sci-
ence, Vol.6, No.1, pp.143-188, 1999.
6. F. Jambon. “Erreurs et interruptions du point de vue de l’ingénierie de l’interaction
homme-machine”. Thèse de doctorat de l’Université Joseph Fourier (Grenoble 1).
Soutenue le 05 décembre 1996.
7. A.A. Krassovski. Bases of simulators’ theory in aviation. Moscow, Machinostroenie,
1995, 304p. (in Russian)
8. K. Van Lehn, S. Ohlsson, R. Nason. Application of Simulated Students: an exploration.
Journal of Artificial Intelligence in Education, vol.5, n.2, 1994; p.135-175.
9. P. Mendelsohn, P. Dillenbourg. Le développement de l’enseignement intelligemment
assisté par ordinateur. Conférence donnée à l’Association de Psychologie Scientifique de
Langue Française Symposium Intelligence Naturelle et Intelligence Artificielle, Rome, 23-
25 septembre 1991.
10. K.L. Norman. «The psychology of menu selection: designing cognitive control at the
human/computer interface», Ables Publishing, Norwood NJ, 1991.
11. O. Popov, R. Lalanne, G. Gouardères, A. Minko, A. Tretyakov. Some Tasks of Intelligent
Tutoring Systems Design for Civil Aviation Pilots. Advanced Computer Systems. The
Kluwer International Series in Engineering and Computer Science. Kluwer Academic
Publishers. Boston/Dordrecht/London, 2002
12. J. Rasmussen. «Information processing and human-machine interaction: an approach to
cognitive engineering», North-Holland, 1986.
13. J. Reason. Human error. Cambridge University Press. Cambridge, 1990.
14. P. Salles, B. Bredeweg. «A case study of collaborative modelling: building qualitative
models in ecology». ITS-2002, Workshop on Model-Based Educational Systems and
Qualitative Reasoning, San-Sebastian, Spain, June 2002.
15. J. Self. The Role of Student Models in Learning Environments. AAI/AI-ED Technical
Report No.94. In Transactions of the Institute of Electronics, Information and Communi-
cation Engineers, E77-D(1), 3-8, 1994.
16. F.E. Ritter, D. Van Rooy, F. St Amant. A user modeling design tool based on a cognitive
architecture for comparing interfaces. Proceedings of the Fourth International Conference
on Computer-Aided Design of User Interfaces (CADUI), 2002.
17. W. Lewis Johnson: Interaction tactics for socially intelligent pedagogical agents. Intelli-
gent User Interfaces 2003: 251-253.
Integrating Intelligent Agents, User Models, and
Automatic Content Categorization in a Virtual
Environment
1 Introduction
Virtual Reality (VR) becomes an attractive alternative for the development of more
interesting interfaces for the user. The environments that make use of VR techniques
are referred as Virtual Environments (VEs). In VEs, according to [2], the user is part
of the system, an autonomous presence in the environment. He is able to navigate, to
interact with objects and to examine the environment from different points of view.
As indicated in [11], the 3D paradigm is useful mainly because it offers the possibility
of representing information in a realistic way, while it organizes content in a spatial
manner. In this way, a larger intuition in the visualization of the information is ob-
tained, allowing the user to explore it in an interactive way, more natural to humans.
Nowadays, the use of intelligent agents in VEs has been explored. According to [3],
these agents when inserted in virtual environments are called Intelligent Virtual
Agents (IVAs). They act as users’ assistants in order to help to explore the environ-
ment and to locate information [8,15,16,18], being able to establish a verbal commu-
nication (e.g., using natural language) or non verbal (through body movement, ges-
tures and face expressions) with the user. The use of these agents has many advan-
tages: to enrich the interaction with the virtual environment [25]; to turn the environ
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 128–139, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Integrating Intelligent Agents, User Models, and Automatic Content Categorization 129
ment less intimidating, more natural and attractive to the user [8]; to prevent the users
from feeling lost in the environment [24].
At the same time, the systems capable of adapting its structure from a user model
have received special attention on research community, especially Intelligent Tutor-
ing Systems and Adaptive Hypermedia. According to [13], a user model is a collec-
tion of information and suppositions on individual users or user groups, necessary for
the system to adapt several aspects of its functionalities and interface. The adoption of
a user model has been showing great impact in the development of filter systems and
information retrieval [4,14], electronic commerce [1], learning systems [29] and
adaptive interfaces [5,21]. These systems have already proven to be more effective
and/or usable that non adaptive ones [10]. However, the research effort in adaptive
systems has being focused in the adaptation of traditional 2D/textual environments.
Adaptation of 3D VEs is still few explored, but considered promising [6,7].
Moreover, in relation to organizing of content in VEs, the grouping of the contents,
according to some semantic criterion, is interesting and sometimes necessary. An
approach to organization of content consists in the automatic content categorization
process. This process is based on machine learning techniques (see e.g, [28]) and
comes being applied in general context, such as web pages classification [9,20].
However, it can be adopted in the organization of content in VE context.
In this paper we present an approach that aims to integrate intelligent agents, user
models and automatic content categorization in a virtual environment. In this envi-
ronment, called AdapTIVE (Adaptive Three-dimensional Intelligent and Virtual En-
vironment), an intelligent virtual agent assists users during navigation and retrieval of
relevant information. The users’ interests and preferences, represented in a user
model, are used in the adaptation of environment structure. An automatic content
categorization process is used in the spatial organization of the contents in the envi-
ronment. In order to validate our approach, a case study of a distance-learning envi-
ronment, used to make educational contents available, is presented.
The paper is organized as follow. In section 2, the AdapTIVE architecture is pre-
sented and its main components are detailed. In section 3, the case study is presented.
Finally, section 4 presents the final considerations and future works.
2 AdapTIVE Architecture
matic content categorization process, acts in the definition of this model. From the
content model, the spatial position of each content in the environment is defined. The
representation of the contents in the environment is made by three-dimensional ob-
jects and links to the data (e.g., text document, web page). The environment generator
module is the responsible for the generation of different three-dimensional structures
that form the environment and to arrange the information in the environment, ac-
cording to the user and content models. The environment adaptation involves its reor-
ganization, in relation to the arrangement of the contents and aspects of its layout
(e.g. use of different textures and colors, according to user’s preference). In the fol-
lowing sections are detailed the main components of the environment: user model
manager, content manager and intelligent agent.
This module is responsible for the initialization and updating of user models. The user
model contains information about the user’s interests, preferences and behaviors. In
order to collect the data used in the composition of the model, the explicit and im-
plicit approaches [19,20] are used. The explicit approach is adopted to acquire the
user’s preferences compounding an initial user model and the implicit one is applied
to update this model. In the explicit approach, a form is used to collect fact data (e.g.,
name, gender, areas of interest and preferences for colors). In the implicit approach,
the monitoring of user navigation in the environment and his interactions with the
agent are made. Through this approach, the environment places visited by the user
and the requested (through the search mechanism) and accessed (clicked) contents are
monitored. These data are used to update the initial user model.
The process of updating the user model is based on rules and certainty factors (CF)
[12,17]. The rules allow to infer conclusions (hypothesis) from antecedents (evi-
dences). To each conclusion, it is possible to associate a CF, which represents the
Integrating Intelligent Agents, User Models, and Automatic Content Categorization 131
degree of belief associated to corresponding hypothesis. Thus, the rules can be de-
scribed in the following format: IF Evidence (s) THEN Hypothesis with CF = x
degree. The CFs associate measures of belief (MB) and disbelief (MD) to a hypothe-
sis (H), given an evidence (E). A CF=1 indicates total belief in a hypothesis, while
CF=-1 corresponds a total disbelief. The calculation of the CF is accomplished by the
formulas (1), (2) and (3), where P(H) represents the probability of the hypothesis (i.e.
the interest in some area), and is the probability of the hypothesis (H), given
that some evidence (E) exists. In the environment, the user’s initial interest in a given
area (initial value of P(H)) is determined by the explicit data collection and it may
vary during the process of updating the model (based on threshold of increasing and
decreasing belief), where is obtained from the implicit approach.
The evidences are related to the environment areas visited and to the requested and
accessed contents by the user. They are used to infer the hypothesis of the user’s
interest in each area of knowledge, from the rules and corresponding CFs. To update
the model the rules (4), (5), (6) and (7) were defined. The rules (4), (5) and (6) are
used when evidences of request, navigation and/or access exist. In this case, the com-
bination of the rules is made and the resultant CF is calculated - formula (8), where
two rules with CF1 and CF2 are combined. The rule (7) is used when does not exist
any evidence, indicating total lack of user interest in the corresponding area.
Each n sessions (adjustable time window), for each area, the evidences (navigation,
request and access) are verified, the inferences with the rules are made, and the CFs
corresponding to the hypothesis of interest are updated. By sorting the resulting CFs,
it is possible to establish a ranking of user’s areas of interest. Therefore, it is possible
to verify the alterations in the initial model (obtained from the explicit data collec-
132 C. Trojahn dos Santos and F. Santos Osório
tion) and, thus, to update the user model. From this update, the reorganization of the
environment is made - contents that correspond to the areas of major user’s interest
are placed, in a visualization order, before the contents which are less interesting
(easier access). It must be addressed that each modification in the environment is
always suggested to the user and accomplished only under user’s acceptance.
Our motivation to adopt rules and CFs is based on the following main ideas. First,
it is a formalism that allows to infer hypothesis of the user’s interests in the areas
from a set of evidences (e.g., navigation, request and access), also considering a de-
gree of uncertainty about the hypothesis. Second, it can be an alternative to Bayesian
Nets, an other common approach used in user modeling, considering that it doesn’t
require to know a full priori set of probabilities and conditional tables. Third, it
doesn’t require the pre-definition of user categories, as in the techniques based on
stereotypes. Moreover, it has low computational cost, is intuitive, robust and extensi-
ble (considering that it was extended, allowing to create the new type of rule). In this
way, this formalism can be considered an alternative technique in user modeling.
This module is responsible for insertion and removal of contents, and management of
its models. The content models contain the following data: category (among a pre-
defined set), title, description, keywords, type of media and corresponding file. From
content model, the spatial position that the content will occupy in the environment is
defined. The contents are also grouped into virtual rooms by main areas (categories).
For textual contents, an automatic categorization process is available, thus the cate-
gory and the keywords of the content are obtained. For non textual contents (for in-
stance, images and videos), textual descriptions of contents can be used in the auto-
matic categorization process.
The automatic categorization process is formed by a sequence of stages: (a) docu-
ment base collection; (b) pre-processing; and (c) categorization. The document base
collection consists of obtaining the examples to be used for training and test of the
learning algorithm. The pre-processing involves, for each example, the elimination of
irrelevant words (e.g., articles, prepositions, pronouns), the removal of affix of the
words and the selection of the most important words (e.g., considering the word fre-
quency occurrence), used to characterize the document. In the categorization stage,
the learning technique is then determined, the examples are coded, and the classifier
learning is accomplished. After these stages, the classifier can be used in the categori-
zation of new documents. In a set of preliminary experiments (details in [26]), deci-
sion trees [23] showed to be more robust and were selected for use in the categoriza-
tion process proposed in the environment. In these experiments, the pre-processing
stage was supported by an application, extended from a framelet (see [22]), whose
kernel contemplates the basic flow of data among the activities of the pre-processing
stage and generation of scripts submitted to the learning algorithms. After, the
“learned model” - rules extracted from the decision tree - is connected to the content
manager module, in order to use it in the categorization of new documents. Thus,
Integrating Intelligent Agents, User Models, and Automatic Content Categorization 133
when a new document is inserted in the environment, it is pre-processed, has its key-
words extracted and is automatically categorized and positioned in the environment.
The intelligent virtual agent assists users during navigation and retrieval of relevant
information. The agent’s architecture reflects the following modules: knowledge base,
perception, decision and action. The agent’s knowledge base stores the information
that it holds about the user and the environment. This knowledge is built from two
sources of information: external source and perception of the interaction with the
user. The external source is the information about the environment and the user, and
they are originated from the environment generator module. A perception module
observes the interaction with the user, and the information obtained from this obser-
vation is used to update the agent’s knowledge. It is through the perception module
that the agent detects the requests from user and observes the user’s actions in the
environment. Based on its perception and in the knowledge that it holds, the agent
decides how to act in the environment. A decision module is responsible for this ac-
tivity. The decisions are passed to an action module, responsible to execute the ac-
tions (e.g., animation of graphic representation and speech synthesis).
The communication between the agent and the users can be made by three ways: in
a verbal way, through a pseudo-natural language and speech synthesis1, and non ver-
bal way, by the agent’s actions in the environment. The dialogue in pseudo-natural
language consists of a certain group of questions and answers and short sentences,
formed by a verb that corresponds to the type of user request and a complement,
regarding the object of user interest. During the request for helping to locate informa-
tion, for instance, the user can indicate (in textual interface) Locate <content>. The
agent’s answers are suggested by its own movement through the environment, by
indications through short sentences, and by text-to-speech synthesis. In the interaction
with provider, during the insertion of content, he can indicate Insert <content>, and
the agent presents the data entry interface for the specification, identification and
automatic categorization of the content model.
Moreover, a topological map of the environment is kept in the agent’s knowledge
base. In this map, a set of routes for key-positions of the environment is stored. In
accordance with the information that the agent has about the environment and with
the map, it defines a set of routes that must be used in the localization of determined
content or used to navigate until determined environment area. Considering that the
agent updates its knowledge for each modification in the environment, it is always
able to verify the set of routes that leads to a new position of a specific content.
1
JSAPI (Java Speech API)
134 C. Trojahn dos Santos and F. Santos Osório
According to the user model, the reorganization of this environment is made: the
rooms that correspond to the areas of major user’s interest are placed, in a visualiza-
tion order, before the rooms which contents are less interesting. The initial user
model, based on explicit approach, is used to structure the initial organization of the
environment. This involves also the use of avatars according to gender of user and the
consideration of users’ preferences by colors. As the user interacts with the environ-
ment, his model is updated and changes in the environment are made. After n sessions
(time window), for each area, the evidences of interest (navigation, request and ac-
cess) are verified, in order to update the user model. For instance, with a user, who is
interested about Artificial Intelligence (AI), is indifferent to contents related to the
areas of Computer Networks (CN) and Computer Graphics (CG), and does not show
initial interest about Software Engineering (SE), the initial values of the CFs, at the
beginning of the first session of interaction (without evidences), would be respec-
tively 1, 0, 0 e -1. After doing some navigations (N), requests (R) and access (A),
presented in the graph of Fig. 3, the CFs can be reevaluated. According to Fig. 3, it is
verified that the CN area was not navigated, requested and accessed, and on the other
side, the user started to navigate, to request, and to access contents in SE area. As
presented in the graph of Fig. 4, an increasing of the CFs had been identified as re-
Integrating Intelligent Agents, User Models, and Automatic Content Categorization 135
lated to the SE area. In that way, at end of the seventh session, the resulting CFs
would be 1, -1, 0.4 and 0.2 (AI, CN, CG, SE, respectively). By sorting the resulting
CFs, it would be possible to detect an alteration in the user model, whose new ranking
of the interest areas would be AI, CG, SE, CN.
Fig. 3. Number of navigations (N), requests (R) and access (A) of each area, for session.
Fig. 5 (a) and (b) represent an example of the organization of the environment (2D
view) before and after a modification in the user model, respectively, as showed in
the example above.
Fig. 5. (a) Organization of the environment according to initial user model; (b) Organization of
the environment after the user model changes.
136 C. Trojahn dos Santos and F. Santos Osório
On the other side, in relation to contents in the environment, the following types
are supported: ,
The types that correspond to 2D and 3D images and
and videos are represented directly in the 3D space. The other types
and are represented through 3D objects and links to
content details (visualized using the corresponding application/plug-in). Moreover,
the sounds and are activated when the user navigate or click on some
object. The Fig. 6 (a) shows a simplified representation of a neural network
and a 2D image of a type of neural network (Self Organizing Maps); Fig 6 (b)
presents a 3D object and the visualization of corresponding content details
Fig. 6 (c) shows the representation of computers in the room of Protocols.
Fig. 6. (a) 3D and 2D contents; (b) 3D object and link to content details; (c) 3D content.
quest of the user for the localization of determined area and the movement of the
agent, together with a 2D environment map, used as an additional navigation re-
source; the localization of an sub-area by the agent; the user visualization of a content
and the visualization of details of it, after selection and click in a specific content
description.
Fig. 7. (a) Request of the user; (b) Localization of a sub-area; (c) Visualization of contents.
4 Final Remarks
This paper presented an approach that integrates intelligent agents, user models and
automatic content categorization in a virtual environment. The main objective was to
explore the resources of Virtual Reality, seeking to increase the interactivity degree
between the users and the environment. A large number of distance learning envi-
ronments make content available through 2D environments, usually working with
interfaces in HTML, offering poor interaction with the user. The spatial reorganiza-
tion possibilities and the environment customization, according to the modifications
in the available contents and the user models were presented. Besides, an automatic
content categorization process that aims to help the specialist of the domain (pro-
vider) in the information organization in this environment was also shown. An intelli-
gent agent that knows the environment and the user and acts assisting him in the
navigation and location of information in this environment was described. A standout
of this work is that it deals with the acquisition of users’ characteristics in a three-
dimensional environment. Most of the works related to user model acquisition and
environment adaptation are accomplished using 2D interfaces. Moreover, a great
portion of these efforts in the construction of Intelligent Virtual Environments don’t
provide the combination of user models, assisted navigation and retrieval of informa-
tion, and, mainly, don’t have the capability to reorganize the environment, and dis-
play the contents in a 3D space. Usually, only a sub-group of these problems is con-
sidered. This work extends and improves these capabilities 3D environments.
138 C. Trojahn dos Santos and F. Santos Osório
References
1. Abbattista, F.; Degemmis, M; Fanizzi, N.; Licchelli, O. Lops, P.; Semeraro, G.; Zambetta,
F.: Learning User Profile for Content-Bases Filtering in e-Commerce. Workshop Ap-
prendimento Automatico: Metodi e Applicazioni, Siena, Settembre, 2002.
2. Avradinis, N.; Vosinakis, S.; Panayiotopoulos, T.: Using Virtual Reality Techniques for
the Simulation of Physics Experiments. 4th Systemics, Cybernetics and Informatics Inter-
national Conference, Orlando, Florida, USA, July, 2000.
3. Aylett, R. and Cavazza, M.: Intelligent Virtual Environments - A state of the art report.
Eurographics Conference, Manchester, UK, 2001.
4. Billsus, D. and Pazzani, M.: A Hybrid User Model for News Story Classification. Pro-
ceedings of the 7th International Conference on User Modeling, Banff, Canada, 99-108,
1999.
5. Brusilovsky, P.: Adaptive Hypermedia. User Modeling and User-Adapted Interaction, 11,
87-110, Kluwer Academic Publishers, 2001.
6. Chittaro L. and Ranon R.: Adding Adaptive Features to Virtual Reality Interfaces for E-
Commerce. Proceedings of the International Conference on Adaptive Hypermedia and
Adaptive Web-based Systems, Lecture Notes in Computer Science 1892, Springer-Verlag,
Berlin, August, 2000.
7. Chittaro, L. and Ranon, R.: Dynamic Generation of Personalized VRML Content: A Gen-
eral Approach and its Application to 3D E-Commerce. Proceedings of 7th Conference on
3D Web Technology, USA, February, 2002.
8. Chittaro, R.; Ranon, R.; Ieronutti, L.: Guiding Visitors of Web3D Worlds through Auto-
matically Generated Tours. Proceedings of the 8th Conference on 3D Web Technology,
ACM Press, New York, March, 2003.
9. Duarte, E.; Braga, A.; Braga, J.: Agente Neural para Coleta e Classificação de Informações
Disponíveis na Internet. Proceeding of the 16th Brazilian Symposium on Neural Net-
works, PE, Brazil, 2002.
10. Fink. J. and Kobsa, A.: A Review and Analysis of Commercial User Modeling Server for
Personalization on the World Wide Web. User Modeling and User Adapted Interaction,
10(3-4), 209-249, 2000.
11. Frery, A.; Kelner, J.; Moreira, J., Teichrieb, V.: Satisfaction through Empathy and Orien-
tation in 3D Worlds. CyberPsychology and Behavior, 5(5), 451-459, 2002.
12. Giarratano, J. and Riley, G.: Expert Systems - Principles and Programming. 3 ed., PWS,
Boston, 1998.
13. Kobsa, A.: Supporting User Interfaces for All Through User Modeling. Proceedings of
HCI International, Japan, 1995.
14. Lieberman, H.: Letizia: An Agent That Assist Web Browsing. International Joint Confer-
ence on Artificial Intelligence, Montreal, 924-929,1995.
15. Milde, J.: The instructable Agent Lokutor. Workshop on Communicative Agents in Intel-
ligent Virtual Environments, Spain, 2000.
16. Nijholt, A. and Hulstijn, J.: Multimodal Interactions with Agents in Virtual Worlds. In:
Kasabov, N. (ed.): Future Directions for Intelligent Information Systems and Information
Science, Physica-Verlag: Studies in Fuzziness and Soft Computing, 2000.
17. Nikolopoulos, C.: Expert Systems - Introduction to First and Second Generation and
Hybrid Knowledge Based Systems. Eds: Marcel Dekker, New York, 1997.
18. Panayiotopoulos, T.; Zacharis, N.; Vosinakis, S.: Intelligent Guidance in a Virtual Univer-
sity. Advances in Intelligent Systems - Concepts, Tools and Applications, 33-42, Kluwer
Academic Press, 1999.
Integrating Intelligent Agents, User Models, and Automatic Content Categorization 139
19. Papatheodorou, C.: Machine Learning in User Modeling. Machine Learning and Applica-
tions. Lecture Notes in Artificial Intelligence. Springer Verlag, 2001.
20. Pazzani, M. and Billsus, D.: Learning and Revising User Profiles: The identification on
Interesting Web Sites. Machine Learning, 27(3), 313-331, 1997.
21. Perkowitz, M. and Etzioni, O.: Adaptive Web Sites: Automatically synthesizing Web
pages. Fifteen National Conference in Artificial Intelligence, Wisconsin, 1998.
22. Pree, W. and Koskimies, K.: Framelets-Small Is Beautiful, A Chapter in Building Appli-
cation Frameworks: Object Oriented Foundations of Framework Design. Eds: M.E. Fayad,
D.C. Schmidt, R.E. Johnson, Wiley & Sons, 1999.
23. Quinlan, R. C4.5: Programs for Machine Learning. Morgan Kaufmann, Sao Mateo, Cali-
fornia, 1993.
24. Rickel, J. and Johnson, W.: Task-Oriented Collaboration with Embodied Agents in Virtual
Worlds. In J. Cassell, J. Sullivan, S. Prevost, and E. Churchill (Eds.), Embodied Conver-
sational Agents, 95-122. Boston: MIT Press, 2000.
25. Rickel, J.; Marsella, S.; Gratch, J.; Hill, R.; Traum, D.; Swartout W.: Toward a New Gen-
eration of Virtual Humans for Interactive Experiences. IEEE Intelligent Systems, 17(4),
2002.
26. Santos, C. and Osorio, F.: Técnicas de Aprendizado de Máquina no Processo de
Categorização de Textos. Internal Research Report
(http://www.inf.unisinos.br/~cassiats/mestrado), 2003.
27. Santos, C. and Osório, F.: An Intelligent and Adaptive Virtual Environment and its Appli-
cation in Distance Learning. Advanced Visual Interfaces, Italy, May, ACM Press, 2004.
28. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Sur-
veys, 34(1), 1-47, 2002.
29. Self, J.: The defining characteristics of intelligent tutoring systems research: ITSs care,
precisely. International Journal of Artificial Intelligence in Education, 10, 350-364,1999.
EASE: Evolutional Authoring Support Environment
Abstract. How smart should we be in order to cope with the complex authoring
process of smart courseware? Lately this question gains more attention with
attempts to simplify the process and efforts to define authoring systems and
tools to support it. The goal of this paper is to specify an evolutional perspective
on the Intelligent Educational Systems (IES) authoring and in this context to
define the authoring framework EASE: powerful in its functionality, generic in
its support of instructional strategies and user-friendly in its interaction with the
author. The evolutional authoring support is enabled by an authoring task
ontology that at a meta-level defines and controls the configuration and tuning
of an authoring tool for a specific authoring process. In this way we achieve
more control over the evolution of the intelligence in IES and reach a
computational formalization of IES engineering.
For many years now, various types of Intelligent Educational Systems (IES) have
proven to be well accepted and have gained a prominent place in the field of
courseware [15]. IES also have proven [8, 14] that they are rather difficult to build
and maintain, which became, and still is, a prime obstacle for their wide spread
popularization. The dynamic user demands in many aspects of software production
are influencing research in the field of intelligent educational software as well [1].
Problems are related to keeping up with the constant requirements for flexibility and
adaptability of content and for reusability and sharing of learning objects [10].
Thus, the IES engineering is a complex process, which could benefit from a
systematic approach, based on a common models and a specification framework. This
will offer a common framework, to identify general design and development phases,
to modularize the system components, to separate the modeling of various types of
knowledge, to define interoperability points with other applications, to reuse subject
domains, tutoring and application independent knowledge structures, and finally to
achieve more flexibility and consistency within the entire authoring process. Beyond
the point of creation of IES, such a common engineering framework will allow for
structured analysis and comparison of IES and their easy maintainability.
Currently, a lot of effort is focused on improving of IES authoring tools to simplify
the process and allow time-efficient creation of IES [14, 17, 21]. Despite this massive
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 140–149, 2004.
© Springer-Verlag Berlin Heidelberg 2004
EASE: Evolutional Authoring Support Environment 141
The approach we take follows up on the efforts to elicit requirements for IES
authoring, define a reference model and modularize the architecture of IES authoring
tools. We describe a model-driven design and specification framework that provides
functionality to bridge the gap between the author and the authoring system by
managing the increased intelligence. It accentuates the separation of concerns
between subject domain, user aspects, application and the final presentation of the
educational content. It allows to overcome inconsistencies and to automate the
authoring tasks. We show how the scheme from [14] can be filled with the ‘entire
intelligence of IES’, split into collaborative knowledge components.
First, we look at the increased intelligence. Authoring of IES is a process with an
exponentially growing complexity and it requires many different types of knowledge
and considering various constraints, requirements and educational strategies [16].
Aiming at (semi)-automated IES authoring we need to have explicit representations of
the strategic knowledge (rules, requirements, constraints) in order to be able to reason
142 L. Aroyo et al.
Characteristically, ITS [14], maintain and work with knowledge of the expert,
learner, and tutoring strategies, to capture the student’s understanding of the domain
and to tailor instructional strategies to the concrete student’s needs. Adaptive
Hypermedia reference architectures [8] define a domain, a user and an adaptation
(teaching) model used to achieve the content adaptation.
Analogously, Web-based Educational Systems [2] distinguish a domain, a user and
an application models, connecting the domain and user models to give a personalized
view of the learning resources. A task model specifies the concrete sequence of tasks
in an adaptive way. As a consequence, [4] distinguish three IES design stages: (1)
conceptual modeling of domain and resources, (2) the modeling of application
aspects, and (3) simulated use of the user model. Thus, the provision of user-oriented
(adapted) instruction and adequate guidance in IES depends on:
maintaining a model of the domain, describing the structure of the
information content within IES (based on concepts and their relationships);
maintaining a personalized portal to a large collection of well organized and
structured learning/teaching material resources.
maintaining a model of the user to reflect the user’s preferences, knowledge,
goals, and other relevant instructional aspects;
maintaining the application intelligence in instructional design, testing,
adaptation and sequencing models;
a specific engine to execute the prepared educational structure or sequences.
We organize the common aspects of IES in a model-driven reference approach to
allow for a modularization of authoring concerns interoperability of IES components.
In line with the IES model defined in the previous section we structure the complexity
of the entire authoring process by grouping various authoring activities to:
model the domain as a representation of the domain knowledge;
annotate, maintain, update and create learning objects;
define the learning goals;
select and apply instructional strategies for individual and group learning;
select and apply assessment strategies for individual and group learning;
specify a learner model with learner characteristics;
specify learning sequence(s) out of learning and assessment activities.
To support these authoring tasks we employ knowledge models and capture all the
processes related to those tasks in corresponding authoring modules as shown in
Figure 2. It defines three levels of abstraction for building an IES. At the product level
we see the final IES. At the authoring instance level the actual IES authoring takes
place by instantiation of the meta-schema with the actual IES authoring concepts,
models and behavior. At the meta-authoring we exploit the generic authoring task
ontology (ATO) [3, 4] as a main knowledge component in a meta-authoring system
and as a conceptual structure of the entire authoring process. A repository of domain-
independent authoring components is defined at this level.
At the instance level we exploit ontologies as a way to conceptualize the authoring
knowledge in IES. Corresponding ontologies (e.g. for Domain Model, Instructional
Strategies, Learning Goal, Test Generation, Resource Management, User Model) are
defined to represent the knowledge and important concepts in each of those authoring
modules.
Our final goal with this three-layer approach is to realize an evolutional (self-
evolving) authoring system, which will be able to reason over its own behavior and
based on statistical and other intelligent computations will be able to add new rules or
change existing ones in the different parts of the authoring process.
144 L. Aroyo et al.
5.1 Communication
The core of the intelligence in the EASE architecture comes from the communication
or interactions between the components. There are two “central” components here, the
Sequencing Strategies Authoring (SS) and the Authoring Interface (AI). The AI is the
access point for the author to interact with the underlying concepts, models and
content. The SS interacts with the other components in order to achieve the most
appropriate learning sequence for the targeted learner. In this section we illustrate the
communication exchange among EASE components, which will further result in the
authoring support guidance provided by an EASE-based authoring system.
146 L. Aroyo et al.
An authoring support rule in the CLS’s knowledge base on the other hand produces
recommendations and can be triggered by either the author or the system. For
example:
6 Conclusion
Our aim in this research is to specify a general authoring framework for content and
knowledge engineering for Intelligent Educational Systems (IES). The main added
value of this approach is that on the one hand the ontologies in it make the authoring
knowledge explicit, which improves the basis for sharing and reusing. On the other
hand, it is configurable through an evolutional approach. Finally, this knowledge is
implementable, since all higher-level (meta-level) constructs are expressed with a
limited class of generic primitives out of lower-level constructs. Thus, we set the
ground for a new generation of evolutional authoring systems, which meet the high
requirements for flexibility, user-friendliness and efficiency in maintainability.
We have described reference model for IES and in connection with it a three-level
model for IES authoring. For this EASE framework we have identified the main
intelligence components and have illustrated their interaction. Characteristic for
EASE is the use of ontologies to provide common vocabulary and common
understanding of the entire IES authoring processes. This allows for interoperation
between different applications and authors.
References
1. Ainsworth, S., Major, N., Grimshaw, S., Hayes, M., Underwood, J., Williams, B., &
Wood, D. (2003). REDEEM: Simple Intelligent Tutoring Systems from Usable Tools, In
Murray, Ainsworth, & Blessing (eds.), Authoring Tools for Adv. Tech. Learning Env.,
205-232.
2. Aroyo, L., Dicheva, D., & Cristea, A. (2002). Ontological Support for Web Courseware
Authoring. In Proceedings of ITS 2002 Conference, 270-280.
3. Aroyo, L, & Mizoguchi, R. (2003). Authoring Support Framework for Intelligent
Educational Systems. In Proceedings of AIED 2003 Conference.
4. Aroyo, L. & Mizoguchi, R. (2004). Towards Evolutional Authoring Support. Journal for
Interactive Learning Research. (in print)
5. Anderson, J., Corbett, A. Koedinger, K., & Pelletier, R. (1995). Cognitive tutors: Lessons
learned. The Journal of the Learning Sciences, 4(2), 167-207.
EASE: Evolutional Authoring Support Environment 149
Abstract. This paper introduces the rationale for concrete situations in the
authoring process that can exploit a theory-aware Authoring Environment. It
illustrates how Ontological Engineering (OE) can be instrumental in
representing the declarative knowledge needed, and how an added value in
terms of intelligence can be expected for both authoring and for learning
environments.
1 Introduction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 150–161, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Selecting Theories in an Ontology-Based ITS Authoring Environment 151
Exploring the power of ontologies for ITS and for the authoring process raises the
following question: what is the full power of an ontology-based system when
adequately deployed? A successful experiment was conducted by Mizoguchi [6] in
deploying ontology-based knowledge systematization of functional knowledge into a
production division of a large company. Although the domain is different from
educational knowledge, we believe that it is applicable to the knowledge
systematization of learning and instructional sciences. One of the key claims of our
knowledge systematization is that the concept of function should be defined
independently of an object that can possess it and of its realization method. This in
effect releases the function for re-use in multiple domains.
Consider: If functions are defined depending on objects and their realization, few
functions are reused in different domains. In the systematization reported in
152 J. Bourdeau et al.
Mizoguchi, a six-layer ontology and knowledge base was built using functional
knowledge representation frameworks to capture, store and share functional
knowledge among engineers and enable them to reuse that functional knowledge in
their daily work life with the help of a functional knowledge server. It was
successfully deployed inside the Production Systems Division of Sumitomo Electric
Industry, Ltd., with the following results: 1) the same document can be used for
redesign, design review, patent writing and troubleshooting; 2) Patent writing process
is reduced by one third; 3) design review goes extremely better than before, 4)
troubleshooting is much easier than before, 5) it enables collaborative work among
several kinds of engineers. It demonstrates that operational knowledge systems based
on developed ontologies can work effectively in a real world situation.
What is the similarity of situations in the manufacturing and the educational
domains? Both have rich concepts and experiential knowledge. However, neither a
common conceptual infrastructure nor shareable knowledge bases are available in
those domains. Rather, each is characterized by multiple viewpoints and a variety of
concepts. The success reported in [6] leads us to believe that similar results can be
obtained in the field of ITS, and that efforts should be made towards achieving this
goal of building ITS frameworks capable of sharing educational knowledge.
The power of the intelligent behaviour of an ITS Learning Environment relies on the
knowledge stored in it. This knowledge deals with domain expertise, pedagogy,
interaction and tutoring strategy. Each of those dimensions is usually implemented as
an agent-based system. A simplified view of an ITS is that of a multi-agent system in
which domain expert, pedagogical and tutoring agents cooperate to deliver an optimal
learning environment with respect to the learning goals. In order to achieve this, ITS
agents need to share common interpretations during their interactions. How does
ontology engineering contribute to this?
Several views or theories of domain knowledge taxonomy can be found in the
literature as well as discussions of how this knowledge can be represented in a
knowledge-based system for learning/teaching purposes. For instance, Gagné et al.
suggested five categories of knowledge that are responsible for most human activity;
some of these categories include several types of knowledge. Merrill suggested a
different view of possible learning outcomes or domain knowledge. Even if there are
some intersections between existing taxonomies, it is very difficult to implement a
system that can integrate these different views without a prior agreement on the
semantics of what the student should learn.
We believe that ontological engineering can help the domain expert agent to deal
with these different views in two ways: 1) by defining “things” associated with each
taxonomy and their semantics, the domain knowledge expert can then inform other
agents of the system in the course of their interaction. 2) by creating an ontology for a
Selecting Theories in an Ontology-Based ITS Authoring Environment 153
meta-taxonomy which can include different views. We are experimenting with each
of these approaches.
Ontological engineering can also be instrumental for including different
instructional theories into the same pedagogical agent: for example Gagné’s learning
events theory, or Merrill component-based theory. This could lead to the development
of multi-instructional theories-based ITS which could exploit principles from one
instructional theory or another with respect to the current instructional goal.
Furthermore, ontology engineering is essential for the development of ITSs in
which several agents (tutor, instructional planner, expert, profiler) need to agree about
the definition and the semantics of the things they share during a learning session.
Even if pure multi-agent platforms based on standards such as FIPA offer ontology-
free ACL (Agent Communication Language), ontological engineering is still
necessary because the ontology defines shared concepts, which are in turn sent to the
other party during the communication. It is possible to implement using FIPA-ACL
standards in which “performatives” (communication acts between agents) can take a
given ontology as a parameter to make it possible for the other party to understand
and to interpret concepts or things included in the content of the message.
By adding intelligence to ITS authoring environments in the form of theory-aware
environments, we could also provide not only curriculum knowledge, not only
instructional strategies, but also the foundations, the rationale upon which the tutoring
system relies and acts. As a result of having ontology-based ITS authoring
environments, we can ensure that: 1) the ITS generated can be more coherent, well-
founded, scrutable, and expandable; 2) the ITS can explain and justify to learners the
rationale behind an instructional strategy (based on learning theories), and therefore
support metacognition; 3) the ITS can even offer some choice to learners in terms of
instructional strategies, with pros and cons for each option, thus supporting the
development of autonomy and responsibility.
Having access to multiple theories (instead of one) in an ITS authoring
environment such as CREAM-tools [13] would offer added value through: 1) the
possible diversity offered to authors and learners, through the integration of multiple
theories into a common knowledge base, 2) a curriculum planning module that would
then be challenged to select among theories, and would therefore have to be more
intelligent, and 3) an opportunity for collaboration between authors (even with
different views of instructional or learning theory) in the development of ITS
modules.
Dependence between instructional theory and strategy is best illustrated in the book
edited by Reigeluth: ‘Instructional Theories in Action: Lessons Illustrating Selected
Theories and Models [14]. Reigeluth asked several authors to design a lesson based
on a specific theory, having in common the subject matter, the objectives, and the test
items. The objectives included both concept learning and skill development. The
lesson is an introduction to the concepts of lens, focus and magnitude in optics. The
book offers eight variations of the lesson, each one being an implementation of eight
existing and recognized theories. Reigeluth gives as warnings to this exercise that: 1)
despite the fact that each theory uses its own terminology, they have much in
common, 2) each theory has its limitations, and none of them covers the full
complexity of the design problem, none of them takes into account the large number
of variables that play a role, 3) the main variation factor is how much the strategy is
appropriate to the situation, 4) authors would benefit to know and have access to all
existing ones. In the same book, Schnellbecker, in his effort to compare and contrast
the different approaches, underlines that there is no such thing as a ‘truth character’ in
the selection of a model. Variations among the lessons are of two kinds: intra-theory
156 J. Bourdeau et al.
and inter-theory. Each implementation of one theory is specific, and could give room
to a range of other strategies, all of them referring to the same principles. Inter-theory
variations represent fundamental differences in how one views learning and
instruction, in terms of paradigm.
Since we are mainly interested in the examination of variations among theories, we
concentrated on inter-theories variations. We selected three theories: Gagné-Briggs,
Merrill, and Collins, and we selected one objective of concept learning (skill
development is to be examined under a different study). The Gagné-Briggs theory of
instruction was the first one to directly and explicitly rely on a learning theory. It is
comprehensive of cognitive, affective and psycho-motor knowledge; the goal of
learning is performance, and the goal of instruction is effectiveness. Merrill’s
Component Display theory shares the same paradigm as Gagné-Briggs’, but suggests
a different classification of objectives, and provides more detailed guidelines for
learning organization and learning material. The lesson drawn by Collins refers to
principles extracted from good practices and to scientific inquiry as a metaphor to
learning; its goals are oriented towards critical thinking rather than performance.
This section documents the methodology used for building the ontology and the
models, and presents the results. The view of an ontology and of ontological
engineering is the one developed and applied at Mizlab [15]. From the three steps
proposed by Mizoguchi [15], the first one, called Level 1, has been developed, and
consists of term extraction and definition, hierarchy building, relation setting, and
model building. A Use Case was built to obtain the ontological commitments needed
[16], and competency questions were sketched as suggested by Gruninger and Fox
[17]. Concept extraction was performed based on the assumptions expressed by Noy
and McGuiness [18]. The definition of a ‘role’ refers to Kozaki [19].
The Ontology environment used for the development of the ontology is Hozo [19],
an environment composed of a graphic interface, an ontology editor and an ontology
and models server in a client-server architecture.
Use-case. The definition of the domain of instruction has been done based on the
ideas developed in [8]. A set of competency questions [17] was also sketched, as well
as preliminary queries that our authoring environment prototype should be able to
answer, such as: What is the most appropriate instructional theory? Which kind of
learning activity or material do we need based on instructional theory chosen? At this
stage we made ontological commitments [16] to which domain we wanted to model,
how we wanted to do it, under which goals, in which order, and in which
environment. The use case was done from the point of view of an author (human or
software) having to select an instructional strategy to design a lesson. The lesson
selection is usually done based on the learning conditions that have been previously
identified. The result generated by the authoring environment is an instructional
scenario based on the instructional strategy which best satisfies the learning
conditions. Building the Use Cases was done by analyzing the expectations for
Selecting Theories in an Ontology-Based ITS Authoring Environment 157
According to these uses cases, the author is informed of: the necessary
prerequisites to reach the lesson objective, the learning content, the teaching strategy,
the teaching material, the assessment, the activities order and type. The activities
proposed are based on Gagné’s instructional events, Merrill’s performance/content
matrix and Collins’s instructional techniques.
Term extraction and definition. This operation was conducted based on the
assumptions [18] that; 1) there is no one correct way to model a domain, 2) ontology
development is necessarly an iterative process, 3) concepts in the ontology should be
158 J. Bourdeau et al.
Create instances (models). The models were built by instantiating the ontology
concepts, then by connecting the instances to each other. The consistency checking
Selecting Theories in an Ontology-Based ITS Authoring Environment 159
of the model is done using the axioms defined in the ontology. The model is then
ready to be used by others agents (human, software or both).
Three models have been built that rely on the main ontology, and relate to the Use
Cases. These models of scenarios focus on the teaching/learning interaction based on
each respective Instructional Theory. Figure 2 presents the model for the Gagné-
Briggs theory.
Six out of Gagné’s nine events of instruction, which are needed to achieve the
lesson scenario, are presented in Figure 2. The activities involved in the achievement
of the lesson objective, are represented according to “Remember-Generality-Concept”
from Merrill’s performance/content matrix. In the same way, six of Collins’s ten
techniques of instruction, which are needed to achieve the lesson scenario according
to this Theory, are represented in their event order. The most interesting part of these
three models is that they explicitly show the role of each participant during the
activities based on each theory.
5 Conclusion
themselves, and one should not expect from ontological engineering of theoretical
knowledge more than what can be expected from the theories themselves.
This paper illustrates the idea that an Ontology-based ITS Authoring Environment
can enrich the authoring process as well curriculum planning. One example is
provided of how a theory-aware authoring environment allows for principled design,
provides explicit justification for selection, may stimulate reflection among authors,
and may pave the way to an integrated knowledge base of instructional theories. A
theory-aware Authoring Environment also allows for principled design when it comes
to assembling, aggregating and integrating learning objects by applying principles
from theories. Further work in this direction will lead us to develop the system’s
functionalities, to implement them in an ITS authoring environment, and to conduct
empirical evaluation.
References
1. Mizoguchi R. and Bourdeau J., Using Ontological Engineering to Overcome Common AI-
ED Problems. International Journal of Artificial Intelligence and Education, 2000.
vol.11 (Special Issue on AIED 2010): p. 107-121.
2. Mizoguchi R. and Bourdeau J. Theory-Aware Authoring Environment : Ontological
Engineering Approach. in Proc. of the ICCE Workshop on Concepts and Ontologies in
Web-based Educational Systems. 2002. Technische Universiteit Eindhoven.
3. Mizoguchi R. and Sinitsa K. Architectures and Methods for Designing Cost-Effective and
Reusable ITSs. in Proc. ITS’96. 1996. Montreal.
4. Chen W., et al. Ontological Issues in an Intelligent Authoring Tool. in ICCE’98. 1998.
5. Mizoguchi R., et al., Construction and Deployment of a Plant Ontology. The 12th
International Conference, EKAW2000,, 2000 (Lecture Notes in Artificial Intelligence
1937): p. 113-128.
6. Mizoguchi R. Ontology-based systematization of functional knowledge. in
TMCE2002:Tools and methods of competitive engineering. 2002. China.
7. Rubin D. L., et al., Representing genetic sequence data for pharmacogenomics: an
evolutionary approach using ontological and relational models. 2002. 18(1): p. 207-215.
8. Bourdeau J. and Mizoguchi R. Collaborative Ontological Engineering of Instructional
Design Knowledge for an ITS Authoring Environment. in ITS 2002. 2002: Springer,
Heidelberg.
9. Murray T., Authoring intelligent tutoring systems: an analysis of the state of the art.
IJAIED, 1999. 10: p. 98-129.
10. Kay J. and Holden S. Automatic Extraction fo Ontologies from Teaching Document
Metadata. in ICCE Workshop on Concepts and Ontologies in Web-based Educational
Systems. 2002. Technische Universiteit Eindhoven.
11. Paquette G. and Rosca I., Organic Aggregation of Knowledge Objects in Educational
Systems. Canadian Journal of Learning and Technology, 2002. vol. 28(No. 3): p. 11-26.
12. Aroyo L. and Dicheva D. Authoring Framework for Concept-based Web Information
Systems. in ICCE Workshop on Concepts and Ontologies in Web-based Educational
Systems. 2002. Technische Universiteit Eindhoven.
Selecting Theories in an Ontology-Based ITS Authoring Environment 161
13. Nkambou R., Frasson C., and Gauthier G., Cream-Tools: an authoring environment for
knowledge engineering in intelligent tutoring systems, in Authoring Tools for Advanced
Technology Learning Environments : Toward cost-effective adaptative, interactive, and
intelligent educational software, B.S.a.A.S. Murray T., Editor. 2002, Kluwer Academic
Publishers.
14. Reigeluth C. M., ed. Instructional theories in action: lessons illustrating, selected theories
and models. 1993, LEA.
15. Mizoguchi R. A Step Towards Ontological Engineering. in 12th National Conference on
AI of JSAI. 1998.
16. Davis R., Shrobe H., and Szolovits P., What Is a Knowledge Representation? AI
Magazine, 1993.
17. Gruninger M. and Fox M.S. Methodology for the Design and Evaluation of Ontologies. in
Workshop on Basic Ontological Issues in Knowledge Sharing, IJCAI-95. 1995. Montreal.
18. Noy N. F. and McGuinness D. L., Ontology Development 101: A Guide to Creating Your
First Ontology. 2000.
19. Kozaki K., et al., Development of an environment for building ontologies which is based
on a fundamental consideration of relationship and role. 2001.
Opening the Door to Non-programmers:
Authoring Intelligent Tutor Behavior by Demonstration
Abstract. Intelligent tutoring systems are quite difficult and time intensive to
develop. In this paper, we describe a method and set of software tools that ease
the process of cognitive task analysis and tutor development by allowing the
author to demonstrate, instead of programming, the behavior of an intelligent
tutor. We focus on the subset of our tools that allow authors to create “Pseudo
Tutors” that exhibit the behavior of intelligent tutors without requiring AI pro-
gramming. Authors build user interfaces by direct manipulation and then use a
Behavior Recorder tool to demonstrate alternative correct and incorrect actions.
The resulting behavior graph is annotated with instructional messages and
knowledge labels. We present some preliminary evidence of the effectiveness
of this approach, both in terms of reduced development time and learning out-
come. Pseudo Tutors have now been built for economics, analytic logic,
mathematics, and language learning. Our data supports an estimate of about
25:1 ratio of development time to instruction time for Pseudo Tutors, which
compares favorably to the 200:1 estimate for Intelligent Tutors, though we ac-
knowledge and discuss limitations of such estimates.
1 Introduction
Intelligent Tutoring Systems have been successful in raising student achievement and
have been disseminated widely. For instance, Cognitive Tutor Algebra is now in
more than 1700 middle and high schools in the US [1] (www.carnegielearning.com).
Despite this success, it is recognized that intelligent tutor development is costly and
better development environments can help [2, 3]. Furthermore, well-designed devel-
opment environments should not only ease implementation of tutors, but also im-
prove the kind of cognitive task analysis and exploration of pedagogical content
knowledge that has proven valuable in cognitively-based instructional design more
generally [cf., 4, 5]. We have started to create a set of Cognitive Tutor Authoring
Tools (CTAT) that support both objectives. In a previous paper, we discussed a num-
ber of stages of tutor development (e.g., production rule writing and debugging) and
presented some preliminary evidence that the tools potentially lead to substantial
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 162–174, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Opening the Door to Non-programmers: Authoring Intelligent Tutor Behavior 163
savings in the time needed to construct executable cognitive models [6]. In the current
paper, we focus on the features of CTAT that allow developers to create intelligent
tutor behavior without programming. We describe how these features have been used
to create “Pseudo Tutors” for a variety of domains, including economics, LSAT
preparation, mathematics, and language learning, and present data consistent with the
hypothesis that these tools reduce the time to develop educational systems that pro-
vide intelligent tutor behavior.
A Pseudo Tutor is an educational system that emulates intelligent tutor behavior,
but does so without using AI code to produce that behavior. (It would be more accu-
rate, albeit more cumbersome, to call these “Pseudo Intelligent Tutors” to emphasize
that it is the lack of an internal AI engine that makes them “pseudo,” not any signifi-
cant lack of intelligent behavior.) Part of our investigation in exploring the possibili-
ties of Pseudo Tutors is to investigate the cost-benefit trade-offs in intelligent tutor
development, that is, in what ways can we achieve the greatest instructional “bang”
for the least development “buck.” Two key features of Cognitive Tutors, and many
intelligent tutoring systems more generally, are 1) helping students in constructing
knowledge by getting feedback and instruction in the context of doing and 2) pro-
viding students with flexibility to explore alternative solution strategies and paths
while learning by doing. Pseudo Tutors can provide these features, but with some
limitations and trade-offs in development time. We describe some of these limitations
and trade-offs. We also provide preliminary data on authoring of Pseudo Tutors, on
student learning outcomes from Pseudo Tutors, and development time estimates as
compared with estimates of full Intelligent Tutor development.
The first two productions illustrate alternative correct strategies for the same goal.
By representing alternative strategies, the cognitive tutor can follow different students
down different problem solving paths. The third “buggy” production represents a
common error students make when faced with this same goal. A Cognitive Tutor
164 K.R. Koedinger et al.
makes use of the cognitive model to follow students through their individual ap-
proaches to a problem. A technique called “model tracing” allows the tutor to provide
individualized assistance in the context of problem solving. Such assistance comes in
the form of instructional message templates that are attached to the correct and buggy
production rules. The cognitive model is also used to estimate students’ knowledge
growth across problem-solving activities using a technique known as “knowledge
tracing” [9]. These estimates are used to adapt instruction to individual student needs.
The key behavioral features of Cognitive Tutors, as implemented by model tracing
and knowledge tracing, are what we are trying to capture in Pseudo Tutor authoring.
The Pseudo Tutor authoring process does not involve writing production rules, but
instead involves demonstration of student behavior.
ware programming environment. To create this interface, the author clicks on the text
field icon in the widget palette and uses the mouse to position text fields.
Typically in the tutor development process new ideas for interface design, par-
ticularly “scaffolding” techniques, may emerge (cf., [11]). The interface shown in
Figure 1 provides scaffolding for converting the given fractions into equivalent frac-
tions that have a common denominator. The GUI Builder tool can be used to create a
number of kinds of scaffolding strategies for the same class of problems. For in-
stance, story problems sometimes facilitate student performance and thus can serve as
a potential scaffold. Consider this story problem: “Sue has 1/4 of a candy bar and Joe
has 1/5 of a candy bar. How much of candy bar do they have altogether?” Adding
such stories to the problem in Figure 1 is a minor interface change (simply add a text
area widget). Another possible scaffold early in instruction is to provide students with
the common denominator (e.g., 1/4 + 1/5 =__/20 +__/20). An author can create such
a subgoal scaffold simply by entering the 20’s before saving the problem start state.
Both of these scaffolds can be easily implemented and have been shown to reduce
student errors in learning fraction addition [12].
The interface widgets CTAT provides can be used to create interfaces that can
scaffold a wide variety of reasoning and problem solving processes. A number of
non-trivial widgets exist including a “Chooser” and “Composer” widget. The Chooser
widget allows students to enter hypotheses (e.g., [13]). The Composer widget allows
students to compose sentences by combining phrases from a series of menus (e.g.,
[14]).
Demonstrate Alternative Correct and Incorrect Solutions. Once an interface is
created, the author can use it and the associated “Behavior Recorder” to author prob-
lems and demon-
strate alternate solu-
tions. Figure 1 shows
the interface just
after the author has
entered 1, 4, 1, and 5
in the appropriate
text fields. At this
point, the author
chooses “Create Start
State” from the
Author menu and
begins interaction
with the Behavior Fig. 2. The Behavior Recorder records authors’ actions in any
Recorder, shown on interface created with CTAT’s recordable GUI widgets. The
the left in Figure 2. author demonstrates alternative correct and incorrect paths.
After creating a Coming out of the start state (labeled “prob-1-fourth-1-fifth”) are
two correct paths (“20, F21den” and “20, F22den”) and one
problem start state,
incorrect path (“2, F13num”). Since state8 is selected in the
the author demon- Behavior Recorder, the Tutor Interface displays that state,
strates alternate solu- namely with the 20 entered in the second converted fraction.
tions as well as
166 K.R. Koedinger et al.
In order to estimate the development time to instructional time ratio, we asked the
authors on each project, after they had completed a set of Pseudo Tutors, to estimate
the time spent on design and development tasks and the expected instructional time of
the resulting Pseudo Tutors (see Table 1). Design time is the amount of time spent
selecting and researching problems, and structuring those problems on paper. Devel-
opment time is the amount of time spent with the tools, including creating a GUI, the
behavior diagrams, hints, and error messages. Instructional time is the time it would
likely take a student, on average, to work through the resulting set of Pseudo Tutors.
The final column is a ratio of the design and development time to instructional time
for each project’s Pseudo Tutors. The average Design/Development Time to Instruc-
tional Time ratio of about 23:1, though preliminary, compares favorably to the corre-
sponding estimates for Cognitive Tutors (200:1) and other types of instructional tech-
nology given above. If this ratio stands up in a more formal evaluation, we can claim
significant development savings using the Pseudo Tutor technology.
Aside from the specific data collected in this experiment, this study also demon-
strates how we are working with a variety of projects to deploy and test Pseudo Tu-
tors. In addition to the projects mentioned above, the Pseudo Tutor authoring tools
have been used in an annual summer school on Intelligent Tutoring Systems at CMU
and courses at CMU and WPI. The study also illustrates the lower skill threshold
needed to develop Pseudo-Tutors, compared to typical intelligent tutoring systems:
None of the Pseudo Tutors mentioned were developed by experienced AI program-
mers. In the Language Learning Classroom Project, for instance, the students learned
to build Pseudo Tutors quickly enough to make it worthwhile for a single homework
assignment.
Preliminary empirical evidence for the instructional effectiveness of the Pseudo-
Tutor technology comes from a small evaluation study with the LSAT Analytic Logic
Tutor, involving 30 (mostly) pre-law students. A control group of 15 students was
given 1 hour to work through a selection of sample problems in paper form. After 40
minutes, the correct answers were provided. The experimental group used the LSAT
Analytic Logic Tutor for the same period of time. Both conditions presented the stu-
dents with the same three “logic games.” After their respective practice sessions, both
groups were given a post-test comprised of an additional three logic games. The re-
sults indicate that students perform significantly better after using the LSAT Analytic
Opening the Door to Non-programmers: Authoring Intelligent Tutor Behavior 171
Logic Tutor (12.1 ± 2.4 v. 10.3 ± 2.3, t(28) = 2.06, p < .05). Additionally, pre-
questionnaire results indicate that neither group of students had a significant differ-
ence in relevant areas of background that influence LSAT test results. Thus, the study
provides preliminary evidence that Pseudo Tutors are able to support student learning
in complex tasks like analytic logic games.
swer-reason pair before moving on to the next answer-reason pair (i.e., if you give a
numeric answer, the next thing you need to do is provide the corresponding reason -
and vice versa) and (2) require students to complete a step only if the pre-requisites
for that step have been completed (i.e., the quantities from which the step is derived).
To implement these requirements with current Pseudo Tutor technology would re-
quire a huge behavior diagram. In practice, Pseudo Tutors often compromise on ex-
pressing such subtle constraints on the ordering of steps. Most of the Pseudo Tutors
developed so far have used a “commutative mode”, in which the student can carry out
the steps in any order. We are planning on implementing a “partial commutativity”
feature, which would allow authors to express that certain groups of steps can be done
in any order, whereas others need to be done in the order specified in the behavior
graph.
Despite some limitations, Pseudo Tutors do seem capable of implementing useful
interactions with students. As we are building more Pseudo Tutors, we are become
more aware of their strengths and limitations. One might have thought that it would
be an inconvenient limitation of Pseudo Tutors that the author must demonstrate all
reasonable alternative paths through a problem. However, in practice, this has not
been a problem. But, these questions would best be answered by re-implementing a
Cognitive Tutor unit as a Pseudo Tutor. We plan to do so in the future.
6 Conclusions
We have described a method for authoring tutoring systems that exhibit intelligent
behavior, but can be created without AI programming. Pseudo Tutor authoring opens
the door to new developers who have limited programming skills. While the Pseudo
Tutor development time estimates in Table 1 compare favorably to past estimates for
intelligent tutor development, they must be considered with caution. Not only are the
these estimates rough, there are differences in the quality of the tutors produced
where most Pseudo Tutors to date have been ready for initial lab testing (alpha ver-
sions) and past Cognitive tutors have been ready for extensive classroom use (beta+
versions). On the other hand, our Pseudo Tutor authoring capabilities are still im-
proving.
In addition to the goal of Pseudo Tutor authoring contributing to faster and easier
creation of working tutoring systems, we also intend to encourage good design prac-
tices, like cognitive task analysis [5] and to facilitate fast prototyping of tutor design
ideas that can be quickly tested in iterative development. If desired, full Intelligent
Tutors can be created and it is a key goal that Pseudo Tutor creation is substantially
“on path” to doing so. In other words, CTAT has been designed so that almost all of
the work done in creating a Pseudo Tutor is on path to creating a Cognitive Tutor.
Pseudo Tutors can provide support for learning by doing and can also be flexible
to alternative solutions. CTAT’s approach to Pseudo-Tutor authoring has advantages
over other authoring systems, like RIDES [3], that only allow a single solution path.
Nevertheless, there are practical limits to this flexibility. Whether such limits have a
significant affect on student learning or engagement is an open question. In future
Opening the Door to Non-programmers: Authoring Intelligent Tutor Behavior 173
References
1. Corbett, A. T., Koedinger, K. R., & Hadley, W. H. (2001). Cognitive Tutors: From the
research classroom to all classrooms. In Goodman, P. S. (Ed.) Technology Enhanced
Learning: Opportunities for Change, (pp. 235-263). Mahwah, NJ: Lawrence Erlbaum.
2. Murray, T. (1999). Authoring intelligent tutoring systems: An analysis of the state of the
art. International Journal of Artificial Intelligence in Education, 10, pp. 98-129.
3. Murray, T., Blessing, S., & Ainsworth, S. (Eds.) (2003). Authoring Tools for Advanced
Technology Learning Environments: Towards cost-effective adaptive, interactive and in-
telligent educational software. Dordrecht, The Netherlands: Kluwer.
4. Lovett, M. C. (1998). Cognitive task analysis in service of intelligent tutoring system
design: A case study in statistics. In Goettl, B. P., Halff, H. M., Redfield, C. L., & Shute,
V. J. (Eds.) Intelligent Tutoring Systems, Proceedings of the Fourth Int’l Conference. (pp.
234-243). Lecture Notes in Comp. Science, 1452. Springer-Verlag.
5. Schraagen, J. M., Chipman, S. F., Shalin, V. L. (2000). Cognitive Task Analysis. Mawah,
NJ: Lawrence Erlbaum Associates.
6. Koedinger, K. R., Aleven, V., & Heffernan, N. (2003). Toward a rapid development
environment for Cognitive Tutors. In U. Hoppe, F. Verdejo, & J. Kay (Eds.), Artificial
Intelligence in Education, Proc. of AI-ED 2003 (pp. 455-457). Amsterdam, IOS Press.
7. Anderson, J. R., Corbett, A. T., Koedinger, K. R., & Pelletier, R. (1995). Cognitive tu-
tors: Lessons learned. The Journal of the Learning Sciences, 4 (2), 167-207.
8. Anderson, J. R. (1993). Rules of the Mind. Mahwah, NJ: Lawrence Erlbaum.
9. Corbett, A.T. & Anderson, J.R. (1995). Knowledge tracing: Modeling the acquisition of
procedural knowledge. User modeling and user-adapted interaction, 4, 253-278.
10. Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ:
Prentice-Hall.
11. Reiser, B. J., Tabak, I., Sandoval, W. A., Smith, B. K., Steinmuller, F., & Leone, A. J.
(2001). BGuILE: Strategic and conceptual scaffolds for scientific inquiry in biology class-
rooms. In S. M. Carver & D. Klahr (Eds.), Cognition and instruction: Twenty-five years of
progress (pp. 263-305). Mahwah, NJ: Erlbaum.
12. Rittle-Johnson, B. & Koedinger, K. R. (submitted). Context, concepts, and procedures:
Contrasting the effects of different types of knowledge on mathematics problem solving.
Submitted for peer review.
174 K.R. Koedinger et al.
13. Lajoie, S. P., Azevedo, R., & Fleiszer, D. M. (1998). Cognitive tools for assessment and
learning in a high information flow environment. Journal of Educational Computing Re-
search, 18, 205-235.
14. Shute, V.J. & Glaser, R. (1990). A large-scale evaluation of an intelligent discovery
world. Interactive Learning Environments, 1: p. 51-76.
15. Eberts, R. E. (1997). Computer-based instruction. In Helander, M. G., Landauer, T. K.,
& Prabhu, P. V. (Ed.s) Handbook of Human-Computer Interaction, (pp. 825-847). Am-
sterdam, The Netherlands: Elsevier Science B. V.
16. Koedinger, K. R., Anderson, J. R., Hadley, W. H., & Mark, M. A. (1997). Intelligent
tutoring goes to school in the big city. International Journal of Artificial Intelligence in
Education, 8, 30-43.
17. Aleven, V.A.W.M.M., & Koedinger, K. R. (2002). An effective metacognitive strategy:
Learning by doing and explaining with a computer-based Cognitive Tutor. Cognitive Sci-
ence, 26(2).
Acquisition of the Domain Structure from Document
Indexes Using Heuristic Reasoning
1 Introduction
The rapid advance in the Education Technology area during the last years makes it
possible to evolve education at different levels: from personal interaction with a
teacher in a classroom to computer assisted learning, from written textbooks to elec-
tronic documents. Different kinds of approaches (Intelligent Tutoring Systems, e-
learning systems, collaborative learning systems...) profit from new technologies in
order to educate different kinds of students. These Technology Supported Learning
Systems (TSLSs) have proved to be very useful in many learning situations such as
distance learning and training. TSLSs require the representation of the domain to be
learnt. However, the development of the domain module is not easy because of the
amount of data that must be represented. Murray [10] pointed out the need of tools
that facilitate the construction of the domain module in a semi automatic way.
Electronic documents constitute a source of information that can be used in TSLSs
for this purpose. However, electronic documents require a transformation process
before incorporating them in a TSLS due to their different features. Vereoustre and
McLean [14] present a survey of current approaches in the area of technologies for
electronic documents that are used for finding, reusing and adapting documents for
learning purposes. They describe how research in structured documents, document
representation and retrieval, semantic representation of document content and rela-
tionships, learning objects and ontologies, could be used to provide solutions to the
problem of reusing education material for teaching and learning.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 175–186, 2004.
© Springer-Verlag Berlin Heidelberg 2004
176 M. Larrañaga et al.
In fact, in the past 5-7 years there have been considerable efforts in the computer-
mediated learning field towards standardization of metadata elements to facilitate a
common method for identifying, searching and retrieving learning objects [11].
Learning objects are reusable pieces of educational material intended to be strung
together to form larger educational units such as activities, lessons or whole courses
[4]. A Learning Object (LO) has been defined as any entity, digital or non-digital,
which can be used, re-used or referenced during technology supported learning [7]. In
2002, LOM (Learning Object Metadata), the first standard for Learning Technology
was accredited. Learning Object Metadata (LOM) is defined as the attributes required
to fully/adequately describe a Learning Object [7]. The standard will focus on the
minimal set of attributes needed to allow these LOs to be managed, located, and
evaluated but lacks the instructional design information for the decision-making proc-
ess [15]. Recently, a number of efforts have been initiated with the aim of adding
didactical information in the LO description [5][15][13][12]. So far, there has not
been any significant work in automating the discovery and packaging of LOs based
on variables such as learning objectives and learning outcomes [9]. As a conclusion,
it is clear that some pedagogical knowledge has to guide the sequence of the LOs
presented to the student for both open learning environments and more classical ITSs.
The final aim of the project here presented is to extract the domain knowledge of a
TSLS from existing documents in order to lighten its development cost. It uses Artifi-
cial Intelligence methods and techniques like Natural Language Processing (NLP)
and heuristic reasoning to achieve this goal. However, the acquisition of this knowl-
edge still requires the collaboration of instructional designers in order to get an ap-
propriate representation of the Domain Module. The system here presented is aimed
at identifying the topics included in documents, to establish the pedagogical relation-
ships among them, and to cut the whole document into LOs categorizing them ac-
cording to the pedagogical purpose and, thus, tagging them with the corresponding
metadata. Three basic features are essential in representing the domain module in
TSLSs: 1) Learning units that represent the teaching/learning topics, 2) relationships
among contents, and 3) learning objects or didactic resources.
Once the electronic document that is going to be the source of the domain knowl-
edge has been selected, the process of preparing the learning material to be included
in the domain module of a TSLS involves the following three steps [6]:
Identifying the relevant topics included in the document.
Establishing the pedagogical and sequential relationships between the contents.
Identifying the Learning Objects.
This paper focuses on the identification of the relevant topics of the document and
the discovery of pedagogical and sequential relationships between them. Concretely,
the structure of the domain is extracted just by analysing the index of a document. In
a similar direction, Mittal et al. [8] present a system where the input is the set of slides
for a course in PowerPoint and the output are a concept tree and a relational tree.
Their approach is based on rules for the identification of relationships (class of, ap-
plied to, prerequisite) between concepts and the identification of objects (definition,
example, figure, equation...) in a concept. However, rules are specific to computer
Acquisition of the Domain Structure from Document Indexes 177
science and mathematics like-courses. The solution presented here is domain inde-
pendent and has been proved with a wide set of subject matters.
This paper starts with a description of the analysis of the indexes of the documents.
Next, the heuristics to identify the pedagogical relationships among topics are pre-
sented and also the results of their application in a wide set of subject matters. Finally,
some conclusions and future work are pointed out.
2 Index Analysis
Indexes are useful sources of information for acquiring the domain module in a semi-
automatic way because they are usually well-structured and contain the main topics of
the domain. Besides, they are quite short so a lot of useful information can be ex-
tracted in a low cost process. The documents’ authors have previously analysed the
domain and decided how to organise the content according to pedagogical principles.
They use the indexes as the basis for structuring the subject matter. Therefore, the
main implicit pedagogical relations can be inferred from the index by using NLP
techniques and a collection of heuristics.
Fig. 1 illustrates the index analysis process that is described next:
The indexes are usually human made text files and therefore, they may contain differ-
ent numbering formats and some inconsistencies such as typographic errors, format
errors, etcetera. In order to run an automatic analysis process the indexes must be
error-free, so they have to be corrected and homogenized before the analysis.
In the pre-process step, performed automatically, the numbering of the index items
is filtered and replaced by tabulations with the aim of sharing the same index struc-
ture. However, the correction of inconsistencies can hardly be performed automati-
cally. Hence, this task is performed manually by the users. The result of this step is a
text file in which each title of section is in one line (index item) and the level of nest-
ing of the title is defined by the number of tabulations.
178 M. Larrañaga et al.
In this process the index is analysed using NLP tools. Due to the differences between
languages, specific tools are needed. The work here presented has been performed
with documents written in Basque language. Basque is an agglutinative language, i.e.,
for the formation of words the dictionary entry takes each of the elements needed for
the different functions (syntactic case included). More specifically, the affixes corre-
sponding to the determiner, number and declension case are taken in this order inde-
pendently of each other. As prepositional functions are realised by case suffixes in-
side word-forms, Basque presents a relative high power to generate inflected word-
forms. This characteristic is particularly important because the words in Basque con-
tain much more part-of-speech information than words in other languages. These
characteristics make morphosyntactic analysis very important for Basque. Thus, for
the index analysis, the lemmas of the words must be extracted so as to the gather
correct information. This process is carried out using EUSLEM [2], a lemma-
tizer/tagger for the Basque. Noun phrases, verb phrases and multiword terms are
detected by ZATIAK [1]. The result of this step is the list of lemmas and the chunks
of the index items. These lemmas and chunks constitute the basis of the domain on-
tology that will be completed in the analysis of the whole document.
The morphosyntactic analysis is performed by EUSLEM, which annotates each
word with the lemma and morphosyntactic information. Later, entities, postpositions
are extracted. ZATIAK extracts the noun and verb phrases.
Despite the small size of the indexes, there is useful information for the TSLSs that
can be extracted from them. Concretely, it is possible to identify the main topics of
the domain (BLUs) and pedagogical relationships among them. In this step the system
makes use of a set of heuristics in order to establish structural relationships between
topics (Is-a and Part-of) and sequential relationships (Prerequisite and Next). The
next section goes deeper into this analysis.
3 Heuristic Analysis
As mentioned above, the indexes contain both the main topics of the domain as well
as the implicit pedagogical relations among them. In this task the structure of the
domain is gathered using a set of heuristics from the homogenized index. This analy-
sis is domain independent.
The process starts assuming an initial structure that later on is refined. In this ap-
proach, each index item is considered as a domain topic (BLU). Regarding the rela-
tionships, two kinds of pedagogical relationships are detected: structural and sequen-
tial. Structural relations are inferred between an item and its sub-items (nested items).
A sub-item of a general topic is used to explain a part of that issue or a particular case
of it. Sequential relations are inferred among concepts of the same nesting level. The
order of the items establishes the sequence of the contents in the learning process. The
obtained initial domain structure is refined using a set of heuristics.
The following procedure was carried out to define the heuristics that are described
in this section:
1. A small set of indexes related to Computer Science has been analysed in order to
detect some patterns that may help in the classification of relationships.
2. This heuristics have been tested in a wide set of indexes related to different do-
mains. As a result of this experiment the relationships implicit in the indexes have
been inferred.
3. The results of the heuristics have been contrasted with the real relationships (iden-
tified manually).
4. After analysing the results paying special attention to the detected lacks in the
heuristics, some new heuristics have been defined.
5. The performance of the improved set of heuristic has also been measured.
Next the sets of heuristics are described and the results of the experiments pre-
sented and discussed.
Even though the work has been conducted with Basque language, the examples will
be presented in both Basque and English1 for a better understanding.
Heuristics for Structural relationships. The first analysis of the document indexes
(step 1 in the above procedure) has proved that the most common structural relation is
the Part-of relation. Therefore, by default, the structural relations will be classified
into part-of. In addition, some heuristics have been identified to detect the is-a rela-
tion or to reinforce the hypothesis that the structural relation is describing a part-of
relation. This heuristics are applied to know the structural relationship between an
index item and the sub-items included in it. However, the empirical analysis showed
that index items do not always share the same linguistic structure. Therefore, different
heuristics may apply in the same set of index sub-items. The system combines the
information provided by the heuristics that can be applied in order to refine the peda-
gogical relationships. If the percentage of sub-items that satisfy the heuristics’ pre-
conditions goes beyond a predefined threshold, the relations are classified into the
corresponding relationship. In addition this percentage is considered the level of cer-
tainty.
MultiWord Heuristic (MWH): MultiWord Terms may contain information to
infer the is-a relation. This relation can be inferred in sub-items with the following
patterns: noun + adjective, noun + noun phrase, etcetera. If the noun that appears
in these patterns (agente or agent) is the same of the general item (agenteak or
agents), is-a relationship is more plausible (Table 1).
Entity Name Heuristic (ENH): Entity names are used to identify examples of a
particular entity. When the sub-items contain entity names, the relation between
the item and the sub-items can be considered as the is-a relation. In Table 2, Palm
Os, Windows CE (Pocket PC) and Linux Familiar 0.5.2 distribuzioa correspond to
entity names.
Acronyms Heuristic (AH): When the sub-items contain just acronyms, the struc-
tural relations may be the is-a relation. In Table 3, the XUL and jXUL acronyms
1
In some examples, there may be some information lost in the English translations
Acquisition of the Domain Structure from Document Indexes 181
represent the names of some examples of languages for designing graphical inter-
faces.
The above-described heuristics have been tested with 150 indexes related to a wide
variety of domains such as Economy, Philosophy, Pedagogy, and so on. These in-
dexes have been analysed manually to know the real relationships. As a result of this
process, 7231 relationships have been identified in these indexes (3935 structural
relationships and 3296 sequential relationships). As Table 7 illustrates, the most fre-
quent relationships are Part-Of (93.29% of structural relationships) and Next (85.83%
of sequential relationships) relationships. Therefore, it has been confirmed that the
initial hypothesis which establishes the Part-of as the default structural relationship
and the Next as the default sequential relationship is sound.
The same set of indexes has been analysed using the heuristics set 1. Table 8 de-
scribes the heuristics’ precision (i.e. success rate when they are triggered) obtained in
this empirical study. The first three columns show the performance of the heuristics
that refine the Is-a relationship, whereas the forth column describes the results of the
heuristic that confirm Part-of relationship, and the last two refer to the Prerequisite
refinement. The first row measures how many times a heuristic has triggered correctly
and the second one counts the wrong activations. The third row presents the percent-
age of correct activations. As mentioned above, the ENH heuristic triggers when the
sub-item contains an entity name, which usually represents a particular case or an
example. The AH is triggered when the sub-item entails just an acronym, which also
refers to an example. As it can be observed in Table 8, the precision of these heuris-
tics is 100%. Sub-items that form multiword terms based on the super-item activate
MWH. Multiword terms are very usual in Basque language, and they usually repre-
sent a particular case of the original term. This heuristic has a tested precision of
92.59%. The heuristics that classify prerequisite relationship, i.e. RH and PGH2, also
have a high precision (93.333% for RH and 96.15% for PGH2).
Table 9 shows a comparison between the real relationships and those identified
using the heuristic set 1. It illustrates the recall of each heuristic (first row), i.e. the
number of relationships correctly identified compared with the numbers obtained in
the manual analysis, as well as the recall of the heuristics all together (second row). In
order to better illustrate the performance of the method, the data for Part-of and Next
relationships, corresponding to default reasoning, are also included in the table. Part-
of relationships are classified by default (94,5%) and reinforced by the PGH1 heuris-
tic, which is fired in 0.735% of Part-of relationships. Although the outcome of this
heuristic is not good, it may help the system to determine Part-of relationships when
Is-a is also plausible. The combination of the PGH1 and by-default classification of
Part-of results in 95.27% correctly classified Part-of relationships. The by-default
classification of Next relationships also provides a high success rate (97.85%). In
Acquisition of the Domain Structure from Document Indexes 183
Even though 80% of prerequisites and almost 29% of the Is-a relations are detected,
the results have not been as satisfying as expected. The indexes have been manually
analysed again in order to know the reasons of the low recall for the Is-a refining
heuristics. The study showed that, on the one hand, the Is-a relationship is used to
express examples and particular cases of a topic and it is difficult to infer whether a
BLU entails an example or just a part of the upper BLU without Domain Knowledge.
On the other hand, indexes related to Computer Sciences (the initial set of indexes)
are quite schematic, whereas other domains use implicit or contextual knowledge as
well as synonyms, metaphors and so on. Considering that the aim of this work is to
infer the relations in a domain independent way, specific knowledge cannot be used
in order to improve the results. This second study was carried out in order to detect
other domain independent methods that may improve the results. Firstly, a set of
keywords that usually introduce example sub-items has been identified. These key-
words facilitate Is-a relationship recognition. Table 10 shows an example of the
Keywords Based Heuristic (KBH).
In addition, the problem of the implicit or contextual knowledge has to be overcome.
Therefore, new heuristics have been defined in order to infer Is-a and Prerequisite rela-
tionships. The heuristic set 1 is triggered when the whole super-item is included in the
sub-item, e.g the MWH triggers when it finds a super-item such as agenteak (agents)
and a set of sub-items like agente mugikorrak (mobile agents). However, in many
184 M. Larrañaga et al.
cases contextual knowledge is used to refer to the topic presented in the super-item or in
a previous item. The heuristic set 2 is an improved version of the initial set taking into
account contextual knowledge. Two main ways of using contextual knowledge have
been identified in the analysis. The first one entails the use of the head of the phrase of a
particular item to refer to that item, e.g. authors may refer using the term karga (charge)
to refer to the karga elektrikoa (electric charge) topic. The second one entails the use of
acronyms to refer to the original item. In some index sub-items, the acronym corre-
sponding to an item is added at the end of the item between brackets and later that acro-
nym is used to reference the whole topic represented by the index item.
Regarding the structural relationships the heuristics to detect Is-a relationships
have been improved. Head of the phrase + MultiWord Heuristic (He+MWH) is
fired when the head of the phrase of an item is used to form multiword terms in the
sub-items. Acronyms + MultiWord Heuristic (A+MWH) is triggered when the
acronyms corresponding to an item is used by the sub-items to form multiword terms.
Common Head + MultiWord Heuristic (CHe+MWH) is activated when a set of
sub-items share a common head of phrase and form multiword terms based on it; this
heuristic does not look at the super-item.
Concerning the sequential relationships, the initial set of heuristics uses the refer-
ences to the whole item and possessive genitives with the whole item to detect pre-
requisites. In order to deal with the contextual knowledge problem the new heuristics
work as follows. Head of the phrase + Reference Heuristic (He+RH) is activated
by references to the head of a previous index item while Acronym + Reference Heu-
ristic (A+RH) is triggered when the acronym corresponding to a previous index item
is referenced. The possessive genitive is also used by the new heuristics to detect
prerequisites. Head of the phrase + Possessive Genitive Heuristic (He+PGH2) is
activated by items entailing possessive genitives based on the head of a previous
index item whereas possessive genitives using the acronym of a previous index item
trigger Acronym + Possessive Genitive Heuristic (A+PGH2).
gered in some cases because of the use of synonyms and other related terms. Adapt-
ing the heuristics to deal with synonyms may improve the performance even more.
Fig. 2. Comparison of the performance of the initial and the enhanced heuristic set
The aim of this work is to facilitate the building process of Technology Supported
Learning Systems (TSLS) by acquiring the Domain Module from textbooks and other
existing documents. The semi-automatic acquisition of the domain knowledge will
significantly reduce the instructional designers’ workload when building the TSLSs.
This paper has presented a domain independent system for generating the domain
module structure from the analysis of indexes of textbooks. The domain module
structure includes the topics of the domain and the pedagogical relationships among
them. The system performs the analysis using NLP tools and Heuristic Reasoning.
Some heuristics have been implemented in order to identify pedagogical relations
between topics. These heuristics provide additional information about the type of the
pedagogical relations. The performance of the heuristics has been measured and after
analysing the results an improved set of heuristics has been designed and tested.
Next phases of this work will include the analysis of the whole documents in order
to extract the Didactic Resources to be used in the TSLS and also to create the ontol-
ogy of the domain. In addition, the system will profit from linguistic ontologies with
the aim of enriching both the domain ontology and the domain module structure (sec-
ond level topics, related topic of other domains, new pedagogical relations, etc).
References
1. Aduriz I., Aranzabe M. J., Arriola J.M., Ezeiza N., Gojenola K., Oronoz M., Soroa A.,
Urizar R. (2003). Methodology and steps towards the construction of a Corpus of written
Basque tagged in morphological, syntactic, and semantic levels for the automatic proc-
essing (IXA Corpus of Basque, ICB). In proceedings of Corpus Llinguistics 2003. Lan-
caster. United Kingdom, 10-11.
2. Aduriz I., Aldezabal I., Alegria I., Artola X., Ezeiza N., Urizar R. (1996). EUSLEM: A
Lemmatiser / Tagger for Basque. In Proceedings of the EURALEX’96, Part 1. Gothemburg
(Sweden), 17-26.
3. Arruarte, A., Elorriaga, J. A., Rueda, U. (2001). A template Based Concept Mapping tool for
Computer-Aided Learning. Okamoto, T., Hartley, R., Kinshuk, Klus, J. P. (Eds), IEEE Inter-
national Conference on Advance Learning Technologies 2001, IEEE Computer society, 309-
312.
4. Brooks, C., Cooke, J., Vassileva, J. (2003). “Evaluating the Learning Object Metadata for K-
12 Educational Resources”. In Proceedings of ICALT2003, Devedzic, V., Spector, J.M.,
Sampson, D.G., Kinshuk (Eds.), pp. 296-297.
5. CANDLE. www.candle.eu.org
6. Larrañaga, M. (2002). Enhancing ITS building process with semi-automatic domain acquisi-
tion using ontologies and NLP techniques. In Proceedings of the Young Researches Track of
the Intelligent Tutoring Systems (ITS 2002). Biarritz (France).
7. LTSC. (2001). IEEE P1484.12 Learning Object Metadata Working Group homepage [On-
line]. http://ltsc.ieee.org/wg12/
8. Mittal, A., Dixit, S., Maheshwari, L.K. (2003). “Enhanced Understanding and Retrieval of E-
learning Documents through Relational and Conceptual Graphs”. In Supplementary Pro-
ceedings of AIED2003, Aleven, V., Hoppe, U., Kay, J., Mizoguchi, R., Pain, H., Verdejo, F.
and Yacef, K. (Eds.), pp. 645-652.
9. Mohan, P. and Brooks, C. (2003). “Learning Objects on the Semantic Web”. In Proceedings
of ICALT2003, Devedzic, V., Spector, J.M., Sampson, D.G., Kinshuk (Eds.), pp. 195-199.
10. Murray, T. (1999). Authoring Intelligent Tutoring Systems: an Analysis of the State of the
Art. International Journal of Artificial Intelligence in Education, 10, 98-129.
11. Polsani, PR. (2003). “Use and Abuse of Reusable Learning Objects”. Journal of Digital
Information, Vol. 3, Issue 4.
12. Redeker, G.H.J. (2003). “An Educational Taxonomy for Learning Objects”. In Proceed-
ings of ICALT2003, Devedzic, V., Spector, J.M., Sampson, D.G., Kinshuk (Eds.), pp. 250-
251.
13. Sampson, D. and Karagiannidis, C. (2002). “From Content Objects to Learning Objects:
Adding Instructional Information to Educational Meta-Data”. In Proceedings of 2nd IEEE
Computer Society International Conference on Advanced Learning Technologies (ICALT
02), pp. 513-517.
14. Vereoustre, A. and McLean, A. (2003). “Reusing Educational Material for Teaching and
Learning: Current Approaches and Directions”. In Supplementary Proceedings of
AIED2003, Aleven, V., Hoppe, U., Kay, J., Mizoguchi, R., Pain, H., Verdejo, F. and
Yacef, K. (Eds.), pp. 621-630.
15. Wiley, D.A. (2002). “Connecting Learning Objects to Instructional Design Theory: A
Definition, a Metaphor, and a Taxonomy”. Wiley, D.A. (Eds.), The Instructional Use of
Learning Objects, pp. 3-23
Role-Based Specification of the Behaviour of an Agent for
the Interactive Resolution of Mathematical Problems
1 Introduction
There are nowadays many computer systems that can be used when learning
Mathematics like Computer Algebra Systems (CASs), [3], [11], [12], that can be used
as cognitive tools in a learning environment, but that lack the interactivity necessary
for a more direct participation of teachers and students in the learning process. Some
learning environments, like [2] [6] [1], present a variety of learning materials that
include motivations, concepts, elaborations, etc, and have a bigger level of
interactivity. Additionally, demonstrating alternative solution paths to problems, e. g.
the behavior recorder mechanism used in [6], provides tutors with specification
methods for some kind of interactivity. MathEdu, [5], provides a rich interaction
capacity built on a CAS like Mathematica. Still, there exists a need of more intelligent
systems with bigger capacity of interaction with the student.
The final interactivity of many teaching applications consists of dialogs between
learners and applications where they have to answer to questions from an application.
In this context, it turns out that the design of user interfaces is a complex process and
it usually requires the creation of code. However, teachers are not usually prepared for
this. Authoring tools are therefore very appropriate in order to simplify this process
for tutors. Besides, it would be desirable to have WYSIWYG authoring and execution
tools where students and teachers use similar environments.
Authoring tools for building learning applications allow tutors to get involved in
the generation of the software to be delivered to students. For instance, it is usual to
find a scenario where teachers are able to add their own controls (buttons, list boxes,
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 187–196, 2004.
© Springer-Verlag Berlin Heidelberg 2004
188 M.A. Mora, R. Moriyón, and F. Saiz
etc) that will form the final application, and even to specify the behavior of such
controls when students interact with them. Nonetheless, those authoring tools do not
usually give support to specify the feedback to be given to students depending on
their actions.
As a consequence of that, models of authoring tools that allow the design of
tutoring applications that interact more completely with the students, performing a
dialog with them, are desirable. A dialog between a student and a tutoring application
involves the existence of moments where the student has to make choices or give
information to the system. It can be modeled by means of a tree structure that
represents the different paths the dialog can follow, where the nodes represent abstract
decision points, which can be used to control the dialogs that take place when solving
different problems. This structure is simple enough as to allow teachers to create it
interactively, without the need to use any programming language, and it is still
powerful enough to represent interesting dialogs from the didactic point of view.
In this paper we present a role-based mechanism of specification of the model for
the interaction with the student that is part of the ConsMath computer system, [7], [8],
which allows the construction of interactive applications with which students can
learn how to solve problems of Mathematics that involve symbolic computations.
ConsMath includes both a WYSIWYG authoring tool and an execution tool
written in Java. Teachers design interactive tutoring applications for the resolution of
problems using the authoring tool in an intuitive and simple way, since the
development environment looks exactly like the students’ working environment,
except for the availability of some additional functionality, and at each instant the
designer has in front of him the same contents and dialog components the student will
have at a specific point during the resolution of the problem with the execution tool.
The design process is possible in this simple setting thanks to the use of techniques of
programming by demonstration, [4], an active research area within the field of
Human-Computer Interaction.
ConsMath supports a methodology by which an interactive application for the
resolution of sets of problems can be built in a simple way starting from a static
document that shows a resolution of a particular problem, and adding to it different
layers of interactivity. The initial document can be created by the designer using an
editor of scientific texts or it can come from a different source, like Internet.
ConsMath includes a tracking agent that deals with the application being executed
by students and matches their operations with the model created by the teacher. Thus,
the agent owns all the information necessary for determining the exact state of the
interaction. ConsMath has been built using a collaborative framework, [9], so it can
also be used in a collaborative setting. For example, students can learn
collaboratively, and the tutor can interactively monitor their progress, on the basis of
the dialog model previously created.
The rest of the paper is structured as follows: in the next section we shall describe
ConsMath from the perspective of a user. After that, we shall describe the
mechanisms related to the tracking agent, together with the recursive uses of the
model in case the resolution of a problem is reduced to the resolution of one or more
simpler subproblems. Finally, we will explain the main conclusions of our work.
Role-Based Specification of the Behaviour of an Agent for the Interactive Resolution 189
Figs. 1, 2 and 3 show three steps during the design of a problem. The designer
starts from a document, like in Fig. 1, which shows an editor of mathematical
documents that contains a resolution of the problem in the way it would be described
in a standard textbook. The document can be imported or it can be built using the
ConsMath editor. In this case, the problem asks the student to normalize the equation
of a parabola, putting it under the form (1),
in terms of its degree of aperture and the coordinates of its vertex (x0 , y0). After
this, in a first step, the designer generalizes the problem statement and its resolution
by introducing generic variables in the formulae that appear in the statement, and
defining constraints among the formulae that appear in the resolution of the problem,
in a spreadsheet style. For example, the designer can give names A, B and C to the
coefficients in the equation of the parabola, and he can also specify the different
190 M.A. Mora, R. Moriyón, and F. Saiz
formulae that appear in the resolution of the problem in terms of these coefficients.
These steps give rise to an interactive document that allows a student to change the
equation to be normalized, and the document is updated automatically.
Once a generalized interactive document has been specified, the designer describes
the dialog between the system and the student. During this dialog, the student makes
choices and gives information to the system, and the system gives the student
feedback and asks for new decisions to be taken for the resolution of the problem. The
teacher specifies the dialog by means of the ConsMath editor of mathematical
documents, using some additional functionality that will be described next. At some
points the designer switches the role he is playing between system and student. The
change of role is done under the shadows by ConsMath when it is needed as we shall
explain in the next section. During the steps described previously the designer has
played the role of the system. Before the instant considered in Fig. 2, he also plays the
role of the system, hiding first the resolution of the problem and typing a question to
be posed to the student, where he is asked for the general normalized second degree
equation. After this, he enters a field where the student is supposed to type his answer.
At the instant considered in Fig. 2 the designer is playing the role of the student when
answering the question. He does it by typing the correct formula. After that the
designer starts playing again the role of the system. First he gives a name to the
formula introduced by the student, then he erases the part of the editing panel where
the last question and the answer appear, and finally he poses a new question to the
student asking which of the coefficients in the general normalized second degree
equation will be calculated first. This is shown in Fig. 3.
In order to create the interactive dialogs, the designer can write text using the
WYSIWYG editor, and can insert ConsMath components, like text fields, equation
fields, simple equations, buttons, etc. Also, other Java components can be used in the
dialogs, importing these components to the ConsMath palette. Each component has a
name and a set of properties. The designer can specify the value of a property using a
mathematical expression that can contain mathematical and ConsMath functions.
These functions allow us to define variables, to evaluate expressions and to create
constraints between variables or components. It is important to notice that when the
designer erases parts of the document, although they disappear from the editor pane,
they are not deleted, since formulae can still reference variables defined in them.
Role-Based Specification of the Behaviour of an Agent for the Interactive Resolution 191
At any time the designer can return to any of the previous states where he is
playing the role of the student, and start working again as before. This can be done by
means of buttons included in the user interface that allow him to go back and forth.
When he is back at one of these points, the designer can continue working as before,
and ConsMath interprets automatically that he is specifying a different path to be
followed in case the answer of the student doesn’t fit the one specified before. In this
way, a tree of alternatives that depend on the students actions is specified. The rest of
the design process follows a similar pattern.
Once the design is finished, it can be saved. After this, the execution process can
start at any moment. There are two ways in which a student can start solving a
problem: either the statement is completely specified or the system is supposed to
generate a random version of a given class of problem to be solved. The first situation
can arise either because the tutor or the tutoring system that controls the knowledge of
the student decides the specific problem the student has to solve, or because the
student is asked to introduce the specific formulae that appear in the statement. There
is a third possibility that takes place when a problem is solved as part of the resolution
of another one. During the resolution of a problem, the parts of the movie created by
the designer where he has played the role of the system are played automatically,
while the ones where the designer plays the role of the student are used as patterns to
be matched against his actions during the interactive resolution of the problem. Each
decision of the student directs the performance of the movie towards one of the
alternative continuations. Hence, for example, if the general normalized equation
typed by the student in the first step is incorrect, the system will show him a
description of the type of equation that is needed.
The above paragraphs are a description of the way ConsMath interacts with
designers and students. In order to achieve this interactivity, the design and resolution
tools must be able to represent interaction activities in a persistent way, with the
possibility to execute them and undo them at any moment from scratch. Moreover, the
following operations are possible: activities can bifurcate, in the sense that at any
point of any activity an alternative can be available, and the actions accomplished by
the users determines which of the existing alternatives is executed at each moment.
Besides these functional requirements, an editor of mathematical documents that
192 M.A. Mora, R. Moriyón, and F. Saiz
stores in them the information to discriminate the different situations that can be
produced by the student’s actions. More specifically, each descendant node represents
one of these alternatives and is linked to one performance zone, forming a decision
rule. A decision rule is fired when it is enabled and its head, which is the descendant
node of the corresponding decision zone, matches an action from the student.
Students’ actions give rise to events produced by the user interface. These events
form the descendant nodes in decision zones.
The designer specifies these events interactively emulating the students’ actions.
In case these events are produced within a performance zone, the tracking agent
automatically starts a decision tree. The specification of these events is accomplished
as follows:
Button and multiple-choice events are generated directly by clicking on these
components after they are created.
Conditional events are produced by the evaluation of a condition that corresponds
to a formula, like a comparison between two mathematical expressions. When the
designer creates a condition, he types its corresponding formula in a field,
including dependencies with respect to previous formulae that appear in the
document. The designer simulates the action of the students that generates the
event by entering a value in one of the input controls on which the condition
depends. The tracking agent enters this elaborated event in the decision tree.
Matching events are produced by the evaluation of a pattern matching condition
between a formula typed by the student and a pattern specified by the designer, like
a pattern of trigonometric polynomials. Similarly to the previous case, the designer
has to create a pattern by entering the expression that specifies it, and he must
194 M.A. Mora, R. Moriyón, and F. Saiz
simulate the action of the students that generates the corresponding event by
entering a value in the input control associated to this pattern.
Performance zones are sequences of execution steps, previously designed by the tutor
or designer, which manipulate dynamically the document and can pose the student a
new question. The steps that form performance zones can be of the following types:
insert text, create random generator, insert or change formula, insert input component,
etc. The creation and modification of formulae involves also the creation and
modification of constraints among them, as described in the previous section. There
are also higher order steps that consist of the creation of subdocuments, which are
formed by several simpler components of the types described before. Performance
zones that pose questions to the student are followed by another decision zone,
forming the tree of decision-performance zones.
The design tree starts with a performance zone that contains a problem pattern.
Problem patterns are generalized problems statements whose formulae are not
completely specified, like a problem that asks for the normalization of the equation of
an arbitrary parabola. Mathematical formulae appearing in problem patterns are
formulae patterns. Each part of a formula in a problem pattern that is not completely
specified has an associated random generator, which is a program that is called when
a student starts solving a problem that must be generated randomly.
As the student progresses in the resolution of the problem, the tracking agent keeps
a pointer to a point in the tree model that represents the current stage in the resolution
process. If the pointer is in a performance zone, then all the actions previously stored
by the designer are reproduced by ConsMath to recreate the designed dialog, stopping
when the agent finds the beginning of a decision tree. As the resolution goes ahead,
new subdocuments are created dynamically that include constraints that depend on
formulae that are already present, and they are updated on the fly.
When the tracking agent is in a decision tree, it enables the corresponding decision
rules, and waits until the user generates an action that fires one of them. Then, it
enters the corresponding performance zone. This iterative process ends when the
agent arrives to the end of a branch in the tree model. When this happens, in case a
subproblem is being solved, the resolution of the original problem continues as we
will see in the next subsection.
where “f ”, “c” and “g” are the input variables of the problem pattern. In our example
we can create a first dialog showing to the student the problem to solve and asking
him which method he is going to use to solve that problem. For example we can ask
him to choose among several methods for the computation of limits, including
L’Hôpital rule and the direct computation of the limit.
Each time the student chooses one of the options, the system has to check that his
decision is correct. In case it is not, the designer must have specified how the system
will respond. Each time the student chooses L’Hôpital rule the system makes a
recursive call to the same subproblem with new values for the initial variables “f ” and
“g”. Finally, when the student chooses to give directly the solution the recursion ends.
5 Evaluation
We have tested how ConsMath can be used for the design of interactive sets of
problems. These tests have been performed by two math teachers. A collection of
problems from the textbook [10] on ordinary differential equations has been designed.
The teachers found that the resulting interactive problems are useful from the didactic
point of view, the use of the tool is intuitive and simple, and they could not have
developed anything similar without ConsMath. The teachers have also warned us that
before using the system on a larger scale with less advanced users like students, the
behaviour of the editor of math formulae should be refined. Since this editor is a
third-party component, we are planning to replace it by our own equation editor in a
future release.
Also, we have done an initial evaluation of previously designed problems in a
collaborative setting, where two experts try to collaborate in order to solve a problem
and another one, using the teacher role, supervises and collaborates with them. In
these tests the experts with the role of students were collaborating synchronously,
while the teacher was mainly in an asynchronous collaborative session, joining the
synchronous session just to help the students. The first results helped us to improve
some minor usability problems that we plan to fix in the next months in order to
shortly carry out tests with the students enrolled in a course.
6 Conclusions
Acknowledgements. The work described in this paper is part of the Ensenada and
Arcadia projects, funded by the National Plan of Research and Development of Spain,
projects TIC 2001-0685-C02-01 and TIC2002-01948 respectively.
References
1. Beeson, M.: “Mathpert, a computerized learning environment for Algebra, Trigonometry
and Calculus”, Journal of Artificial Intelligence in Education, pp. 65-76, 1990.
2. Büdenbender, J., Frischauf, A., Goguadze, G., Melis, E., Libbrecht, P., Ullrich, C.: “Using
Computer Algebra Systems as Cognitive Tools”, pp. 802-810, 6th International
Conference, ITS 2002, LNCS 2363, Springer 2002, ISBN 3-540-43750-9.
3. Char, B.W., Fee, G.J., Geddes, K.O., Gonnet, G.H., Monagan, M.B.”A tutorial
introduction to MAPLE”. Journal of Symbolic Computation, 2(2):179–200,1986.
4. Cypher, A.: “Watch what I do. Programming by Demonstration”, ed. MIT Press
(Cambridge, MA), 1993.
5. Diez, F., Moriyon, R.: “Solving Mathematical Exercises that Involve Symbolic
Computations”; in “Computing in Science and Engineering, pp. 81-84, vol. 6, n. 1, 2004.
6. Koedinger, K.R., Anderson, J.R., Hadley, W.H., Mark, M. A.: “Intelligent tutoring goes to
school in the big city”. Int. Journal of Artificial Intelligence in Education, 8, 1997.
7. Mora, M., A., Moriyón, R., Saiz, F.: “Mathematics Problem-based Learning through
Spreadsheet-like Documents”, proc. International Conference on the Teaching of
Mathematics, Crete, Greece, 2002, http://www.math.uoc.gr/~ictm2/
8. Mora, M., A., Moriyón, R., Saiz, F.: “Building Mathematics Learning Applications by
Means of ConsMath ” in Proceedings of IEEE Frontiers in Education Conference, pp.
F3F1-F3F6, November 2003, Boulder, CO.
9. Mora, M., A., Moriyón, R., Saiz, F.: “Developing applications with a framework for the
analysis of the learning process and collaborative tutoring”. International Journal Cont.
Engineering Education and Lifelong Learning, Vol. 13, Nos. 3/4, 2003268-279, pp. 268-
279, 2003, USA
10. Simmons, G. F.: “Differential equations: with applications and historical notes”, ed.
McGraw-Hill, 1981.
11. Sorgatz, A., Hillebrand, R.: “MuPAD”. Linux Magazin, (12/95), 1995.
12. Wolfram, S.: “The Mathematica Book”, ed. Cambridge University Press (fourth edition),
1999.
Lessons Learned from Authoring for Inquiry Learning:
A Tale of Authoring Tool Evolution
1 Introduction
Despite many years of research and development, intelligent tutoring systems and
other advanced adaptive learning environments have seen relatively little use in
schools and training classrooms. This can be attributed to several factors that most of
these systems have in common: high cost of production, lack of widespread convinc-
ing evidence of the benefits, limited subject matter coverage, and lack of buy-in from
educational and training professionals. Authoring tools are being developed for these
learning environments (LEs) because they address all of these areas of concern [1].
Authoring tools can reduce the development time, effort, and cost; they can enable
reuse and customization of content; and they can lower the skill barrier and allow
more people to participate in development and customization ([2], [3]). And finally,
they impact LE evaluation and evolution by allowing alternative versions of a system
to be created more easily, and by allowing greater participation by teachers and sub-
ject matter experts.
Most papers on LE authoring tools focus on how the features of an authoring tool
facilitate building a tutor. Of the many research publications involving authoring
tools, extremely few document the use of these tools by subject matter experts
(SMEs, which includes teachers in our discussion) not intimately connected with the
research group to build tutors that are then used by students in realistic settings (ex-
ceptions include work described in [2] and [3]). A look at over 20 authoring systems
(see [1]) shows them to be quite complex, and it is hard to imaging SMEs using them
without significant ongoing support. Indeed, tutoring systems are complex, and de-
1
We gratefully acknowledge support for this work from the U.S. Department of Education,
FPISE program (#P116B010483) and NSF CCLI (# 0127183).
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 197–206, 2004.
© Springer-Verlag Berlin Heidelberg 2004
198 T. Murray, B. Woolf, and D. Marshall
signing them is a formidable task even with the burden of writing computer code
removed. More research is needed determine how to match the skills of the target
SME user to the design of authoring tools so that as a field we can calibrate our ex-
pectations about the realistic benefits of these tools. Some might say that the role of
SMEs can be kept to a minimum--we disagree. Principles from human-computer
interaction and participatory design theory are unequivocal in their advocacy for
continuous, iterative design cycles using authentic users ([4], [5]). This leads us to
two conclusions. First, LE usability requires the participation of SMEs (with exper-
tise in the domain and with teaching). LE evaluations by non-SMEs may be able to
determine that a given feature is not usable, that learners are overwhelmed or not
focusing on the right concepts, or that a particular skill is not being learned; but reli-
able insights about why things are not working and how to improve the system can
only come from those with experience teaching in the domain. The second conclu-
sion is that, since authoring tools do indeed need to be usable by SMEs, then SMEs
need to be highly involved in the formative stages of designing the authoring tools
themselves, in order to insure that these systems can in fact be used by an “average”
(or even highly skilled) SME.
This paper provides case study and strong anecdotal evidence for the need for
SME participation in LE design and in LE authoring tool design. We describe the
Rashi inquiry learning environment, and our efforts to build authoring tools for Rashi.
In addition to illustrating how the design of the authoring tool evolved as we worked
with SMEs (college professors), we argue for the importance of SME involvement
and describe some lessons learned about authoring tools design. First we will de-
scribe the Rashi LE.
The author uses the Rashi authoring tools to enter the following into the knowl-
edge base:
Propositions and hypotheses such as “has a fever”, “has diabetes”
Inferential relationships between the propositions such as “high fever
supports diabetes”
Cases with case specific values: Ex: the “Janet Stone Case” has values
including “temperature is 99.1” “White blood cell count is 5.0 x 10^3 ”
For the several cases we have authored so far there are many hundreds of proposi-
tions, relationships, and case values. Each of these content objects has several attrib-
utes to author. The authoring complexity comes in large part from the sheer volume
of information and interrelationships to maintain and proof-check. The authoring
tools assist with this task but can not automate it, as too much heuristic judgment is
involved.
The above gives evidence for the amount of participation that can be required of a
domain expert when building novel LEs. Also, it should be clear that deep and on-
going participation is needed by the SMB. We believe this to be the case for all al-
most all adaptive LE design. Since our goal is not to produce one tutor for one do-
main, but tutors for multiple domains and multiple cases, and to enable experts to
continue to create new cases and customize existing cases in the future, we see the
issues of authoring tool usability as critical and perennial. The greater the complexity
of the LE, the greater the need for authoring tools. In designing an authoring tool
there are tradeoffs involved in how much of the complexity can be exposed to the
author and made a) inspectable, and b) authorable or customizable [4].
The original funding for Rashi did not include funds for authoring tool construction,
and the importance of authoring tools was only gradually appreciated. Because of
this, initial attempts to support SMEs were focused on developing tools of low com-
plexity and cost. In the next section we describe a succession of three systems built to
support authors in managing the propositions and evidential relationships in Rashi.
Each tool is very different as we learned more in each iteration about how to sche-
matically and visually represent the content. In one respect, the three tools illustrate
the pros and cons of three representational formalisms for authoring the network of
evidential relationships comprising the domain expertise (network, table-based, and
form-based). In addition, each successive version added new functionality as the
need for it was realized.
style model seemed to fit well with the mental model of the argument structure that
we wanted the expert to have. However, in working with both the biology professor
and the environmental engineering professor (for a Rashi tutor in another domain), as
the size of the networks began to grow, the network became spaghetti-like and the
interface became unwieldy. The auto-layout feature was not sufficient and the author
needed to constantly reposition nodes manually to make way for new nodes and links.
The benefits of the visualization were overcome by the cognitive load of having to
deal with a huge network, and more and more the tool was used exclusively by the
programming and knowledge engineering team, and not by the domain ex-
perts/teachers. We realized that the expert only needed to focus on the local area of
nodes connected to the node being focused on, and that in this situation the expert did
not benefit much from the big picture view of the entire network (or a region of it)
that the tool provided. We concluded that it would require less cognitive load if the
authors just focused on each individual relationship: X support/refutes Y, and we
moved to an authoring tools which portrayed this in a tabular format.
A table-based representation. The second tool was build using macros and other
features available in Microsoft Excel (see Figure 3). The central piece of the tool was
a table allowing the author to create Data->RelationshipType->Inference triplets (e.g.
“high-temperature supports mono”) (Figure 3A). For ease of authoring it was essen-
tial that the author choose from pop-up menus in creating relationships (which can be
easily accomplished in Excel). In order to flexibly support the pop-ups, data tables
were created with all of the options for each item in the triplet (Figure 3B). The same
item of data (proposition) or inference (hypothesis) can be used many times, i.e. rela-
tionship is a many-to-many mapping. Authors could add new items to the tables in
Figure 3B and to the list of relationships in Figure 3A (A and B are different work-
sheets in the Excel data file). Using the Excel features the author can sort by any of
Lessons Learned from Authoring for Inquiry Learning 203
the columns to see, for example, all of the hypotheses connected to an observation; or
all of the observations connected to a hypothesis; or all of the “refutes” relationships
together. This method worked well for a while. But as the list of items grew in length
the pop-up-menus became unwieldy. Our solution to this was to segment them into
parts where the author chooses one from list A, B, C, or D and one from list X, Y, or
Z (this modified interface is not shown). The complexity increased as we began to
deal with intermediate inferences which can participate in both the antecedent and the
consequent of relationships, so these items needed to show up in both right hand and
left hand pop up menus. As we began to add authoring of case-values to the tool, the
need to maintain unique identifiers for all domain “objects” was apparent, and the
system became even more unwieldy.
related to the focal object. Figure 4 shows that while editing propositions the author
can edit and manage relationships and case values also. Thus the author can get by
using only the propositions screens in figure 4 and a similar but much simpler screen
for cases. Creating fully functioning tools has allowed the expert to creatively author
and analytically correct almost all aspects of the Human Biology cases, and partici-
pate with much more autonomy and depth (we are using the tool for the other do-
mains as well). It has freed up the software design team from having to understand
and keep a close eye on every aspect of the domain knowledge, and alleviates much
of the time it took to maintain constant communication between the design team and
the domain expert on the details of the content.
5 Discussion
Why did we bother to describe three versions of authoring tools when it was only the
final one that was satisfactory? Stories of lessons learned from software development
are rare, but the trial and error process can illustrate important issues. In our case this
process has illustrated the importance of having SMEs involved in authoring tool
design, and the importance of finding the right external representation for the subject
matter content.
Comparison with other authoring tool projects. The Rashi authoring tools are
relatively unique in that there is only one other project that deals with authoring tools
Lessons Learned from Authoring for Inquiry Learning 205
References
[1] Murray, T. (2003). An Overview of Intelligent Tutoring System Authoring Tools: Up-
dated analysis of the state of the art. Chapter 17 in Murray, T., Blessing, S. & Ainsworth,
S. (Eds.). Authoring Tools for Advanced Technology Learning Environments. Kluwer
Academic Publishers, Dordrecht.
[2] Ainsworth, S., Major, N., Grimshaw, S., Hayes, M., Underwood, J., Williams, B. &
Wood, D. (2003). REDEEM: Simple Intelligent Tutoring Systems from Usable Tools.
Chapter 8 in Murray, T., Blessing, S. & Ainsworth, S. (Eds.). Authoring Tools for Ad-
vanced Technology Learning Environments. Kluwer Academic Publishers, Dordrecht.
[3] Halff, H, Hsieh, P., Wenzel, B., Chudanov, T., Dirnberger, M., Gibson, E. & Redfield,
C. (2003). Requiem for a Development System: Reflections on Knowledge-Based, Gen-
erative Instruction, Chapter 2 in Murray, T., Blessing, S. & Ainsworth, S. (Eds.).
Authoring Tools for Advanced Technology Learning Environments. Kluwer Academic
Publishers, Dordrecht.
[4] Shneiderman, B. (1998). Designing the User Interface (Third Edition). Addison-Wesley,
Reading, MA, USA.
[5] Norman, D. (1988). The Design of Everyday Things. Doubleday, NY.
206 T. Murray, B. Woolf, and D. Marshall
[6] Mayer, R. (1998). Cognitive, metacognitive, and motivational aspects of problems solv-
ing. Instructional Science vol. 26, p. 49-63.
[7] Duell, O.K. & Schommer-Atkins, M. (2001). Measures of people’s belief about knowl-
edge and learning. Educational psychology review 13(4) 419-449.
[8] Lajoie, S. (Ed), 2000. Computers as Cognitive Tools Volume II. Lawrence Erlbaum Inc.:
New Jersey
[9] White, B., Shimoda, T., Frederiksen, J. (1999). Enabling students to construct theories of
collaborative inquiry and reflective learning: computer support for metacognitive devel-
opment. International J. of Artificial Intelligence in Education Vol. 10, 151-1182.
[10] van Joolingen, W., & de Jong, T. (1996). Design and Implementation of Simulation
Based Discovery Environments: The SMILSE Solution. Jl. of Artificial Intelligence in
Education 7(3/4) p 253-276.
[11] Lajoie, S., Greer, J., Munsie, S., Wikkie, T., Guerrera, C., Aleong, P. (1995). Establish-
ing an argumentation environment to foster scientific reasoning with Bio-World. Pro-
ceedings of the International Conference on Computers in Education, pp. 89-96. Char-
lottesville, VA: AACE.
[12] Suthers, D. & Weiner, A. (1995). Groupware for developing critical discussion skills.
Proceedings of CSCL ’95, Computer Supported Collaborative Learning, Bloomington,
Indiana, October 1995.
[13] Scardamalia, Marlene, and Bereiter, Carl (1994). Computer Support for Knowledge-
Building Communities. The Journal of the Learning Sciences, 3(3), 265-284.
[14] Woolf, B.P., Marshall, D., Mattingly, M., Lewis, J. Wright, S. , Jellison. M., Murray, T.
(2003). Tracking Student Propositions in an Inquiry System. Proceedings of Artificial
Intelligence in Education, July, 2003, Sydney, pp. 21-28.
[15] Murray, T., Bruno, M., Woolf, B., Marshall, D., Mattingly, M., Wright, S. & Jellison,
M. (2003). A Coached Learning Environment for Case-Based Inquiry Learning in Hu-
man Biology. Proceedings of E-Learn 2003. Phoenix, Arizona, November 2003,
pp. 654-657. AACE Digital Library, www.AACE.org.
[16] Bruno, M.S. & Jarvis, C.D. (2001). It’s Fun, But is it Science? Goals and Strategies in a
Problem-Based Learning Course. J. of Mathematics and Science: Collaborative Explora-
tions.
[17] Kolodner, J.L, Camp, P.J., D., Fasse, B. Gray, J., Holbrook, J., Puntambekar, S., Ryan,
M. (2003). Problem-Based Learning Meets Case-Based Reasoning in the Middle-School
Science Classroom: Putting Learning by Design(tm) Into Practice. Journal of the
Learning Sciences, October 2003, Vol. 12: 495-547.
[18] Murray, T., Blessing, S. & Ainsworth, S. (Eds) (2003). Authoring Tools for Advanced
Technology Learning Environments: Toward cost-effective adaptive, interactive, and
intelligent educational software. Kluwer Academic Publishers, Dordrecht
[19] Suthers, D. & Hundhausen, C. (2003). An empirical study of the effects of representa-
tional guidance on collaborative learning. J. of the Learning Sciences 12(2), 183-219.
[20] Ainsworth, S. (1999). The functions of multiple representations. Computers & Education
vol. 33 pp. 131-152.
The Role of Domain Ontology in Knowledge Acquisition
for ITSs
Pramuditha Suraweera, Antonija Mitrovic, and Brent Martin
1 Introduction
Intelligent Tutoring Systems (ITS) are educational programs that assist students in
their learning by adaptively providing pedagogical support. Although highly regarded
in the research community as effective teaching tools, developing an ITS is a labour
intensive and time consuming process. The main cause behind the extreme time and
effort requirements is the knowledge acquisition bottleneck [9].
Constraint based modelling (CBM) [10] is a student modelling approach that
somewhat eases the knowledge acquisition bottleneck by using a more abstract repre-
sentation of the domain compared to other common approaches [7]. However, build-
ing constraint sets still remains a major challenge. In this paper, we propose an ap-
proach to automatic acquisition of domain models for constraint-based tutors. We
believe that the domain ontology can be used as a starting point for automatic acquisi-
tion of constraints. Furthermore, building an ontology is a reflective task that focuses
the author on the important concepts of the domain. Therefore, our hypothesis is that
ontologies are also important for developing constraints manually.
To test this hypothesis we conducted an experiment with graduate students en-
rolled in an ITS course. They were given the task of composing the knowledge base
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 207–216, 2004.
© Springer-Verlag Berlin Heidelberg 2004
208 P. Suraweera, A. Mitrovic, and B. Martin
for an ITS for adjectives in the English language. We present an overview of our goals
and the results of our evaluation in this paper.
The remainder of the paper is arranged into five sections. The next section presents
related work on automatic knowledge acquisition for ITSs, while Section 3 gives an
overview of the proposed project. Details of enhancing the authoring shell WETAS
are given in Section 4. Section 5 presents the experiment and its results. Conclusions
and future work are presented in the final section.
2 Related Work
Research attempts at automatically acquiring knowledge for ITSs have met with lim-
ited success. Several authoring systems have been developed so far, such as KnoMic
(Knowledge Mimic)[15], Disciple [13, 14] and Demonstr8 [1]. These have focussed
on acquiring procedural knowledge only.
KnoMic is a learning-by-observation system for acquiring procedural knowledge
in a simulated environment. The system represents domain knowledge as a generic
hierarchy, which can be formatted into a number of specific representations, including
production rules and decision trees. KnoMic observes the domain expert carrying out
tasks within the simulated environment, resulting in a set of observation traces. The
expert annotates the points where he/she changed a goal because it was either
achieved or abandoned. The system then uses a generalization algorithm to learn the
conditions of actions, goals and operators. An evaluation conducted to test the accu-
racy of the procedural knowledge learnt by KnoMic in an air combat simulator re-
vealed that out of the 140 productions that were created, 101 were fully correct and 29
of the remainder were functionally correct [15]. Although the results are encouraging,
KnoMic’s applicability is restricted to simulated environments.
Disciple is a shell for developing personal agents. It relies on a semantic network
that describes the domain, which can be created by the author or imported from a
repository. Initially the shell has to be customised by building a domain-specific inter-
face, which gives the domain expert a natural way of solving problems. Disciple also
requires a problem solver to be developed. The knowledge elicitation process is initi-
ated by a proble-solving example provided by the expert. The agent generalises the
given example with the assistance of the expert and refines it by learning from ex-
perimentation and examples. The learned rules are added to the knowledge base.
Disciple falls short of providing the ability for teachers to build ITSs. The cus-
tomisation of Disciple requires multiple facets of expertise including knowledge engi-
neering and programming that cannot be expected from a typical domain expert. Fur-
thermore, as Disciple depends on the problem solving instances provided by the do-
main expert, they should be selected carefully to reflect significant problem states.
Demonstr8 is an authoring tool for building model-tracing tutors for arithmetic. It
uses programming by demonstration to reduce the authoring effort. The system pro-
vides a drawing tool like interface for building the student interface of the ITS. The
system automatically defines each GUI element as a working memory element
(WME), while WMEs involving more than a single GUI element must be defined
manually. The system generates production rules by observing problems being solved
The Role of Domain Ontology in Knowledge Acquisition for ITSs 209
such as cardinality restrictions for relationships or domains for attributes. The second
stage involves learning from examples. The system learns constraints by generalising
the examples provided by the domain expert. If the system finds an anomaly between
the ontology and the examples, it alerts the user, who corrects the problem. The final
stage involves validating the generated constraints. The system generates examples to
be labelled as correct or incorrect by the domain expert. It may also present the con-
straints in a human readable form, for the domain expert to validate.
We propose that the initial authoring step be the development of a domain ontology,
which will later be used to generate constraints automatically. An ontology describes
the domain, by identifying all domain concepts and relationships between them. We
believe that it is highly beneficial for the author to develop a domain ontology even
when the constraint sets is developed manually, because this helps the author to reflect
on the domain. Such an activity would enhance the author’s understanding of the do-
main and therefore be a helpful tool when identifying constraints. We also believe that
categorising constraints according to the ontology would assist the authoring process.
To test our hypothesis, we built a tool as a front-end for WETAS. Its main purpose
is to encourage the use of domain ontology as a means of visualising the domain and
organising the knowledge base. The tool supports drawing the ontology, and compos-
ing constraints and problems. The ontology front end for WETAS was developed as a
Java applet. The interface (Fig. 1a) consists of a workspace for developing a domain
ontology (ontology view) and editors for syntax constraints, semantic constraints,
macros and problems. As shown in Fig. 1a, concepts are represented as rectangles,
and sub-concepts are related to concepts by arrows. The concept details such as attrib-
utes and relationships can be specified in the bottom section of the ontology view. The
interface also allows the user to view the constraints related to a concept.
The ontology shown in Fig. 1a conceptualises the Entity Relationship (ER) data
model. Construct is the most general concept, which includes Relationship, Entity,
Attribute and Connector as sub-concepts. Relationship is specialized into Regular and
Identifying ones. Entity is also specialized, according to its types, into Regular and
Weak entities. Attribute is divided in to two sub-concepts of Simple and Composite
attributes. The details of the Binary Identifying relationship concept are depicted in
Fig. 1. It has several attributes (such as Name and Identified-participation), and three
relationships (Fig. 1b): Attributes (which is inherited from Relationship), Owner, and
Identified-entity. The interface allows the specification of restrictions of these rela-
tionships in the form of cardinalities. The relationship between Identifying relation-
ship and Regular entity named Owner has a minimum cardinality of 1. The interface
also allows the author to display the constraints for each concept (Fig. 1c). The con-
straints can be either directly entered in the ontology view interface or in the syn-
tax/semantic constraints editor.
The Role of Domain Ontology in Knowledge Acquisition for ITSs 211
The constraint editors allow authors to view and edit the entire list of constraints
and problems. As shown in Fig. 2, the constraints are categorised according to the
concepts that they are related to by the use of comments. The Ontology view extracts
constraints from the constraint editors and displays them under the categorised con-
cept. Fig. 2 shows two constraints (Constraint 22 and 23) that belong to Identifying
relationship concept.
5 Experiment
We hypothesized that composing the ontology and organising the constraints accord-
ing to its concepts would assist in the task of building a constraint set manually. To
evaluate our hypothesis, we set 18 students enrolled in the 2003 graduate course on
Intelligent Tutoring Systems at the University of Canterbury the task of building a
tutor using WETAS for adjectives in the English language.
The students had attended 13 lectures on ITS, including five on CBM, before the
experiment. They also had a 50 minute presentation on WETAS, and were given a
description of the task, instructions on how to write constraints, and the section on
adjectives from a text book for English vocabulary [2]. The students had three weeks
to implement the tutor. A typical problem is to complete a sentence by providing the
correct form of a given adjective. An example sentence the students were given was:
“My sister is much than me (wise).”
The Role of Domain Ontology in Knowledge Acquisition for ITSs 213
The students were also free to explore LBITS [3], a tutor developed in WETAS
that teaches simple vocabulary skills. The students were allowed to access the “last
two letters” puzzles, where the task involved determining a set of words that satisfied
the clues, with the first two letters of each word being the same as the last two letters
of the previous one. All domain specific components, including its ontology, the con-
straints and problems, were available.
Seventeen students completed the task satisfactorily. One student lost his entire
work due to a system bug, and this student’s data was not included in the analysis. The
same bug did not affect other students, since it was eliminated before others experi-
enced it. Table 1 gives some statistics about the remaining students, including their
interaction times, numbers of constraints and the marks for constraints and ontology.
The participants took 37 hours to complete the task, spending 12% of the time in
the ontology view. The time in the ontology view varied widely, with a minimum of
1.2 and maximum of 7.2 hours. This can be attributed to different styles of developing
the ontology. Some students may have developed the ontology on paper before using
the system, whereas others developed the whole ontology online. Furthermore, some
students also used the ontology view to add constraints. However, the logs showed
that this was not a popular option, as most students composed constraints in the con-
straint editors. One factor that contributed to this behaviour may be the restrictiveness
of the constraint interface, which displays only a single constraint at a time.
WETAS distinguishes between semantic and syntactic constraints. In the domain
of adjectives, it is not clear as to which category the constraints belong. For example,
in order to determine whether a solution is correct, it is necessary to check whether the
correct rule has been applied (semantics) and whether the resulting word is spelt cor-
rectly (syntax). This is evident in the results for the total number of constraints for
each category. The averages of both categories are similar (9 semantic constraints and
11 syntax constraints). Some participants have included most of their constraints as
semantic and others vice versa. Students on average composed 20 constraints in total.
We compared the participants’ solution to the “ideal” solution. The marks for
these two aspects are given under Coverage (the last two columns in Table 1). The
ideal knowledge base consists of 20 constraints. The Constraints column gives the
number of the ideal constraints that are accounted for in the participants’ constraint
sets. Note that the mapping between the ideal and participants’ constraints is not nec-
essarily 1:1. Two participants accounted for all 20 constraints. On average, the par-
ticipants covered 15 constraints. The quality of constraints was high generally.
The ontologies produced by the participants were given a mark out of five (the
Ontology column in Table 1). All students scored high, as expected because the ontol-
ogy was straightforward. Almost every participant specified a separate concept for
each group of adjectives according to the given rules [2]. However, some students
constructed a flat ontology, which contained only the six groupings corresponding to
the rules (see Fig. 3a). Five students scored full marks for the ontology by including
the degree (comparative or superlative) and syntax such as spelling (see Fig. 3b).
Even though the participants were only given a brief description of ontologies and
the example ontology of LBITS, they created ontologies of a reasonable standard.
However, we cannot make any general assumptions on the difficulty of constructing
ontologies since the domain of adjectives is very simple. Furthermore, the six rules for
214 P. Suraweera, A. Mitrovic, and B. Martin
determining the comparative and superlative degree of an adjective gave strong hints
on what concepts should be modelled.
6 Conclusions
Acknowledgements. The work reported here has been supported by the University of
Canterbury Grant U6532.
References
1. Blessing, S.B.: A Programming by Demonstration Authoring Tool for Model-Tracing
Tutors. Artificial Intelligence in Education, 8 (1997) 233-261
2. Clutterbuck, P.M.: The art of teaching spelling: a ready reference and classroom active
resource for Australian primary schools. Longman Australia Pty Ltd, Melbourne, 1990.
3. Martin, B., Mitrovic, A.: Authoring Web-Based Tutoring Systems with WETAS. In: Kin-
shuk, Lewis, R., Akahori, K., Kemp, R., Okamoto, T., Henderson, L. and Lee, C.-H. (eds.)
Proc. ICCE 2002 (2002) 183-187
4. Martin, B., Mitrovic, A.: WETAS: a Web-Based Authoring System for Constraint-Based
ITS. Proc. 2nd Int. Conf on Adaptive Hypermedia and Adaptive Web-based Systems AH
2002, Springer-Verlag, Berlin Heidelberg New York, pp. 543-546, 2002.
5. Mitrovic, A.: Experiences in Implementing Constraint-Based Modelling in SQL-Tutor. In:
Goettl, B.P., Halff, H.M., Redfield, C.L. and Shute, V.J. (eds.) Proc. 4th Int. Conf. on In-
telligent Tutoring Systems, San Antonio, (1998) 414-423
6. Mitrovic, A.: An intelligent SQL tutor on the Web. Artificial Intelligence in Education, 13,
(2003) 171-195
7. Mitrovic, A., Koedinger, K. Martin, B.: A comparative analysis of cognitive tutoring and
constraint-based modeling. In: Brusilovsky, P., Corbett, A. and Rosis, F.d. (eds.) Proc.
UM2003, Pittsburgh, USA, Springer-Verlag, Berlin Heidelberg New York (2003) 313-322
8. Mitrovic, A., Ohlsson, S.: Evaluation of a Constraint-based Tutor for a Database Language.
Artificial Intelligence in Education, 10(3-4) (1999) 238-256
9. Murray, T.: Expanding the Knowledge Acquisition Bottleneck for Intelligent Tutoring
Systems. Artificial Intelligence in Education, 8 (1997) 222-232
10. Ohlsson, S.: Constraint-based Student Modelling. Proc. Student Modelling: the Key to
Individualized Knowledge-based Instruction, Springer-Verlag (1994) 167-189
11. Ohlsson, S.: Learning from Performance Errors. Psychological Review, 103 (1996) 241-
262
12. Suraweera, P., Mitrovic, A.: KERMIT: a Constraint-based Tutor for Database Modeling.
In: Cerri, S., Gouarderes, G. and Paraguacu, F. (eds.) Proc. 6th Int. Conf on Intelligent Tu-
toring Systems ITS 2002, Biarritz, France, LCNS 2363 (2002) 377-387
13. Tecuci, G.: Building Intelligent Agents: An Apprenticeship Multistrategy Learning Theory,
Methodology, Tool and Case Studies. Academic press, 1998.
14. Tecuci, G., Keeling, H.: Developing an Intelligent Educational Agent with Disciple. Artifi-
cial Intelligence in Education, 10 (1999) 221-237
15. van Lent, M., Laird, J.E.: Learning Procedural Knowledge through Observation. Proc. Int.
Conf. on Knowledge Capture, (2001) 179-186
Combining Heuristics and Formal Methods in a Tool for
Supporting Simulation-Based Discovery Learning
Koen Veermans1 and Wouter R. van Joolingen2
1
Faculty of Behavioral Sciences University of Twente, PO Box 217
7500 AE Enschede, The Netherlands
[email protected]
2
Graduate school of teaching and learning, University of Amsterdam, Wibautstraat 2-4
1091 GM Amsterdam, The Netherlands
[email protected]
Abstract. This paper describes the design of a tool to support learners in simu-
lation-based discovery learning environments. The design redesigns and extents
a previous tool to overcome issues that came up in a classroom learning setting.
The tool focuses on supporting learners with experimentation to identify or test
hypotheses. The aim is not only to support learning domain knowledge, but
also learning discovery learning skills. For this purpose the tool uses heuristics
and formal methods to assess the learners experimenting behavior, and trans-
lates this assessment into feedback directed at improving the quality of the
learners discovery learning behavior. The tool is designed to be part of an
authoring environment for designing simulation-based learning environments,
which put some constraints on the design, but also ensures that the tool can be
reused in different learning environments. After describing the design, a learn-
ing scenario is used to serve as an illustration of the tool, and finally some con-
cluding remarks, evaluation results, and potential extensions for the tool are
presented.
1 Introduction
Discovery learning or Inquiry Learning has a long history in education [1, 4] and has
regained popularity over the last decade as a result of changes in the field of educa-
tion that put more emphasis on the role of the learner in the learning process. Zachos,
Hick, Doane, and Sargent [19] define discovery learning as “the self-attained grasp of
a phenomenon through building and testing concepts as a result of inquiry of the
phenomenon” (p. 942). The definition emphasizes that it is the learner who builds
concepts, that the concepts need to be tested, and that building and testing of concepts
are part of the inquiry of the phenomenon. Computer simulations have rich potential
to provide learners with opportunities to build and test concepts, and learning with
these computer simulations is also referred to as simulation-based discovery learning.
Like in discovery learning, the idea of simulation-based discovery learning is that
the learner actively engages in a process. In an unguided simulation-based discovery
environment learners have to set their own learning goals. At the same time they have
to find and apply the methods that help to achieve these goals, which is not always
easy. Two main goals can be associated with simulation-based discovery learning;
development of knowledge about the domain of discovery, and development of skills
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 217–226, 2004.
© Springer-Verlag Berlin Heidelberg 2004
218 K. Veermans and W.R. van Joolingen
that facilitate development of knowledge about the domain (i.e., development of skills
related to the process of discovery).
This paper describes a tool that combines support for learning the domain knowl-
edge with specific attention for learning discovery learning skills. Two constraints
had to be taken into account in the design of the tool. The first constraint is related to
the exploratory nature of discovery learning. To maintain the exploratory nature of
the environment, the tool may be directive, should try to be stimulating and must be
non-obligatory, leaving room for exploration to the learner. The second constraint is
related to the context in which the tool should be operating, SIMQUEST [5], an
authoring environment for the design and development of simulation-based learning
environments. Since SIMQUEST allows the designer to specify the model, the domain
will not be known in advance, and therefore, the support cannot rely on domain
knowledge.
2 Learning Environments
At the core of SIMQUEST learning environments are one or more simulation models;
visualized to learners through representations of the model (numerical, graphical,
animated, etc.) in simulation interfaces. SIMQUEST includes templates for assignments
(f.i. exercises that provide a learner with a subgoal), explanations (f.i. background
information or feedback on assignments) and several tools (f.i. experiment storage
tool). These components can be used to design a learning environment that supports
learners. The control mechanism determines when the components present them-
selves to the learner and allows the designer to specify the balance between system
control and learner control in the interaction between learning environment and
learner.
This framework allows authors to design and develop simulation-based learning
environments, and to some extent support for learners working with these learning
environments. However, it does not provide a way of assessing of and providing
individualized support on the learners’ experimentation with a simulation. This was
the starting point for the design of a tool called the ‘monitoring tool’ [16]. It sup-
ported experimenting by learners based on a formal analysis of their experimentation
in relation to hypotheses (these hypotheses had to be specified by the designer in
assignments). A study [17] showed positive results, but also highlighted two impor-
tant problems with the monitoring tool.
The first problem is that one of the strengths of the monitoring tool is also one of
its weaknesses. The monitoring tool did not rely on domain knowledge for the analy-
sis of the learners’ experimentation. The strength of this approach is that it is domain
independent, the weakness that it can not use knowledge about the domain to correct
learners when this might be needed. This might lead to incorrect domain knowledge,
and incorrect self-assessment of the exploration process, because the outcome of the
exploration process serves as a benchmark for learners in assessing the exploration
process [2]. In the absence of external feedback, learners have to rely on their own
assessment of the outcome of the process. If this assessment is incorrect, the resulting
assessment of the exploration might also be incorrect.
Combining Heuristics and Formal Methods in a Tool 219
The second problem is that the design of the tool was based primarily on formal
principles related to induction and deduction. This had the shortcoming that it could
only give detailed feedback about experimentation in combination with certain cate-
gories of hypothesis, like for instance semi-quantitative hypotheses (f.i. “If the veloc-
ity becomes twice as large then kinetic energy becomes four times as large”). In more
common language this hypothesis might be expressed as: “There is a quadratic rela-
tion between velocity and kinetic energy”, but this phrasing has no condition part that
can be used to make a formal assessment of experiments.
As a solution for this second problem the tool is extended with less formal, i.e.
heuristic assessment of the experimentation. The heuristics that were used for this
purpose originate from an inventory [12] of literature [4, 7, 8, 9, 10, 11, 13, 14, 15]
about problem solving, discovery learning, simulation-based learning, and machine
discovery, in search for heuristics that could prove useful in simulation-based discov-
ery learning. A set of heuristics (Table 1) related to experimentation and hypothesis
testing was selected from this inventory for the present purpose.
Heuristic assessment of the experimentation will allow the tool to provide feedback
on experimentation without needing specific hypotheses as input for the process of
evaluating the learners’ experiments. Consequently, the hypotheses in the assign-
ments can now be stated in “normal” language, which makes it easier for the learners
220 K. Veermans and W.R. van Joolingen
not only to investigate, but also to conceptualize them. If the hypothesis in the as-
signment is no longer used as input for the analysis of the learners’ experimentation,
it is also no longer needed to connect the experimentation feedback to assignments.
This means that feedback on the correctness of the hypothesis can be given in the
assignment, thus, solving the first problem. The feedback on experimentation can be
moved to the tool in which the experiments are stored; a more logical place to provide
feedback on experimentation. Moving the feedback to this tool requires it to be re-
designed, and this was the starting point for a redesign of the tool.
Fig. 1. The structure of control and information exchange between a learner and a SimQuest
learning environment with the new experiment storage tool with graphing and heuristic support
Drawing a graph is not a trivial task and has been the object of instruction in itself [6].
It was therefore decided to let the tool take care of drawing the graph, but to provide
feedback related to drawing and interpreting graphs to the learner, as well as, feed-
back related to experimenting. The learner has to do is to select a variable for the x-
axis, and a variable for the y-axis, which provides the tool with important information
that can be used for generating feedback. Through the choice of variables the learner
expresses interest in a certain relation.
Learners can ask the tool to fit a function on the experiments along with drawing a
graph. Basic qualitative functions (monotonic increase and monotonic decrease), and
Combining Heuristics and Formal Methods in a Tool 221
quantitative functions (constant, linear, quadratic, and reciprocal) are provided to the
learners. More functions could of course be provided, but it was decided to restrict the
set of functions to the functions first, because too many possibilities might overwhelm
learners. Fitting a function is optional, but when a learner selects this option it pro-
vides the tool with valuable extra information for the analysis of the experimentation.
Learners can also construct new variables based on existing variables. New vari-
ables can be constructed using basic simple arithmetic functions add, subtract, divide,
and multiply. Whenever the learner creates a new variable, a new column will be
added to the experiment storage tool, and this column will also be updated for new
experiments. The learner can compare these values to other values in the table to see
how the newly constructed variable relates to the variables that were already listed in
the monitoring tool. The redesigned version of the monitoring tool with its new func-
tionality is shown in Figure 2.
The division between general and specific heuristics is reflected in the feedback
that is given to the learners when they draw a graph. General heuristics are always
used to assess the learner’s experiments, and can always generate feedback. Specific
heuristics are only be used to assess the learner’s experiments if the learner fits a
function on the experiments. Which of the specific heuristics are be used, depends on
the kind of function. For instance, the ‘equal increments’ heuristic will not be used if
the learner fits a qualitative function on the experiments.
The specific heuristics “identify hypothesis” can be said to represent the formal
analysis that of the experiments that was used in the first version of the tool [16]. The
first version of the tool checked whether the hypothesis could be identified based on
the experimental evidence that was generated by the learner. It also checked whether
this identification was proper. It did not check if the experimental evidence could also
confirm the hypothesis. For instance, if the hypothesis is that two variables are line-
arly related, and only two experiments were done, at least one other experiment is
needed for confirming this hypothesis. This extra experiment could show that the
hypothesis that was identified is able to account for this additional experiment, but it
could also show that the additional experiment is not on the line with the hypothesis
that was identified based on the first two experiments. The “confirm hypothesis”
heuristic takes care of this in the new tool.
5 A Learning Scenario
A learner working with the simulation can do experiments, decide whether to store
the experiment in the tool or not. The tool keeps track of all these experiments and
keeps a tag that indicates whether the learner stored an experiment or not. At a certain
moment, the learner decides to draw a graph. The learner has to select a variable for
the x-axis and for the y-axis, and press the button to draw a graph for these variables.
At this point, the tool checks what ‘type’ of variables the learner is plotting, and based
on this check the tool can stop without drawing a graph and present feedback to the
learner, or proceed with drawing the graph. The first will happen if a learner tries to
draw a graph with two input variables, since this does not make sense. Input variables
are independent, and any relation that might show in a graph will therefore be the
result of the changes that were made by the learner, and not of a relation between the
variables. The tool will not draw a graph either when a student tries to draw a graph
with an input variable on the y-axis, and an output variable on the x-axis. Unlike with
the two input variables this could make sense, but it is common practice to plot the
variables the other way around. In both cases the learner will receive feedback that
explains why no graph was drawn, and what they could change in order to draw a
graph that will provide more insight on relations in the domain.
If the learner selects an input variable on the x-axis, and an output variable on the
y-axis, or two output variables the tool will proceed with drawing a graph, and will
generate feedback based on the heuristics.
First, the general experimenting heuristics evaluate the experiments that the learner
has performed. Each of the heuristics will compare the learner’s experiments with the
pattern (for an example see Table 2) that was defined for the heuristic. If necessary
Combining Heuristics and Formal Methods in a Tool 223
the heuristic can ask the tool to filter the experiments (f.i. only stored experiments).
The feedback text is generated based on the result of this comparison, and returned to
the tool. The tool temporarily stores the feedback until it will be presented to the
learner.
The next step will be that the tool analyses the experiments using the same princi-
ples that were described in Veermans & van Joolingen [16]. Based on these principles
the tool identifies sets of experiments that are informative for the relation between the
input variable on the x-axis and the variable on the y-axis. For this purpose the ex-
periments are grouped into sets in which all input variables other than the variable o
the x-axis are kept constant. This will result in one or more sets of experiments that
will be sent to the specific experiment heuristics, which will compare them with their
heuristic pattern, and, if necessary, generate feedback text.
At this point the tool will draw the graph (see for example Figure 3). Together with
the plots the tool will now present the feedback that was generated by the general
experimenting heuristics. The feedback consists of the name of the heuristic, the out-
come of the comparison with the heuristic pattern, an informal text that says that it
could be useful to set up experiments according to this heuristic. The tool will provide
information on each of the experiment sets that consists of the values of the input
variables in this set and the feedback on the specific experiment heuristics.
If the learner decides to plot two output variables, it is not possible to divide the
experiments formally into separate sets of informative experiments. Both output vari-
ables are dependent on one or more input variables, and it is not possible to say what
kind of values for the input variables make up a set that can be used to see how the
output variables are co-varying given the different values for the input variables.
Some input variables might influence both output variables, and some only one of
them. This makes it impossible to assess the experiments and the relation between the
outputs formally. This uncertainty is communicated to the learners, warning them that
they should be careful with drawing conclusions based on such a graph. It is accom-
panied by the suggestion to remove some experiments to get a set of experiments in
which only one input variable is varied that than is the one that causes variation in the
output variables. This feedback is combined with the feedback that was generated by
the general experiment heuristics.
Learners can also decide to fit a function through their experiments, and if possi-
ble, a fit will be calculated for each of the experiment sets. These fits will be added to
the graph, and additional feedback will be generated and presented to the learner.
This additional feedback consists of a calculated estimation of the fit and more elabo-
rate feedback from the specific experiment heuristics. The estimation of the fit
224 K. Veermans and W.R. van Joolingen
Fig. 3. Example of a graph with heuristic feedback based on the experiments in Figure 2
learner-modeling tool, in the sense that keeps and updates a persistent model of the
learner’s knowledge, but is in the sense that it interprets the behavior of the learner,
and uses this interpretation to provide individualized and contextualized feedback to
the learner. The fact that tool uses both formal and heuristic methods, makes it
broader in its scope than a purely formal tool.
In relation to the goal for the tool and the constraints it can be concluded that:
1. The tool can support testing hypotheses and drawing conclusions. Sorting the
experiments into sets that are informative for the relation in the graph, drawing
these sets as separate plots, generating feedback on experimentation, and generat-
ing feedback that can help the learner in the design and analysis of the experi-
ments, supports hypothesis testing. Drawing separate plots, and presenting an es-
timated fit for a fitted function supports drawing conclusions.
2. It leaves room for the learners to explore. The tool leaves learners free to set up
their own experiments, to draw graphs, and to fit relations through these graphs,
thus leaving room for the learners to explore the relation between variables in the
simulation.
3. It is able to operate within the context of the authoring environment. The tool is
designed as a self-standing tool, and can be used as such. It does not have depend-
encies other than a dependency on the central manager of the simulation model.
References
1. Bruner, J. S. (1961).The act of discovery. Harvard Educational Review, 31, 21-32.
2. Butler, D. L., & Winne, P. H. (1995). Feedback and self-regulated learning: A theoretical
synthesis. Review of Educational Research, 65, 245-281.
3. Dewey, J. (1938). Logic: the theory of inquiry. New York: Holt and Co.
4. Glaser, R., Schauble, L., Raghavan, K., & Zeitz, C. (1992). Scientific reasoning across
different domains. In E. De Corte, M. Linn, H. Mandl, & L. Verschaffel (Eds.),
Computer-based learning environments and problem solving (pp. 345-373). Berlin:
Springer-Verlag.
5. Joolingen, W. R. van, & Jong, T. de (2003). SimQuest: Authoring educational
simulations. In T. Murray, S. Blessing & S. Ainsworth (Eds.), Authoring Tools for
Advanced Technology Educational Software: Toward cost-effective production of
adaptive, interactive, and intelligent educational software. Lawrence Erlbaum
6. Karasavvidis, I. (1999). Learning to solve correlational problems. A study of the social
and material distribution of cognition. PhD Thesis. Enschede, The Netherlands:
University of Twente.
7. Klahr, D., & Dunbar, K. (1988). Dual space search during scientific reasoning. Cognitive
Science, 12, 1-48.
8. Klahr, D., Fay, A. L., & Dunbar, K. (1993), Heuristics for scientific experimentation: A
developmental study. Cognitive Psychology, 25, 111-146.
9. Kulkarni, D., & Simon, H. A. (1988). The processes of scientific discovery: The strategy
of experimentation. Cognitive Science, 12, 139-175.
10. Langley, P. (1981). Data-Driven discovery of physical laws. Cognitive Science, 5, 31-54.
11. Qin, Y., & Simon, H. A. (1990). Laboratory replication of scientific discovery processes.
Cognitive Science, 14, 281-312.
12. Sanders, I., Bouwmeester, M., & Blanken, M. van (2000). Heuristieken voor
experimenteren in ontdekkend leeromgevingen. Unpublished report.
13. Schoenfeld, A. (1979). Can heuristics be taught? In J. Lochhead & J. Clement (Eds.),
Cognitive process instruction (pp. 315-338 ). Philadelphia: Franklin Institute Press.
14. Schunn, C. D., & Anderson, J. R. (1999). The generality/specificity of expertise in
scientific reasoning. Cognitive Science, 23, 337-370.
15. Tsirgi, J. E. (1980). Sensible reasoning: A hypothesis about hypotheses. Child
Development, 51, 1-10.
16. Veermans, K., & Joolingen, W. R. van (1998). Using induction to generate feedback in
simulation-based discovery learning environments. In B. P. Goetl, H. M., Halff, C. L.
Redfield, & V. J. Shute (Eds.), Intelligent Tutoring Systems, 4th International Conference,
San Antonio, TX USA (pp. 196-205). Berlin: Springer-Verlag.
17. Veermans, K., Joolingen, W. R. van, & Jong, T. de (2000). Promoting self directed
learning in simulation based discovery learning environments through intelligent support.
Interactive Learning Environments 8, 229-255.
18. Veermans, K., Joolingen, W. R. van, & Jong, T. de (submitted). Using Heuristics to
Facilitate Discovery Learning in a Simulation Learning Environment in a Physics
Domain.
19. Zachos, P., Hick, L. T., Doane, W. E. J., & Sargent, S. (2000). Setting theoretical and
empirical foundations for assessing scientific inquiry and discovery in educational
programs. Journal of Research in Science Teaching. 37, 938-962.
Toward Tutoring Help Seeking
Applying Cognitive Modeling to Meta-cognitive Skills
Abstract. The goal of our research is to investigate whether a Cognitive Tutor can
be made more effective by extending it to help students acquire help-seeking skills.
We present a preliminary model of help-seeking behavior that will provide the
basis for a Help-Seeking Tutor Agent. The model, implemented by 57 production
rules, captures both productive and unproductive help-seeking behavior. As a first
test of the model’s efficacy, we used it off-line to evaluate students’ help-seeking
behavior in an existing data set of student-tutor interactions, We found that 72%
of all student actions represented unproductive help-seeking behavior. Consistent
with some of our earlier work (Aleven & Koedinger, 2000) we found a proliferation
of hint abuse (e.g., using hints to find answers rather than trying to understand).
We also found that students frequently avoided using help when it was likely to
be of benefit and often acted in a quick, possibly undeliberate manner. Students’
help-seeking behavior accounted for as much variance in their learning gains as
their performance at the cognitive level (i.e., the errors that they made with the
tutor). These findings indicate that the help-seeking model needs to be adjusted, but
they also underscore the importance of the educational need that the Help-Seeking
Tutor Agent aims to address.
1 Introduction
Meta-cognition is a critical skill for students to develop and an important area of focus
for learning researchers. This, in brief, was one of three broad recommendations in a
recent influential volume entitled “How People Learn,” in which leading researchers
survey state-of-the-art research on learning and education (Bransford, Brown, & Cock-
ing, 2000). A number of classroom studies have shown that instructional programs with
a strong focus on meta-cognition can improve students’ learning outcomes (Brown &
Campione, 1996; Palincsar & Brown, 1984; White & Frderiksen, 1998). An important
question therefore is whether instructional technology can be effective in supporting
meta-cognitive skills. A small number of studies have shown that indeed it can. For ex-
ample, it has been shown that self-explanation, an important metacognitive skill, can be
supported with a positive effect on the learning of domainspecific skills and knowledge
(Aleven & Koedinger, 2002; Conati & VanLehn, 2000; Renkl, 2002; Trafton & Tricket,
2001).
This paper focuses on a different meta-cognitive skill: help seeking. The ability to
solicit help when needed, from a teacher, peer, textbook, manual, on-line help system,
or the Internet may have a significant influence on learning outcomes. Help seeking has
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 227–239, 2004.
© Springer-Verlag Berlin Heidelberg 2004
228 V. Aleven et al.
been studied quite extensively in social contexts such as classrooms (Karabenick, 1998).
In that context, there is evidence that better help seekers have better learning outcomes,
and that those who need help the most are the least likely to ask for it (Ryan et. al, 1998).
Help seeking has been studied to a lesser degree in interactive learning environments.
Given that many learning environments provide some form of on-demand help, it might
seem that proficient help use would be an important factor influencing the learning
results obtained with these systems. However, there is evidence that students tend not
to effectively use the help facilities offered by learning environments (for an overview,
see Aleven, Stahl, Schworm, Fischer & Wallace, 2003). On the other hand, there is also
evidence that when used appropriately, on-demand help can have a positive impact on
learning (Renkl, 2000; Schworm & Renkl, 2002; Wood, 2001; Wood & Wood, 1999)
and that different types of help (Dutke & Reimer, 2000) or feedback (McKendree, 1990;
Arroyo et al., 2001) affect learning differently.
Our project focuses on the question of whether instructional technology can help
students become better help seekers and, if so, whether they learn better as a result. Luckin
and Hammerton (2002) reported some interesting preliminary evidence with respect to
“meta-cognitive scaffolding.” We are experimenting with the effects of computer-based
help-seeking support in the context of Cognitive Tutors. This particular type of intelligent
tutor is designed to support “learning by doing” and features a cognitive model of the
targeted skills, expressed as production rules (Anderson, Corbett, Koedinger, & Pelletier,
1995). Cognitive Tutors for high-school mathematics have been highly successful in
raising students’ test scores and are being used in 1700 schools nationwide (Koedinger,
Anderson, Hadley, & Mark, 1997).
As a first step toward a Help-Seeking Tutor Agent, we are developing a model of the
help-seeking behavior that students would ideally exhibit as they work with the tutor.
The model is implemented as a set of production rules, just like the cognitive models of
Cognitive Tutors. The Help-Seeking Tutor Agent will use the model, applying its model-
tracing algorithm at the meta-cognitive level to provide feedback to students on the way
they use the tutor’s help facilities. In this paper, we present an initial implementation of
the model. We report results of an exploratory analysis, aimed primarily at empirically
validating the model, in which we investigated, using an existing data set; to what extent
students’ help-seeking behavior conforms to the model and whether model conformance
is predictive of learning.
The Geometry Cognitive Tutor offers two different types of help on demand. At the
student’s request, context-sensitive hints are provided at multiple levels of detail. This
help is tailored toward the student’s specific goal within the problem at hand, with each
hint providing increasingly specific advice. The Geometry Cognitive Tutor also provides
a less typical source of help in the form of a de-contextualized Glossary. Unlike hints,
the Glossary does not tailor its help to the user’s goals; rather, at the student’s request, it
displays information about a selected geometry rule (i.e., a theorem or definition). It is
up to the student to search for potentially relevant rules in the Glossary and to evaluate
which rule is applicable to the problem at hand.
Cognitive Tutors keep track of a student’s knowledge growth over time by means
of a Bayesian algorithm called knowledge tracing (Corbett & Anderson, 1995). At each
problem-solving step, the tutor updates its estimates of the probability that the student
knows the skills involved in that step, according to whether the student was able to
complete the step without errors and hints. A Cognitive Tutor uses the estimates of skill
mastery to select problems and make pacing decisions on an individual basis. These
estimates also play a role in the model of help seeking, presented below.
Fig. 2. A model of help-seeking behavior (The asterisks indicate examples of where violations
of the model can occur. To be discussed later in the paper.)
to address this issue, since it does not track the effect of learning over time. Instead, as a
starting point to address these questions, we use the estimates of an individual student’s
skill mastery derived by the Cognitive Tutor’s knowledge- tracing algorithm. The tests
Familiar at all? and Sense of what to do? compare these estimates against pre-defined
thresholds. So, for instance, if a student’s current estimated level for the skill involved
in the given step 0.4, our model assumes Familiar at all? = YES, since the threshold
for this question is 0.3 . For Sense of what to do?, the threshold is 0.6. These values
are intuitively plausible but need to be validated empirically. One of the goals of our
experiments with the model, described below, is to evaluate and refine the thresholds.
The tests Clear how to fix? and Hint helpful? also had to be rendered more concrete.
For the Clear how to fix? test, the help-seeking model prescribes that a student with a
higher estimated skill level (for the particular skill involved in the step, at the particular
point in time that the step is tried), should re-try a step after missing it once, but that mid
or lower skilled students should ask for a hint. In the future we plan to elaborate Clear
how to fix? by using heuristics that catch some of the common types of easy-to-fix slips
that students make. Our implementation of Hint Helpful? assumes that the amount of
help a student needs on a particular step depends on their skill level for that step. Thus,
a high-skill student, after requesting a first hint, is predicted to need 1/3 of the available
hint levels, a mid-skill student 2/3 of the hints, and a low-skill student all of the hints.
However, this is really a question of reading comprehension (or self-monitoring thereof).
In the future we will use basic results from the reading comprehension literature and
also explore the use of tutor data to estimate the difficulty of understanding the tutor’s
hints.
3.2 Implementation
The rule chain that matched the students’ behavior is highlighted. This chain in-
cludes an initial rule that starts the meta-cognitive cycle (“start-new-metacog-cycle”),
a subsequent bug rule that identifies
the student as having acted too
quickly (“bug1-think-about-step-
quickly”), a second bug rule that
indicates that the student was not
expected to try the step, given her
low mastery of the skill at that point
Fig. 3. A chain of rules in the Meta-Cognitive Model in time (“bug1-try-step-low-skill”),
and, finally, a rule that reflects the
fact that the student answered incorrectly (“bug-tutor-says-step-wrong”). The feedback
message in this case, compiled from the two bug rules identified in the chain, is: “Slow
down, slow down. No need to rush. Perhaps you should ask for a hint, as this step might
be a bit difficult for you.” The bug rules corresponding to the student acting too quickly
and trying the step when they should not have are shown in Figure 4.
The fact that the student got the answer wrong is not in itself considered to be a
meta-cognitive error, even though it is captured in the model by a bug rule (“bug-tutor-
says-step-wrong”). This bug rule merely serves to confirm the presence of bugs captured
by other bug rules, when the student’s answer (at the cognitive level) is wrong. Further,
when the student answer is correct, (at the cognitive level) no feedback is given at the
meta-cognitive level, even if the student’s behavior was not ideal from the point of view
of the help-seeking model.
The help-seeking model uses information passed from the cognitive model to perform
its reasoning. For instance, the skill involved in a particular step, the estimated mastery
level of a particular student for that skill, the number of hints available for that step,
and whether or not the student got the step right, are passed from the cognitive to the
meta-cognitive model. Meta-cognitive model tracing takes place after cognitive model
tracing. In other words, when a student enters a value to the tutor, that value is first
evaluated at the cognitive level before it is evaluated at the meta-cognitive level. An
important consideration in the development of the Help-Seeking Tutor was to make it
modular and useable in conjunction with a variety of Cognitive Tutors. Basically, the
Help-Seeking Tutor Agent will be a plug-in agent applicable to a range of Cognitive
Tutors with limited customization. We have attempted to create rules that are applicable
to any Cognitive Tutor, not to a specific tutor. Certainly, there will be some need for
customization, as optional supporting tools (of which the Glossary is but one example)
will be available in some tutors and not others.
enough to either try the step (“Ask Hint when Skilled Enough to Try Step”) or use the
Glossary (“Ask Hint when Skilled Enough to Use Glossary”), or when a student overuses
the Glossary (“Glossary Abuse”). Recall from the flow chart in Figure 2 that a student
with high mastery for the skill in question should first try the step, a student with medium
mastery should use the Glossary, and a student with low mastery should ask for a hint.
Second, the category “Try-Step Abuse” represents situations in which the student
attempts to hastily solve a step and gets it wrong, either when sufficiently skilled to try
the step (“Try Step Too Fast”) or when less skilled (“Guess Quickly when Help Use was
Appropriate”).
Third, situations in which the student could benefit from asking for a hint or inspecting
the Glossary, but chose to try the step instead, are categorized as “Help Avoidance”. There
are two bugs of this type – “Try Unfamiliar Step Without Hint Use” and “Try Vaguely
Familiar Step Without Glossary Use.”
Finally, the category of “Miscellaneous Bugs” covers situations not represented in the
other high-level categories. The “Read Problem Too Fast” error describes hasty reading
of the question, when first encountered followed by a rapid help request. “Ask for Help
Too Fast” describes a similar situation in which the student asks for help too quickly
after making an error. The “Used All Hints and Still Failing” bug represents situations in
which the student has seen all of the hints, yet cannot solve the step (i.e., the student has
failed more than a threshold number of times). In our implemented model, the student
is advised to talk to the teacher in this situation.
In general, if the student gets the step right at the cognitive level, we do not consider
a meta-cognitive bug to have occurred, regardless of whether the step was hasty or the
student’s skill level was inappropriate.
234 V. Aleven et al.
Fig. 5. A taxonomy of help-seeking bugs. The percentages indicate how often each bug occurred
in our experiment.
The data used in the analysis were collected during an earlier study in which we
compared the learning results of students using two tutor versions, one in which they
explained their problem-solving steps by selecting the name of the theorem that jus-
tifies it and one in which the students solved problems without explaining (Aleven &
Toward Tutoring Help Seeking 235
Koedinger, 2002). For purposes of the current analysis, we group the data from both
conditions together. Students spent approximately 7 hours working on this unit of the
tutor. The protocols from interaction with the tutor include data from 49 students, 40
of whom completed both the Pre- and Post-Tests. These students performed a total of
approximately 47,500 actions related to skills tracked by the tutor.
The logs of the student-tutor interactions where replayed with each student action
(either an attempt at answering, a request for a hint, or the inspection of a Glossary item)
checked against the predictions of the help-seeking model. Actions that matched the
model’s predictions were recorded as “correct” help-seeking behavior, actions that did
not match the model’s predictions as “buggy” help-seeking behavior. The latter actions
were classified automatically with respect to the bug taxonomy of Figure 5, based on the
bug rules that were matched. We computed the frequency of each bug category (shown
in Figure 5) and each category’s correlation with learning gains. The learning gains (LG)
were computed from the pre- and post-test scores according to the formula (LG = (Post
- Pre) / (1 - Pre), mean 0.41; standard deviation 0.28).
The overall ratio of help-seeking errors to all actions was 72%; that is, 72% of the
students’ actions did not conform to the help-seeking model. The most frequent errors
at the meta-cognitive level were Help Abuse (37%), with the majority of these being
“Clicking Through” hints (33%). The next most frequent category was Try Step Abuse
(18%), which represents quick attempts at answering steps. Help Avoidance – not using
help at moments when it was likely to be beneficial – was also quite frequent (11%),
especially if “Guess quickly when help was needed” (7%), arguably a form of Help
Avoidance as well as Try-Step Abuse, is included in both categories.
The frequency of help-seeking bugs was correlated strongly with the students’ overall
learning (r= –0.61 with p < .0001), as shown in Table 1. The model therefore is a good
predictor of learning gains – the more help-seeking bugs students make, the less likely
they are to learn. The correlation between students’ frequency of success at the cognitive
level (computed as the percentage of problem steps that the student completed without
errors or hints from the tutor) and learning gain is about the same (r = .58, p = .0001)
as the correlation between help-seeking bugs and learning. Success in help seeking
and success at the cognitive level were highly correlated (r = .78, p < .0001). In a
multiple regression, the combination of help-seeking errors and errors at the cognitive
level accounted only for marginally more variance than either one alone. We also looked
at how the bug categories correlated with learning (also shown in Table 1). Both Help
Abuse and Miscellaneous Bugs were negatively correlated with learning with p < 0.01.
These bug categories have in common that the students avoid trying to solve the step. On
the other hand, Try Step Abuse and Help Avoidance were not correlated with learning.
236 V. Aleven et al.
6 Discussion
Our analysis sheds light on the validity of the help-seeking model and the adjustments we
must make before we use it for “live” tutoring. The fact that some of the bug categories of
the model correlate negatively with learning provides some measure of confidence that
the model is on the right track. The correlation between Hint Abuse and Miscellaneous
Bugs and students’ learning gain supports our assumption that the help-seeking model
is valid in identifying these phenomena. On the other hand, the model must be more
lenient with respect to help-seeking errors. The current rate of 72% implies that the
Help-Seeking Tutor Agent would intervene (i.e., present a bug message) in 3 out of
every 4 actions taken by a student. In practical use, this is likely to be quite annoying and
distracting to the student. Another finding that may lead to a change in the model is the
fact that Try-Step Abuse did not correlate with learning. Intuitively, it seems plausible
that a high frequency of incorrect guesses would be negatively correlated with learning.
Perhaps the threshold we used for “thinking time” is too high; perhaps it should be
depend on the student’s skill level. This will require further investigation. Given that the
model is still preliminary and under development, the findings on students’ help seeking
should also be regarded as subject to further investigation.
The finding that students often abuse hints confirms earlier work (Aleven &
Koedinger, 2000; Aleven, McLaren, & Koedinger, to appear; Baker, Corbett, &
Koedinger, in press). The current analysis extends that finding by showing that help
abuse is frequent relative to other kinds of help-seeking bugs and that it correlates neg-
atively with learning. However, the particular rate that was observed (37%) may be
inflated somewhat because of the high frequency of “Clicking Through Hints” (33%).
Since typically 6 to 8 hint levels were available, a single “clicking-through” episode –
selecting hints until the “bottom out” or answer hint is seen – yields multiple actions
in the data. One would expect to see a different picture if the clicking episodes were
clustered into a single action.
Several new findings emerged from our empirical study. As mentioned, a high help-
seeking error rate was identified (72%). To the extent that the model is correct, this
suggests that students generally do not have good help-seeking skills. We also found a
relatively high Help Avoidance rate, especially if we categorize “Guess Quickly when
Help Use was Appropriate” as a form of Help Avoidance (18% combined). In addition,
since the frequency of the Help Abuse category appears to be inflated by the high preva-
lence of Clicking Through Hints, categories such as Help Avoidance are correspondingly
deflated. The significance of this finding is not yet clear, since Help Avoidance did not
correlate with learning. It may well be that the model does not yet successfully identify
instances in which the students should have asked for help but did not. On the other
hand, the gross abuse of help in the given data set is likely to have lessened the impact of
Help Avoidance. In other words, given that the Help Avoidance in this data set was really
Help Abuse avoidance, the lack of correlation with learning is not surprising and should
not be interpreted as meaning that help avoidance is not a problem or has no impact on
learning. Future experiments with the Help-Seeking Tutor Agent may cast some light on
the importance of help avoidance, in particular if the tutor turns out to reduce the Help
Avoidance rate.
It must be said that we are just beginning to analyze and interpret the data. For
instance, we are interested in obtaining a more detailed insight into and understanding
Toward Tutoring Help Seeking 237
of Help Avoidance. Under what specific circumstances does this occur? We also intend
to investigate in greater detail how students so often get a step right even when they
answer too quickly, according to the model. Finally, how different would the results
look if clicking through hints is considered a single mental action?
7 Conclusion
We have presented a preliminary model of help seeking which will form the basis of
a Help-Seeking Tutor Agent, designed to be seamlessly added to existing Cognitive
Tutors. To validate the model, we have run it against pre-existing tutor data. This analysis
suggests that the model is on the right track, but is not quite ready for “live” tutoring, in
particular because it would lead to feedback on as much as three-fourths of the students’
actions, which is not likely to be productive. Although the model is still preliminary,
the analysis also sheds some light on students’ help-seeking behavior. It confirms earlier
findings that students’ help-seeking behavior is far from ideal and that help-seeking
errors correlate negatively with learning, underscoring the importance of addressing
help-seeking behavior by means of instruction.
The next step in our research will be to continue to refine the model, testing it
against the current and other data sets, and modifying it so that it will be more selective
in presenting feedback to students. In the process, we hope to gain a better understanding,
for example, of the circumstances under which quick answers are fine or under which
help avoidance is most likely to be harmful. Once the model gives satisfactory results
when run against existing data sets, we will use it for live tutoring, integrating the Help-
Seeking TutorAgent with an existing Cognitive Tutor. We will evaluate whether students’
help-seeking skill improves when they receive feedback from the Help-Seeking Tutor
Agent and whether they obtain better learning outcomes. We will also evaluate whether
better help-seeking behavior persists beyond the tutor units in which the students are
exposed to the Help-Seeking Tutor Agent and whether students learn better in those units
as a result. A key hypothesis is that the Help- Seeking Tutor Agent will help students to
become better learners.
Acknowledgments. The research reported in this paper is supported by NSF Award No.
IIS-0308200.
References
Aleven V. & Koedinger, K. R. (2002). An effective meta-cognitive strategy: Learning by doing
and explaining with a computer-based Cognitive Tutor. Cognitive Science, 26(2), 147-179.
Aleven, V., & Koedinger, K. R. (2000). Limitations of Student Control: Do Student Know when
they need help? In G. Gauthier, C. Frasson, & K. VanLehn (Eds.), Proceedings of the 5th
International Conference on Intelligent Tutoring Systems, ITS 2000 (pp. 292-303). Berlin:
Springer Verlag.
Aleven, V., McLaren, B. M., & Koedinger, K. R. (to appear). Towards Computer-Based Tutoring
of Help-Seeking Skills. In S. Karabenick & R. Newman (Eds.), Help Seeking in Academic
Settings: Goals, Groups, and Contexts. Mahwah, NJ: Erlbaum.
238 V. Aleven et al.
Aleven, V., Stahl, E., Schworm, S., Fischer, F., & Wallace, R.M. (2003). Help Seeking in Interactive
Learning Environments. Review of Educational Research, 73(2), 277-320.
Anderson, J. R., Corbett, A. T., Koedinger, K. R., & Pelletier, R. (1995). Cognitive tutors: Lessons
learned. The Journal of the Learning Sciences, 4, 167-207.
Arroyo, I., Beck, J. E., Beal, C. R., Wing, R., & Woolf, B. P. (2001). Analyzing students’ response to
help provision in an elementary mathematics intelligent tutoring system. In R. Luckin (Ed.),
Papers of the AIED-2001 Workshop on Help Provision and Help Seeking in Interactive
Learning Environments (pp. 34-46).
Baker, R. S., Corbett, A. T., & Koedinger, K. R. (in press). Detecting Student Misuse of Intelligent
Tutoring Systems. In Proceedings of the 7th International Conference on Intelligent Tutoring
Systems. ITS 2004.
Bransford, J. D., Brown, A. L., & Cocking, R. R. (Eds.) (2000). How People Learn: Brain, Mind,
Experience, and School. Washington, CD: National Academy Press.
Brown, A. L., & Campione, J. C. (1996). Guided Discovery in a Community of Learners. In K.
McGilly (Ed.), Classroom Lessons: Integrating Cognitive Theory and Classroom Practice
(pp. 229-270). Cambridge, MA: The MIT Press.
Conati C. & VanLehn K. (2000). Toward computer-based support of meta-cognitive skills: A
computational framework to coach self-explanation. International Journal of Artificial In-
telligence in Education, 11, 398-415.
Corbett, A. T. & Anderson, J. R. (1995). Knowledge tracing: Modeling the acquisition of procedural
knowledge. User Modeling and User-Adapted Interaction, 4, 253-278.
Dutke, S., & Reimer, T. (2000). Evaluation of two types of online help information for application
software: Operative and function-oriented help. Journal of Computer-Assisted Learning, 16,
307-315.
Hambleton, R. K. & Swaminathan, H. (1985). Item Response Theory: Principles and Applications.
Boston: Kluwer.
Karabenick, S. A. (Ed.) (1998). Strategic help seeking. Implications for learning and teaching.
Mahwah: Erlbaum.
Koedinger, K. R., Anderson, J. R., Hadley, W. H., & Mark, M. A. (1997). Intelligent tutoring
goes to school in the big city. International Journal of Artificial Intelligence in Education,
8, 30-43.
Koedinger, K. R., Corbett, A. T., Ritter, S., & Shapiro, L. (2000). Carnegie Learning’s Cognitive
TutorTM : Summary Research Results. White paper. Available from Carnegie Learning Inc.,
1200 Penn Avenue, Suite 150, Pittsburgh, PA 15222, E-mail: [email protected],
Web: http://www.carnegielearning.com
Luckin, R., & Hammerton, L. (2002). Getting to know me: Helping learners understand their
own learning needs through meta-cognitive scaffolding. In S. A. Cerri, G. Gouardères, &
F. Paraguaçu (Eds.), Proceedings of Sixth International Conference on Intelligent Tutoring
Systems, ITS 2002 (pp. 759- 771). Berlin: Springer.
McKendree, J. (1990). Effective feedback content for tutoring complex skills. Human Computer
Interaction, 5, 381-413.
Nelson-LeGall, S. (1981). Help-seeking: An understudied problem-solving skill in children. De-
velopmental Review, 1, 224-246.
Newman, R. S. (1994). Adaptive help seeking: a strategy of self-regulated learning. In D. H.
Schunk & B. J. Zimmerman (Eds.), Self-regulation of learning and performance: Issues and
educational applications (pp. 283-301). Hillsdale, NJ: Erlbaum.
Palincsar, A. S., & Brown, A. L. (1984). Reciprocal teaching of comprehension-fostering and
comprehension monitoring activities. Cognition and Instruction, 1, 117-175.
Renkl, A. (2002). Learning from worked-out examples: Instructional explanations supplement
selfexplanations. Learning and Instruction, 12, 529-556.
Toward Tutoring Help Seeking 239
Ryan, A. M., Gheen, M. H. & Midgley, C. (1998), Why do some students avoid asking for help? An
examination of the interplay among students’ academic efficacy, teachers’ social-emotional
role, and the classroom goal structure. Journal of Educational Psychology, 90(3), 528-535.)
Schworm, S. & Renkl, A. (2002). Learning by solved example problems: Instructional explanations
reduce self-explanation activity. In W. D. Gray & C. D. Schunn (Eds.), Proceeding of the 24th
Annual Conference of the Cognitive Science Society (pp.816-821). Mahwah, NJ: Erlbaum.
Trafton, J.G., & Trickett, S.B. (2001). Note-taking for self-explanation and problem solving.
Human- Computer Interaction, 16, 1-38.
White, B., & Frederiksen, J. (1998). Inquiry, modeling, and metacognition: Making science ac-
cessible to all students. Cognition and Instruction, 16(1), 3-117.
Wood, D. (2001). Scaffolding, contingent tutoring, and computer-supported learning. International
Journal of Artificial Intelligence in Education, 12.
Wood, H., & Wood, D. (1999). Help seeking, learning and contingent tutoring. Computers and
Education, 33, 153-169.
Why Are Algebra Word Problems Difficult? Using
Tutorial Log Files and the Power Law of Learning to
Select the Best Fitting Cognitive Model
Abstract. Some researchers have argued that algebra word problems are diffi-
cult for students because they have difficulty in comprehending English. Oth-
ers have argued that because algebra is a generalization of arithmetic, and gen-
eralization is hard, it’s the use of variables, per se, that cause difficulty for stu-
dents. Heffernan and Koedinger [9] [10] presented evidence against both of
these hypotheses. In this paper we present how to use tutorial log files from an
intelligent tutoring system to try to contribute to answering such questions. We
take advantage of the Power Law of Learning, which predicts that error rates
should fit a power function, to try to find the best fitting mathematical model
that predicts whether a student will get a question correct. We decompose the
question of “Why are Algebra Word Problems Difficult?” into two pieces.
First, is there evidence for the existence of this articulation skill that Heffernan
and Koedinger argued for? Secondly, is there evidence for the existence of the
skill of “composed articulation” as the best way to model the “composition ef-
fect” that Heffernan and Koedinger discovered?
1 Introduction
Many researchers had argued that students have difficulty with algebra word-problem
symbolization (writing algebra expressions) because they have trouble comprehend-
ing the words in an algebra word problem. For instance, Nathan, Kintsch, & Young
[14] “claim that [the] symbolization [process] is a highly reading-oriented one in
which poor comprehension and an inability to access relevant long term knowledge
leads to serious errors.” [emphasis added]. However, Heffernan & Koedinger [9] [10]
showed that many students can do compute tasks well, whereas they have great diffi-
culty with the symbolization tasks [See Table 1 for examples of compute and symboli-
zation types of questions]. They showed that many students could comprehend the
words in the problem, yet still could not do the symbolization. An alternative expla-
nation for “Why Are Algebra Word Problems Difficult?” is that the key is the use of
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 240–250, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Why Are Algebra Word Problems Difficult? Using Tutorial Log Files 241
variables. Because algebra is a generalization of arithmetic, and it’s the variables that
allow for this generalization, it seems to make sense that it’s the variables that make
algebra symbolization hard.
However, Heffernan & Koedinger presented evidence that cast doubt on this as an
important explanation. They showed there is hardly any difference between students’
performance on articulation (see Table 1 for an example) versus symbolization tasks,
arguing against the idea that the hard part is the presence of the variable per se.
Instead, Heffernan & Koedinger hypothesized that a key difficulty for students was
in articulating arithmetic in the “foreign” language of algebra. They hypothesized the
existence of a skill for articulating one step in an algebra word problem. This articu-
lation step requires that a student be able to say (or “articulate”) how it is they would
do a computation, without having to actually do the arithmetic. Surprising, the found
that is was easier for a student to actually do the arithmetic then to articulate what
they did in an expression. To successfully articulate a student has to be able to write
in the language of algebra. Question 1 for this paper is “Is there evidence from tuto-
rial log files that support the conjecture that the articulate skill really exists?”
In addition to conjecturing the existence of the skill for articulating a single step,
Heffernan & Koedinger also reported what they called the “composition effect”
which we will also try to model. Heffernan & Koedinger took problems requiring two
mathematical steps and made two new questions, where each question assessed each
of the steps independently. They found that the difficulty of the one two-operator
problem was much more than the combined difficulty of the two one-operator prob-
lems taken together. They termed this the composition effect. This led them to
speculate as to what the “hidden” difficulty was for students that explained this dif-
ference in performance. They argued that the hidden difficulty included knowledge
of composition of articulation. Heffernan & Koedinger attempted to argue that the
composition effect was due to difficulties in articulating rather than on the task of
comprehending, or at the symbolization step when a variable is called for. In this
paper we will compare these hypotheses to try to determine the source of the compo-
sition effect originates. We refer to this as Question 2.
Heffernan & Koedinger’s arguments were based upon two different samplings of
about 70 students. Students’ performances on different types of items were analyzed.
Students were not learning during the assessment so there was no need to model
learning. Heffernan & Koedinger went on to create an intelligent tutoring system,
“Ms Lindquist”, to teach student how to do similar problems. In this paper we at-
tempt to use tutorial log file data collected from this tutor to shed light on this contro-
versy. The technique we present is useful for intelligent tutoring system designers as
it shows a way to use log file data to refine the mathematical models we use in pre-
dicting whether a student will get an item correct. For instance, Corbett and Ander-
son describe how to use “knowledge tracing” to track students performance on items
related to a particular skill, but all such work is based upon the idea that you know
what skills are involved already. But in this case there is controversy [15] over what
are the important skills (or more generally, knowledge components). Because Ms
Lindquist selects problems in a curriculum section randomly, we can learn what the
knowledge components are that are being learned. With out problem randomization
242 E.A. Croteau, N.T. Heffernan, and K.R. Koedinger
we would have no hope of separating out the effect of problem ordering with the
difficulty of individual questions.
In the following sections of this paper we present the investigations we did to look
into the existence of both the skills of articulation as well as composition of articula-
tion. In particular, we present mathematically predictive models of a student’s chance
of getting a question correct. It should be noted, such predicative models have many
other uses for intelligent tutoring systems, so this methodology has many uses.
degree that a student learns (i.e., receives practice at employing) the comprehending
one-step KC. We can turn this qualitative observation into a quantified prediction
method by treating each knowledge component as having a difficulty parameter and a
learning parameter. This is where we take advantage of the Power Law of Learning,
which is one of the most robust findings in cognitive psychology. The power law says
that the performance of cognitive skills improve approximately as a power function
of practice [16] [1]. This has been applied to both error rates as well as time to com-
plete a task, but our use here will be with error rates. This can be stated mathematical
as follows:
Where x represents the number of times the student has received feedback on the
task, b represents a difficulty parameter related to the error rate on the first trail of the
task, and d represents a learning parameter related to the learning rate for the task.
Tasks that have large b values represent tasks that are difficult for students the first
time they try it (could be due to the newness of the task, or the inherit complexity of
the task). Tasks that have a large d coefficient represent tasks where student learning
is fast. Conversely, small values of d are related to tasks that students are slow to
improve1.
The approach taken here is a variation of “learning factors analysis”, a semi-
automated method for using learning curve data to refine cognitive models [12]. In
this work, we follow Junker, Koedinger, & Trottini [11] in using logistic regression to
try to predict whether a student will get a question correct, based upon both item
factors (like what knowledge components are used for a given question, which is
what we are calling difficulty parameters), student factors (like a students pretest
score) and factors that depend on both students and items (like how many times this
particular students has practiced their particular knowledge component, which is what
we are calling learning parameters.) Corbett & Anderson [3], Corbett, Anderson &
O’Brien [4] and Draney, Pirolli, & Wilson [5] report results using the same and/or
similar methods as described above. There is also a great deal of related work in the
psychometric literature related to item response theory [6], but most of it is focused
on analyzing test (e.g., SAT or GRE) rather than student learning.
1.3 Using the Transfer Model to Predict Transfer in Tutorial Log Files
Heffernan [7] created Ms. Lindquist, an intelligent tutoring system, and put it online
(www.algebratutor.org) and collected tutorial log files for all the students learning to
symbolize. For this research we selected a data set for which Heffernan [8] had pre-
viously reported evidence that students were learning during the tutoring sessions.
Some 73 students were brought to a computer lab to work with Ms. Lindquist for two
class periods totaling an average of about 1 hour of time for each student. We present
1
All learning parameters are restricted to be positive otherwise the parameters would be
modeling some sort of forgetting effect.
244 E.A. Croteau, N.T. Heffernan, and K.R. Koedinger
data from students working only on the second curriculum section, since the first
curriculum was too easy for students and showed no learning. (An example of this
dialog is shown in Table 2 and will be discussed shortly). This resulted in a set of log
files from 43 students, comprising 777 rows where each row represents a student’s
first attempt to answer a given question.
Why Are Algebra Word Problems Difficult? Using Tutorial Log Files 245
Table 1 shows an example of the sort of dialog Ms. Lindquist carries on with stu-
dents (this is with “made-up” student responses). Table 1 starts by showing a student
working on scenario identifier #1 (Column 1) and only in the last row (Row 20) does
the scenario identifier switch. Each word-problem has a single top-level question
which is always a symbolize question. If the student fails to get the top level question
correct, Ms. Lindquist steps in to have a dialog (as shown in the column) with the
student, asking questions to help break the problem down into simpler questions. The
246 E.A. Croteau, N.T. Heffernan, and K.R. Koedinger
combination of the second and third column indicates the question type. The second
column is for the Task Direction factor, where S=Symbolize, C=Compute and
A=Articulate. By crossing task direction and steps, there are six different question
types. The column defines what we call the attempt at a question type. The num-
ber appearing in the attempt column is the number of times the problem type has been
presented during the scenario. For example, the first time one of the six question
types is asked, the attempt for that question will be “1”. Notice how on row 7, the
attempt is “2” because it’s the second time a one-step compute question has been
asked for that scenario identifier. For another example see rows 3 and 7. Also notice
that on line 20 the attempt column indicates a first attempt at a two-step symbolize
problem for the new scenario identifier.
Notice that on row 5 and 7, the same question is asked twice. If the student did not
get the problem correct at line 7, Ms Lindquist would have given a further hint of
presenting six possible choices for the answer. For our modeling purposes, we will
ignore the exact number of attempts the student had to make at any given question.
Only the first attempt in a sequence will be included in the data set. For example, this
is indicated in Table 1, in the row of the column, where the “F” for false indi-
cates that row will be excluded from the data set.
The column has the exact dialog that the student and tutor had. The and
columns are grouped together because they are both outcomes that we will try to
predict.2 Columns 9-16 show what statisticians call the design matrix, which maps
the possible observations onto the fixed effect (independent) coefficients. Each of
these columns will get a coefficient in the logistic regression. Columns 9-12 show the
difficulty parameters, while columns 13-16 show the learning parameters. We only
list the four knowledge components of the Base+ Model, and leave out the four dif-
ferent ways to deal with composition. The difficulty parameters are simply the knowl-
edge components identified in the transfer model. The learning parameter is calcu-
lated by counting the number of previous attempts a particular knowledge component
has been learned (we assume learning occurs each time the system gives feedback on
a correct answer). Notice that these learning parameters are strictly increasing as we
move down the table, indicating that students’ performance should be monotonically
increasing.
Notice that the question asked of the student on row 3 is the same as the one on
row 9, yet the problem is easier to answer after the system has given feedback on “the
distance rowed is 120”. Therefore the difficulty parameters are adjusted in row 9,
column 9 and 10, to reflect the fact that if the student had already received positive
feedback on those knowledge components. By using this technique we make the
credit-blame assignment problem easier for the logistic regression because the num-
ber of knowledge components that could be blamed for a wrong answer had been
reduced. Notice that because of this method with the difficulty parameters, we also
had to adjust the learning parameters, as shown by the crossed out learning parame-
2 Currently, we are only predicting whether the response was correct or not, but later we will
do a Multivariate logistic regression to take into account the time required for the student to
respond.
Why Are Algebra Word Problems Difficult? Using Tutorial Log Files 247
ters. Notice that the learning parameters are not reset on line 20 when a new scenario
was started because the learning parameters extend across all the problems a student
does.
With some minor changes, Table 1 shows a snippet of what the data set looked like
that we sent to the statistical package to perform the logistic regression. We per-
formed a logistic regressions predicting the dependent variable response (column 8)
based on the independent variables on the knowledge components (i.e., columns 9-
16). For some of the results we present, we also add a student specific column (we
used a student’s pretest score) to help control for the variability due to students dif-
fering incoming knowledge.
testing set by randomly selecting one-tenth of the students not having appeared in a
prior testing set. This procedure was repeated ten times in order to have included
each student in a testing set exactly once.
A model was then constructed for each of the training sets using a logistic regres-
sion with the student response as the dependent variable. Each fitted model was used
to predict the student response on the corresponding testing set. The prediction for
each instance can be interpreted as the model’s fit probability that a student’s re-
sponse was correct (indicated by a “1”). To associate the classification with the bi-
variate class attribute, the prediction was rounded up or down depending if it was
greater or less than 0.5. The predictions were then compared to the actual response
and the total number of correctly classified instances were divided by the total num-
ber of instances to determine the overall classification accuracy for that particular
testing set.
3 Results
We summarize the results of our model construction, with Table 2 showing the results
of models we attempted to construct. To answer Question 1, we compared the Base
Model to the Base+ Model that added the articulate one-step KC. After applying our
criterion for eliminating non-statistically significant parameters we were left with just
two difficulty parameters for the Base Model (all models in Table 2 also had the very
statistically significant pretest parameter).
It turned out that the Base+ Model did a better statistically significant better job
(smaller BIC are better) than the Base Model in terms of BIC (the difference was
great than 10 BIC points suggesting a statistically significant difference). The Base+
Model also did better when using the K-holdout strategy (59.6% vs 64.3%). We see
from Table 2 that the Base+ Model eliminated the comprehending one-step KC and
added instead the articulating one-step and arithmetic KCs suggesting that “articu-
lating” does a better job than comprehension as the way to model what is hard about
word problems.
Why Are Algebra Word Problems Difficult? Using Tutorial Log Files 249
So after concluding that there was good evidence for articulating one-step, we then
computed Models 2-4. We found that two of the four ways of trying to model com-
position resulted in models that were inferior in terms of BIC and not much different
in terms of the K-holdout strategies. We found that models 4 and 5 were reduced to
the Base+ Model by the step-wise elimination procedure. We also tried to calculate
the effect of combining any two of the four composition KCs but all such attempts
were reduced by the step-wise elimination procedure to already found models. This
suggests that for the set of tutorial log files we used, there was not sufficient evidence
to argue for the composition of articulation over other ways of modeling the compo-
sition effect.
It should be noted that while none of the learning parameters of any of the knowl-
edge components were in any of the final models (thus creating models that predict
no learning over time) we should note that on models 4 and 5, the last parameter that
was eliminated was a learning parameters that both had t-test values that were within
a very small margin of being statistically significant (t=1.97 and t=1.84). It should
also be noted that in Heffernan [8] the learning within Experiment 3 was only close to
being statistically significant. That might explain why we do not find any statistically
significant learning parameters.
We feel that Question 1 (“Is there evidence from tutorial log files that support the
conjecture that the articulating one-step KC really exists?”) is answered in the af-
firmative, but Question 2 (“What is the best way to model the composition effect”)
has not been answered definitely either way. All of the models that tried to explicitly
model a composition KC did not lead to significantly better models. So it is still an
open question of how to best model the composition effect.
4 Conclusions
This paper presented a methodology for evaluating models of transfer. Using this
methodology we have been able to compare different plausible models. We think that
this method of constructing transfer models and checking for parsimonious models
against student data is a powerful tool for building cognitive models.
A limitation of this techniques is that the results depend on what curriculum (i.e.,
the problems presented to students, and the order in which that happened) the students
were presented with during their course of study. If students were presented with a
different sequence of problems, then there is no guarantee of being able to draw the
same conclusions.
We think that using transfer models could be an important tool to use in building
and designing cognitive models, particularly where learning and transfer are of inter-
est. We think that this methodology makes a few reasonable assumptions (the most
important being the Power Law of Learning). We think the results in this paper show
that this methodology could be used to answer interesting cognitive science questions.
250 E.A. Croteau, N.T. Heffernan, and K.R. Koedinger
References
1. Anderson, J. R., & Lebiere, C. (1998). The Atomic Components of Thought. Lawrence
Erlbaum Associates, Mahwah, NJ.
2. Baker, R.S., Corbett, A.T., Koedinger, K.R. (2003) Statistical Techniques for Comparing
ACT-R Models of Cognitive Performance. Presented at Annual ACT-R Workshop.
3. Corbett, A. T. and Anderson, J. A. (1992) Knowledge tracing in the ACT programming
tutor. In: Proceedings of 14-th Annual Conference of the Cognitive Science Society.
4. Corbett, A. T., Anderson, J. R., & O’Brien, A. T. (1995) Student modeling in the ACT
programming tutor. Chapter 2 in P. Nichols, S. Chipman, & R. Brennan, Cognitively Di-
agnostic Assessment. Hillsdale, NJ: Erlbaum.
5. Draney, K. L., Pirolli, P., & Wilson, M. (1995). A measurement model for a complex
cognitive skill. In P. Nichols, S. Chipman, & R. Brennan, Cognitively Diagnostic Assess-
ment. Hillsdale, NJ: Erlbaum.
6. Embretson, S. E. & Reise, S. P. (2000) Item Response Theory for Psychologists Law-
rence Erlbaum Assoc.
7. Heffernan, N. T. (2001). Intelligent Tutoring Systems have Forgotten the Tutor: Adding a
Cognitive Model of an Experienced Human Tutor. Dissertation & Technical Report.
Carnegie Mellon University, Computer Science, http://www.algebratutor.org/pubs.html.
8. Heffernan, N. T. (2003) Web-Based Evaluations Showing both Cognitive and Motiva-
tional Benefits of the Ms. Lindquist Tutor 11th International Conference Artificial Intelli-
gence in Education. Syndey. Australia.
9. Heffernan, N. T., & Koedinger, K. R.(1997) The composition effect in symbolizing: the
role of symbol production versus text comprehension. In Proceeding of the Nineteenth
Annual Conference of the Cognitive Science Society (pp. 307-312). Hillsdale, NJ: Law-
rence Erlbaum Associates.
10. Heffernan, N. T., & Koedinger, K. R. (1998) A developmental model for algebra symboli-
zation: The results of a difficulty factors assessment. Proceedings of the Twentieth Annual
Conference of the Cognitive Science Society, (pp. 484-489) Hillsdale, NJ: Lawrence Erl-
baum Associates.
11. Junker, B., Koedinger, K. R., & Trottini, M. (2000). Finding improvements in student
models for intelligent tutoring systems via variable selection for a linear logistic test
model. Presented at the Annual North American Meeting of the Psychometric Society,
Vancouver, BC, Canada. http://lib.stat.cmu.edu/~brian/bjtrs.html
12. Koedinger, K. R. & Junker, B. (1999). Learning Factors Analysis: Mining student-tutor
interactions to optimize instruction. Presented at Social Science Data Infrastructure Con-
ference. New York University. November, 12-13, 1999.
13. Koedinger, K.R., & MacLaren, B. A. (2002). Developing a pedagogical domain theory of
early algebra problem solving. CMU-HCII Tech Report 02-100. Accessible via
http://reports-archive.adm.cs.cmu.edu/hcii.html.
14. Nathan, M. J., Kintsch, W. & Young, E. (1992). A theory of algebra-word-problem com-
prehension and its implications for the design of learning environments. Cognition & In-
struction 9(4): 329-389.
15. Nathan, M. J., & Koedinger, K. R. (2000). Teachers’ and researchers’ beliefs about the
development of algebraic reasoning. Journal for Research in Mathematics Education, 31,
168-190.
16. Newell, A., & Rosenbloom, P. (1981) Mechanisms of skill acquisition and the law of
practice. In Anderson (ed.), Cognitive Skills and Their Acquisition., Hillsdale, NJ: Erl-
baum.
17. Raftery, A.E. (1995) Bayesian model selection in social research. Sociological Method-
ology (Peter V. Marsden, ed.), Cambridge, Mass.: Blackwells, pp. 111-196 .
Towards Shared Understanding of Metacognitive Skill
and Facilitating Its Development
1 Introduction
Recently many researchers who are convinced that metacognition has relevance to
intelligence [1,26], are shifting their attention from the theoretical to practical
educational issues. As a result of this shift, researchers are designing a number of
effective learning strategies [15,16,23,24,25] and computer based learning systems
[5,6,8,20] to facilitate the development of learners’ metacognition.
However, there is one critical problem encountered in these strategies and systems:
the concept of metacognition is ambiguous and mysterious [2,4,18]. There are several
terms currently used to describe the same basic phenomenon (e.g., self-regulation,
executive control). The varied phenomena that have been subsumed under the term,
metacognition, are described. Also cognitive and metacognitive functions are often
used interchangeably in the literature [2,4,7,15,16,17,18,19,22,27]. The ambiguity
mainly comes from the following three reasons: (1)it is difficult to distinguish
metacognition from cognition; (2)metacognition has been used to refer to two distinct
area of research: knowledge about cognition and regulation of cognition; and (3)there
are four historical roots to the inquiry of metacognition [2].
With this ambiguous definition of “metacognition”, we cannot answer the crucial
questions concerning existing learning strategies or systems: what they have
supported, or not; what is difficult for them to support; why it is difficult; and
essentially what is the distinction between cognition and metacognition. In order to
answer these questions, we first should clarify how many concepts are subsumed
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 251–261, 2004.
© Springer-Verlag Berlin Heidelberg 2004
252 M. Kayashima, A. Inaba, and R. Mizoguchi
Observation as basic cognition is to take information from the outside world into
working memory (WM) at the cognitive layer. As a result, a state or a sequence of
states is generated in WM at the cognitive layer. Evaluation and Selection as cogni-
tive activity is to evaluate the sequence of states in WM, select actions from a knowl-
edge base, and create an action-list. Consequently, a state or a sequence of states in
WM at the cognitive layer is transformed. Output as cognitive activity is to output
actions in an action-list as behavior. Observation as basic metacognition is to take
information of cognitive activities and information in WM at the cognitive layer into
WM at the metacognitive layer. As a result, a state or a sequence of states in WM at
the metacognitive layer is transformed. Evaluation and Selection are to evaluate states
in WM at the metacognitive layer, select actions from a knowledge base, and form
actions to regulate cognitive activities at the cognitive layer as an action-list. In this
way, a state or a sequence of states in WM at the metacognitive layer is transformed.
Output as metacognitive activity is to perform actions in an action-list to regulate
cognitive activities at the cognitive layer. As a result, cognitive activities at the cog-
nitive layer are changed.
We clarify the target activities of learning strategies and systems by consideration
of the correspondence of organized activities in Table 1 to target activities. Consider a
learner’s activity with Error-Based Simulation (we abbreviate it as EBS) [8] and the
Reflection Assistant (we abbreviate it as RA) [5, 6]. EBS is a behavior simulation
generated from an erroneous equation for mechanics problems. The strange behavior
in an EBS makes the error in the equation clear and gives the learner a motivation to
reflect, and provides opportunities that a learner monitors his/her previous cognitive
activity objectively. RA consists of three phases to help learners do three reflective
activities; understanding of goals and given facts of the problem; recalling previous
knowledge; organizing the problem, and thinking about strategies to solve the prob-
lem. These reflective activities allow learners to identify knowledge about problem
solving; strategically encode the nature of the problem and form a mental representa-
tion of its elements; select appropriate strategies depending on the mental representa-
tion. Based on the organized activities in cognitive skill and metacognitive skill, RA
facilitates learners’ basic cognition and cognitive activities while EBS facilitates
metacognitive activities.
254 M. Kayashima, A. Inaba, and R. Mizoguchi
to THINK–TEL WHY (we abbreviate as AT) [15,16], reciprocal teaching (we abbre-
viate as RT)[23], RA [5,6] and EBS [8].
We identify goals for these learning strategies and systems for each of the four
categories: I-goal, Y<=I-goal, W(A)-goal, and W(L)-goal. For I-goal, we adopt
Inaba’s classification of I-goals for collaborative learning: acquisition of content
specific knowledge, development of cognitive skills, development of metacognitive
skills, and development of skill for self-expression. Each I-goal has a developmental
stage. The I-goal “acquisition of content specific knowledge” has three phases of
learning: accretion, tuning, and restructuring. Each I-goal of skill learning has three
stages: cognitive stage, associative stage, and an autonomous stage.
256 M. Kayashima, A. Inaba, and R. Mizoguchi
that is, a type of Y<=I-goal. The S<=P-goal is the goal of the person who participates
in the learning session as the Primary focus to interact with the learners who play a
role as Secondary focus, while P<=S-goal is the goal of the person who plays a Sec-
ondary focus role to interact with the learners who play a Primary focus role. Y<=I–
goal consists of three parts: “I-role”, “You-role” and “I-goal”. I-role is a role to attain
the Y<=I-goal. A member who plays I-role (I-member) is expected to attain his/her I-
goal by attaining the Y<=I-goal. You-role means a role as a partner for the I-member.
I-goal (I) is an I-goal which defines what the I-member attains. (For more details,
please see [9,10])
The AT has been used to comprehend science and social studies material. Its
W(L)-goal is “Comprehension.” At the AT, learners who participate in the learning
session take turns playing the roles of tutor and tutee, and they are trained in question-
asking skills in the tutor role and explanation skills in the tutee role. Learners in the
tutor role should not teach anything, but select an appropriate question from a tem-
plate of questions and ask the other learners, while the learners playing the tutee role
respond to the questions by explaining and elaborating their answers. So, the learner
playing the tutor role is called the “Questioner” and the tutee is the “Explainer”. The
questioner regulates other learners to explain what they think and elaborate upon it.
The questioner acquires knowledge about what question they should ask other learn-
ers to explain and elaborate what they think using a template of questions. The “Pri-
mary focus” in this learning strategy is “Questioner”, and the “Secondary Role” is
“Explainer”. The S<=P-goal is “Learning by Trial and Error”, the P<=S-goal is
“Learning by Self-Expression.” I-goal (Questioner) is “Other-regulation (Cognitive
stage)”, I-goal (Explainer) is “Acquisition of Content Specific Knowledge (Restruc-
turing).”
Fig. 4 represents the W(A)-goal “Setting up the situation for RT” using the struc-
ture shown in Fig. 2. The RT has been used to understand an expository text. Its
W(L)-goal is also “Comprehension.” At the RT, members in a group take turns in
leading a dialogue concerning sections of a text, and generate summaries and predic-
tions and in clarifying misleading or complex sections of the text. Initially, the
teacher demonstrates activities as a dialogue leader, and then provides each learner
who plays a role of a dialogue leader with guidance and feedback at the appropriate
level. The learner who plays the role mimics teacher’s activities, that is, a leader
practices what he learned through observing the teacher’s demonstration. Other
members in the group discuss about questions of a dialogue leader and the gist of
what has been read. In the form of discussion, members’ thinking is externalized. So,
the form of discussion helps a dialogue leader to monitor other members’ comprehen-
sion, and also promotes other members to elaborate their comprehension each other.
Thus, a member who leads a dialogue is called the “Dialogue Leader” and other
members in a group are the “Discussant.” A dialogue leader promotes others’ com-
prehensive monitoring and regulation. The discussants promote their comprehension.
The “Primary focus” in this learning strategy is “Dialogue Leader”, and
the“Secondary Role” is “Discussant”. The S<=P-goal is “Learning by Practice”, the
P<=S-goal is “Learning by Discussion”. I-goal (Dialogue Leader) is “Other-
Regulation (associative stage)” and I-goal (Discussant) is “Acquisition of Content
Towards Shared Understanding of Metacognitive Skill and Facilitating Its Development 259
5 Conclusion
The ambiguity of the term metacognition raises issues in support of the development
of a learners’ metacognitive skill. To clarify this ambiguity, we have organized ac-
tivities that cover a variety of activities pertaining to metacognitive skill. Based on the
organized activities, we can clarify what activity learners master by using learning
strategies and support systems. In this paper, we show that the activity which some
computer-based systems support, which has been subsumed under the heading meta-
cognition, is actually cognitive activity. Also, we explained existing learning strate-
gies and support systems which support the development of learners’ metacognitive
skill in relationship to Learning Goal Ontology.
In the future, we would like to identify learning goals that are proposed in other
existing learning strategies and learning support systems using the organized activi-
ties in cognitive skill and metacognitive skill, and represent them with the Learning
Goal Ontology.
References
1. Borkowski, J., Carr, M., & Pressely, M.: “Spontaneous” Strategy Use: Perspectives from
Metacognitive Theory. Intelligence, vol. 11. (1987) 61-75
260 M. Kayashima, A. Inaba, and R. Mizoguchi
2. Brown, A.: Metacognition, Executive Control, Self-Regulation, and Other More Mysteri-
ous Mechanisms. In: Weinert, F.E., Kluwe, R. H. (eds.): Metacognition, Motivation, and
Understanding. NJ: LEA. (1987) 65-116
3. Brown, A. L., Campione, J. C.: Psychological Theory and the Design on Innovative
Learning Environments: on Procedures, Principles, and Systems. In: Schauble, L., Glaser,
R. (eds.): Innovations in Learning: New Environments for Education. Mahwah, NJ: LEA.
(1996) 289-325
4. Flavell, J. H.: Metacognitive Aspects of Problem-Solving. In: Resnick, L. B. (ed.): The
Nature of Intelligence. NJ: LEA. (1976) 231-235
5. Gama, C.: The Role of Metacognition in Interactive Learning Environments, Track Proc.
of ITS2000 – Young Researchers. (2000)
6. Gama, C.: Helping Students to Help Themselves: a Pilot Experiment on the Ways of
Increasing Metacognitive Awareness in Problem Solving. Proc. of New Technologies in
Science Education 2001. Aveiro, Portugal. (2001)
7. Hacher, D. J. (1998). Definitions and Empirical Fpundations. In Hacker, D. G., Dunlosky,
J. and Graesser, A. C. (Eds.) Metacogniton in Educational Theory and Practice. NJ:LEA.
1-23.
8. Hirashima, T., Horiguchi, T.: What Pulls the Trigger of Reflection? Proc. of ICCE2001.
(2001)
9. Inaba, A., Supnithi, T., Ikeda, M., Mizoguchi, R., Toyoda, J.: How Can We Form Effec-
tive Collaborative Learning Groups? – Theoretical Justification of “Opportunistic Group
Formation” with Ontological Engineering. Proc. of ITS2000. (2000)
10. Inaba, A., Supnithi, T., Ikeda, M., Mizoguchi, R., Toyoda, J.: Is a Learning Theory Har-
monious With Others? Proc. of ICCE2000. (2000)
11. Kayashima, M., Inaba, A.: How Computers Help a Learner to Master Self-Regulation
Skill? Proc. of Computer Support for Collaborative Learning 2003. (2003)
12. Kayashima, M., Inaba, A.: Difficulties in Mastering Self-Regulation Skill and Supporting
Methodologies. Proc. of the International AIED Conference 2003. (2003)
13. Kayashima, M., Inaba, A.: Towards Helping Learners Master Self-Regulation Skills.
Supplementary Proc. of the International AIED Conference, 2003. (2003)
14. Kayashima, M., Inaba, A.: The Model of Metacognitive Skill and How to Facilitate De-
velopment of the Skill. Proc. of ICCE Conference 2003. (2003)
15. King, A.: ASK to THINK-TEL WHY: a Model of Transactive Peer Tutoring for Scaf-
folding Higher Level Complex Learning. Educational Psychologist. 32(4). (1997) 221-235
16. King, A.: Discourse Patterns for Mediating Peer Learning. In: O’Donnell A.M., King, A.
(eds.): Cognitive Perspectives on Peer Learning. NJ: LEA. (1999) 87-115
17. Kluwe, R. H.: Cognitive Knowledge and Executive Control: Metacognition. In: Griffin,
D. R. (ed.): Animal Mind - Human Mind. New York: Springer-Verlag. (1982) 201-224
18. Livingston, J. A.: Metacognition: an Overview.
http://www.gse.buffalo.edu/fas/shuell/cep564/Metacog.htm. (1997)
19. Lories, G., Dardenne, B., Yzerbyt, V. Y.: From Social Cognition to Metacognition. In:
Yzerbyt, V. Y., Lories, G., Dardenne, B. (eds.): Metacognition. SAGE Publications Ltd.
(1998) 1-15
20. Mathan, S. & Koedinger, K. R.: Recasting the Feedback Debate: Benefits of Tutoring
Error Detection and Correction Skills. Proc. of the International AIED Conference 2003.
(2003)
21. Mizoguchi, R., Bourdeau, J.: Using Ontological Engineering to Overcome Common AI-
ED Problems. IJAIED, vol. 11. (2000)
Towards Shared Understanding of Metacognitive Skill and Facilitating Its Development 261
22. Nelson, T. O. & Narens, L.: Why Investigate Metacognition? In: Metcalfe, J., Shimamura,
A.P. (eds.): Metacognition. MIT Press. (1994). 1-25.
23. Palincsar, A. S., Brown, A.: Reciprocal Teaching of Comprehension - Fostering and Com-
prehension Monitoring Activities. Cognitive and Instruction. 1(2). (1984) 117-175
24. Palincsar, A.S., Herrenkohl, L.R.: Designing Collaborative Contexts: Lessons from Three
Research Programs. In: O’Donnell, A. M., King, A. (eds.): Cognitive Perspectives on Peer
Learning. Mahwah, NJ: LEA. (1999) 151-177
25. Schoenfeld, A. H.: What’s All the Fuss about Metacognition? In: Shoenfeld, A. H. (ed.):
Cognitive Science and Mathematics Education. LEA. (1987) 189-215
26. Sternberg, R. J.: Inside Intelligence. American Scientist, 74. (1986) 137-143.
27. Yzerbyt, V. Y., Lories, G., Dardenne, B.: Metacognition: Cognitive and Social Dimen-
sion. London: SAGE. (1998)
Analyzing Discourse Structure to
Coordinate Educational Forums
Marco Aurélio Gerosa, Mariano Gomes Pimentel, Hugo Fuks, and Carlos Lucena
1 Introduction
As an asynchronous communication tool, a forum makes it possible for learners to
participate at their own pace while allowing them more time to think. However, edu-
cational environments still do not offer computational aids that are appropriate for
coordinating forums. The majority of the environments present a typical implementa-
tion that does not take into account educational aspects and it remains up to the
teacher (without specific computational support) to collect and analyze the informa-
tion that is necessary to coordinate group discussion.
Coordination is the effort needed to organize a group to enable it to work as a team
in a manner that channels communication and cooperation towards the group’s ob-
jective [8]. When coordinating a group discussion in a forum, among other factors the
teacher must be prepared to ensure that all of the learners are participating, that the
contributions add value to the discussion, that the conversation does not go off on
non-productive tangents and that good contributions are encouraged.
This article focuses on message chaining, categorization and timestamp. These
message attributes help in the coordination of educational forums without the teacher
inspecting the content of individual messages and in a manner that allows computa-
tional support for this
In a forum, where messages are structured hierarchically (tree), it is possible to
obtain indications about the depth of the discussion and the level of interaction by
observing the form of this tree. Measurements such as the average depth level and
percentage of leaves provide indications about how a discussion is going. Message
categorization can also help to identify the types of messages, making a separate
analysis of each message type possible. By analyzing the date that messages were
sent, among other factors it is possible to identify the amount of time between the
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 262–272, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Analyzing Discourse Structure to Coordinate Educational Forums 263
sending of messages, the day of the week and the hour expected for messages to be
sent. Comparing this data also makes it possible to obtain other information, such as
the type of message expected per level, how fast the tree grows, which types of mes-
sages are answered more quickly, etc. Based upon these aspects, the course coordi-
nator can evaluate how a discussion is evolving, giving him enough time to redirect
the discussion and, for example, to check up on the effects of his interventions.
The AulaNet environment supports the creation of educational forums, as pre-
sented in Section 2. The Information Technology Applied to Education (ITAE)
course, which provided the data for the analyses presented in this article, also is dis-
cussed in this section. Section 3 shows the analyses about discourse structure. Section
4 concludes the article.
Fig. 2. Trees extracted from the Conferences of the five editions of the ITAE course
266 M.A. Gerosa et al.
Visually, upon analyzing the trees in Figure 2, it can be seen that in ITAE 2001.2
and ITAE 2002.1 the tree became shallower over the period of time the course was
being taught. In the ITAE 2002.2, the tree depth changed from one conference to
another. In the ITAE 2003.1 and ITAE 2003.2, the tree depth increased during the
course, despite the fact that there were a number of shallow trees. It is also possible to
observe in this figure that, in all editions, conference one corresponding tree is the
shallowest. Although the depth of a tree does not in and of itself ensures that in-depth
discussion took place, it is a good indication. The teacher, then, can initiate a more
detailed investigation about the discussion depth. Based on the visualization of the
trees, it is possible to visually compare the depth of the conferences of a given edition
with those of other editions. However, in order to conduct a more precise analysis, it
is also necessary to have statistical information about these trees.
Fig. 3. Comparison of the Conferences of the ITAE 2002.1 and 2003.1 editions
It can be seen in Figure 3 that the average depth of the tree in the ITAE 2002.1
edition declined while the percentage of messages without answers (leaves) increased,
Analyzing Discourse Structure to Coordinate Educational Forums 267
which indicates that learners were having diminishing interaction as the course ad-
vanced. In this edition, in the first four Conferences the average level of the tree was
3.0 and the percentage of messages without answers was 51%; in the last four Con-
ferences, the average tree level was 2.8 and the leaves were 61%. For its part, in the
ITAE 2003.1, learners interacted more over the course of the conferences: the tree
corresponding to the discussion was getting deeper while the percentage of messages
without answers was decreasing. The average level was 2.2 in the first four Confer-
ences, increasing to 3.0 in the last four Conferences, while the percentage of mes-
sages without answers went from 69% in the first four Conferences to 53% in the last
four. Figure 3 also presents a comparison between a conference at the beginning and
another at the end of each one of these editions, emphasizing their difference. The
trees shown in Figure 2 and the charts in Figure 3 indicate that the interaction on
ITAE 2002.1 edition declined over the course of the conferences, while the interac-
tion on ITAE 2003.1 edition increased.
All of this data was obtained without having to inspect the content of the messages.
Comparing the evolution of the form and of the information about the trees in the
course allows teachers to intervene when they perceive that the level of interaction
has fallen or when the Conference is not reaching the desired depth level. Next in
Figure 4 is shown the expected quantity of messages per level.
Fig. 4. Average quantity of messages per tree level corresponding to the conferences
The coordinating teacher—the one who plans the course—can adjust the category set
to the objectives and characteristics of the group and the tasks.
Upon viewing the messages of a Conference, participants immediately realize the
category to which the message belongs (between brackets) together with its title,
author and date. Thus, it is possible to estimate how the discussion is progressing and
what is the probable content of the messages. The AulaNet also implements reports
about the utilization of the categories per participant, in order to facilitate the future
refining of the category set and to obtain indications about the characteristics of the
participants and their compliance with tasks. Categorization also helps organize the
discussion in a manner that favors decision making and maintenance of communica-
tion memory [2].
The categories adopted in the ITAE Conferences reflect the course dynamics. They
are: Seminar, for the root message of the discussion, posted by the seminar leader at
the beginning of the week; Question, to propose discussion topics, also posted by the
seminar leader; Argumentation, to answer the questions, offering the author’s point of
view in the message subject line and the arguments for it in the body of the message;
Counter-Argumentation, to be used when the author states a position that is contrary
to an argument; and finally, Clarification, to request or clarify doubts about a specific
message.
Message size also has a different expected value for each one of the categories,
given that each category has its own objectives and semantic. Figure 7 presents the
average values of characters for its category and average deviations. In this figure one
can see that the Seminar category is the one having the largest messages, followed by
Argumentation and Counter-Argumentation. The shortest messages are those in the
Question and Clarification categories.
At some point, during the course, one of the ITAE learners said: “When we coun-
ter-argue we can be more succinct, since the subject matter already is known to all.”
This statement is in keeping with the chart in Figure 7. If the subject is known to all
(it was presented during the previous messages) the author can go directly to the point
that interests him or her. Somehow, this also can be noted in the chart in Figure 8,
270 M.A. Gerosa et al.
which presents a decline in the average quantity of the characters per level in the
Argumentation (correlation = -80%) and Counter-Argumentation (correlation = -
93%) categories.
The category also helps to identify the direction that the discussion is taking. For
example, in a tree or a branch only containing argumentation messages, there is
probably no idea confrontation taking place. It is expected that the clashing of ideas
helps to involve more participants into the discussion, thus, bringing up confronting
points of view. Similarly, excessive counter-argumentation should attract mediator’s
attention. The group might be getting too involved into a controversy or, even worst,
there may be interpersonal conflicts taking place.
Analyzing Discourse Structure to Coordinate Educational Forums 271
Fig. 10. Frequency of messages over the course of the conferences of the ITAE 2003.2 edition
4 Conclusion
Message chaining, categorization and message timestamp are factors that help in the
coordination of educational forums within ITAE. Based upon the form established by
message chaining, it is possible to infer the level of interaction among course partici-
pants. Message categorization provides semantics to the way messages are connected,
helping to identify the accomplishment of tasks, identification of incorrectly message
nesting and the direction the discussion is taking. The analysis of message timestamp
makes it possible to identify the Student Syndrome phenomenon, which gets in the
way of the development of an in-depth discussion and the orientation provided by an
evaluation of the messages.
By analyzing the characteristics of the messages, teachers are able to better coordi-
nate learners, knowing when to intervene in order to keep the discussion from moving
in an unwanted direction. Furthermore, these analyses could be used to develop filter
for intelligent coordination and mechanisms for error reduction. It should be empha-
sized that these quantitative analyses provide to the teachers indications and alerts
about situations where problems exist and where the discussion is going well. How-
ever, final decision and judgment are still up to the teacher.
272 M.A. Gerosa et al.
Finally, discourse structure and message categorization also help to organize the
recording of the dialogue, facilitating its subsequent recovery. Based upon the tree
form, with the help of the categories, it is possible to obtain visual information about
the structure of the discussion [6]. Teachers using collaborative learning environ-
ments to carry out their activities should take these factors into account for the better
coordination of educational forums.
References
1. Conklin, J. (1988) “Hypertext: an introduction and Survey”, Computer Supported Coopera-
tive Work: A Book of Readings, pp. 423-476
2. Fuks, H., Gerosa, M.A. & Lucena, C.J.P. (2002), “The Development and Application of
Distance Learning on the Internet”, Open Learning Journal, V.17, N.1, pp. 23-38.
3. Gerosa, M.A., Fuks, H. & Lucena, C.J.P. (2001), “Use of categorization and structuring of
messages in order to organize the discussion and reduce information overload in asynchro-
nous textual communication tools”, CRIWG 2001, Germany, pp 136-141.
4. Goldratt, E.M. (1997) “Critical Chain”, The North River Press Publishing Corporation,
Great Barrington.
5. Harasim, L., Hiltz, S. R., Teles, L., & Turoff, M. (1997) “Learning networks: A field guide
to teaching and online learning”, 3rd ed., MIT Press, 1997.
6. Kirschner, P.A., Shum, S.J.B. & Carr, C.S. (eds), Visualizing Argumentation: Software
Tools for Collaborative and Educational Sense-Making, Springer, 2003.
7. Pimentel, M. G., Sampaio, F. F. (2002) “Comunicografia”, Revista Brasileira de Infor-
mática na Educação - SBC, v. 10, n. 1. Porto Alegre, Brasil.
8. Raposo, A.B. & Fuks, H. (2002) “Defining Task Interdependencies and Coordination
Mechanisms for Collaborative Systems”, Cooperative Systems Design, IOS Press, 88-103.
9. Stahl, G. (2001) “WebGuide: Guiding collaborative learning on the Web with perspec-
tives”, Journal of Interactive Media in Education, 2001.
Intellectual Reputation to Find an Appropriate Person
for a Role in Creation and Inheritance of Organizational
Intellect
1 Introduction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 273–284, 2004.
© Springer-Verlag Berlin Heidelberg 2004
274 Y. Hayashi and M. Ikeda
finding an appropriate person for a given role in organizational intellect creation and
inheritance.
There is growing concern with IT support for community members to share the
context of collaborative activity and to manage it successfully. Ogata et al. defined
awareness of one’s own or another’s knowledge as “Knowledge awareness” and
developed Sherlock II, which supports group formation for collaborative learning
based on learners’ initiatives with knowledge awareness [11]. The ScholOnt project
by Buckingham et al. aims at supporting an academic community [1]. They clarify
norms of academic exchange and have been developing an information system to
raise mutual awareness of roles in academic activity. Such awareness information
indicates others’ behaviors to document in document sharing or claim-making.
This research is intended to provide more valuable awareness information based on
an interpretation of the user’s behavior in terms of a model of organizational intellect
for creation and inheritance This paper proposes “Intellectual Reputation, (IR),”
which is a recommendation to find an appropriate person for a given role in the
creation and inheritance of organizational intellect.
This paper is organized as the following. Section 2 introduces the framework of
organizational memory as the basis of considering IR. Section 3 describes the concept
of IR. Section 4 presents a definition and mechanism of IR with an example. Section
5 summarizes this paper.
2 Organizational Memory
2.1 Models for Observing and Helping the Formative Process of Organizational
Intellect
We must consider models that satisfy the following requirements to achieve the goal
stated above.
1. Models must set a desirable process for each organization from abstract to concrete
activities. The desirable process can be a guideline for members to ascertain the
way they should behave in the organization and can form the basis of design for
information systems that are aware of the process.
2. Models must establish the basis for each organization member to understand
intellect in the organization. The basis arranges mutual understanding among the
members and support system in addition to the members.
3. Models must memorize intellect in terms not only of meaning, but also in terms of
the formative process. The formative process is important information to
understand, manage, and use intellects appropriately in the organization.
4. Models must provide information for organization members to be aware of
organizational intellect. That helps members to make decisions about planning
activities.
This study has proposed models addressing the first three points so far.
“ Dual loop model (DLM)” and “ Organizational intellect ontology (OIO)” address
the first and second points, respectively [4]. Simply put, DLM describes an ideal
process of creation and inheritance of an organizational intellect from both viewpoints
of the ‘individual’ as the substantial actor in an organization, and the ‘organization’ as
the aggregation of individuals. This model is based on the SECI model and is well
276 Y. Hayashi and M. Ikeda
We should describe our architecture for the organizational memory before proceeding
to a discussion of IR. The generation mechanism of an Intellectual Genealogy Graph
(IGG) is also important for Intellectual reputation. Figure 2 shows the architecture of
an information system that facilitates organizational memory. The architecture is
presumed to consist of a server and clients. The server manages information about
organizational intellect. The organization members get information needed for their
activity from the server through user interfaces. As an embodiment of this
architecture, we have developed a support environment for the creation and
inheritance of organizational intellect: Kfarm. For details of that support environment,
please see references [4, 9, 10]. Herein, we specifically address the generation and use
of the IGG and IR.
The middle of Fig. 2 presents the IGG. That graph has the following three levels:
Personal level (PL) describes interpersonal activity and the status of intellect
concerned with it.
Interaction level (IL) describes interaction among members and their roles in that
interaction using PL description.
Organizational level (OL) describes the activity and status of intellect in terms of
organization using PL and IL description.
The input for generation of IGG is a time-series of vehicle level activities tracked
through user interfaces. That time-series is a series of actions: e.g., drawing up a
document, having a discussion with it and then revising it. Vehicles used in the
activities are stored in the vehicle repository. The vehicle level data are transformed
to IGG by the reasoning engine, which is indicated on the right of Fig. 2. The
reasoning engine has three types of rule bases corresponding to the three levels of
IGG. The rule bases are developed based on DLM ontology, which is a
conceptualization of DLM. The rule base for the personal level (PLRB) is based on
Personal Loop in DLM. The rule bases for the interaction level and the organizational
level (ILRB and OLRB) are based on the organizational loop in DLM. Each model
level is generated by applying the rules to the lower level of a model or models. For
example, the organizational level model is generated from the personal level one and
interactive level one. The IGG is modeled based on these rule bases.
Intellectual Reputation to Find an Appropriate Person 277
The left of Fig. 2 represents IR as one way to use the IGG. The next chapter
contains a more complete discussion of that process along with the concept of IR.
3 Intellectual Reputation
This section presents discussion of how to meet the fourth requirement mentioned in
the previous section. The essential concept is IR, which is recommendation by
organizational memory. The IR provides supportive evidence to identify a person who
can play a suitable role to the current context.
We next introduce the “Intellectual Role”, which is a conceptualization of actors
who carry out important activities in the formative process of organizational intellect.
Two reasons exist for considering Intellectual Role. One is to form a basis for
describing each member and vehicle’s contribution to the formative process of an
organizational intellect at the abstract level. The other is to establish criteria for
estimating which person can fill a role in an activity that will be carried out in an
organization. First, this section explains IGG in terms of the former significance and
then discusses the concept of IR in terms of the latter.
In this study, Intellectual Roles each member played in the past are extracted from
records of their activities based on DLM. One characteristic is that the performance of
the formative process of organizational intellect can be viewed as having two aspects:
contents and activities. Contents imply which field of work that person has
278 Y. Hayashi and M. Ikeda
contributed to and activities imply how one has contributed to the formative process
of the intellect. Regarding content, it may be inferred that an organization has its own
conceptual system that is a basis to place each intellect in the organization. In this
study, it is called OIO. On the other hand, regarding process, the process model of
DLM can be a basis to assess a person’s competency to achieve activities in the
process. Based on these two aspects, which are content and process, the formative
processes of organizational intellect are interpreted as IGG. Each member’s
contribution to the formative process recorded in IGG indicates their Intellectual
Role.
An IGG represents chronological correlation among persons, activities, and
intellect in an organization as an interpretation of observed activities of organization
members based on DLM. Figure 3 shows an example of an IGG. It is composed of
vehicle level activities, intellect level activities and formative process of intellect.
Source data for modeling an IGG comprise a time series of vehicle-level activities
observed in the workplace, for example, vehicle handling operations in an IT tool.
The bottom of Fig. 3 shows those data. Typical observable activities are to write, edit,
and review a document. First, the IGG generator builds a vehicle-level model from
the data. Then, it abstracts intellect level activities and a formative process of intellect
from the vehicle level based on DLM.
IGG offers the following three types of interpretation for generating IR:
Interpretation of content and status of intellect: The content and status of intellect
are shown as formative process of intellect at the upper left of Fig. 3.
Organizational intellect ontology is a representation of an organization’s own
conceptual system which each organization possesses either implicitly or
did not fill the role actually. The member’s recorded activities and roles imply that the
member can fill the role. This study defines relations among activities, competency
and intellectual roles and the expected Intellectual Roles are derived from IGG based
on the relations. An example of that is shown in the right panel of Fig. 4. The reason
says that person A has not served in a reviewer role, but has served in a creator role.
The creator role is that the creator generates unique ideas and has been authorized to
use the ideas as systemic intellect in the organization. Assuming that such a capacity
that is related to creativity is necessary to serve in a reviewer role, the record of filling
the creator role can be the basis of IR derivation.
This chapter explains how IR is generated from IGG, taking the query “Who is
competent to review my proposal of ‘ontology-aware authoring tool’ from an
organizational viewpoint?” as an example.
In DLM terms, the query is interpreted “Who can review my intellect as a systemic
intellect in our organization?” The interpretation is done by the query interpreter
module. A ‘systemic’ intellect means that the organization accepts the value of the
intellect to be shared and inherited among organization members.
The context of a query is represented by the two elements shown below.
Type_of_activity represents a type of vehicle-level activity that the querist wants to
perform. Type_of_activity in the example query is ‘to review the intellect as a
systemic intellect.’
Intellectual Reputation to Find an Appropriate Person 281
4.1 Intellects
4.2 Roles
Roles that one plays in the formative process of an intellect are extracted from IGG as
mentioned in the previous section. Table 1 shows some typical roles.
For example, the typical intellect-level activities of a person P who plays a role
originator(I,P) are pa_construct and pa_publish, which mean that the
person creates a personal intellect and publishes it to others.
4.3 Results
descendant intellects from an intellect are identified along the formative process.
Table 2 shows categories to show the importance of an ancestral intellect based on the
growth level of descended intellects. The levels of intellects correspond to the statuses
of intellect in Nonaka’s SECI model.
contents by double-clicking the vehicle icon to review the intellect represented in the
vehicle.
5 Conclusion
This paper presented discussion of the role and importance of intellectual reputation
as awareness information. Organization members should understand individuals’ roles
in the formative process of organizational intellect to create and inherit organizational
intellect. Intellectual reputation is helpful information to find an appropriate person
for a right role in the creation and inheritance of organizational intellect.
This study will expand the IR concept into vehicles in the future. For example, it is
useful to know how a learning content contributes to which process or scene in the
creation or inheritance of organizational intellect. Grasping situations to which each
learning content contributes will allow management of learning contents, which more
effectively correspond to the organizational intellect formation process.
References
1. Buckingham Shum, S., Motta, E., Domingue, J.: ScholOnto: An Ontology-Based Digital
Library Server for Research Documents and Discourse”, Int. J. Digit. Libr., 3 (3) (2000)
237–248
2. Carter J., Bitting E., Ghorbani, A. A.: Reputation formalization for an information sharing
multiagent system, Comp. Intell., Vol. 18, No. 4 (2002) 515–534
3. Goffman, Erving. The Presentation of Self in Everyday Life. Doubleday: Garden City,
New York, (1959)
4. Hayashi, Y., Tsumoto, H., Ikeda, M., Mizoguchi, R.: “Toward an Ontology-aware Support
for Learning-Oriented Knowledge Management”, Proc. of the 9th Int. Conf. on Comp. in
Educ. (ICCE’2001), (2001) 1149–1152
284 Y. Hayashi and M. Ikeda
5. Hayashi, Y., Tsumoto, H., Ikeda, M., Mizoguchi, R.: “An Intellectual Genealogy Graph -
Affording a Fine Prospect of Organizational Learning-”, Proc. of the 6th International
Conference on Intelligent Tutoring Systems (ITS 2002), (2002) 10–20
6. Hood, L., McDermott, R.P., Cole, M.: Let’s try to make it a good day: Some not so simple
ways, Disc. Proc., 3 (1980) 155–168
7. Nonaka, I., Takeuchi, H.: The Knowledge-Creating company: How Japanese Companies
Create the Dynamics of Innovation, Oxford University Press, (1995)
8. Mizoguchi R., Bourdeau J.: Using Ontological Engineering to Overcome AI-ED Problems,
Int. J. of Art. Intell. in Educ., Vol.11, No.2 (2000) 107–121
9. Takeuchi, M., Odawara, R., Hayashi, Y., Ikeda, M., Mizoguchi, R.: A Collaborative
Learning Design Environment to Harmonize Sense of Participation, Proc. of the 10th Int.
Conf. on Comp. in Education ICCE’03 (2003) 462–465
10. Tsumoto, H., Hayashi, Y., Ikeda, M., Mizoguchi, R.: “A Collaborative-learning Support
Function to Harness Organizational Intellectual Synergy” Proc. of the 10th Int. Conf. on
Comp. in Education ICCE’02 (2002) 297–301
11. Ogata H., Matsuura K., Yano Y.: “Active Knowledge Awareness Map: Visualizing
Learners’ Activities in a web Based CSCL Environment”, Proc. of NTCL2000 (2000) 89–
97
Learners’ Roles and Predictable Educational Benefits
in Collaborative Learning
An Ontological Approach to Support Design and Analysis of CSCL
1 Introduction
In the last decade, many researchers have contributed to development of the research
area “Computer Supported Collaborative Learning” (CSCL) [e.g., 3, 8-15, 19, 24, 26],
and advantages of collaborative learning over individual learning have been well
known. The collaborative learning, however, is not always effective for every learner
in a learning group. Educators sometimes argue that it is essential for collaborative
learning and its advantage that learners take turns to play some roles; for example,
tutor, tutee, helper, assistant, and so on. Of course, in collaborative learning, the
learners not only learn passively, but also interact with others actively, and they share
their knowledge and develop their skills through it. Educational benefits that a learner
gets through the collaborative learning process depend mainly on interaction among
learners, that is, the educational benefits depend on what roles the learner plays in the
collaborative learning. Moreover, the relationship between a role in a group and a
learner’s knowledge and/or cognitive states when the learner begins to play the role is
critical. If the learner performs a role which is not appropriate for his/her knowledge
and/or cognitive state, his/her efforts would be in vain. So, designers and educators
should consider carefully the relationship among learners’ states, experiences, and
conditions for role assignment; and the synergistic and/or harmful effect of a
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 285–294, 2004.
© Springer-Verlag Berlin Heidelberg 2004
286 A. Inaba and R. Mizoguchi
combination of more than one role; when they form learning groups and design
learning processes. To realize this, we need to organize models and rules for role
assignments the designers and educators can refer to, and construct a system of
concepts to facilitate shared understanding of them.
Our research objectives include constructing a collaborative learning support
system that detects appropriate situation for a learner to join in a collaborative
learning session, forms a collaborative learning group appropriate for the situation,
and monitors and supports the learning processes dynamically. To fulfill these
objectives, we have to consider the following:
1. How to detect appropriate situations to start collaborative learning sessions and
to set up learning goals for the group and members of the group,
2. How to form an effective group which ensures educational benefits to each
members of the group, and
3. How to analyze interaction among learners and facilitate desired interaction in
the learning group.
We have discussed item 1 in our previous papers [8, 9], and have been constructing a
support system for analyzing interaction for item 3 [13, 14]. We also have been
discussing item 2, especially, concentrated on extracting educational benefits
expected to acquire through collaborative learning (i.e., learning goals), and
constructing a system to support group formation represented as a combination of the
goals [11, 26].
This paper focuses on learners’ behavior, roles, conditions to assign appropriate
roles for learners, and predictable educational benefits of the roles referring to
learning theories, as a remaining part of the item 2. First, we overview our previous
work, that is, the system of concepts to represent collaborative learning session: we
call it “Collaborative Learning Ontology”, especially we describe “Learning Goal
Ontology” which is a part of the Collaborative Learning Ontology. Next, we pick up
learners’ behavior and roles from learning theories. Then, we discuss conditions of
role assignments and predictable benefits by playing the roles.
There are many theories to support the advantage of collaborative learning. For
instance, Observational learning [2], Constructivism [20], Self-regulated learning
[21], Situated learning [16], Cognitive apprenticeship [5], Distributed cognition [23],
Cognitive flexibility theory [25], Sociocultural Theory [28], Zone of proximal
development [27, 28], and so on. If learners learn in compliance with strategies based
on the theories, we can expect some educational benefits for the learners with the
strong support of the theory. So, we have been constructing models referring to these
theories. However, there is a lack of common vocabulary to describe the models.
Therefore, we have been constructing the “Collaborative Learning Ontology” which
is a system of concepts to represent collaborative learning sessions proposed by these
learning theories [10, 11, 26]. Here, we focus on the “Learning Goal Ontology”. The
concept “Learning Goal” is one of the most important concepts for forming a learning
group because each learner joins in a collaborative learning session in order to attain a
Learners’ Roles and Predictable Educational Benefits in Collaborative Learning 287
Each W(A)-goal provides the rationale justified by specific learning theory. That
is, the W(A)-goal specifies a rational arrangement of learning goals and a group
formation. Fig.2 shows a typical representation for the structure of a W(A)-goal. The
W(A)-goal consists of five concepts: Common goal, Primary Focus, Secondary
Focus, S<=P-goal, and P<=S-goal. The Common Goal is a goal of the whole group,
and the entity of the Common goal refers to the concepts defined as W(L)-goal
ontology. Both Primary Focus and Secondary Focus are learners’ roles in a learning
group. A learning theory generally argues the process that learners, who play a
specific role, can obtain educational benefits through interaction with other learners
who play other roles. The theories have common characteristics to argue effectiveness
of a learning process focusing on a specific role of learners. So, we represent the
focus in the theories as Primary Focus and Secondary Focus. S<=P-goal and P<=S-
goal are interaction goals between Primary focused learner (P) and Secondary focused
learner (S) from P’s viewpoint and S’s viewpoint, respectively. The entities of these
goals refer to the concepts defined as Y<=I-goals. The conditions, which are proper to
each W(A)-goal, can be added to the concepts, if necessary. Each of the Y<=I-goals
referred to by S<=P-goal and P<=S-goal consists of three concepts as follows:
I-role: a role to attain the Y<=I-goal. A member who plays I-role (I-member) is
expected to attain his/her I-goal by attaining the Y<=I-goal.
You-role: a role as a partner for the I-member.
I-goal (I): an I-goal that means what the I-member attains.
We have described detailed discussion of the goals in our previous papers [10, 11,
26]. In the remains of this paper, we concentrate on identifying behavior and roles,
clarifying conditions to assign a role for a learner, and connecting the roles with
predictable educational benefits.
Table 1 shows learners’ behavior and roles in collaborative learning sessions inspired
by the learning theories. There are nine types of behavior and thirteen types of roles.
Learners’ Roles and Predictable Educational Benefits in Collaborative Learning 289
To design effective learning processes and form appropriate groups for learners, it is
important to assign an appropriate role to each learner. As we described, educational
benefits depend on how learners interact with each other: what roles they play in
collaborative learning. For example, teaching something to other learners is effective
for the learner, who already knows it but does not have experience in using the
knowledge. Since the learner has to explain it in his/her words in order to teach it to
others, he/she is expected to comprehend it more clearly. On the other hand, the same
role is not effective for the learner who already understands it well, uses it many
times, and teaches it to other learners again and again. In such a case, it is effective
not for the learner who teaches it, but only for learners who are taught it. So, to clarify
the conditions for role assignments is necessary to support design processes of
learning sessions.
Learners’ Roles and Predictable Educational Benefits in Collaborative Learning 291
292 A. Inaba and R. Mizoguchi
Table 2 shows the roles which appear in collaborative learning sessions inspired
by the learning theories we have referred to, conditions for each role, and predictable
educational benefits by playing each role. This prediction is based on the theories.
There are two types of conditions: necessary conditions and desired conditions. The
necessary conditions are essential for the role: if a learner does not satisfy the
conditions, the learner cannot play the role. On the other hand, the desired conditions
should be satisfied to enable a learner to get full benefits of the role: if a learner does
not satisfy the conditions, the learner can play the role, but educational benefits may
not be ensured. In Table 2, the conditions marked with are the necessary
conditions, and the conditions marked with ‘-’ are the desired conditions. For
example, any learner can play the role ‘Peer tutor’ as long as the learner has target
knowledge to teach other learners. If the learner misunderstood the knowledge and/or
he/she did not have experience in using the knowledge, it is a good opportunity for
the learner to play the role ‘Peer tutor’, because to externalize his/her knowledge in
his/her words facilitates re-thinking of the knowledge, and gives an opportunity to
notice the misunderstanding [6].
By clarifying the conditions to assign learners some roles like this, it would be
possible for designers who are not experts of learning theories and even if computer
systems to assign appropriate roles for each learner, to form groups for effective
collaborative learning, and to predict educational benefits that each learner will get
through the learning session in compliance with learning theories. It will be useful not
only to support design processes for collaborative learning sessions, but also to
analyze processes for them.
5 Conclusion
of more than one role. Moreover, we plan to extract heuristics to assign roles for
learners. For example, according to the theory ‘Peer tutoring’, a learner who has a
misunderstanding is appropriate for the role ‘Peer tutor’. However, there is a risk: if a
learner who plays ‘Peer tutee’ does not know the knowledge, the learner would
believe what the peer tutor teaches and the peer tutee would also have the
misunderstanding. It is caused by characteristics of the theory: the theory ‘Peer
tutoring’, primary focus is ‘Peer tutor’ and his/her benefits, and the theory gives little
attention to benefits of ‘Peer tutee’. We will also describe the risks like this with the
theory-based conditions for role assignments. Then, we will consider order of
recommendations of roles, and implement the mechanism how to recommend the
roles in a collaborative learning support system [14], and supporting environment for
instructional design process for CSCL [12]. At this stage, we have been collecting
supportive theories for collaborative learning, that is, all theories we referred to
describe positive effects of collaborative learning, because we would like to collect
effective models of collaborative learning as reference models to design collaborative
learning. Of course, collaborative learning also has negative effect, and the negative
models are useful to avoid designing such learning sessions. It will also be included in
our future work.
References
1. Anderson, J. R. Acquisition of Cognitive Skill, Psychological Review, 89(4), 369-406
(1982)
2. Bandura, A. “Social Learning Theory”, New York: General Learning Press (1971)
3. Barros, B., & Verdejo, M.F. Analysing student interaction processes in order to improve
collaboration. The DEGREE approach, IJAIED, 11 (2000)
4. Cognition and Technology Group at Vanderbilt. Anchored instruction in science
education, In: R. Duschl & R. Hamilton (Eds.), “Philosophy of science, cognitive
psychology, and educational theory and practice.” Albany, NY: SUNY Press. 244-273
(1992)
5. Collins, A. Cognitive apprenticeship and instructional technology, In: Idol, L., & Jones, B.
F. (Eds.) “Educational values and cognitive instruction: Implications for reform.”
Hillsdale, N.J.: LEA (1991)
6. Endlsey, W. R. “Peer tutorial instruction”, Englewood Cliffs, NJ: Educational Technology
(1980)
7. Fitts, P. M. Perceptual-Motor Skill Learning, In: Melton, A. W. (Ed.), “Categories of
Human Learning”, New York: Academic Press. 243-285 (1964)
8. Ikeda, M., Hoppe, U., & Mizoguchi, R. Ontological issue of CSCL Systems Design, Proc.
of AIED95, 234-249 (1995)
9. Ikeda, M., Go, S., & Mizoguchi, R. Opportunistic Group Formation, Proc. of AIED97,
166-174(1997)
10. Inaba, A, Ikeda, M., Mizoguchi, R., & Toyoda, J.
http://www.ei.sanken.osaka-u.ac.jp/~ina/LGOntology/ (2000)
11. Inaba, A, Supnithi, T., Ikeda, M., Mizoguchi, R., & Toyoda, J. How Can We Form
Effective Collaborative Learning Groups? -Theoretical justification of “Opportunistic
Group Formation” with ontological engineering, Proc. of ITS2000, 282-291 (2000)
294 A. Inaba and R. Mizoguchi
12. Inaba, A., Ohkubo, R., Ikeda, M., Mizoguchi, R., & Toyoda, J. An Instructional Design
Support Environment for CSCL - Fundamental Concepts and Design Patterns, Proc. of
AIED-2001, 130-141 (2001)
13. Inaba, A., Ohkubo, R., Ikeda, M., & Mizoguchi, R. Models and Vocabulary to Represent
Learner-to-Learner Interaction Process in Collaborative Learning, Proc. of ICCE2003,
1088-1096 (2003)
14. Inaba, A., Ohkubo, R., Ikeda, M., & Mizoguchi, R. An Interaction Analysis Support
System for CSCL - An Ontological Approach to Support Instructional Design Process,
Proc. of ICCE2002 (2002)
15. Katz, A., O’Donnell, G., & Kay, H. An Approach to Analyzing the Role and Structure of
Reflective Dialogue, IJAIED, 11 (2000)
16. Lave, J. & Wenger, E. “Situated Learning: Legitimate peripheral participation”,
Cambridge University Press (1991)
17. Mizoguchi, R., & Bourdeau, J. Using Ontological Engineering to Overcome Common AI-
ED Problems, IJAIED, 11 (2000)
18. Mizoguchi, R., Ikeda, M., & Sinitsa, K. Roles of Shared Ontology in AI-ED Research,
Proc. of AIED97, 537-544 (1997)
19. Muhlenbrock, M., & Hoppe, U. Computer Supported Interaction Analysis of Group
Problem Solving, Proc. of CSCL99, 398-405 (1999)
20. Piaget, J., & Inhelder, B. “The Psychology of the Child”, New York: Basic Books (1971)
21. Resnick, M. Distributed Constructionism, Proc. of the International Conference on the
Learning Science (1996)
22. Rumelhart, D.E., & Norman, D.A. Accretion, Tuning, and Restructuring: Modes of
Learning, In: Cotton, J.W., & Klatzky, R.L. (Eds.) “Semantic factors in cognition.”
Hillsdale, N.J.: LEA, 37-53 (1978)
23. Salomon, G. “Distributed cognitions”, Cambridge University Press (1993)
24. Soller, A. Supporting Social Interaction in an Intelligent Collaborative Learning System,
IJAIED, 12 (2001)
25. Spiro, R. J., Coulson, R., L., Feltovich, P. J., & Anderson, D. K. Cognitive flexibility:
Advanced knowledge acquisition ill-structured domains, Proc. of the Tenth Annual
Conference of Cognitive Science Society, Hillsdale, NJ, LEA, 375-383 (1988)
26. Supnithi, T., Inaba, A., Ikeda, M., Toyoda, J., & Mizoguchi, R. Learning Goal Ontology
Supported by Learning Theories for Opportunistic Group Formation, Proc. of AIED99
(1999)
27. Vygotsky, L.S. The problem of the cultural development of the child, Journal of Genetic
Psychology, 36, 414-434 (1929)
28. Vygotsky,L.S. “Mind in Society: The development of the higher psychological processes”,
Cambridge, MA: Harvard University Press (1930, Re-published 1978)
Redefining the Turn-Taking Notion in Mediated
Communication of Virtual Learning Communities
1 Introduction
This work takes place in a research project that aims to contribute to a better under-
standing of interactions within virtual learning communities, in the context of com-
puter-based tools. We design, implement and test innovative tools to provide data that
will enable us to progress in our conceptualization of our research domain and phe-
nomena.
Network technologies have enabled web-learning activities based on the emer-
gence of virtual learning communities (VLC). In the VLC the collaborative learning
activities are realized mainly through a conversational asynchronous environment,
which we call forum-type tools. The expression forum-type tools (FTT) designate a
mainly text-based and asynchronous, electronic conferencing system that makes use
of a tree hierarchical data structure of enchained messages called threads. These FTT
are tools widely used for communication and learning throughout Internet and e-
learning platforms. These tools have opened the possibility of creating virtual learning
communities in which students discuss a great variety of subjects, and in different
depth levels, through a threaded conversation structure.
This paper proposes a change of the turn-taking notion in distributed and collabo-
rative learning environments that use FTT. Here, we take charge of a set of issues de-
scribed in the literature that make reference to turn-taking difficulties in virtual envi-
ronments. Concretely, we propose a redefinition of the turn-taking concept in
threaded conversations that take place in VLC. The new turn-taking concept in the
FTT context is characterized as a new temporal structure that we call session. The
session structure is a mechanism for the turn-taking management of threaded conver-
sations.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 295–304, 2004.
© Springer-Verlag Berlin Heidelberg 2004
296 P. Reyes and P. Tchounikine
The new turn-taking notion stems from a quantitative study in temporary aspects of
the behavior of the users using FTT in their VLC, paying particular attention to the
work practices of these users. This choice is inspired by ethnographic and Scandina-
vian participatory design approaches [1]. This method emphasizes the informatics’
systems construction, which represents the actual work practice in the communities to
whom this system is directed.
We suggest that this change can be an element that enhances and facilitates the
emergence and development of learning interactions that take place in FTT as written
learning conversations. The learning conversations are the ones that “go further than
just realizing information exchange; rather, they allow participants to make connec-
tions between previously unrelated ideas, or to see old ideas in a new way. They are
conversations that lead to conceptual change” [2].
Our approach has a clearly differing view than the research on turn-taking issues
which focus, as a main factor, on the consequences of delay in the communication
tools as forums and chat’s (e.g., [3,4]).
This paper is organized as follows: First comes an overview of turn-taking in vir-
tual environments. We next describe a quantitative study of temporal behavior of par-
ticipants of a VLC. Then, we present the session structure and the implementation of
a prototype that reifies these ideas. Finally, we present some preliminary results from
an empirical study.
Turn-taking in spoken conversations deals with the alternating turns for communi-
cating between participants. This process takes place in an orderly fashion: in each
turn each participant speaks and then the other responds and so forth. In this way, the
conversations are oriented in a series of successive steps or “turns” and the turn-
taking’s become the basic mechanism of conversation organization (e.g., [5]).
Nevertheless, the application of the same turn-taking concept in the CMC context
(written conversations, principally) is not correct: “The turn-taking model does not
adequately account for this mode of interaction” [6]. Actually, the nature of the com-
munication medium changes the nature of the turn-taking concept. Consequently, the
turn-taking system in CMC tools is substantially different to the face-to-face interac-
tions (e.g., [7,6] or [8]).
The communication that takes place in these tools follows a multidimensional se-
quential pattern (e.g., a conversation with parallel threads), rather than a linear se-
quential pattern, with complex interactions that result “inlayered topics, multiple
speech acts and interleaved turns” [6]. In synchronous communication tools, e.g.
chats, the turn-taking has a confused meaning: generally the dyadic exchanges are in-
terleaved with others dyadic exchanges [3]. In this way, and in these tools, the mes-
sage exchanges are highly overlapped. The same overlap problem is distinguished in
asynchronous tools [3] (generally FTT). In asynchronous communications (news-
group type communication through FTT) everybody holds the floor at any time what
breaks the traditional concept of turn-taking. In this way, all participants can produce
messages independently and simultaneously. We found that the development of face-
to-face conversations is basically a linear process of alterning turns. But in the FTT
this linearity is destroyed by the generation of multiples threads in a parallel way. In
Redefining the Turn-Taking Notion in Mediated Communication 297
this way, the on-line conversation grows in a dispersed way with a blurred notion of
turn-taking. This situation generates deterrents that have direct consequences on col-
laborative learning and the learning conversations on which it is based: a dispersed
conversation prevents the participants building their own “adequate mental represen-
tation of the virtual space and time within which they interact” [9]. So students have a
weaker perception of the global discussion, since “in order to communicate and learn
collaboratively in such an environment, participants need to engage in what they can
perceive as a normal discussion” [9], what is not obvious to obtain in the actual FTT.
The importance of turn-taking, and the turn management for the learning conver-
sations is stated by several authors: The turn management in distance groups can in-
fluence the performance of groups [10]; In a collaborative problem solving activity
the students build sequentially the solutions through the alterning turns [11]; The turn-
taking represents the rhythm of a conversation; Expliciting the rhythm of communi-
cation patterns can help improve the coordination of the interactions [12]; The turn-
taking is essential for a good understanding of conversations [5].
In order to understand some temporal work practice in the VLC, we studied a collec-
tion of Usenet newsgroups. The research has shown that some newsgroups can be
considered a community [13].
The objective of this empirical study is to analyze participant’s actions, in order to
find recurrent sequences of work practices (work patterns) [14]. The work practices
are the starting point for the technical solutions that facilitate the work patterns found.
A quantitative approach to data collection has been pursued for finding the tempo-
ral work practices. We analyze the temporal behavior of participants. Particularly,
their participation on the threaded conversations, and how the way of participating
denotes a specific time management pattern of users of FTT.
For this research, we selected data from a collection of threaded conversations be-
longing to some open access newsgroups. The process of selection is realized in two
steps: First, the detection of newsgroups with the characteristics of interactivity of a
VLC. Particularly, we take in account the length of threads and the activity of groups.
Next, the detection and selection of threaded conversations in these newsgroups
where there are deepened exchanges. The newsgroups that have the role of a dynamic
FAQ (a question and one or two answers) are far from the notion of learning commu-
nity that we sustain, where exists a complex and rich exchange of interactions in rela-
tion to topics of interest of a particular community.
298 P. Reyes and P. Tchounikine
We selected a number of eight newsgroups1 that are particularly active and have a
number of threaded conversations that largely exceed the average length of the
threads. This analysis covers roughly 50.000 messages through a 4-month time span
in these 8 newsgroups. The mean volume per month of sent messages (1628 mes-
sages) of the selected newsgroups is very high compared with the monthly mean of
the newsgroups, such as has been noted by Butler (50 messages) [15].
In relation to the thread’s length, the thread’s length mean in the newsgroups is
2.036 messages [15]. The selected newsgroups in this work exceed largely these mean
(13.25). Quantitatively, these newsgroups have an active and in-depth communication
among their members. This fact indicates how engaged people are with the commu-
nity. This selection ensures we are looking at conversations in communities that have
a very high amount of on-topic discussion. The thread’s length reveals a more in-
depth commentary relating to a specific topic. Also, the length of threads is recog-
nized as a measurement of the interactivity [16].
The second stage of selection corresponds to the selection of threads from a minimal
threshold length in the selected newsgroups in order to look at the work patterns in
threads with a high complexity and interactivity. This selection better focuses this
quantitative study: a discussion with only three or four messages does not enable us to
discover these work patterns. Thus, we consider that this process of detection and fil-
ter does not decrease the validity of our results, but it focuses the research in the field
of our interest. Our interest is providing new tools for better managing discussions
which have complex interactions (highly intertwined conversations) in VLCs.
We set the minimal limit of the length of threads to 10 messages. In this way we
analyse only the threaded conversations with a 10 or more messages. With this ap-
proach, we leave away only 20% of the messages (that is, the messages that belong to
threads having less than 10 messages).
The results of the quantitative study are focused on the temporal behavior of the users
in the delivery of their messages. First, we pay attention to an interesting work pat-
tern, which is repeated throughout the threads analyzed in our study: the participants
answer messages in a great percentage in a buffer or digest way (they send several
messages during a short time). The analyses show that a fraction of messages (25%)
in the selected newsgroup is sent in a consecutive way by a specific participant an-
swering different branches on different threads.
In a deeper analysis of the consecutive messages, we found that the mean of the pe-
riod between these messages is 14 minutes. This period confirms the notion of these
1
comp.ai.philosophy, humanities.lit.authors.shakespeare, humanities.philosophy.objectivism,
sci.anthropology.paleo, soc.culture.french, talk.origins, talk.philosophy.humanism,
talk.politics.guns.
Redefining the Turn-Taking Notion in Mediated Communication 299
Fig. 1. Messages in a temporal order and thread order view in actual FTTs
We propose the creation of a structure called session. This structure intends to model
the turn-taking behavior and make visible the particular rhythm of answers that is an
existing work practice. A session is “a period of time (...) for carrying out a particular
activity” [17]. In our context, a session corresponds to a group of messages sent con-
secutively in a short duration of time by the same participant. That is, a new structure
that sticks together the messages sent at almost the same time.
300 P. Reyes and P. Tchounikine
The introduction of the session structure in FTT changes the concept of turn-taking
in the threaded conversations given that communication turns are now visually real-
ized from sessions (packaged messages) and not from the individual messages.
We implement this structure in a FTT as columns (Figure 2) that package the mes-
sages in a parallel and linear view.
We state that the session structure represents a turn-taking element in CMC environ-
ments. We propose that the turn-taking in threaded conversations is not the message,
but that the whole of messages that each participant answers in each branch of con-
versation is the equivalent in the threaded conversations to face-to-face turn taking. In
the FTT we must interpret the turn-taking in another level of granularity, and no
longer think that each intervention in a thread is a turn in the conversation, but the
group of messages (each one in different threads) sent in an intervention is now the
turn.
With the use of this session element, the users obtain a clearer notion of the turn-
taking that takes place in threaded conversations, the non-linearity of threaded con-
versations disappears and the turn-taking in CMC becomes the alterning turns of ses-
sions. This session structure tries to clarify the turn-taking in threaded conversations
for better coordination, cooperation and collaboration given that users can situate and
contextualize better the interventions of participants. Also, this structure allows man-
agement and parallel visualization of different threads in a multithread conversation,
which are normally visualized in a dispersed way (e.g., figure 1): The session struc-
ture encompasses the messages that are normally disperse in different threads.
In this way, this structure overcomes an observed dissociation in threaded conver-
sations between the temporal order of messages and the thread order of messages
[19,3]. This dissociation has important consequences on participants, which often fail
to follow the development of a multithread debate [19].
An empirical study was designed in order to collect feedback on the actual character-
istics of our prototype from the user’s perspective. In this study, 15 participants were
recruited. The participants were teachers, who carried out, during one month and a
half, a distance collaborative activity as part of a training course on information and
Redefining the Turn-Taking Notion in Mediated Communication 303
communication technologies (ICT). During this study, the tool was used just as a me-
dium of communication and discussion, not as a point of concern in itself. The par-
ticipants’ activity objective was, actually, to carry out a collaborative analysis of inte-
gration and utilization of ICT in the educational milieu.
The discussion contributions were studied in order to examine the impact of the
new thread visualization in the threaded conversations of the participants in a learning
context. More, students’ impressions of the use and value of the proposed tool and its
potentials for learning enhancement were collected with a questionnaire.
The experience has showed that the introduction of the session construct in VLC
does not generate a significant change in the temporal behavior of participants (be-
tween 20% and 30% of the messages are still consecutive messages). Nevertheless,
we have eliminated these consecutive messages as serials events converting them into
parallel events through the construction of sessions that contain two or more mes-
sages.
The questionnaire findings confirm the benefits of using this visualization for the
threaded conversations. The participants consider in a high number (75%) that the
proposed visualization and organization of messages allows them to better follows the
development of conversations.
Another remarkable result is that the participants consider in a high proportion
(75%) that Mailgroup permits an effective visualization of participant’s exchanges.
This fact improves the navigation through the contributions, which “is a key usability
issue for online communities; particularly communities of practice which involve a
large amount of information exchange” [20].
6 Conclusions
This paper attempts to connect the turn-talking issues found in the literature and em-
pirical findings with some practical propositions.
We propose in this paper the introduction of a new element in the forum-type tools
that make certain work practices of participants in a VLC explicit. The forum-type
tools have become, together with email, the basic tools for realizing collaborative
learning activities.
This work is framed by our objective of creating new artifacts that make collabo-
rative learning environments more flexible and give new possibilities of communica-
tion, coordination and action. The introduction of a session structure redefines the
concept of turn-taking in threaded conversations. With the session construct these
conversations become the alterning turns as the face-to-face conversations.
The introduction of the session construct allows make salient a temporal behavior
of participants of a VLC that actually the technology hides. We conjecture that ex-
pliciting these behaviors can be an element to improve the management of threads
learning conversations. This way, this structure gives a more coherent visualization of
turn-taking. More, The session can be conceptualized as a type of representational
guidance [21] element for the students or participants. That is, a representation (as
graphs or pictures) that helps to promote collaborative interactions of users through a
direct perception of participant’s turns.
The origins of the FFT were mainly the distribution of news, and were not an envi-
ronment for interactive communication. In this context, we try to fix up these tools to
304 P. Reyes and P. Tchounikine
obtain the better and perfectible environments for group communications. And this,
based on the tenet that making the structures of interactions more coherent with our
communication patterns contributes to facilitate the communication in the VLC.
References
1. Bødker, K., F. Kensing, and J. Simonsen, Changing Work Practices in Design, in Social
Thinking - Software Practice, Y.e.a. Dittrich, Editor. 2002, MIT Press.
2. Bellamy, R., Support for Learning Conversations. 1997.
3. Herring, S. Interactional Coherence in CMC. in 32nd Hawai’i International Conference on
System Sciences. 1999. Hawai: IEEE Computer Society Press.
4. Pankoke-Babatz, U. and U. Petersen. Dealing with phenomena of time in the age of the
Internet. in IFIP WWC 2000. 2000.
5. Sacks, H., E. Schegloff, and G. Jefferson, A simplest systematics for the organisation of
turn-taking in conversation. Language, 1974. 50.
6. Murray, D.E., When the medium determines turns: Turn-taking in computer conversation,
in Working with language, H. Coleman, Editor. 1989, Mouton de Gruyter: Berlín - Nueva
York. p. 319-337.
7. McElhearn, K., Writing conversation : an analysis of speech events en e-mail mailing lists.
Revue Française De Linguistique Appliquée, 2000. 5(1).
8. Warschauer, M., Computer-mediated collaborative learning: Theory and practice. Modern
Language Journal, 1997. 81(3): p. 470-481.
9. Pincas, A. E-leaming by virtual replication of classroom methodology. in The Humanities
and Arts higher education Network, HAN. 2001.
10. McKinlay, A. and J. Arnott. A Study of Turn-taking in a ComputerSupported Group Task.
in People and Computers, HCI’93 Conference. 1993: Cambridge University Press.
11. Teasley, S. and J. Roschelle, Constructing a joint problem space, in Computers as
cognitive tools., S. Lajoie and S. Derry, Editors. 1993, Lawrence Erlbaum: Hillsdale, NJ.
12. Begole, J., et al. Work rhythms: Analyzing visualizations of awareness histories of
distributed groups. in Proceedings of CSCW 2002. 2002: ACM Press.
13. Roberts, T.L. Are newsgroups virtual communities? in CHI’98. 1998.
14. Singer, J. and T. Lethbridge. Studying work practices to assist tool design in software
engineering. in 6th International Workshop on Program Comprehension (WPC’98). 1998.
Ischia, Italy.
15. Butler, B.S., When is a Group not a Group : An Empirical Examination of Metaphors for
Online Social Structure. 1999, Graduate School of Business, University of Pittsburgh.
16. Rafaeli, S. and F. Sudweeks, Networked interactivity. Journal of Computer-Mediated
Communication, 1997. 2(4).
17. Cambrige Dictionary, Cambridge Dictionary Online. n.d., Cambridge University.
18. Reyes, P. and P. Tchounikine. Supporting Emergence Of Threaded Learning
Conversations Through Augmenting Interactional And Sequential Coherence. in CSCL
Conference. 2003.
19. Davis, M. and A. Rouzie, Cooperation vs. Deliberation: Computer Mediated Conferencing
and the Problem of Argument in International Distance Education. International Review
of Research in Open and Distance Learning, 2002. 3(1).
20. Preece, J., Sociability and usability: Twenty years of chatting online. Behavior and
Information Technology Journal, 2001. 20(5): p. 347-356.
21. Suthers, D., Towards a Systematic Study of Representational Guidance for Collaborative
Learning Discourse. Journal of Universal Computer Science, 2001. 7(3).
Harnessing P2P Power in the Classroom
Julita Vassileva
1 Introduction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 305–314, 2004.
© Springer-Verlag Berlin Heidelberg 2004
306 J. Vassileva
Therefore, one of the class activities involves the students in the process of creat-
ing and maintaining of such a repository. Students are required to find on a weekly
basis web-links to articles related to the issues discussed during the week and post
them on their personal websites dedicated to the class. The instructor reviews these
websites and selects from them several links to post on the class website. The students
need to write a one-page summary - discussion for one of these selected articles.
The process described above is quite laborious both for the students and for the
instructor. The students need to create and maintain personal class-websites on which
to post the links they find. The instructor needs to review frequently differently or-
ganized student websites, to see which students have found links to new articles, to
read and evaluate the articles and add selected good papers to the official class web-
site where the students can pick an article to summarize. This process takes time and
usually can be done only in the end of the week, therefore the students can only write
summaries for the articles on the topic discussed during the previous week, which
makes it impossible to focus all activities of the students to the currently discussed
topic. Another disadvantage of this process is that the selected by the instructor arti-
cles posted on the class website reflect the instructor’s subjective interests in the area;
the students may prefer to summarize different articles than those selected by the
instructor.
The process of sharing class related articles, selection of articles and summariza-
tion can be supported much better by using a peer-to-peer (P2P) file-sharing technol-
ogy. Therefore we decided to deploy in the “Ethics and IT” class Comtella, a P2P
system developed at the MADMUC lab of the Computer Science Department for
sharing academic papers among researchers in a group, lab or department. The next
section introduces briefly the area of P2P file-sharing. Section 3 describes the Com-
tella system. Section 4 explains how Comtella was applied to support the Ethics and
IT class. Section 5 presents the first encouraging evaluation results.
Peer to Peer (P2P) file-sharing systems have been around for 5 years and have en-
joyed enormous popularity as free tools for downloading music (.mp3) files and
movies. They have also gained a lot of public attention due to the controversial law-
suit that the RIAA launched against Napster and the ensuing on-going public debate
about copyright protection. RIAA initially claimed that P2P technologies are used
mainly to violate copyright and argued unsuccessfully for banning them. It succeeded
in closing Napster, which used a centralized index of the files shared by all partici-
pants to facilitate the search. However, the widely publicized decision spurred a
bunch of new entirely distributed and anonymous file-sharing applications relying on
protocols such as Gnutella or FreeNet, which make it very hard to identify and prose-
cute file-sharers. Up to now, with the exceptions of P2P applications aimed at sharing
CPU cycles (e.g. SETI@home which harnesses the CPU power of the participating
peers’ PCs to compute data from telescopes to search for signs of extraterrestrial
intelligence and several projects like the Intel Philanthropic Peer-to-Peer project using
Harnessing P2P Power in the Classroom 307
P2P technology to harness computer power for medical research), instant messaging
applications like Jabber and AVAKI, and collaboration applications like Groove, the
most widely used P2P applications are used for illegal file-sharing (e.g. KaZaA,
BearShare, E-Donkey) of copyrighted music, films, or pornographic materials. Most
recently, there have been initiatives to put P2P file-sharing for better use, e.g. MS
Sharepoint or Nullsoft’s Waste which serves a small private network of friends.
We see a huge potential for P2P file-sharing to tap the individual efforts of in-
structors, teaching assistants and learners in creating and sharing learning materials.
These materials can be specially developed instructional modules or learning objects
as in EDUTELLA (Neidl et al, 2002), or in the Ternier, Duval & Neven’s (2001)
proposal for a P2P based learning object repository. However, any kind of files can
be shared in a P2P way, including PowerPoint files presenting lecture notes, web-
based notes or references, research papers (used as teaching materials in graduate
classes or during graduate student supervision). We propose a P2P system enabling
learners to bring in and share course-related materials, called Comtella. The system is
described in the next section. Section 4 presents results of an ongoing experiment
with the system in the Ethics and IT course and compares the amount of student con-
tributions using Comtella with the contributions of students taking the same class in
the previous year, using their own websites to post links to the resources found.
The Comtella system (Vassileva, 2002) was developed at the MADMUC lab at the
Computer Science Department to support graduate students in the laboratory to share
research papers found on-line. Comtella uses an extension of the Gnutella protocol,
and is fully distributed. Each user needs to download a client application (called “ser-
vent”) which allows sharing new papers with the community (typically, pdf files, but
it can be easily extended for all kinds of files) and searching for papers shared by
oneself and by the other users. The shared papers need to be annotated with respect to
their content as belonging to a subject category (an adapted subset of the ACM sub-
ject index). The user searches by specifying a category and receives a list of all own
papers and papers shared by others related to this category. From the list of results,
the user can download the desired papers and view them in a browser.
While the research papers shared by users are not necessarily their own papers, but
written by other authors, there is a copyright issue. However, these papers are typi-
cally found on the Web anyway (Comtella supports the user to seamlessly save and
share pdf files that are viewed in the browser as that are typically found on the Web
using search tools, such as Google or CiteSeer). Storing a local copy of the paper may
be considered as a violation of copyright. However, users typically store local copies
of viewed papers for personal use anyway, since they can not rely that they will find
the file if they search again later (the average lifetime of a document on the web is
approximately three months). Saving a copy for backup purpose is generally consid-
ered fair use. The sharing of papers happens on a small scale, among people inter-
ested in the same area within a research group or department, typically 5-10 people.
308 J. Vassileva
Lending a book to a friend or colleague is normally considered fair use and in aca-
demic environment supervisors and graduate students typically share articles both
electronically or printed on paper. Therefore, we believe that this type of sharing can
not be considered copyright violation, since it has educational use and stimulates the
flow of ideas, research information and assists the generation of new ideas.
In addition to facilitating the process of sharing papers, Comtella supports the
development of a shared group repository of resources, by synergizing the efforts of
all participating users. It allows users to rate the papers they share and add comments,
which can yield a global ranking of the papers with respect to their quality and/or
popularity within the group. Thus an additional source of information is generated
automatically, that can be very useful for newcomers to the lab (e.g. new students) to
get initial orientation in the assembled paper repository.
Comtella has been used on experimental basis with some interruptions and vari-
ous successes for nearly one year in the MADMUC lab and for about three months
across the Computer Science Department. We identified a number of technical issues
related to instability of servents caused by Java-related memory leaks, and communi-
cating across firewalls (so that the users could use the system from home), which
have been mostly resolved. There were logistics issues, related to the fact that the
system was fully distributed: a user who wanted to use the system both from home
and from the office had to always leave his/her servents running on both machines, so
that s/he can access from work his/her own papers shared by the servent at home and
to access from home the papers shared at the work computer. In fact, Comtella con-
siders the user’s servents at home and at work as servents of two different users, with
different ids, lists of shared papers etc.
In order to access the papers shared by another user, the other user has to be on-
line. This proved to be a problem, because users typically switch off their home com-
puters when they are at work. In addition, the users tend to start their servents only
when they want to search for papers and to quit it afterwards. This leads to very few
servents being on-line simultaneously, and therefore there are very few (if any) re-
sults to a query. It is very important to ensure a critical mass of on-line servents to
maintain an infrastructure that guarantees successful searches and attracts more users
to keep on-line their servents. Various solutions have been deployed in the popular
file-sharing systems like KaZaA and LimeWire, for example, switching off the ser-
vent can be made particularly hard.
Finally, even when most of the users keep their servents running all the time, the
system quickly reaches a “saturation” point, when all users download all the files in
which they are interested from other users during their first successful searches. If
there are no new resources injected into the system (by users bringing in and sharing
new papers), very soon it makes no sense for a user to search in his/her main area of
interest since there is nothing new. Ultimately, the system reaches equilibrium where
everyone has all papers that everyone else has. In order to achieve a dynamic and
useful system, the users have to share new papers regularly and thus contribute to the
growing repository rather to behave as lurkers (Nonnecke & Preece, 2000). Motivat-
ing users to contribute is an important problem and we have researched a motiva-
Harnessing P2P Power in the Classroom 309
tional strategy based on rewards in terms of quality of service and community visu-
alization of contributions (Bretzke & Vassileva, 2003).
Since Comtella provides exactly the infrastructure allowing users to bring in and
share resources with each other, we decided to deploy it in the “Ethics in IT” course
to support students in sharing and rating on-line papers related to the topics of the
course. We expected to have a higher participation and contribution rate than in the
case where Comtella is used to share research papers within a lab, since within a
class, the students are required to put a concerted effort scheduled by the class cur-
riculum (weekly topics) to summarize papers and to contribute new papers to get a
participation mark. We wanted to see also how the contribution level when students
use Comtella will differ from the level in the previous offering of the class, when
students had to add the links to their own class website. Finally, we wanted to ex-
periment with some of our motivational strategies to see if they actually lead to in-
crease in participation compared to a system with no motivational strategies.
For these two reasons, we decided to modify the standard Gnutella servent func-
tionality and instead of sharing the actual files, only their URLs are shared. To share a
paper from the Web, the user needs to copy and paste the URL into the Comtella
“Share” window, to copy and paste the title of the article from the browser, and fi-
nally to select the category (topic) of the article, which is indicated by the week of the
class it relates to (see Figure 1). The shared paper in this way consists of: title, URL,
category, and optionally, rating and comment, if the user decides to provide them.
Users who decide to search for papers related to the topic of a given week have to
specify the topic from the category list in the “Search” window. The servent sends the
query to its neighbour servents residing on the server and they forward the query to
their neighbours. If any of the servents that receive the query share some papers about
this topic, the results are sent back to the querying peer using the standard Gnutella
protocol. In other words, the protocol for search is not changed; the only change is the
physical location of the BE of the servents that reside now on two server machines.
Students can view and read the papers that were yielded as results of the search by
clicking on the “Visit” button without actually downloading the paper (see Figure 2).
Clicking “Visit” starts the default browser with the URL of the paper and the student
can view the paper in the browser. The student can also view the comments of the
user who shares the paper and his/her rating of the paper. If the student likes the paper
and decides to share it him/herself, to comment on it or rate it, s/he can download it,
by clicking on the “download” button. This initiates a download process between the
servents (which follows again the standard Gnutella protocol). Rather than the actual
paper, the title and URL are downloaded, while the comment and rating that the
sharing user entered are not. In this way, each user who shares a paper has to provide
his/her own comment and rating.
Harnessing P2P Power in the Classroom 311
The ratings of the papers indicate the quality of the paper and the level of interest
the students who downloaded the paper have in the topic of the paper. The students
were instructed to select for their weekly summary a paper that was rated highly by
two or more students of those who share the paper. The students could enter their
weekly summary through Comtella too, by entering a summary for a selected paper
from their shared papers.
If two students disagreed in their rating of a paper, their relationship strength de-
creases. The relationship between the student who does the search and each student
who shares a paper is shown in the search results (see Figure 2). In this way, students
can find other students who judge papers in a similar way, since the relationship value
serves as a measure of trust that the student has in the papers provided by the other
student.
Comtella became a focal point of all weekly course activities. The instructor did
not need to find and share any new articles since the students provided an excessive
number of materials, which were immediately accessible for anyone who wanted to
search for papers with the same topic (category). It became also unnecessary for the
instructor to review all contributed papers and select those appropriate to be summa-
rized, since the ratings of the papers indicated the good papers and those in which the
students were interested.
5 Evaluation
The deployment of Comtella in the Ethics and IT course is ongoing at the time of
writing. The planned evaluation of the system includes data collected through system
use (e.g. statistics on numbers of contributed papers, numbers of downloaded papers,
ratings and summaries written, average time spent on line, frequency of logging in)
312 J. Vassileva
and data from student questionnaires. However, even though the experiment is only
half-way through, comparing the average levels of student contributions during the
same period of time (first 6 weeks) in the current offering and the last year’s offering
of the class shows evidence for the success of the system. In both cases the same
instructor taught the class; the curriculum, scheduling of weekly themes and grading
scheme were the same. We compare the first six weeks in the 2002/2003 offering of
the class with the first six weeks of the 2003/2004 offering.
Table 1 summarizes the student and participation data in each class. We can see that
the average number of contributed new links per person in the 2003/2004 class where
students used Comtella was nearly three times higher than in the 2002/2003 class.
The bulk (nearly 80%) of contributions in the 2002/2003 class was done by five stu-
dents, while in the 2003/2004 class the top five students contributed approximately
40% of the links and the contributions were spread more equally (see also Figure 3,
which compares the distribution of contributions among the students in the two class
offerings). We can see that 56% of the students in the 2002/2003 class did not con-
tribute, versus only 17% in the 2003/04 class. Figure 4 shows how regularly students
contributed over the course of the first six weeks of the experiment. As can be seen
more students contributed regularly in the 2003/2004 class than in the 2002/2003
class.
One reason for these encouraging results is that it is much easier for the students to
add new links in Comtella than to maintain a website and add links there. Another
reason is that searching for relevant links with Comtella is much more convenient
than visiting the websites of each student in the class, so the students tended to use the
system much more often. They visited links shared by others in Comtella and when
viewing these articles they found new relevant articles (since often Web-magazine
articles have a sidebar field “Related Stories” or “Related Links”) and shared them in
Comtella “on the go”.
Harnessing P2P Power in the Classroom 313
Fig. 3. Number of new contributions: comparing the first weeks of the two courses.
While in the beginning the instructor had to evaluate the links added and to rate
them, from the third week on the students started giving ratings to the articles them-
selves and the system became self-organized. Of course, monitoring is still necessary,
since currently nothing prevents students from sharing links that are not related to the
contents of the course or offensive materials. In our experiment, such an event has not
happened to date, possibly, since the students are senior students in their last year
before graduation. Yet, it would be good to incorporate tools that would allow the
community of students to have a say on the postings of the colleagues and thus
achieve an automatic quality control by the community of users, similar to Slashdot.
In the remaining six weeks of the course, we will experiment a three level “mem-
bership” in the Comtella community based on the level of contribution: bronze, silver
and gold, that will give certain privileges for members who have contributed on
regular basis papers that have been downloaded and rated highly by other students.
This newer version contains also a visualization of the community showing the con-
tribution level of each individual member, information about whether s/he is on-line
at the moment in line with the motivation visualization described in (Bretzke & Vas-
sileva, 2003). The goal is to create a feeling of community (Smith & Kollock, 1999,
314 J. Vassileva
De Souza & Preece, 2004) and a competition among the students to find more and
better links.
6 Conclusions
References
1. Bretzke H., Vassileva J.: Motivating Cooperation in Peer to Peer Networks, Proceedings
User Modeling UM03, Johnstown, PA, Lecture Notes in Computer Science, Vol. 2702.
Springer-Verlag, Berlin Heidelberg New York, 218-227 (2003).
2. Comtella © 2002-2004 available from http://bistrica.usask.ca/madmuc/peer-motivation.htm
3. De Souza, C., Preece, J. A framework for analyzing and understanding online communities.
Interacting with Computers. 16 (3), 579-610 (2004).
4. Nejdl, W., Wolf B. et al.: EDUTELLA : A P2P Networking Infrastructure Based on RDF,
WWW2002, May 7-11, Honolulu, Hawaii, USA (2002).
5. Nonnecke, B., Preece, J., Lurker Demographics: Counting the Silent. Proceedings of ACM
CHI’2000, Hague, The Netherlands, 73–80 (2000).
6. Smith, M.A. and Kollock, P., 1999. Communities in Cyberspace. , Routledge, London.
7. Ternier, S., Duval, E., and Vandepitte P. LOMster: Peer-to-Peer Learning Object Metadata.
Proceedings of EdMedia-2002, AACE: Blacksburg, 1942-1943 (2002).
8. Vassileva J.: Supporting Peer-to-Peer User Communities, in R. Meersman, Z. Tari et al.
(Eds.) Proc. CoopIS, DOA, and ODBASE, LNCS 2519, Springer: Berlin, 230-247 (2002).
Analyzing Online Collaborative Dialogues:
The OXEnTCHÊ–Chat
Ana Cláudia Vieira, Lamartine Teixeira, Aline Timóteo, Patrícia Tedesco, and
Flávia Barros
1 Introduction
Since the 1970’s, research in the area of Computing in Education has been looking for
ways to improve learning rates with the help of computers [1]. Until the mid 1990’s, com-
putational educational systems focused on offering individual assistance to students (e.g.,
Computer Assisted Instruction (CAI), and early Intelligent Tutoring Systems (ITS)). As
a consequence, the students could only work in isolation, frequently feeling unmotivated
to spend long hours in this task.
Currently, the available information and communication technologies (ICTs) provide
means for the development of group work/learning virtual systems [2] at considerably
low costs. This scenario has favoured the emergence of virtual learning environments
(VLE) on the Internet (e.g., WebCT [3]). One of the benefits of group work is that the
participants can refine their knowledge by interacting with the others. Besides, it offers
ways to escape from the isolation seen in the CAI and ITS systems.
However, simply offering technology for interactions between VLE participants is
not enough to eliminate the isolation feeling. The students are not able to see their
peers, and to feel that they are part of a “community”. This way, they tend to become
unmotivated [4], and drop out of on-line courses fairly frequently.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 315–324, 2004.
© Springer-Verlag Berlin Heidelberg 2004
316 A.C. Vieira et al.
Recent research in Computer Supported Collaborative Learning (CSCL) [5] has been
investigating ways of helping users to: (1) feel more motivated; and (2) achieve better
performances in collaborative learning environments. One way to tackle problem (1) is
to provide the interface with an animated agent that interacts with the students. In fact,
studies have shown that these software agents facilitate human computer interaction,
and are able to influence users’ behavior [6]. Regarding issue (2), one possibility is to
monitor the collaboration process, analyzing it and providing feedback to the users on
how to better participate in the interaction. Besides, the system should also keep the
instructor informed about the interaction (so that s/he can decide if, when and how to
intervene or change pedagogical practices).
In this light, we developed the OXEnTCHÊ–Chat, a tool that tackles the above
problems by monitoring the interaction process and offering feedback to users. The
system provides a chat tool coupled with an automatic dialogue classifier which analyses
on-line interaction and provides just-in-time feedback reports to both instructors and
learners. Two different reports are available: (1) general information about the dialogue
(e.g. chat duration, number of users); and (2) specific information about one user’s
participation, how to improve it. The system also counts on a chatterbot [7], which plays
the role of an automatic coordinator (helping to maintain the dialogue focus, and trying to
motivate students to engage in the interaction). The tool was evaluated with two groups,
and the obtained results are very satisfactory.
The remainder of this paper is organised as follows. Section 2 presents a brief review
of the state of the art in systems that analyse collaboration. Section 3 describes the
OXEnTCHÊ–Chat tool, and section 4 discusses experiments and results. Finally, section
5 presents conclusions and suggestions for further work.
DEGREE (Distance Environment for GRoup ExperiencEs) [10] monitors the in-
teraction of distant learners in a discussion forum in order to support its pedagogical
decisions. The system sends messages to the students with the aim of helping them reflect
about the solution-building process, as well as about the quality of their collaboration.
It also provides feedback about the group performance.
COMET (A Collaborative Object Modelling Environment) [11] is a system developed
so that teams can collaboratively solve object-oriented design problems, using the Object
Modelling Technique (OMT). The system uses sentence openers (e.g. I think, I agree)
in order to analyse the ongoing interaction. The chat log stores information about the
conversation, such as date, day of the week, time of intervention, user login and sentence
openers used. COMET uses Hidden Markov Models to analyse the interaction and assess
the quality of knowledge sharing.
MArCo (Artificial Conflict Mediator – in Portuguese) [12] counts on an artificial
conflict mediator that monitors the dialogue, giving tips on how to better proceed when
a conflict is detected to the participants.
Apart from DEGREE, current systems that monitor on-line collaboration tend to
either concentrate their feedback on the users’ specific actions or on the whole inter-
action. On the one hand, by concentrating only on particular actions, systems can miss
opportunities for improving group performance. On the other hand, by concentrating
on the whole interaction, systems can miss opportunities for engaging students into the
collaborative process, and thus not properly motivating them.
3 The OXEnTCHÊ–Chat
The OXEnTCHÊ–Chat is a tool that tackles the problems of lack of motivation and low
group performance by providing feedback to individual users as well as to the group. The
system provides a chat tool coupled with an automatic dialogue classifier which analyses
the on-line interaction and provides just-in-time feedback to instructors/teachers and
learners. Teachers receive feedback reports on both the group and on individual students
(and thus can evaluate students and change pedagogical practices), whereas students
can only check their individual performance. This combination of automated dialogue
analysis and just-in-time feedback for teachers and students constitutes a novel approach.
The OXEnTCHÊ–Chat is an Internet-based tool, implemented in Java. Its architecture
is explained in details in section 3.1.
dialogue log; and Ontology, which stores the ontologies for various subject domains.
Package analysis also counts on the Bot Agent.
The Analysis Controller (AC) performs three functions: to receive users’ contribu-
tions to the dialogue; to receive requests for feedback; and to send relevant messages to
the Bot. When the AC receives a contribution to the dialogue, it stores this contribution in
the whole dialogue log, as well as in the corresponding user’s log. When the AC receives
a student’s request for feedback, it retrieves the corresponding user’s log, and sends it to
the Subject Classifier (SC). If the request is from the teacher, the AC retrieves the whole
dialogue log as well as any individual logs requested. The retrieved logs are then sent
to the SC. The AC forwards to the Bot all messages directed to it (e.g., a query about a
concept definition).
The SC analyses the dialogue and identifies whether or not participants have dis-
cussed the subject the teacher proposed for that chat. This analysis is done by querying
the relevant domain ontology (stored in the Ontology database). Currently, there are six
ontologies available: Introduction to Artificial Intelligence, Intelligent Agents, Multi-
Agent Systems, Knowledge Representation, Machine Learning and Project Manage-
ment. When the SC verifies that the students are really discussing the proposed subject,
it sends the dialogue log to the Feature Extractor (FE) for further analysis. If not, the SC
sends a message to the Report Manager (RM), asking it to generate a Standard report.
The SC also informs the Bot Agent about the subject under discussion, so that it can
provide relevant web links to the participants.
Analyzing Online Collaborative Dialogues: The OXEnTCHÊ–Chat 319
1
Chatterbots are software agents that communicate with people in natural language.
2
A FAQ-bot is a chatterbot whose aim is to answer Frequent Asked Questions.
320 A.C. Vieira et al.
interaction. The facilities found in (3) allow participants to talk in private, change font
colour and insert emoticons to express their feelings.
By clicking on button participants choose which sentence openers [13] they want
to use. OXEnTCHÊ–Chat provides a list of collaborative sentence openers in Portuguese,
compiled during the tool’s development. This list is based on available linguistics studies
[16], as well as on an empirical study of our dialogue corpus (used to train the MLP
and Decision Tree classifiers). We carefully analysed the corpus, labelling participants’
utterances according to the collaborative skills they indicated. The final list of sentence
openers was based both on their frequency in the dialogue corpus as well as on our
studies about linguistics and about collaborative dialogues (e.g. [14]).
Arrow points to the Agent Bot’s name in the logged users window. We have
decided to show the Bot as a logged user (Billy) to encourage participants to interact
with it. The Bot can answer user’s questions based on pre-stored concept definitions,
send messages to users that are not actively contributing to the dialogue, or play the role
of an automated dialogue coordinator.
Fig. 3 presents the window showed to the teacher when s/he requests feedback. In
the teacher can see which individual dialogue logs are available. In the instructor
can choose between analysing the complete dialogue, or the individual performances.
The teacher can do so by clicking on the buttons labelled “Analisar diálogo completo”
(Analyze the complete dialogue) and “Analisar conversa selecionada” (Analyze the
selected dialogue), respectively. Item shows the area where feedback reports are
presented. This particular example shows an Instructor Report. It contains the following
information: total chat duration, number of user’s contributions, number of participants,
number of collaborative skills used, SC analysis and final classification (effective, in this
case).
Analyzing Online Collaborative Dialogues: The OXEnTCHÊ–Chat 321
We have also developed an add-in that allows the instructor to access the dialogue
analysis even if s/he is not online during the interaction. In order to get feedback reports,
the teacher should select the relevant dialogue logs, and click on the corresponding
interface buttons to obtain Instructor and/or Learner reports.
In order to assess the tool’s usability, we first tested the OXEnTCHÊ–Chat’s performance
with nine users. All participants commented that the system was fairly easy to use.
However, they suggest some interface improvements. All suggestions were considered,
and the resulting interface layout is shown in Fig. 2.
Following, we conducted a usability test at the Federal Rural University of Pernam-
buco (UFRPE) with a group of ten undergraduate Computer Science students. They used
the tool to discuss about the proposal for an electronic magazine. The participants and
their lecturer, all in the same laboratory, interacted for 90 minutes while being observed
by three researchers.
At the end of the interaction, both the lecturer and the students were asked to fill in an
evaluation questionnaire with questions about the users’ identification and background
as well as about the chat usage (e.g., difficulties, suggestion for improvement). All
participants considered the system’s usability excellent. Difficulties reported were related
to reading of messages and user’s identification. This is due to the use of nicknames,
as well as to the speed of communication (and several conversation threads) that is so
common in synchronous communication.
In order to assess the quality of the feedback provided, we carried out an evaluation
experiment with two groups of participants. The main objective was to validate the
feedback and the dialogue classification provided by the OXEnTCHÊ–Chat.
The first experiment was performed at UFRPE, with the same group that participated
in the usability test. This time, learners were asked to discuss about a face-to-face class
ministered by the lecturer. Initially, the observers explained how participants could obtain
the tool’s feedback. Participants and the lecturer interacted for forty minutes. Participants
requested individual feedback during and after the interaction. The lecturer requested
the Instructor Report and also accessed several Learner Reports.
At the end of the test, the participants filled in a questionnaire, and remarked that
the chat was enjoyable, since the tool was easy to use and provided interesting just-in-
time feedback. They pointed out that a more detailed feedback, including tips on how
to improve their participation would be useful. Nine out of the ten students rated the
feedback as good, while one rated it as regular, stating that it was too general.
The second experiment was carried out at UFPE. Five undergraduate Computer
Science students (with previous background on Artificial Intelligence) participated in it.
The participants were asked to use OXEnTCHÊ–Chat to discuss about Intelligent Agents.
Participants interacted for twenty-five minutes. The lecturer was not present during the
experiment, and thus used the off-line feedback add-in in order to obtain the Instructor
and Learner reports. She assessed the quality of the feedback provided by analysing the
dialogue logs and comparing them to the system’s reports.
At the end of their dialogue, participants filled in the same evaluation questionnaire
that was distributed at UFRPE. Out of the five participants, two rated the feedback as
excellent, two rated it as good, and one rated it as weak.
Analyzing Online Collaborative Dialogues: The OXEnTCHÊ–Chat 323
Recent research in CSCL has been investigating ways to mitigate the problems of stu-
dent’s feeling of isolation and lack of motivation, common to Virtual Learning Envi-
ronments. In order to tackle these issues, several Collaborative Learning Environments
monitor the interaction and provide feedback specific to users actions or to the whole
interaction.
In this paper, we presented the OXEnTCHÊ–Chat, a tool that tackles the above prob-
lems. It provides a chat tool coupled with an automatic dialogue classifier which analyses
on-line interaction and provides just-in-time feedback to both teachers and learners. The
system also counts on a chatterbot to automatically coordinate the interaction. This
combination of techniques and functionalities is a novel one. OXEnTCHÊ–Chat has
been evaluated with two different groups, and the obtained results are very satisfactory,
indicating that this approach should be taken further.
324 A.C. Vieira et al.
At the time of writing, we are working on improving the Bot Agent by augmenting its
domain knowledge and skills, as well as on evaluating its performance. In the near future
we intend to improve OXEnTCHÊ–Chat in three different aspects: (1) to include other
automatic dialogue classifiers (e.g. other neural network models); (2) to improve the
feedback provided to teachers and learners, making it more specific; and (3) to improve
the Bot capabilities, so that it can contribute more effectively to the dialogue, by, for
example, playing a given role (e.g. tutor) in the interaction.
References
1. Wenger, E. Artificial Intelligence and Tutoring Systems: Computational and Cognitive Ap-
proaches to the Communication of Knowledge. Ed: Morgan Kaufmann (1987) 486p
2. Wessner, M. and Pfister, H.: Group Formation in Computer Supported Collaborative Learning.
In Proceedings of Group´01, ACM Press, (2001) 24-31
3. Goldberg, M.W. Using a Web-Based Course Authoring Tool to Develop Sophisticated Web-
based Course. Available at WWW in:
http://www.webct.com/service/ViewContent?contentID=11747. Accessed in 15/09/2003
4. Issroff, K., and Del Soldato, T., Incorporating Motivation into Computer-Supported Coopera-
tive Learning. In Brna, P. Paiva, A. and Self, J. (eds.) Proceedings of the European Conference
on Artificial Intelligence in Education, Edições Colibri, (1996) 284-290
5. Dillenbourg P. Introduction: What do you mean by Collaborative Learning? In Dillenbourg,
P. (ed.) Collaborative Learning: Cognitive and Computational Approaches. Elsevier Science,
(1999) 1-19
6. Chou C. Y; Chan T. W.; Lin C. J.: Redefining the learning companion: the past, present, and
future of educational agents Computers & Education 40, Elsevier Science (2003) 255-269
7. Galvão, A.; Neves, A.; Barros, F. “Persona-AIML: Uma Arquitetura para Desenvolver Chat-
terbots com Personalidade”. In.: IV Encontro Nacional de Inteligência Artificial. Anais do
XXIII Congresso SBC. v.7. Campinas, Brazil, (2003) 435- 444
8. Rosatelli, M. and Self, J. A Collaborative Case Study System for Distance Learning, Interna-
tional Journal of Artificial Intelligence in Education, 12, (2002) 1-25
9. González M. A. C. e Suthers D. D.: Coaching Collaboration in a Compute-Mediated learn-
ing Environment. (2002) Available at http://citeseer.nj.nec.com/514195.html. Accessed in
12/12/2003
10. Barros, B. e Verdejo, M. F.: Analysing student interaction processes in order to improve collab-
oration. The Degree Approach. International Journal of Artificial Intelligence in Education,
11, (2000) 221-241
11. Soller A.; Wiebe J.; Lesgold A.: A Machine Learning Approach to Assessing Knowledge
Sharing During Collaborative Learning Activities. Proceedings of Computer Support for
Collaborative Learning 2002, (2002) 128-137.
12. Tedesco, P. MArCo: Building an Artificial Conflict Mediator to Support Group Planning
Interactions, International Journal of Artificial Intelligence in Education, 13, (2003) 117-155
13. MacManus, M. M. e Aiken, R. M.: Monitoring Computer-Based Collaborative Problem Solv-
ing. Journal of Artificial Intelligence in Education, 6(4), (1995) 307-336
14. Marcuschi, L. A.: Análise da Conversação. Editora Ática, (2003)
A Tool for Supporting Progressive Refinement
of Wizard-of-Oz Experiments in Natural Language
1 Introduction
Natural language interaction is considered a major hope for increasing the effectiveness
of tutorial systems, since [7] has empirically demonstrated the necessity of natural lan-
guage dialogue capabilities for the success of tutorial sessions. Moreover, Wizard-of-Oz
(WOz) techniques proved to be an appropriate approach to collect data about dialogues
in complex domains [3]. In a WOz experiment subjects interact with a system that is
feigned by a human, the so-called wizard. Thus, WOz experiments generally allow one
to capture the idiosyncrasies of human-machine as opposed to human-human dialogues
[5,4]. Hence, these techniques are perfectly applicable for collecting data about the
behavior of students in tutorial dialogues with computers.
Carrying out WOz experiments in a systematic and motivated manner is expensive
and requires dedicated tools. However, existing tools have serious limitations for sup-
porting the development of systems with ambitious natural language capabilities. In
order to meet the demands of testing tutorial dialog systems in their development, we
have designed and implemented DiaWoZ, a tool that enables setting up and executing
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 325–335, 2004.
© Springer-Verlag Berlin Heidelberg 2004
326 A. Fiedler, M. Gabsdil, and H. Horacek
of WOz experiments to collect dialogue data. Its architecture is highly modular and al-
lows for the progressive refinement of the experiments by both modelling increasingly
sophisticated dialogues and successively replacing simulated components of the system
by actual implementations. Our investigations are part of the DIALOG project 1 [1].
Its goal is to (i) empirically investigate the
use of flexible natural language dialogue
in tutoring mathematics, and (ii) develop
an experimental prototype system gradu-
ally embodying the empirical findings. The
system will conduct dialogues in written
natural language to help a student under-
stand and construct mathematical proofs. In
contrast to most existing tutorial systems, Fig. 1. Progressive Refinement Cycles.
we envision a modular design, making use
of the powerful proof system [9]. This design enables a detailed reasoning about
the student’s action and enables elaborate system responses.
In Section 2, we motivate our approach in more detail. Section 3 is devoted to the
architecture of DiaWoZ and Section 4 discusses the dialogue specification for a short
example dialogue. We conclude the paper by discussing experience gained from the first
experiments carried out with DiaWoZ and sketch future developments.
2 Motivation
In our approach, we first want to collect initial data on tutoring mathematics, as well as a
corpus of the associated dialogues, similar to what human tutors do when tutoring in the
domain. This is particularly important in our domain of application, due to the notorious
lack of empirical data about mathematical dialogues, as opposed to the vast host of
textbooks. In these “classical” WOz experiments, the tutor is free to enter utterances
without much restriction. Refinement at this stage merely means to define subdialogues
or topics the wizard has to address during the dialogue, but without committing him to
any predefined sequence of actions.
In addition, we plan to progressively refine consecutive WOz experiments as depicted
in Figure 1. This concerns two aspects:
We aim at setting up experiments where the dialogue specifications are spelled out
in increasing detail, thereby limiting the choices of the wizard. These experiments
will enable us to formulate increasingly finer-grained hypotheses about the tutoring
dialogue, and to test these hypotheses in the next series of experiments.
We want to evaluate already existing components of the dialogue system before other
components have been implemented. For example, if the dialogue manager and the
natural language generation component are functional, but natural language analysis
is not, the wizard has to take care of natural language understanding. Since we expect
that the inclusion of system components will have an effect on the dialogues that
1
The DIALOG project is part of the Collaborative Research Center on Resource-Adaptive Cog-
nitive Processes (SFB 378) at Saarland University.
A Tool for Supporting Progressive Refinement 327
The architecture of DiaWoZ (cf. Figure 2) and its dialogue specification language are
designed to support the progressive refinement of experiments as discussed in Section 2.
We assume
that the task
of authoring a
dialogue to be
examined in a
WOz experiment
is usually per-
formed distinct
in time and place
from the task of
performing the
corresponding
WOz experiment. Fig. 2. The architecture of DiaWoZ.
To reflect on this
distinction, we decided to divide DiaWoZ into two autonomous subcomponents,
which can run independently: the Dialogue Authoring and the Dialogue Execution
components. In order to handle communication, both the tutoring system and wizard
utterances are presented to the subject via the Subject Interface, which also allows
the subject to enter text. To enable subsequent examination by the experimenter, the
Logging Module structures and stores relevant information of the dialogue.
The Dialogue Authoring component is a tool for specifying the dialogues to be exam-
ined in a WOz experiment. Using the Graphical Dialogue Specification module, which
allows for drag-and-drop construction of the dialogue specification, the experimenter can
assemble a finite state automaton augmented with information states as the specification
of a dialogue. A Validator ensures that the dialogue specification meets certain criteria
(e.g., every state is reachable from the start state, and the end state can be reached from
every state). The complete dialogue specification is passed to the Dialogue Execution
component.
The Dialogue Execution component first parses the dialogue specification and con-
structs an internal representation of it. This representation is then used by the Executor
to execute the automaton. The Executor determines which state is the currently active
one and which transitions are available. Depending on the dialogue turn these transi-
tions are passed to a chooser. The Generation Chooser receives the possible transitions
that, in turn, generate the tutor’s next utterance. The Analysis Chooser receives possible
transitions that analyze the subject’s utterances. Both choosers may delegate the task
of choosing a transition to specialized modules, such as an intelligent tutoring system
to determine the next help message or a semantic analysis component that analyzes the
328 A. Fiedler, M. Gabsdil, and H. Horacek
subject’s utterance. Moreover, both choosers may also inform the wizard of the available
options via the Wizard Interface and thus allow the wizard to pick a transition.
DiaWoZ is devised as a distributed system, such that the Dialogue Authoring and the
Dialogue Execution components, the Wizard and Subject Interfaces, and the Logging
Module each can be run on different machines. The components are implemented in Java
and the communication is via sockets using an XML interface language. Since XML
parsers are available for almost every all languages, new modules can be programmed
in any programming language and added to the system. In the remainder of this section,
we discuss the main components of DiaWoZ in some more detail.
to their respective drawbacks. This allows us to devise non-trivial dialogues that can still
be handled appropriately by the wizard.
As an example consider the following task from algebra: An algebraic structure
where S is a set and an operator on S, should be classified. is a group if
(i) there is a neutral element in S with respect to (ii) each element in S has an inverse
element with respect to and (iii) is associative. In a tutorial dialogue, the tutor
must ensure, that the student addresses all three subtasks to conclude that a structure is a
group. An appropriate dialogue specification is given in Figure 3. The initial information
state is displayed on the left side, while the finite-state automaton is shown on the right
side. State 1 is the start state. In State 2, there are three transitions and which
lead to parts of the automaton that represent subdialogues about the neutral element
(States 3 and 6), the inverse elements (States 4 and 7), and associativity (States 5 and 8),
respectively. The information state consists of three global variables NEUTRAL, INVERSE,
and ASSOCIATIVE capturing whether their corresponding values have been solved. The
preconditions of the transitions are the following:
NEUTRAL = open
INVERSE = open
ASSOCIATIVE = open
The remaining transitions are always applicable.
The effects of the transitions and change the value of NEUTRAL, INVERSE,
and ASSOCIATIVE, respectively, to done. Moreover, each transition produces an utterance
in the dialogue. We will give more detail about the utterances in Section 4.
Analysis Chooser. Thus, the choosers allow for the progressive refinement of consecutive
experiments. In general, the transition picked by the chooser can be presented to the
wizard to confirm or overrule this choice.
4 An Example Dialogue
To show how DiaWoZ works, let us come back to the example dialogue specification
given in Figure 3. It covers the following example dialogue (where Z denotes the set of
integers):
(U1) Tutor: To show that (Z, +) is a group, we have to show that it has a neu-
tral element, that each element in Z has an inverse, and that + is
associative in Z.
(U2) Tutor: What is the neutral element of Z with respect to +?
(U3) Student: 0 is the neutral element, and for each in Z, is the corresponding
inverse.
(U4) Tutor: That leaves us to show associativity.
Let us now examine the dialogue in detail. Starting in State 1, there is only one
transition that can be picked, namely It leads to State 2 and outputs utterance (U1).
In State 2, all three transitions and can be picked, because their preconditions
are fulfilled. The wizard chooses which leads to State 3 and produces the tutor’s
A Tool for Supporting Progressive Refinement 331
utterance (U2). Now, the student enters utterance (U3). Note that the student not only
answers the tutor’s question, but also gives the solution for the second subtask about the
inverse elements. Since there is no natural language understanding component included
in the system in our example setting, the wizard has to analyze the student’s utterance.
To allow for that, DiaWoZ presents the window depicted in Figure 4 to the wizard,
where the field titled “Repeat” stands for transition while the field titled “Correct
Answer” denotes transition The wizard instantiates the parameters of by choosing
the value done for the variables NEUTRAL and INVERSE of the information state to be set
by the effect of Note that this choice reflects the fact that the student overanswered
the tutor’s question. Moreover, note that
due to the overanswering the tutor should
not choose the subtask about the inverse
elements in the next dialogue turn, but in-
stead proceed with the remaining problem
about associativity. With clicking OK in
the “Correct Answer” field, transition is
selected. Thus, the Executor updates the
information state by setting the values of
NEUTRAL and INVERSE to done and brings
us back to State 2. This time, only tran-
sition is applicable, which justifies the
production of utterance (U4) through ex- Fig. 5. The Subject Interface window.
tra linguistic knowledge.
5 DiaWoZ in Use
In the DIALOG project, we aim at a tutorial dialogue system for mathematics [1]. Via
a series of WOz experiments, we follow the progressive refinement approach described
in this paper. The first experiment has already been conducted and reported on [2]. It
aimed primarily at collecting a corpus of tutorial dialogues on naive set theory. We shall
describe how we used DiaWoZ in this experiment and discuss the lessons learned.
ask corresponds to the system asking the subject for an answer. The transition answer
corresponds to the subject’s answer. Both transitions hand over the turn to the interlocutor.
The reflexive transitions, in contrast, allow the tutor and the subject, respectively, to utter
something and keep the turn. The transition end-dialogue, finally, results in the end state
and thus ends the dialogue. The wizard can scrutinize the dialogue model by clicking
on states and transitions, which are then described in more detail in the lower part of the
window.
In every state of the dialogue, the wizard had to choose the next transition to be
applied, both when he or the subject made a dialogue move by manipulating the Wizard
Interface window (cf. Figure 7). More-
over, he had to assign the subjects’
answers to a category such as “cor-
rect”, “wrong”, “incomplete-partially-
accurate”, or “unknown”, by selecting the
appropriate category from a pull-down
list. Then, informed by a hinting algo-
rithm, he had to choose his next dialogue
move (again by selecting it from a pull-
down list) and verbalize it (by typing in
his verbalization). The lower part of the
interface window allowed the wizard to
type in standard utterances he wanted to Fig. 6. The example dialogue model.
reuse by copy and paste. These utterances
could be stored in a file.
Both the subjects and the wizard could make use of mathematical symbols provided
as buttons in both interfaces. A resource file, which is accessible by the experimenter,
defines which symbols are provided, such that the buttons can be tailored to the domain
of the dialogue. The Logging Module logged information about selected transitions,
reached states, chosen answer categories and dialogue moves, and utterances typed in
by the subjects and the wizard along with time stamps of all actions. To analyze the data
collected during the experiment, we built a log file viewer that allows for searching the
log file for information, hiding and revealing of information, and printing of revealed
information.
Carrying out WOz experiments in a systematic and motivated manner is expensive
and requires dedicated tools. DiaWoZ is inspired by different existing dialogue building
and WOz systems. MDWOZ [8] and SUEDE [6] are two examples of systems for
designing and conducting WOz experiments. MDWOZ features a distributed client-
server architecture and includes modules for database access as well as visual graph
drawing and inspection. SUEDE provides a sophisticated GUI for drawing finite-state
diagrams, a browser-like environment for running experiments, and an “analysis mode”
in which the experimenter can easily access and review the collected data. The drawback
of these systems, however, is that they only allow for finite-state dialogue modeling,
which is restricted in its expressiveness. Conversely, development environments like
the CSLU toolkit [10], offer more powerful dialogue specifications (e.g., by attaching
program code to states or transitions), but do not support the WOz technique.
A Tool for Supporting Progressive Refinement 333
In the experiments, the students evaluated working with the simulated system rather
positively, which is some evidence for the good functionality of DiaWoZ. By and large,
the dialogue specifications were reasonable for the first experiment, except from one
problem. Namely the need for a time limit had not been foreseen. Our initial dialogue
model did not have reflexive transitions, such that the turn was given to the subject
when the wizard had entered his utterance. If the subject did not answer, the wizard
could not take the initiative anymore. To remedy this problem, we introduced reflexive
transitions to allow the wizard keeping the turn for as long as the the subject had not
typed in his answer. We are currently investigating how to solve this problem generally
in diawoz by providing the wizard with
the means of seizing the turn at any point.
Altogether, we have gained experience
from this first series of experiments in
three major respects:
The quality of the hints. In order to determine the wizard’s reaction, we have made use
of an elaborate hinting algorithm. We have varied hinting strategies systematically, the
socratic strategy being the most ambitious one. However, contrary to our expectations,
this strategy did not turn out to be superior to the others, which led us to analyze more
deeply the method by which the content of the hints was generated.
The flexibility of natural language interaction. In the experiments, it turned out that
the subjects used fragments of natural language text and mathematical formulas in a
freely intertwined way, much more than we had expected. Some of the utterances they
produced required very cooperative reasoning on behalf of the wizard to enable a proper
interpretation. In order to obtain a natural corpus, which was a main goal in the first
series of experiments, applying this high degree of cooperativity was beneficial, but it is
unrealistic for interpretation by a machine.
For the next series of experiments we will undertake modifications in the student
interface that incorporate the suggestions made by the experiment subjects. Moreover, we
334 A. Fiedler, M. Gabsdil, and H. Horacek
have also enhanced our hinting algorithm to include abstractions and new perspectives,
thus extending the repertoire of that module according to the experimental results. We
plan to make this module accessible to DiaWoZ as a software component for the next
series of experiments. Finally, we have to restrict communication in natural language
intertwined with formulas, so that the degree of fragmentation is manageable by the
analysis component we are developing. In terms of DiaWoZ, this will lead to a more
detailed dialogue structure to be spelled out by the means the tool offers.
6 Conclusion
References
1 Introduction
The Tactical Language Training System helps learners acquire communicative com-
petence in spoken Arabic and other languages. An intelligent agent coaches the learn-
ers through lessons, using innovative speech recognition technology to assess their
mastery and provide tailored assistance. Learners then practice particular missions in
an interactive story environment, where they speak and choose appropriate gestures in
simulated social situations populated with autonomous, animated characters. We aim
to provide effective language training both to high-aptitude language learners and to
learners with low confidence in their language abilities. We hypothesize that such a
learning environment will be more engaging and motivating than traditional language
instruction and yield rapid skill acquisition and greater learner self-confidence.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 336–345, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Tactical Language Training System: An Interim Report 337
2 Motivations
The Mission Practice Environment is built using computer game technology, and
exploits game design techniques, in order to promote learner engagement and moti-
vation. Although there is significant interest in the potential of game technology to
promote learning [6], there are some important outstanding questions about how to
exploit this potential. One is transfer – how does game play result in the acquisition
of skills that transfer outside of the game? Another is how best to exploit narrative
structure to promote learning? Narrative structure can make learning experiences
more engaging and meaningful, but can also discourage learners from engaging in
learning activities such as exploration, study, and practice that do not fit into the story
line. By combining learning experiences with varying amounts of narrative structure,
and by evaluating transfer to real-world communication, we hope to develop a deeper
understanding of these issues.
The TLTS builds on ideas developed in previous systems involving microworlds
(e.g., FLUENT, MILT) [7],[9], conversation games (e.g., Herr Kommissar) [3],
speech pronunciation analysis [23], learner modeling, simulated encounters with
virtual characters (e.g., Subarashii, Virtual Conversations, MRE) [1], [8], [20]. It
extends this work by providing rich form feedback, by separating game interaction
from form feedback, and by supporting a wide range of spoken learner inputs, in an
implementation that is robust and efficient enough for ongoing testing and use on
commodity computers. The use of speech recognition for tutoring purposes is par-
ticularly challenging and innovative,since speech recognition algorithms tend not to
be very reliable on learner speech.
3 Example
The following scenario illustrates how the TLTS is used. To appreciate the learner’s
perspective, imagine that you are a member of an Army Special Forces unit assigned
to conduct a civil affairs mission in Lebanon.1 Your unit will need to enter a village,
establish rapport with the people, make contact with the local official in charge, and
help carry out post-war reconstruction. To prepare for your mission, you go into the
Mission Skill Builder and practice your communication skills, as shown in Figure 1.
Here, for example, you learn a common greeting in Lebanese Arabic, “marHaba.”
You practice saying “marHaba” into your headset microphone. Your speech is auto-
matically analyzed for errors, and your virtual tutor, Nadiim, gives you immediate
feedback. If you mispronounce the pharyngeal /H/ sound, as native English speakers
commonly do, you receive focused, supportive feedback. Meanwhile, a learner model
keeps track of the phrases and skills you have mastered. When you feel that you are
ready to give it a try, you enter the Mission Practice Environment. Your character in
the game, together with a non-player character acting as your aide, enters the village.
You enter a café, and start a conversation with a man in the café, as shown in Figure 2
1
Lebanon was initially chosen because Lebanese native speakers and speech corpora are
widely available. This scenario is typical of civil affairs operations worldwide, and does not
reflect actual or planned US military activities in Lebanon.
Tactical Language Training System: An Interim Report 339
(left). You speak for your character into your microphone, while choosing appropri-
ate nonverbal gestures. In this case you choose a respectful gesture, and your inter-
locutor, Ahmed, responds in kind. If you encounter difficulties, your aide can help
you, as shown in Figure 2 (right). The aide has access to your learner model, and
therefore knows what Arabic phrases you have mastered. If you had not yet mastered
Arabic introductions the aide would provide you with a specific phrase to try. You
can then go back to the Skill Builder and practice further.
developers. The system must also be flexible enough to support modular testing and
integration with the DARWARS architecture, which is intended to provide any-time,
individualized cognitive training to military personnel. Given these requirements, a
distributed architecture makes sense (see Figure 3). Modules interact using content-
based messaging, currently implemented using the Elvin messaging service.
The Pedagogical Agent monitors learner performance, and uses performance data
both to track the learner’s progress in mastering skills and to decide what type of
feedback to give to the learner. The learner’s skill profile is recorded in a Learner
Model, which is available as a common resource, and implemented as a set of infer-
ence rules and dynamically updated tables in an SQL database. The learner model
keeps a record of the number of successful and unsuccessful attempts for each action
over the series of sessions, as well as the type of error that occurred when the learner
is unsuccessful. This information is used to estimate the learner’s mastery of each
vocabulary item and communicative skill, and to determine what kind of feedback is
most appropriate to give to the learner in a given instance. When a learner logs into
either the Skill Builder or the Practice Environment, his/her session is immediately
associated with a particular profile in the learner model. Learners can review sum-
mary reports of their progress, and in the completed system instructors at remote
locations will be able to do so as well.
To maintain consistency in the language material, such as models of pronunciation,
vocabulary and phrase construction, a single Language Model serves as an interface
to the language curriculum. The Language Model includes a speech recognizer that
both applications can use, a Natural Language Parser that can annotate phrases with
structural information and refer to relevant grammatical explanations and an Error
Model which detects and analyzes syntactic and phonological mistakes.
While the Language Model can be thought of as a view of and a tool to work with
the language data, the data itself is stored in a separate Curriculum Materials data-
base. This database contains all missions, lessons and exercises that have been con-
Tactical Language Training System: An Interim Report 341
structed, in a flexible Extensible Markup Language (XML) format, with links to me-
dia such as sound clips and video clips. It includes exercises that are organized in a
recommended sequence, and tutorial tactics that are employed opportunistically by
the pedagogical agent in response to learner actions. The database is the focus of the
authoring activity. Entries can be validated using the tools of the Language Model.
The Medina authoring tool (currently under development) consolidates this process
into a single interface where people with different authoring roles can view and edit
different views of the curriculum material while overall consistency is ensured.
Since speech is the primary input modality of the TLTS, robustness and reliability
of speech processing are of paramount concern. The variability of learner language
makes robustness difficult to achieve. Most commercial automated speech recogni-
tion (ASR) systems are not designed for learner language [13], and commercial com-
puter aided language learning (CALL) systems that employ speech tend to overesti-
mate the reliability of the speech recognition technology [22]. To support learner
speech recognition in the TLTS, our initial efforts focused on acoustic modeling for
robust speech recognition especially in light of limited domain data availability [19].
In this case, we bootstrapped data from English and modern standard Arabic and
adapted it to Levantine Arabic speech and lexicon. Dynamic switching of recognition
grammars was also implemented, as were recognition confidence estimates, used by
the pedagogical agent to decide how to give feedback. The structures of the recogni-
tion networks are distinct for the MSB and the MPE environments. In the MSB mode,
the recognition is based on limited vocabulary networks with pronunciation variants
and hypothesis rejection. In the MPE mode, the recognizer supports less constrained
user inputs, focusing on recognizing the learner’s intended meaning.
The Mission Skill Builder (MSB) is a one-on-one tutoring environment which helps
the learner to acquire mission-oriented vocabulary, pronunciation training and gesture
recognition knowledge. In this learning environment the learner develops the neces-
sary skills to accomplish specific missions. A virtual tutor provides personalized
feedback to improve and accelerate the learning process. In addition, a progress re-
port generator generates a summary of skills the learner has mastered, which is pre-
sented to the learner in the same environment.
The Mission Skill Builder user interface is implemented in SumTotal’s ToolBook,
augmented by the pedagogical agent and speech recognizer. The learner initiates
speech input by clicking on a microphone icon, which sends a “start” message to the
automated speech recognition (ASR) process. Clicking the microphone icon again
sends a “stop” message to the speech recognition process, which then analyzes the
speech and sends the recognized utterance back to the MSB. The recognized utter-
ance, together with the expected utterance, is passed to the Pedagogical Agent, which
in turn passes this information to the Error Model (part of the Language Model), to
analyze and detect types of mistakes. The results of the error detection are then passed
back to the Pedagogical Agent, which decides what kind of feedback to choose, de-
342 W.L. Johnson et al.
pending on the error type and the learner’s progress. The feedback is then passed to
the MSB and is provided to the learner via the virtual tutor persona, realized as a set
of video clips, sound clips, and still images. In addition the Mission Skill Builder
informs the learner model about several learner activities with the user interface,
which help to define and extend the individual learner profile.
The Unreal World uses the Unreal Tournament 2003 game engine where each char-
acter, including the learner’s own avatar, is represented by an animated figure called
an Unreal Puppet. The motion of the learner’s puppet is for the most part driven by
input from the mouse and keyboard, while the other puppets receive action requests
from the Mission Engine through the Unreal World Server, which is an extended
version of the Game Bots server [12]. In addition to relaying action requests to pup-
pets, the Unreal World Server sends information about the state of the world back to
the Mission Engine. Events from the user interface, such as mouse button presses, are
first processed in the Input Manager, and then handed to the Mission Engine where a
proper reaction is generated. The Input Manager also invokes the Speech Recognizer,
when the learner presses the right mouse button, and sends the recognized utterance,
with information about the chosen gesture, to the Mission Engine.
Tactical Language Training System: An Interim Report 343
The Mission Engine uses a multi-agent architecture where each character is repre-
sented as an agent with its own goals, relationships with other entities (including the
learner), private beliefs and mental models of other entities [16]. This allows the user
to engage in a number of interactions with one or more characters that each can have
their own, evolving attitude towards the learner. Once the character agents have cho-
sen an action, they pass their communicative intent to corresponding Social Puppets
that plan a series of verbal and nonverbal behavior that appropriately carry out that
intent in the virtual environment. We plan to incorporate a high-level Director Agent
that influences the character agents, to control how the story unfolds and to ensure
that pedagogical and dramatic goals are met. This agent exploits the learner model to
know what the learners can do and to predict what they might do. The director will
use this information as a means to control the direction of the story by manipulating
events and non-player characters as needed, and to regulate the challenges presented
to the student.
A special character that aides the learners during their missions uses an agent
model of the learner to suggest what to say next when the learner asks for help or
when the learner seems to be having trouble progressing. When such a hint is given,
the Mission Engine consults the Learner Model to see whether the learner has mas-
tered the skills involved in producing the phrase to be suggested. If the learner does
not have the required skill set, the aide spells out in transliterated Arabic exactly what
needs to be said, but if the learner should know the phrase in Arabic, the aide simply
provides a hint in English such as “You should introduce yourself.”
5 Evaluation
caused the agent in some cases to reject utterances that were pronounced correctly.
The algorithm for scoring learner pronunciation has since been modified, to give
higher scores to utterances that are pronounced correctly but slowly; this eliminated
most of the problems of correct speech being rejected. We have also adjusted the
feedback selection algorithm to avoid criticizing the learner when speech recognition
confidence is low. This revised feedback mechanism is scheduled to be evaluated in
further tests with soldiers in July 2004 at Ft. Bragg, North Carolina.
The Tactical Language Training System project has been active for a relatively brief
period, yet it has already made rapid progress in combining pedagogical agent, peda-
gogical drama, speech recognition, and game technologies in support of language
learning. Once the system design is updated based upon the results of the formative
evaluations, the project plans the following tasks:
integrate the Medina authoring tool to facilitate content development,
incorporate automated tracking of learner focus of attention, to detect learner
difficulties and provide proactive help,
construct additional content to cover a significant amount of spoken Arabic,
perform summative evaluation of the effectiveness of the TLTS in promoting
learning, and analysis of the contribution of TLTS component features to
learning effectiveness, and
support translingual authoring – adapting content from one language to an-
other, in order to facilitate the creation of similar learning environments for a
range of less commonly taught languages.
References
1. Bernstein, J., Najmi, A. & Ehsani, F.: Subarashii: Encounters in Japanese Spoken Lan-
guage Education. CALICO Journal 16 (3) (1999) 361-384
Tactical Language Training System: An Interim Report 345
2. Brown, P. & Levinson: Politeness: Some universals in language use. Cambridge Univer-
sity Press, New York (1987)
3. DeSmedt, W.H.: Herr Kommissar: An ICALL conversation simulator for intermediate
German. In V.M. Holland, J.D. Kaplan, & M.R. Sams (Eds.), Intelligent language tutors:
Theory shaping technology, 153-174. Lawrence Erlbaum, Mahwah, NJ (1995)
4. Doughty, C.J. & Long, M.H.: Optimal psycholinguistic environments for distance foreign
language learning. Language Learning & Technology 7(3), (2003) 50-80
5. Gamper, G. & Knapp, J.: A review of CALL systems in foreign language instruction. In
J.D. Moore et al. (Eds.), Artificial Intelligence in Education, 377-388. IOS Press, Amster-
dam (2001)
6. Gee, P.: What video games have to teach us about learning and literacy. Palgrave Mac-
millan, New York (2003)
7. Hamberger, H.: Tutorial tools for language learning by two-medium dialogue. In V.M.
Holland, J.D. Kaplan, & M.R. Sams (Eds.), Intelligent language tutors: Theory shaping
technology, 183-199. Lawrence Erlbaum, Mahwah, NJ (1995)
8. Harless, W.G., Zier, M.A., and Duncan, R.C.: Virtual Dialogues with Native Speakers:
The Evaluation of an Interactive Multimedia Method. CALICO Journal 16 (3) (1999) 313-
337
9. Holland, V.M., Kaplan, J.D., & Sabol, M.A.: Preliminary Tests of Language Learning in a
Speech-Interactive Graphics Microworld. CALICO Journal 16 (3) (1999) 339-359
10. Johnson, W.L.: Interaction tactics for socially intelligent pedagogical agents. IUI 2003,
251-253. ACM Press, New York (2003)
11. Johnson, W.L.,& Rizzo, P.: Politeness in tutoring dialogs: “Run the factory, that’s what
I’d do.” ITS 2004, in press (2004)
12. Kaminka, G.A., Veloso, M.M., Schaffer, S., Sollitto, C., Adobbati, R., Marshall, A.N.,
Scholer, A. and Tejada, S.: GameBots: A Flexible Test Bed for Multiagent Team Re-
search. Communications of the ACM, 45 (1) (2002) 43-45
13. LaRocca, S.A., Morgan, J.J., & Bellinger, S.: On the path to 2X learning: Exploring the
possibilities of advanced speech recognition. CALICO Journal 16 (3) (1999) 295-310
14. Lightbown, P.J. & Spada, N.: How languages are learned. Oxford University Press, Ox-
ford (1999)
15. Marsella, S., Johnson, W.L. and LaBore, C.M.: An interactive pedagogical drama for
health interventions. In Hoppe, U. and Verdejo, F. eds., Artificial Intelligence in Educa-
tion. IOS Press, Amsterdam (2003)
16. Marsella, S.C. & Pynadath, D.V.: Agent-based interaction of social interactions and influ-
ence. Proceedings of the Sixth International Conference on Cognitive Modelling, Pitts-
burgh, PA (2004)
17. Muskus, J.: Language study increases. Yale Daily News, Nov. 21, 2003
18. NCOLCTL: National Council of Less Commonly Taught Languages.
http://www.councilnet.org (2003)
19. Srinivasamurthy, N. and Narayanan: “Language-adaptive Persian speech recognition”,
Proc. Eurospeech (Geneva,Switzerland) (2003)
20. Swartout, W.,Gratch, J., Johnson, W.L., et al. : Towards the Holodeck: Integrating
graphics, sound, character and story. Proceedings of the Intl. Conf. on Autonomous
Agents,409-416. ACM Press, New York (2001)
21. Swartout, W. & van Lent: Making a game of system design. CACM 46(7) (2003) 32-39
22. Wachowicz, A. and Scott, B.: Software That Listens: It’s Not a Question of Whether, It’s
a Question of How. CALICO Journal 16 (3), (1999) 253-276
23. Witt, S. & Young, S.: Computer-aided pronunciation teaching based on automatic speech
recognition. In S. Jager, J.A. Nerbonne, & A.J. van Essen (Eds.), Language teaching and
language technology, 25-35. Swets & Zeitlinger, Lisse (1998)
Combining Competing Language Understanding
Approaches in an Intelligent Tutoring System
1 Introduction
Implementing an intelligent tutoring system that attempts a deep understand-
ing of a student’s natural language (NL) explanation is a challenging and time
consuming undertaking even when making use of existing NL processing tools
and techniques [1,2,3]. A motivation for attempting a deep understanding of an
explanation is so that a tutoring system can reason about the domain knowl-
edge expressed in the student’s explanation in order to diagnose errors that are
only implicitly expressed [4] and to provide substantive feedback that encourages
further self-explanation [5]. To accomplish these tutoring system tasks, the NL
technology must be able to map typical student language to an appropriate do-
main level representation language. While some NL mapping approaches require
relatively little domain knowledge preparation there is currently still a trade-off
with the quality of the representation produced especially as the complexity of
the representation language increases.
Although most NL mapping approaches have been rigorously evaluated, the
results may not scale-up or generalize to the tutoring system domain. First it
may not be practical to carefully prepare large amounts of domain knowledge in
the same manner as may have been done for the evaluation of an NL approach.
This is especially a problem for tutoring systems since they need to cover a large
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 346–357, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Combining Competing Language Understanding Approaches 347
tion, clarification and remediation tutoring goals. The details of the Why2-Atlas
system are described in [1] and only the mapping of an isolated NL sentence to
the Why2-Atlas representation language will be addressed in this paper. In this
section we give an overview of the rich domain representation language that the
system uses to support diagnosis and feedback.
The Why2-Atlas ontology is strongly influenced by previous qualitative
physics reasoning work, in particular [7], but makes appropriate simplifications
given the subset of physics the system is addressing. The Why2-Atlas ontology
comprises bodies, states, physical quantities, times and relations. The ontology
and representation language are described in detail in [4].
For the sake of simplicity, most bodies in the Why2-Atlas ontology have the
semantics of point-masses. Body constants are problem specific. For example the
body constants for one problem covered by Why2-Atlas are pumpkin and man.
Individual bodies can be in states such as freefall. Being in a particular
state implies respective restrictions on the forces applied on the body. There is
also the special state of contact between two bodies where attached bodies
can exert mutual forces and the positions of the two bodies are equal, detached
bodies do not exert mutual forces, and moving-contact bodies can exert mutual
forces but there is no conclusion on their relative positions. The latter type of
contact is introduced to account for point-mass bodies that are capable of push-
ing/pulling each other for certain time intervals (a non-impact type of contact),
for example the man pushing a pumpkin up.
Physical quantities are represented as one or two body vectors. The one body
vector quantities are position, displacement, velocity, acceleration, and
total-force and the only two body one in the Why2-Atlas ontology is force.
The single body scalar quantities are duration, mass, and distance.
Every physical quantity has slots and respective restrictions on the sort of a
slot filler as shown in Table 1, where examples of slot filler constants of the proper
sorts are shown in parentheses. Note that the sorts Id, D-mag, and D-mag-num
Combining Competing Language Understanding Approaches 349
do not have specific constants. These slots are used only for cross-referencing
between different propositions.
Time instants are basic primitives in the Why2-Atlas ontology and a time
interval is a pair of instants. This definition of time intervals is sufficient
for implementing the semantics of open time intervals in the context of the
mechanics domain.
Some of the multi-place relations in our domain are before, rel-position
and compare. The relation before relates time instants in the obvious way.
The relation rel-position provides the means to represent the relative posi-
tion of two bodies with respect to each other, independently of the choice of
a coordinate system—a common way to informally compare positions in NL.
The relation compare is used to represent the ratio and difference of two quan-
tities’ magnitudes or for quantities that change over time, magnitudes of the
derivatives.
The domain propositions are represented using order-sorted first-order logic
(FOL) (see for example [8]). For example, “force of gravity acting on the pumpkin
is constant and nonzero” has the following representation in which the generated
identifier constants f1 and ph1 appear as arguments in the due-to relation
predicate (sort information is omitted):
strengths and weaknesses of each general type of approach as the basis for our
hand-coded selection heuristics.
the words in the class tagged corpus for training a classifier. This particular style
of classification is called a bag of words approach because the meaning that the
organization of a sentence imparts is not considered. The classes themselves are
generally expressed as text as well and are at the level of an exemplar of a text
that is a member of the class. With this approach, the text can be mapped to
its representation by looking up a hand-generated propositional representation
for the exemplar text of the class identified at run-time.
RAINBOW is one such bag of words text classifier; in particular it is a Naive
Bayes text classifier. The classes of interest must first be decided and then a
training corpus developed where subtexts are annotated with the class to which
it belongs. For the Why2-Atlas training, each sentence was annotated with one
class. During training RAINBOW computes an estimate of the probability of a
word in a particular class relative to the class labellings for the Why2-Atlas
training sentences. Then when a new sentence is to be analyzed at run-time,
RAINBOW calculates the posterior probabilities of each class relative to the words
in the sentence and selects the class with the highest probability [10].
Like most statistical approaches, the quality of RAINBOW’s analysis depends
on the quality of its training data. Although good annotator agreement is pos-
sible for the classes of interest for the Why2-Atlas domain [18], we found the
resulting training set for a class sometimes includes sentences that depend on a
particular context for the full meaning of that class to be licensed. In practice
the necessary context may not be present for the new sentence that is to be
analyzed. This suggests that the statistical approach will tend to overgenerate
representations. It is also possible for a student to express more than one key
part of an explanation in a single sentence so that multiple class assignments
would be more appropriate. This suggests that the statistical approach will also
sometimes undergenerate since only the best classification is used. However, we
expect the need for multiple class assignments to happen infrequently since the
Why2-Atlas system includes a sentence segmenter that attempts to break up
complex sentences before sentence understanding is attempted by any of the
approaches.
the type it recognizes is present and if so, which class it is. The class indicates
which slots are filled with which slot constants. There is then a one-to-one cor-
respondence between a class and a proposition in the representation language.
To arrive at the representation for a single sentence, RAPPEL applies all of the
trained classifiers and then combines their results during a post-processing stage.
For Why2-Atlas we trained separate classifiers for every physics quantity,
relation and state for a total of 27 different classifiers. For example, there is a
separate classifier for velocity and another for acceleration. Bodies are also
handled by separate classifiers; one for one body propositions and another for two
body propositions. The basic approach for the body classifiers is similar to that
used in statistical approaches to reference resolution (e.g. [20,21]). The number
of classes within each classifier depend on the number of slot constant filler com-
binations possible. For example, the class encodes the proposition (velocity
id1 horizontal ?body and the class encodes the proposition
(velocity id2 horizontal ?body increase ?mag-zero ?mag-num pos ?t1 ?t2) where
represents the predicate velocity, represents the slot constant horizontal,
represents the slot constant increase and represents the constant pos.
Having a large number of classifiers and classes requires a larger, more com-
prehensive set of training data than is needed for a typical text classification
approach. And just as with the preparation of the training data for the statisti-
cal approach, the annotator may still be influenced by the context of a sentence.
However, we expect the impact of contextual dependencies to be less severe
since the representation-defined classes are more formal and finer-grained than
text-defined classes. For example, annotators may still resolve intersentential
anaphora and ellipsis but the content related inferences needed to select a class
are much finer-grained and therefore a closer fit to the actual meaning of the
sentence.
Although we have classifiers and classes defined that cover the entire Why2-
Atlas representation language, we have not yet provided training for the full
representation language. Given the strong dependence of this approach on the
completeness of the training data, we expect this approach to sometimes un-
dergenerate just as an incomplete symbolic approach would and sometimes to
overgenerate because of overgeneralizations during learning, just as with any
statistical approach.
tive, the representation of the essay must be accurate enough to detect when
physics principles are both properly and improperly expressed in the essay.
For the entire test suite we compute the number of true positives (TP), false
positives (FP), true negatives (TN) and false negatives (FN) for the elicitation
topics selected by the system relative to the elicitation topics annotated for the
test suite essays. From this we compute recall = TP/(TP+FN), precision =
TP/(TP+FP), and false alarm rate = FP/(FP+TN).
As a baseline measure, we compute the recall, precision and false alarm rate
that results if all possible elicitations for a physics problem are selected. For
our 35 essay test suite the recall is 1, precision is .61 and false alarm rate is 1.
Although NL evaluations compute an F-measure (the harmonic average of recall
and precision) in order to arrive at one number for comparing approaches, it
does not allow errors to be considered as fully as with other analysis methods
such as receiver operating characteristics (ROC) areas [22] and [23]. These
measures are similar in that they combine the recall and the false alarm rates
into one number but allow for error skewing [22]. Rather than undertaking a full
comparison of the various NL understanding approach configurations for this
paper, we will instead look for those combinations that result in a high recall
and a low false alarm rate. Error skewing depends on what costs we need to
attribute to false negatives and false positives. Both potentially have negative
impacts on student learning in that the former leaves out important information
that should have been brought to the student’s attention and the latter can
confuse the student or cause lack of confidence in the system.
The first competition model tries each approach in a preferred sequential or-
dering, stopping when a representation is acceptable according to a general fil-
tering heuristic and otherwise continuing. The filtering heuristic estimates which
representations are over or undergenerated and excludes those representations
so that it appears that no representation was found for the sentence. A represen-
tation for a sentence is undergenerated if any of the word stems in a sentence are
constants in the representation language and none of those are in the representa-
tion generated or if the representation produced is too sparse. For Why2-Atlas,
it is too sparse if 50% of the propositions in the representation for a sentence
have slots with less than two constants filling them. Most propositions in the
representation language contain six slots which can be filled with constants.
Propositions that are defined to have two or fewer slots that can be filled with
constants are excluded from this assessment (e.g. the relations before and rel-
position are excluded). Representations are overgenerated if the sentences are
shorter than 4 words since in general the physics principles to be recognized
cannot be expressed in fewer words.
For the sequential model, we use a preference ordering of symbolic, statistical
and hybrid in these experiments because of the way in which Why2-Atlas was
originally designed and our expectations for which approach should produce the
highest quality result at this point in the development of the knowledge sources.
We also created some partial sequential models as well to look at whether the
more expensive understanding approaches add anything significant at this point
in their development.
The other competition model requests an analysis from all of the under-
standing approaches and then uses the filtering heuristic along with a ranking
heuristic (as described below) to select the best analysis. If all of the analyses
for either competition model fail to meet the selection heuristics then the sen-
tence is regarded as uninterpretable. The run-time difference between the two
competition models are nearly equivalent if each understanding approach in the
second model is run in parallel using a distributed multi-agent architecture such
as OAA [25].
The ranking heuristic again focuses on the weaknesses of all the approaches.
It computes a score for each representation by first finding the number of words
in the intersection of the constants in the representation and the word stems
in the sentence (justified), the number of word stems in the sentence that are
constants in the representation language that do not appear in the representation
(undergenerated) and the number of constants in the representation that are
not word stems in the sentence (overgenerated). It then selects the one with
the highest score, where the score is; justified – 2 undergenerated – .5
over generated. The weightings reflect both the importance and approximate
nature of the terms.
The main difference between the two models is that the ranking approach
will choose the better representation (as estimated by the heuristics) as opposed
to one that merely suffices.
Combining Competing Language Understanding Approaches 355
The bottom part of Table 2, shows the results of combining the NL ap-
proaches. The satisficing model that includes all three NL mapping approaches
performs better than the individual models in that it modestly improves recall
but at the sacrifice of a higher false alarm rate. The satisficing model checks
each representation in order 1) symbolic 2) statistical 3) hybrid, and stops with
the first representation that is acceptable according to the filtering heuristic. We
also see that both of the satisficing models that include just two understanding
approaches perform better than the model in which all approaches are com-
bined; with the symbolic + statistical model being the best since it increases
recall without further increasing the false alarm rate. Finally, we see that the
model, which selects the best representation from all three approaches, provides
the most balanced results of the combined or individual approaches. It provides
the largest increase in recall and the false alarm rate is still modest compared
to the baseline of tutoring all possible topics. To make a final selection of which
combined approach one should use, there needs to be an estimate of which errors
will have a larger negative impact on student learning. But clearly, selecting a
combined approach will be better than selecting a single NL mapping approach.
References
1. VanLehn, K., Jordan, P., Rosé, C., Bhembe, D., Böttner, M., Gaydos, A.,
Makatchev, M., Pappuswamy, U., Ringenberg, M., Roque, A., Siler, S., Srivas-
tava, R.: The architecture of Why2-Atlas: A coach for qualitative physics essay
writing. In: Proceedings of Intelligent Tutoring Systems Conference. Volume 2363
of LNCS., Springer (2002) 158–167
2. Aleven, V., Popescu, O., Koedinger, K.: Pilot-testing a tutorial dialogue system
that supports self-explanation. In: Proceedings of Intelligent Tutoring Systems
Conference. Volume 2363 of LNCS., Springer (2002) 344
3. Zinn, C., Moore, J.D., Core, M.G.: A 3-tier planning architecture for managing
tutorial dialogue. In: Proceedings of Intelligent Tutoring Systems Conference (ITS
2002). (2002) 574–584
4. Makatchev, M., Jordan, P., VanLehn, K.: Abductive theorem proving for analyzing
student explanations and guiding feedback in intelligent tutoring systems. Journal
of Automated Reasoning: Special Issue on Automated Reasoning and Theorem
Proving in Education (2004) to appear.
5. Aleven, V., Popescu, O., Koedinger, K.R.: A tutorial dialogue system with
knowledge-based understanding and classification of student explanations. In:
Working Notes of 2nd IJCAI Workshop on Knowledge and Reasoning in Prac-
tical Dialogue Systems. (2001)
6. Sandholm, T.W.: Distributed rational decision making. In Weiss, G., ed.: Multia-
gent Systems: A Modern Approach to Distributed Artificial Intelligence. The MIT
Press, Cambridge, MA, USA (1999) 201–258
7. Ploetzner, R., VanLehn, K.: The acquisition of qualitative physics knowledge dur-
ing textbook-based physics training. Cognition and Instruction 15 (1997) 169–205
8. Walther, C.: A many-sorted calculus based on resolution and paramodulation.
Morgan Kaufmann, Los Altos, California (1987)
Combining Competing Language Understanding Approaches 357
9. Rosé, C.P.: A framework for robust semantic interpretation. In: Proceedings of the
First Meeting of the North American Chapter of the Association for Computational
Linguistics. (2000) 311–318
10. McCallum, A., Nigam, K.: A comparison of event models for naive bayes text
classification. In: Proceeding of AAAI/ICML-98 Workshop on Learning for Text
Categorization, AAAI Press (1998)
11. Jordan, P.W.: A machine learning approach for mapping natural language to a
domain representation language. in preparation (2004)
12. Abney, S.: Partial parsing via finite-state cascades. Journal of Natural Language
Engineering 2 (1996) 337–344
13. Lin, D.: Dependency-based evaluation of MINIPAR. In: Workshop on the Evalu-
ation of Parsing Systems, Granada, Spain (1998)
14. Levin, B., Pinker, S., eds.: Lexical and Conceptual Semantics. Blackwell Publishers,
Oxford (1992)
15. Jackendoff, R.: Semantics and Cognition. Current Studies in Linguistics Series.
The MIT Press (1983)
16. Rosé, C., Gaydos, A., Hall, B., Roque, A., VanLehn, K.: Overcoming the knowledge
engineering bottleneck for understanding student language input. In: Proceedings
of of AI in Education 2003 Conference. (2003)
17. Dzikovska, M., Swift, M., Allen, J.: Customizing meaning: building domain-specific
semantic representations from a generic lexicon. In Bunt, H., Muskens, R., eds.:
Computing Meaning. Volume 3. Academic Publishers (2004)
18. Rosé, C., Roque, A., Bhembe, D., VanLehn, K.: A hybrid text classification ap-
proach for analysis of student essays. In: Proceedings of HLT/NAACL 03 Workshop
on Building Educational Applications Using Natural Language Processing. (2003)
19. Lin, D., Pantel, P.: Discovery of inference rules for question answering. Journal of
Natural Language Engineering Fall-Winter (2001)
20. Strube, M., Rapp, S., Müller, C.: The influence of minimum edit distance on
reference resolution. In: Proceedings of Empirical Methods in Natural Language
Processing Conference. (2002)
21. Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolu-
tion. In: Proceedings of Association for Computational Linguistics 2002. (2002)
22. Flach, P.: The geometry of ROC space: Understanding machine learning metrics
through ROC isometrics. In: Proceedings of 20th International Conference on
Machine Learning. (2003)
23. MacMillan, N., Creelman, C.: Detection Theory: A User’s Guide. Cambridge
University Press, Cambridge, UK (1991)
24. Franklin, S., Graesser, A.: Is it an agent, or just a program?: A taxonomy for
autonomous agents. In: Proceedings of the Third International Workshop on Agent
Theories, Architectures, and Languages, Springer-Verlag (1996)
25. Cheyer, A., Martin, D.: The open agent architecture. Journal of Autonomous
Agents and Multi-Agent Systems 4 (2001) 143–148
26. Jokinen, K., Kerminen, A., Kaipainen, M., Jauhiainen, T., Wilcock, G., Turunen,
M., Hakulinen, J., Kuusisto, J., Lagus, K.: Adaptive dialogue systems - interaction
with interact. In: Proceedings of the 3rd SIGdial Workshop on Discourse and
Dialogue. (2002)
Evaluating Dialogue Schemata with the
Wizard of Oz Computer-Assisted Algebra Tutor
Abstract. The Wooz tutor of the North Carolina A&T algebra tutorial dialogue
project is a computer program that mediates keyboard-to-keyboard tutoring of
algebra problems, with the feature that it can suggest to the tutor canned struc-
tures of tutoring goals and canned sentences to insert into the tutoring dialogue.
It is designed to facilitate and record a style of tutoring where the tutor and stu-
dent collaboratively construct an answer in the form of an equation, a style of-
ten attested in natural tutoring of algebra. The algebra tutoring dialogue project
collects and analyzes these dialogues with the aim of describing tutoring strate-
gies and language with enough rigor that they may be evaluated and incorpo-
rated in machine tutoring. By plugging our analyzed dialogues into the com-
puter-suggested tutoring component of the Wooz tutor we can evaluate the fit-
ness of our dialogue analysis.
1 Introduction
Tutorial dialogues are often structurally analyzed for purposes of constructing tutor-
ing systems and understanding the tutorial process. However there are not many ways
for validating the analysis of a dialogue, either for verifying that the analysis matches
the structure that a human would use, or for verifying that the analysis is efficacious.
In the algebra tutorial dialogue project at North Carolina A&T State University we
use a machine-assisted human tutor to evaluate our analysis of elementary college
algebra tutoring dialogues. The project has collected transcripts of human tutoring
using an interface that provides an enhanced chat-window environment for keyboard
to keyboard tutoring of algebra problems [1]. These transcripts of tutorial dialogue are
annotated based on the tutor’s intentions and language. From these annotations we
have created structured tutoring scenarios which we import into an enhanced com-
puter-mediated tutoring interface: the Wooz tutor. In subsequent tutoring sessions, the
tutor has the option of selecting language from the canned scenario, edited or ignored
as the tutor sees fit, for tutoring some of the problems. The resulting transcripts are
then analyzed to evaluate the fitness of our scenarios for tutoring, based on measures
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 358–367, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Evaluating Dialogue Schemata 359
such as pre- and post-test scores and the number of times that the tutor deviated from
the script.
The algebra tutorial dialogue project captures tutoring of high school and college
algebra problems with several goals in mind: 1) cataloging descriptions of tutoring
behavior from both tutor and student, using where possible enough rigor that they
might be useful for dialogue-based computerized tutoring, 2) evaluating the effective-
ness of various tutoring behaviors as they are originally observed, and 3) describing
these computer-mediated human educational dialogue interactions in general, as being
of use to the educational dialogue and cognitive psychology communities. The Wooz
tutor is a useful tool for partially evaluating our success in these endeavors.
The tutoring dialogues we captured consist of a tutor and a student working problems
collaboratively. The dialogue model is of a tutor and student conversing, with both the
problem statement and the equation being worked on being visible to both parties. We
analyze typed communication because first, this is the mode most tractable for com-
puterization and second, we can capture all the communication between student and
tutor, there are no gaze, gesture, prosodic features, and so on to capture and annotate.
Thus the computer-supported tutoring environment affords the following:
1. The statement of the problem currently being worked on is always on display in a
dedicated window.
2. The equations being developed while solving the problem are displayed in a dedi-
cated window, there is a tool bar for equation editing.
3. Typed tutorial dialogue appears, interleaved, in a chat-window.
Additionally there is some status information, e.g. which party has the current turn,
and the tutor has some special controls, such as a menu of problem statements to pick
from. One feature of this software environment is that the equation editor toolbar is
customized for each problem, so extraneous controls not needed for solving the prob-
lem under discussion are not displayed.
A phenomenon annotated in other transcripts of algebra tutoring is deixis [2, 3], in
particular pointing at equations or parts of equations. Although our interface has the
capability to display and edit several equations at the same time in its equation area, it
has no good referring mechanism for the participants to use. So far, we have not no-
ticed this to be an issue in the dialogues we have collected.
Regarding our experience with the program, we have collected transcripts from
50+ students to date, each comprised of about one hour of tutoring, for a total of ap-
proximately 3000 turns and 300 problems. Students and tutors receive brief instruc-
tion before use, they have had little difficulty learning to use the application, includ-
ing constructing equations.
360 J.H. Kim and M. Glass
These problem-oriented tutoring dialogues are similar in form to those studied exten-
sively by the ITS community, e.g. [3, 4, 5], whose salient features were summarized
by [6]. An extract from a typical dialogue is illustrated in Figure 1.
Problems solved during these tutoring sessions include both symbolic manipulation
problems and word problems, viz:
1. Please factor
2. Bob drove “m” miles from Denver to Fargo. Normally this trip takes “n” hours, but
on Tuesday there was good weather and he saved 2 hours. Write an equation for
his driving speed “s”.
Students solve an average of between 5 and 6 problems in an hour session.
One feature of our tutoring data collection protocol is that the student’s perform-
ance on the pre-test determines which categories of problems will be tutored. The
tutor gives priority to problems similar to the ones the student answered incorrectly on
the pre-test, but did not leave totally blank. These are the areas where we judge that
the student is likely most ready to benefit from tutoring. The post-test then covers
only the problem areas that were tutored, so that any learning gains we measure are
specifically measuring learning for the particular tutoring that occurred. For data
analysis purposes the students are coded with an achievement level, on a scale of 1
(lowest) to 5. The achievement judgment is derived from the teacher of the student’s
algebra class, based on previous academic performance in the class.
The NC A&T dialogue project has accumulated 51 one-hour transcripts in this
way. The students are all recruited from the first year basic algebra classes. About 24
of the transcripts were taught by an expert tutor, a professor of mathematics with
extensive experience tutoring algebra, 16 are divided approximately evenly between
experienced tutors, two people with extensive experience but no formal mathematics
education background, and 11 were taught by a novice tutor, an upper-level mathe-
matics student.
Students exhibit a learning gain of 0.35 across all tutoring sessions, calculated as:
(posttest – pretest) / (1 – pretest)
where the test scores range from 0.0 to 1.0. The expert tutor’s sessions exhibit a
learning gain of 0.41, the experienced tutors’ learning gain is 0.33, and the novice
tutor’s learning gain is 0.24. These data show that the dialogues do, in fact, record
learning events. Furthermore it also indicates that even though novice tutors can be
successful, additional tutoring experience seems to improve tutoring outcomes.
Figure 1 shows an extract from a relatively short dialogue where the student solved
one multiplication problem. (In printed transcripts, the evolving equation in the equa-
tion window is interpolated into the dialogue every time the equation changes.) Even
though the student performed perfectly in solving the problem, it illustrates the most
prominent tutoring strategy used by our tutors: ensuring that the student can state the
type of problem (multiplying polynomials in this case) and a technique to solve it (a
mnemonic device in this case) before proceeding with a solution. Rarely do the tutors
skip these steps. This tactic can also be seen in the transcripts of [2]. This tactic alone
Evaluating Dialogue Schemata 361
is often enough to get the student to solve the problem, as illustrated, even when the
student failed to solve similar problems on the pre-test. Getting the student to explic-
itly state the problem and method is consistent with the view that learning mathemat-
ics often invokes metacognitive processes [7].
explain those schemata. In consequence, many of our schemata are quite problem-
specific. The fact that this assemblage of goals and schemata is imputed from text by
the researchers, and not derived in a principled way, makes evaluating them more
important.
The Atlas-Andes tutor [13] guides the student through problem-solving tasks
where the main tutorial mode consists of model tracing guided by physics reasoning.
Our markup would be unable to capture and our Wooz tutor would be unable to
evaluate such dialogues. However Atlas-Andes also includes, as an adjunct method of
tutoring, dialogue schemata similar to our own called Knowledge Construction Dia-
logues. These dialogues would seem to be amenable to Wooz tutor evaluation.
A reason this style of analysis is possible is that our tutors do not teach much alge-
braic reasoning. Instead they emphasize applying problem-solving methods previ-
ously learned in class, along with teaching the metacognitive skills to know how to
apply these methods.
Figure 2 shows the evolving trace of tutorial goals from one of our typical dia-
logues, as affected by student errors and retries. The three prominent goals discussed
above are labeled identify-operation, identify-approach and solve-problem in this an-
notation scheme.
We abstract general schemata from many instances of tutoring such as Figure 2.
The quite general-purpose schema of identify-problem, identify-approach, and solve-
problem usually involves problem-specific sub-schemata. For example, to satisfy
solve-problem in the trinomial factoring domain, we have a schema of make-binomials
and confirm-factoring. If that fails, solve-problem might be satisfied by an alternate
Fig. 3. Extract From Sentences For Each Goal as Presented to the Wooz Tutor
The tutorial schemata are then evaluated by using them in tutorial dialogues with
students, via the Wooz Tutor1. Running in Wooz Tutor mode, the computer-mediated
communication software presents the human tutor with an additional menu of tutoring
goals and a set of associated sentences for each goal. The tutor can optionally select
and edit a sentence, then send it to the dialogue.
Note that since the Wooz tutor is a superset of our normal computer-mediated tu-
toring interface, it is possible to conduct tutoring dialogues where some of the prob-
lems are mechanically assisted and some are produced entirely from the human tutor.
Following the identification of schemata, we collect examples of language used for
each goal. The sets of goals and associated sentences are then collected together, one
set for each problem, illustrated in Figure 3. Some of the sentences are simple tem-
plates where the variable slots can be filled in with the student’s name or problem-
specific information. On the Wooz tutor interface, the goals hierarchy appears as an
1
Wooz comes from Wizard of Oz. The public face of the tutor, including its language and
goals, comes from the machine, while there is a human intelligence pulling the strings. The
name is a bit of a misnomer, as we do not try to fool the students.
364 J.H. Kim and M. Glass
expandable tree of nodes, where expanding a leaf node exposes the sentences that can
be picked. Mouse-over of a goal node shows the first sentence that can be used for
expressing that goal, enabling the tutor to peer inside the tree more readily. Figure 4
shows the Wooz tutor as the tutor sees it.
From the transcripts we can then evaluate how much of the dialogue came from
the canned sentences, edited sentences, or entirely new sentences. We can also tell
when the tutor left the goal script. This gives us an indication of the effectiveness and
completeness of our isolated tutoring schemata and language.
The intelligence for understanding and evaluating student input, and deciding when
and where to switch tutorial goals, still resides in the human tutor. The schemata we
isolate and test with this method do not specify all that is needed for mechanizing the
tutoring process with an ITS. However the tradeoff for leaving the decisions in the
hands of a human tutor is that the simple evaluation of schemata is quite cheap.
We have 6 tutoring sessions where the expert tutor utilized the Wooz structured sce-
nario for the trinomial factoring problem. Thus we have no estimates of statistical
significance. The other problems in the same tutoring session were tutored by normal
means. We have 15 examples of tutoring this problem without benefit of the struc-
tured scenario. The learning gains were 0.75 for the Wooz-assisted sessions and -0.14
Evaluating Dialogue Schemata 365
(a loss) for the non-assisted sessions. The Wooz-assisted tutoring sessions had only
lower achievement (levels 1 through 3) students, while the non-assisted sessions had a
more mixed population. Considering only the students at the lower achievement lev-
els gives a learning gain of 0.75 for Wooz and 0.0 for the unassisted tutors. Note also
that the Wooz-assisted gains compare favorably to the 0.35 gain over all problems in
all transcripts. These results point toward Wooz-assisted tutoring producing superior
learning gains, but the numbers are so small that we do not have statistical signifi-
cance.
Comparing the number of turns to tutor one problem (both tutor and student com-
bined) and clock time to tutor one problem for Wooz vs. non-Wooz for the same
problem, we see that Wooz is a trifle slower and less wordy in the achievement
matched group, and a much slower and a trifle more wordy overall. Table 1 shows
these results. We would have expected the Wooz assisted dialogue to be faster be-
cause of less typing, but this does not seem to be the case.
In the Wooz-assisted dialogues, the tutors almost always followed the suggested
tutorial goal schemata. This suggests that we have the goal structure correct. We have
not tried the computer-suggested goal structure and dialogue with novice tutors to see
whether it affects their tutoring.
Of the tutor turns in the Wooz-assisted dialogue, 70% were extracted from the da-
tabase of canned sentences with no change, 6% were edits of existing sentences, and
24% were new sentences. There is little difference between the edits and the new
sentences, it seems that once the tutor started editing a sentence she changed almost
the whole thing. The new and changed sentences almost always respond to specifics
of student utterances that did not appear in the attested transcripts used in building the
sentence database. Here is an example of a modified turn:
This phenomenon, the human tutor responding to specific variations in the student
responses, would seem to reduce the Wooz tutor’s evaluative probity. When a tutor
changes a sentence, we have no way to know whether the unchanged sentence would
have worked just as well. Nevertheless, with experience we should build up knowl-
edge of what rates of sentence modifications to reasonably expect. Forcing the tutor to
follow Wooz tutor’s suggestions would mean that discovering gaps in schemata
would become more difficult, making it less useful as an evaluative tool.
366 J.H. Kim and M. Glass
Wooz bears a familial similarity to the snapshot analysis technique for evaluating
intelligent tutoring systems, for example [14], whereby at various points in the tutorial
session the choices of experienced tutors are compared with the choices of the ma-
chine tutor. In an ITS project, Wooz could function as a cheap way to partially evalu-
ate the same schemata before they are incorporated into the machine tutor.
The Wooz tutor does not evaluate the completeness or the reliability of coding. It is
thus not a substitute for traditional evaluation measures such as inter-rater reliability.
But by evaluating whether schemata imputed from transcripts are complete and effi-
cacious it could provide an additional measure of evaluation to a dialogue annotation
project. In particular a high inter-rater reliability shows that the analysis is reproduci-
ble, not that it is useful. This technique can help fill that gap.
4 Conclusions
The technique of providing canned tutoring goals structure and sentences to the hu-
man tutor in keyboard-to-keyboard tutoring seems to work well for our purpose of
evaluating whether we have analyzed dialogue in a useful manner. We can evaluate
whether the tutoring language and goal structure are actually complete enough for real
dialogues and actually provide effective tutoring.
The input understanding and decision making structures that would be necessary
for building an ITS are not evaluated here. The positive result is that Wooz tutor
evaluation is cheap and easy, since you do not have to do all the work to commit to
working tutoring software. Furthermore you can evaluate only a few small dialogues
by mixing them in with ordinary un-assisted tutoring. Compared to techniques for
evaluating transcript annotation such as inter-rater reliability measurement, Wooz
tutoring provides the advantage that it tests the final transcript analysis in real dia-
logues.
We have no evidence, partly because of a small number of test cases and partly be-
cause we do not force the tutor to follow the machine’s suggestions, that the artificial
assist to the tutor speeds up the tutoring process or improves learning outcomes.
References
1. Patel, Niraj, Michael Glass, and Jung Hee Kim. 2003. “Data Collection Applications for
the NC A&T State University Algebra Tutoring Dialogue (Wooz Tutor) Project,” Four-
teenth Midwest Artificial Intelligence and Cognitive Science Conference (MAICS-2003),
Cincinnati, 2003.
2. Heffernan, Neil T. 2001. Intelligent Tutoring Systems are Forgotten the Tutor: Adding a
Cognitive Model of Human Tutors. Ph.D. diss,. Computer Science Department, School of
Computer Science, Carnegie Mellon University. Technical Report CMU-CS-01-127.
3. McArthur, David, Cathleen Stasz, and Mary Zmuidzinas. 1990. “Tutoring Techniques in
Algebra,” Cognition and Instruction, vol. 7, pp. 197-244.
4. Fox, Barbara. 1993. The Human Tutorial Dialogue Project, Lawrence Erlbaum Associates.
5. Graesser, Arthur C., Natalie K. Person, and Joseph P. Magliano. 1995. “Collaborative
Dialogue Patterns in Naturalistic One-to-One Tutoring,” Applied Cognitive Psychology,
vol. 9, pp. 495-522.
6. Person, Natalie and Arthur C. Graesser. 2003. “Fourteen Facts about Human Tutoring:
Food for Thought for ITS Developers.” In H.U. Hoppe, M.F. Verdejo, and J. Kay, Artifi-
cial Intelligence in Education (Eleventh International Conference, AIED-2003, Sidney,
Australia), IOS Press.
7. Carr, Martha and Barry Biddlecomb 1998. “Metacognition in Mathematics from a Con-
structivist Perspective.” In Hacker, Douglas, John Dunlosky, and Arthur C. Graesser,
Metacognition in Educational Theory and Practice, Mahwah, NJ: Lawrence Erlbaum,
pp. 69-91.
8. Kim, Jung Hee, Reva Freedman, Michael Glass, and Martha W. Evens. 2004. “Annotation
of Tutorial Goals for Natural Language Generation,” in preparation.
9. Freedman, Reva, Yujian Zhou, Michael Glass, Jung Hee Kim, and Martha W. Evens.
1998a. “Using Rule Induction to Assist in Rule Construction for a Natural-Language
Based Intelligent Tutoring System,” Twentieth Annual Conference of the Cognitive Sci-
ence Society, Madison, pp. 362-367.
10. Freedman, Reva, Yujian Zhou, Jung Hee Kim, Michael Glass, and Martha W. Evens.
1998b. “SGML-Based Markup as a Step toward Improving Knowledge Acquisition for
Text Generation,” AAAI 1998 Spring Symposium: Applying Machine Learning to Dis-
course Processing. Stanford: AAAI Press, pp. 114117.
11. Person, Natalie K., Arthur C. Graesser, Roger J. Kreuz, Victoria Pomeroy, and the Tutor-
ing Research Group. 2001. “Simulating Human Tutor Dialog Moves in AutoTutor,” Inter-
national Journal of Artificial Intelligence in Education, vol. 12, pp. 23-39.
12. Heffernan, Neil T. and Kenneth R. Koedinger, 2002. “An Intelligent Tutoring System
Incorporating a Model of an Experienced Human Tutor,” Intelligent Tutoring Systems,
Sixth International Conference, ITS-2002, Biarritz, Springer Verlag.
13. Rosé, Carolyn P., Pamela Jordan, Michael Ringenberg, Stephanie Siler, Kurt VanLehn,
and Anders Weinstein. 2001. “Interactive Conceptual Tutoring in Atlas-Andes.” In J.
Moore, C. L. Redfield, and W. L. Johnson, Artificial Intelligence in Education (Tenth In-
ternational Conference, AIED-2001, San Antonio) IOS Press, pp. 256-266.
14. Mostow, Jack, Cathy Huang, and Brian Tobin. 2001. “Pause the Video: Quick but Quan-
titative Expert Evaluation of Tutorial Choices in a Reading Tutor that Listens.” In J.
Moore, C. L. Redfield, and W. L. Johnson, Artificial Intelligence in Education (Tenth In-
ternational Conference, AIED-2001, San Antonio) IOS Press, pp. 243-253.
Spoken Versus Typed Human and Computer
Dialogue Tutoring
1 Introduction
It is widely believed that the best human tutors are more effective than the best
computer tutors, in part because [1] found that human tutors could produce a
larger difference in the learning gains than current computer tutors (e.g., [2,3,4]).
A major difference between human and computer tutors is that human tutors use
face-to face spoken natural language dialogue, whereas computer tutors typically
use menu-based interactions or typed natural language dialogue. This raises the
question of whether making the interaction more natural, such as by changing
the modality of the tutoring to spoken natural language dialogue, would decrease
the advantage of human tutoring over computer tutoring.
Three main benefits of spoken tutorial dialogue with respect to increasing
learning have been hypothesized. One is that spoken dialogue may elicit more
student engagement and knowledge construction. [5] found that students who
were prompted for self-explanations produced more when the self-explanations
were spoken rather than typed. Self-explanation is just one form of student
cognitive activity that is known to cause learning gains [6,7,8]. If it can be
increased by using speech, perhaps other beneficial thinking can also be elicited
as well.
A second hypothesis is that speech allows tutors to infer a more accurate
student model, including long-term factors such as overall competence and mo-
tivation, and short-term factors such as whether the student really understood
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 368–379, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Spoken Versus Typed Human and Computer Dialogue Tutoring 369
Next, students read a short textbook-like pamphlet, which described the major
laws (eg., Newton’s first law) and the major concepts. Students then worked
through a set of up to 10 training problems with the tutor. Finally, students
were given a posttest that was isomorphic to the pretest; both consisted of 40
multiple choice questions. The entire experiment took no more than 9 hours per
student, and was usually performed in 1-3 sessions. Subjects were University
students responding to ads, and were compensated with money or course credit.
The interface used for all experiments was basically the same. The student
first typed an essay answering a qualitative physics problem. The tutor then
engaged the student in a natural language dialogue to provide feedback, correct
misconceptions, and to elicit more complete explanations. At key points in the
dialogue, the tutor asked the student to revise the essay. This cycle of instruction
and revision continued until the tutor was satisfied with the student’s essay, at
which point the tutor presented the ideal essay answer to the student.
For the studies described below, we compare characteristics of student dia-
logues with both typed and spoken computer tutors (Why2-Atlas and ITSPOKE,
respectively), as well as with a single human tutor performing the same task as
the computer tutor for each system. Why2-Atlas is a text-based intelligent tutor-
ing dialogue system [16], developed in part to test whether deep approaches to
natural language processing (e.g., sentence-level syntactic and semantic analysis,
discourse and domain level processing, and finite-state dialogue management)
elicit more learning than shallower approaches. ITSPOKE (Intelligent Tutor-
ing SPOKEn dialogue system) [9] is a speech-enabled version of Why2-ATLAS.
Student speech is digitized from microphone input and sent to the Sphinx2 rec-
ognizer. The most probable “transcription” output by Sphinx2 is sent to the
Why2-Atlas natural language processing “back-end”. Finally, the text response
produced by Why2-Atlas is sent to the Cepstral text-to-speech system.
strict turn-taking was enforced, while in the spoken condition interruptions and
overlapping speech were permitted. This was because we plan to add “bargein” to
ITSPOKE, which will enable students to interrupt ITSPOKE. Sample dialogue
excerpts from both conditions are displayed in Figure 1.
Pre and posttest items were scored as right or wrong, with no partial credit.
Students who were not able to complete all 10 problems due to lack of time took
the posttest after only working through a subset of the training problems.
Experiment 1 resulted in two human tutoring corpora. The typed dialogue
corpus consists of 171 physics problems with 20 students, while the spoken di-
alogue corpus consists of 128 physics problems with 14 students. In subsequent
analyses, a “dialogue” refers to the transcript of one student’s discussion of one
problem with the tutor.
3.2 Results
Table 1 presents the means and standard deviations for two types of analyses,
learning and training time, across conditions. The pretest scores were not reliably
different across the two conditions, F(33) = 1.574, p = 0.219, MSe = 0.009. In
372 D.J. Litman et al.
an ANOVA with condition by test phase factorial design, there was a robust
main effect for test phase, F(67) = 90.589, p = 0.000, MSe = 0.012, indicating
that students in both conditions learned a significant amount during tutoring.
However, the main effect for condition was not reliable, F(33) = 1.823, p = 0.186,
MSe = 0.014, and there was no reliable interaction. In an ANCOVA, the adjusted
posttest scores show a strong trend of being reliably different, F(1,33)=4.044,
p=0.053, MSe = 0.01173. Our results thus suggest that the human speech tutored
students learned more than the human text tutored students; the effect size is
0.74. With respect to training time, students in the spoken condition completed
their dialogue tutoring in less than half the time than in the typed condition,
where dialogue time was measured as the sum over the training problems of the
number of minutes between the time that the student was shown the problem
text and the time that the student was shown the ideal essay. The extra time
needed for both the tutor and the student to type (rather than speak) each
dialogue turn in the typed condition was a major contributor to this difference.
An ANOVA shows that the difference in means across the two conditions was
reliably different, with F(33) = 35.821, p = 0.00, MSe = 15958.787. For human
tutoring, our results thus support our hypothesis that spoken tutoring is indeed
more effective than typed tutoring, for both learning and training time.
It is important to understand why the change in modality (and interruption
policy) increased learning. Table 2 presents the means for a variety of measures
characterizing different aspects of dialogue, to determine which aspects differ
across conditions, and to examine whether different dialogue characteristics cor-
relate with learning across conditions (although the utility of correlation analysis
might be limited by our small subject pool). For each dependent measure (ex-
plained below), the second through fourth columns present the means (across
students) for the spoken and typed conditions, along with the statistical signifi-
cance of their differences. The fifth through eighth columns present a Pearson’s
correlation between each dialogue measure and raw posttest score. However, in
the spoken condition, the pre and posttest scores are highly correlated (R=.72,
p =.008); in the typed condition they are not (R=.29, p=.21). Because of the
spoken correlation, the last four columns show the correlation between posttest
and the dependent measure, after the correlation with pretest is regressed out.
The measures in Table 2 were motivated by previous work suggesting that
learning correlates with increased student language production. In pilot studies
of the typed corpus, average student turn length was found to correlate with
learning. We thus computed the average length of student turns in words (Ave.
Spoken Versus Typed Human and Computer Dialogue Tutoring 373
Stud. Wds/Turn), as well as the total number of words and turns per student,
summed across all training dialogues (Tot. Stud. Words, Tot. Stud. Turns). We
also computed these figures for the tutor’s contributions (Ave. Tut. Wds/Turn,
Tot. Tut. Words, Tot. Tut. Turns). The slope and intercept measures will be
explained below. Similarly, the studies of [17] examined student language pro-
duction relative to tutor language production, and found that the percentage of
words and utterances produced by the student positively correlated with learn-
ing. This led us to compute the number of students words divided by the number
of tutor words (S-T Tot. Wds Ratio), and a similar ratio of student words per
turn to tutor words per turn (S-T Wd/Trn Ratio).
Table 2 shows interesting differences between the spoken and typed corpora
of human-human dialogues. For every measure examined, the means across con-
ditions are significantly different, verifying that the style of interactions is indeed
quite different. In spoken tutoring, both student and tutor take more turns on
average than in typed tutoring, but these spoken turns are on average shorter.
Moreover, in spoken tutoring both student and tutor on average use more words
to communicate than in typed tutoring. However, in typed tutoring, the ratio of
student to tutor language production is higher than in speech.
The remaining columns attempt to uncover which aspects of tutorial dialogue
in each condition were responsible for its effectiveness. Although the zero order
correlations are presented for completeness, our discussion will focus only on the
last four columns, which we feel present the more valid analysis.
In the typed condition, as in its earlier pilot study, there is a positive correla-
tion between average length of student turns in words and learning (R=.515, p =
.03). We hypothesize that longer student answers to tutor questions reveal more
of a student’s reasoning, and that if the tutor is adapting his interaction to the
student’s revealed knowledge state, the effectiveness of the tutor’s instruction
might increase as average student turn length increases. Note that there is no
correlation between total student words and learning; we hypothesize that how
374 D.J. Litman et al.
much a student explains (as estimated by turn length) is more important than
how many questions a student answers (as estimated by total word production).
There is also a positive correlation between average length of tutor turn and
learning (R=.536, p=.02). Perhaps more tutor words per turn means that the
tutor is explaining more or giving more useful feedback. A deeper coding of our
data would be needed to test all of these hypotheses. Finally, as in the typed
pilot study [18], student words per turn usually decreased gradually during the
sessions. In speech, turn length decreased from an average of 6.0 words/turn for
the first problem to 4.5 words/turn by the last problem. In text, turn length de-
creased from an average of 14.6 words for the first problem to 10.7 words by the
last problem. This led us to fit regression lines to each subject and compare the
intercepts and slopes to learning. These measures indicate roughly how verbose
a student was initially and how quickly the student became taciturn. Table 2
indicates a reliable correlation between intercept and learning (R=.593; p=.01)
for the typed condition, suggesting that inherently verbose students (or at least
those who initially typed more) learned more in typed human dialogue tutoring.
Since there were no significant correlations in the the spoken condition, we
have begun to examine other measures that might be more relevant in speech.
For example, the mean number of total syntactic questions per student is 35.29,
with a trend for a negative correlation with learning (R=-.500, p=.08). This
result suggests, that as with our text-based correlations, our current surface
level analyses will need to be enhanced with deeper codings before we can fully
interpret our results (e.g., by manually coding non-interrogative form questions,
and by distinguishing question types).
Experiment 2 compared typed and spoken tutoring using the Why2-Atlas and
ITSPOKE computer tutors, respectively. The experimental procedure was the
same as for Experiment 1, except that students worked through only 5 physics
problems, and the pretest was taken after the background reading (allowing us to
measure gains caused by the experimental manipulation, without confusing them
with gains caused by background reading). Strict turn-taking was now enforced
in both conditions as bargein had not yet been implemented in ITSPOKE.
While Why2-Atlas and ITSPOKE used the same web interface, during the
dialogue, Why2-Atlas students typed while ITSPOKE students spoke through
a head-mounted microphone. In addition, the Why2-Atlas dialogue history con-
tained what the student actually typed, while the ITSPOKE history contained
the potentially noisy output of ITSPOKE’s speech recognizer. The speech rec-
ognizer’s hypothesis for each student utterance, and the tutor utterances, were
not displayed until after the student or ITSPOKE had finished speaking.
Figure 2 contains excerpts from both Why2-Atlas and ITSPOKE dialogues.
Note that for ITSPOKE, the output of the automatic speech recognizer (the
Spoken Versus Typed Human and Computer Dialogue Tutoring 375
ASR annotations) sometimes differed from what the student actually said. Thus,
ITSPOKE dialogues contained rejection prompts (when ITSPOKE was not con-
fident of what it thought the student said, it asked the student to repeat, as in the
third ITSPOKE turn). On average, ITSPOKE produced 1.4 rejection prompts
per dialogue. ITSPOKE also misrecognized utterances; when ITSPOKE heard
something different than what the student said but was confident in its hypoth-
esis, it proceeded as if it heard correctly. While the ITSPOKE word error rate
was 31.2%, semantic analysis based on speech recognition versus perfect tran-
scription differed only 7.6% of the time. Semantic accuracy is more relevant for
dialogue evaluation, as it does not penalize for unimportant word errors.
Experiment 2 resulted in two computer tutoring corpora. The typed Why2-
Atlas dialogue corpus consists of 115 problems (dialogues) with 23 students,
while the ITSPOKE spoken corpus consists of 100 problems (dialogues) with 20
students.
4.2 Results
Table 3 presents the means and standard deviations for the learning and training
time measures previously examined in Experiment 1. The pre-test scores were
not reliably different across the two conditions, F(42) = 0.037, p= 0.848, MSe =
0.036. In an ANOVA with condition by test phase factorial design, there was a
376 D.J. Litman et al.
robust main effect for test phase, F(85) = 29.57, p = 0.000, MSe = 0.032, indicat-
ing that students learned during their tutoring. The main effect for condition was
not reliable, F(42)=0.029, p=0.866, MSe=0.029, and there was no reliable inter-
action. In an ANCOVA of the multiple-choice test data, the adjusted post-test
scores were not reliably different, F(1,42)=0.004, p=0.950, MSe=0.01806. Thus,
the Why-Atlas tutored students did not learn reliably more than the ITSPOKE
tutored students. With respect to training time, students in the spoken condition
took more time to complete their dialogue tutoring than in the typed condition.
In the spoken condition, extra utterances were needed to recover from speech
recognition errors; also, listening to tutor prompts often took more time than
reading them, and students sometimes needed to both listen to, then read, the
prompts. An ANOVA shows that this difference was reliable, with F(42)=9.411,
p=0.004, MSe=950.792. In sum, while adding speech to Why2-Atlas did not
yield the hoped for improvements in learning, the degradation in tutor under-
standing due to speech recognition (and potentially in student understanding
due to text-to-speech) also did not decrease student learning. A separate analy-
sis showed no correlation between word error or semantic degradation (discussed
in Section 4.1) with learning or training time.
Spoken Versus Typed Human and Computer Dialogue Tutoring 377
Table 4 presents the means for the measures used in Experiment 1 to char-
acterize dialogue, as well as for a new “Tot. Subdialogues per KCD” measure
for our computer tutors. A Knowledge Construction Dialogue (KCD) is a line
of questioning targeting a specific concept (such as Newton’s Third Law). When
students answer questions incorrectly, the KCDs correct them through a “sub-
dialogue” , which may involve more interactive questioning or simply a remedial
statement. Thus, subdialogues per KCD is the number of student responses
treated as wrong. We hypothesized that this measure would be higher in speech,
due the previously noted degradation in semantic accuracy.
Compared to Experiment 1, Table 4 shows that there are less differences be-
tween spoken and typed computer tutoring dialogues. The total words produced
by students, the average length of turns and initial verbosity, and the ratios
of student to tutor language production are no longer reliably different across
conditions. As hypothesized, Tot. Subdialogues per KCD is reliably different
(p=.01). Finally, the last four columns show a significant negative correlation
between Tot. Subdialogues per KCD and posttest score (after regressing out
pretest) in the typed condition. There is also a trend for a positive correlation
with total student words in the spoken condition, consistent with previous results
on learning and increased student language production.
The main results of our study are that changing the modality from text to
speech caused large differences in the learning gains, time and superficial di-
alogue characteristics of human tutoring, but for computer tutoring it made
less difference. Experiment 1 on human tutoring suggests that spoken dialogue
(allowing interruptions) is more effective than typed dialogue (prohibiting in-
terruptions), with mean adjusted posttest score increasing and training time
decreasing. We also find that typed and spoken dialogues are very different for
the surface measures examined, and for the typed condition we see a benefit for
longer turns (evidenced by correlations between learning and average and initial
student turn length and average tutor turn length). While we do not see these
results in speech, spoken utterances are typically shorter than written sentences
(and in our experiment, turn length was also impacted by interruption policy),
suggesting that other measures might be more relevant. However, we plan to in-
vestigate whether spoken phenomena such as disfluencies and grounding might
also explain the lack of correlation.
The results of Experiment 2 on computer tutoring are less conclusive. On
the negative side, we do not see any evidence that replacing typed dialogue in
Why2-Atlas with spoken dialogue in ITSPOKE improves student learning. How-
ever, on the positive side, we also do not see any evidence that the degradation
in understanding caused by speech recognition decreases learning. Furthermore,
compared to human tutoring, we see less difference between spoken and typed
computer dialogue interactions, at least for the dialogue aspects measured in our
experiments. One hypothesis is that simply adding a spoken “front-end”, with-
378 D.J. Litman et al.
out also modifying the tutorial dialogue system “back-end”, is not enough to
change how students interact with a computer tutor. Another hypothesis is that
the limitations of the particular natural language technologies used in Why2-
Atlas (or the expectations that the students had regarding such limitations)
are inhibiting the modality differences. Finally, if there were differences between
conditions, perhaps the shallow measures used in our experiments and/or our
small number of subjects prevented us from discovering them. In sum, while
the results of human tutoring suggest that spoken tutoring is a promising ap-
proach for enhancing learning, more exploration is required to determine how to
productively incorporate speech into computer tutoring systems.
By design, the modality change left the content of the computer dialogues
completely unchanged – the tutors said nearly the same words and asked nearly
the same questions, and the students gave their usual short responses. On the
other hand, the content of the human tutoring dialogues probably changed con-
siderably when the modality changed. This suggests that modality change makes
a difference in learning only if it also facilitates content change. We will investi-
gate this hypothesis in future work by coding for content and other deep features.
Finally, we had hypothesized that the spoken modality would encourage stu-
dents to become more engaged and to self-construct more knowledge. Although a
deeper coding of the dialogues would be necessary to test this hypothesis, we can
get a preliminary sense of its veracity by examining the total number of words
uttered. Student verbosity (and perhaps engagement and self-construction) did
not increase significantly in the spoken computer tutoring experiment. In the
human tutoring experiment, the number of student words did significantly in-
crease, which is consistent with the hypothesis and may explain why spoken
human tutoring was probably more effective than typed human tutoring. How-
ever, the number of tutor words also significantly increased, which suggests that
the human tutor may have “lectured” more in the spoken modality. Perhaps
these longer explanations contributed to the benefits of speaking compared to
the text, but it is equally conceivable that they reduced the amount of engage-
ment and knowledge construction, and thus limited the gains. This suggests that
although we considered how the modality might effect the student, we neglected
to consider how it might effect the tutor, and how that might impact the stu-
dents’ learning. Clearly, these issues deserve more research. Our goal is to use
such investigations to guide the development of future versions of Why2-Atlas
and ITSPOKE, by modifying the dialogue behaviors in each system to best
enhance the possibilities for increasing learning.
References
1. Blom, B.S.: The 2 Sigma problem: The search for methods of group instruction as
affective as one-to-one tutoring. Educational Researcher 13 (1984) 4–16
Spoken Versus Typed Human and Computer Dialogue Tutoring 379
2. Anderson, J.R., Corbett, A.T., Koedinger, K.R., Pelletier, R.: Cognitive tutors:
Lessons learned. The Journal of the Learning Sciences 4 (1995) 167–207
3. VanLehn, K., Lynch, C., Taylor, L., Weinstein, A., Shelby, R.H., Schulze, K.,
Treacy, D.J., Wintersgill, M.C.: Minimally invasive tutoring of complex physics
problem solving. In: Proc. Intelligent Tutoring Systems (ITS), 6th International
Conference. (2002) 367–376
4. Graesser, A.C., Wiemer-Hastings, K., Wiemer-Hastings, P., Kreuz, R.: Autotutor:
A simulation of a human tutor. Journal of Cognitive Systems Research 1 (1999)
5. Hausmann, R., Chi, M.: Can a computer interface support self-explaining? The
International Journal of Cognitive Technology 7 (2002)
6. Chi, M., Leeuw, N.D., Chiu, M., Lavancher, C.: Eliciting self-explanations improves
understanding. Cognitive Science 18 (1994) 439–477
7. Renkl, A.: Learning from worked-out examples: A study on individual differences.
Cognitive Science 21 (1997) 1–29
8. Chi, M.T.H., Siller, S.A., Jeong, H., Yamauchi, T., Hausmann, R.G.: Learning
from human tutoring. Cognitive Science (2001) 471–477
9. Litman, D.J., Forbes-Riley, K.: Predicting student emotions in computer-human
tutoring dialogues. In: Proc. Association Computational Linguistics (ACL). (2004)
10. Graesser, A.C., Moreno, K.N., Marineau, J.C., Adcock, A.B., Olney, A.M., Person,
N.K.: Autotutor improves deep learning of computer literacy: Is it the dialog or
the talking head? In: Proc. AI in Education. (2003)
11. Moreno, R., Mayer, R.E., Spires, H.A., Lester, J.C.: The case for social agency in
computer-based teaching: Do students learn more deeply when they interact with
animated pedagogical agents. Cognition and Instruction 19 (2001) 177–213
12. Schultz, K., Bratt, E.O., Clark, B., Peters, S., Pon-Barry, H., Treeratpituk, P.:
A scalable, reusable spoken conversational tutor: Scot. In: AIED Supplementary
Proceedings. (2003) 367–377
13. Michael, J., Rovick, A., Glass, M.S., Zhou, Y., Evens, M.: Learning from a com-
puter tutor with natural language capabilities. Interactive Learning Environments
(2003) 233–262
14. Zinn, C., Moore, J.D., Core, M.G.: A 3-tier planning architecture for managing
tutorial dialogue. In: Proceedings Intelligent Tutoring Systems, Sixth International
Conference (ITS 2002), Biarritz, France (2002) 574–584
15. Aleven, V., Popescu, O., Koedinger, K.R.: Pilot-testing a tutorial dialoque system
that supports self-explanation. In: Proc. Intelligent Tutoring Systems (ITS): 6th
International Conference. (2002) 344–354
16. VanLehn, K., Jordan, P., Rosé, C., Bhembe, D., Böttner, M., Gaydos, A.,
Makatchev, M., Pappuswamy, U., Ringenberg, M., Roque, A., Siler, S., Srivas-
tava, R., Wilson, R.: The architecture of Why2-Atlas: A coach for qualitative
physics essay writing. In: Proc. Intelligent Tutoring Systems (ITS), 6th Interna-
tional Conference. (2002)
17. Core, M.G., Moore, J.D., Zinn, C.: The role of initiative in tutorial dialogue.
In: Proc. 11th Conf. of European Chapter of the Association for Computational
Linguistics (EACL). (2003) 67–74
18. Rose, C.P., Bhembe, D., Siler, S., Srivastava, R., VanLehn, K.: The role of why
questions in effective human tutoring. In: Proc. AI in Education. (2003)
Linguistic Markers to Improve the Assessment of
Students in Mathematics: An Exploratory Study
1 Introduction
1
This research was partially funded by the “Programme Cognitique, école et sciences
cognitives, 2002-2004” from the French Ministry of Research and by the IUFM of Créteil.
Numerous colleagues from the IUFM of Créteil and teachers are acknowledged for testing
Pépite in their classes.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 380–389, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Linguistic Markers to Improve the Assessment of Students in Mathematics 381
Therefore, in order to have a full diagnosis, the system needs the teacher’s assessment
for answers expressed in “mathural” language such as in Figure 1. By “mathural”, we
mean a language created by students that combines mathematical language and
natural language. The formulations produced by students in this language are often
incorrect or not completely correct from a mathematical point of view. But we assume
that they demonstrate an early level of comprehension of mathematical notions.
Table 1 shows the example of what the educational researchers in our team
diagnosed in students’ justifications [3, 8]. The diagnosis is based on a classification
of justifications like in other research work [1, 10, 13]. Pépite implements this
analysis and first diagnoses whether the justification is algebraic, numerical or
expressed in mathural language. Then it assesses whether numerical or algebraic
answers are correct. For “mathural” answers it only detects automatically that
students rely on “school authority” by using markers like “il faut” (it is necessary),
“on doit” (you have to), “on ne peut pas” (it is not allowed). In other words, for these
students mathematics consist in respecting formal rules without having to understand
them.
Workshop and classrooms experiments with teachers showed that, except in very
special occasions, they need a fully automated diagnosis to get a quick and accurate
overview of the student’s competencies [6]. Thus, one of our research targets is to
enhance the diagnosis software by analyzing answers expressed in “mathural”
language in a more efficient way. We also noticed that our first classification (Cf.
Table 1) was too specific to a high school level and that teachers were more tolerant
than Pépite toward mathural justifications. For instance for the following answer “the
product of two identical numbers with different exponents is this same number but
with both exponents added, thus a to the power 2+3”, Pépite does not consider it as an
algebraic proof when human assessors do.
382 S. Normand-Assadi et al.
We assumed that a linguistic study of our corpus might give important insights to
improve the classification as a first step to automatically analyze the quality of the
justifications in mathural language. Our preliminary study aimed to point out how
linguistic structures used by students could be connected with their algebra thinking.
Hence we adopted a dual point of view: linguistic and didactical. The study was made
up of five steps: (1) an empirical analysis from a purely linguistic point of view in
order to provide new ideas ; (2) a categorization of justifications by cross fertilizing
the first and second authors’ linguistic and didactical points of view; (3) a review of
this categorization in a workshop with teachers, educational researchers,
psychological ergonomists, Pépite designers and a linguist (the first author); (4) a
final categorization refined by the four authors and presented here; (5) a validation of
the categorization set up by the Pépite team. In the following sections we present our
methodology, the final categorization and the data analysis. The paper ends with a
discussion of the results and with perspectives: first to confirm these early results with
other data and then to use these results to build systems that understand some open
answers uttered in “mathural” language in a more efficient way.
2 Methodology
3 Data Analysis
3.1 Question 1:
From a mathematical point of view, this equality has three main features. First, this
equality is true. Second, it is very similar to an algebraic rule that is
found in every textbook and teacher’ courses as part of the curriculum. Third, the both
members of the equality can be developed
For this questions we determined five categories..
to the second member: exponents 3, 2 and 5. They overlook a, the stable component.
So we classified these justifications as situated on the contextual level. Specific
linguistic forms are used such as: « lorsque » (when), « quand » (when) « dans » (in).
For instance: (i) « Dans les multiplication a puissances, on additionne les exposants »
(in multiplications with powers, exponents are added), (ii) « quand on multiplie des
nombres avec des puissances il faut additionner les puissances » (When numbers with
powers are multiplied, it is necessary to add up the powers).
3.2 Question
2 In that, it is different of other false equalities: as which is similar to the form of the
386 S. Normand-Assadi et al.
CP, descriptive mode, contextual level: 5 students, 3 from Group 1, 2 from Group 2
In this category, the connection with the second member has become implicit: only
one member of the equality is considered. Students describe some algebraic
expressions equivalent to this member and introduced their justification by « c’est»
« ça fait» (it is, that results in). Their discourse is descriptive and the level contextual.
For example: (i) « ça fait a×a. » (it results in a×a), (ii) « c’est « a+a » qui est égal à
2a. » (it is « a+a » who is equal to 2a).
3.3 Question
The given equality is false. Like the previous equality, it is not similar to
any classical rules given in algebra courses. Each member can be developed
The right part of the equality contains parentheses:
mathematics teachers often underline the role of parentheses in numerical and
algebraic calculus. For this question we have obtained five categories.
Linguistic Markers to Improve the Assessment of Students in Mathematics 387
(i) « car on multiplie de gauche à droite » (because we multiply from left to right), (ii)
« car les deux résultats sont égaux. » (because both results are equal).
This study is exploratory but offers some significant results and promising
perspectives. We a priori hypothesized links between the discursive modes and the
level of development in students’ algebra thinking. This empirical study allowed us to
define a classification of the students’ answers based on these links. Applying it
systematically to our data did not invalidate our a priori hypothesis. So this study
takes an important step in our project to improve the automatic assessment of
students’ “mathural” answers.
Our first perspective is to validate this hypothesized correlation in the two
following ways. First it remains to be confirmed by systematically triangulating
performance (correctness), level in algebra thinking (classification with linguistic
markers) and students’ profile (built by PepiTest with the whole test), this for every
single student in the corpus we studied here. We began to testing our categorization,
on some students. We compared their level of development in algebra thinking (as
described in this paper by classifying their answer to this specific exercise) with their
cognitive profile established by Pépite (by analyzing their answers along the whole
test). We noticed that, even in group 1 (correct choices for the three questions), the
distinction between school authority/contextual/conceptual levels we derived from
linguistic markers is relevant from a cognitive point of view. As suggested by
Grugeon [8], students situated in school authority level have difficulties in other
exercises to interpret algebraic expressions and often invoke malrules when they
make algebraic calculations. Moreover, students adopting argumentative discourse at
a conceptual level obtain good results to the whole test. Concerning the contextual
category, the interpretation of data seems to be more complex. In particular we
hypothesize that the mathematical features of the equality may influence the discourse
mode and we will have to investigate that. Second, we will test our typology on other
corpora to assess its robustness. We have built a new set of questions based on the
same task (to validate or invalidate the equality of two algebraic expressions) but
modulating the variables pointed out in this study (true or false equality, features of
the expression. We expect to shade light on the nature of partial justifications and
contextual level.
Our second perspective is to study how using those linguistic patterns can improve
the diagnosis system of Pépite. The current diagnosis system assesses students’
choices. Then it distinguishes whether the justification is numerical, algebraic or
“mathural”. It can both analyze most algebraic or numerical expressions and detect
some modal auxiliaries to diagnose a “school authority” level. But so far it has been
unable to assess the correctness of justifications in “mathural” language. Once our
categorization is validated we will be able to implement a system that links linguistic
markers and a level in algebra thinking. The correctness of justification cannot be
always automatically derived but (i) an argumentative level is likely to be linked to a
correct justification, (ii) a contextual level to a correct or partial (iii) a legal level to a
partial or incorrect. Moreover, we will investigate whether the level assigned by this
study can be useful to implement an adaptive testing system.
Linguistic Markers to Improve the Assessment of Students in Mathematics 389
References
1. V. Aleven, K. Koedinger, O. Popescu, A Tutorial Dialog System to Support Self-
explanation: Evaluation and Open Questions, Artificial Intelligence in Education (2003)
39-46.
2. V. Aleven, O. Popescu, A. Ogan, K. Koedinger, A Formative Classroom Evaluation of a
Tutorial Dialog System that Supports Self-explanation: Workshop on Tutorial Dialog
Systems, Supplementary Proceedings of Artificial Intelligence in Education (2003) 303-
312.
3. M. Artigue, T. Assude, B. Grugeon, A. Lenfant, Teaching and Learning Algebra:
approaching complexity through complementary perspectives, Proceedings of 12 th ICMI
Study Conference, Melbourne, December 9-14 (2001) 21-32.
4. J.L. Austin, How to do the things with words. Cambridge, Cambridge University Press
(1962)
5. O. Ducrot, Le dire et le dit, Paris, Minuit (1984)
6. É. Delozanne, D. Prévit, B. Grugeon, P. Jacoboni, Supporting teachers when diagnosing
their students in algebra, Workshop Advanced Technologies for Mathematics Education,
Supplementary Proceedings of Artificial Intelligence in Education (2003) 461-470.
7. A. Graesser, K. Moreno, J. Marineau, A. Adcock, A. Olney, N. Person, Auto-Tutor
Improves Deep Learning of Computer Literacy: Is it the Dialog or the Talking Head ?
Artificial Intelligence in Education (2003) 47-54.
8. B. Grugeon, Etude des rapports institutionnels et des rapports personnels des élèves à
l’algèbre élémentaire dans la transition entre deux cycles d’enseignement : BEP et
Première G, thèse de doctorat, Université Paris VII (1995).
9. S. Jean, E. Delozanne, P. Jacoboni, B. Grugeon, A diagnostic based on a qualitative model
of competence in elementary algebra, Artificial Intelligence in Education (1999) 491-498
10. P. Jordan, S. Siler, Student Initiative and Questioning Strategies in Computer-Mediated
Human Tutoring Strategies, on Empirical Methods for Tutorial Dialog Systems,
International Conference on Intelligent Tutoring Systems, 2002.
11. M. M. Louwerse, H. H. Mitchell, Towards a Taxonomy of a Set of Discourse Markers in
Dialog: A Theoritical and Computational Linguistic Account, Discourse Processes, 35(3)
(2004) 199-239.
12. C. P. Rosé, A. Roque, D. Bhembe, K. VanLehn, 2003, A Hybrid Text Classification
Approach for Analysis of Student Essays, Proceedings of the HLT-NAACL 03 Workshop
on Educational Applications of NLP (2003).
13. C. P. Rosé, A. Roque, D. Bhembe, K. VanLehn, Overcoming the Knowledge Engineering
Bottleneck for Understanding Student Language Input, Artificial Intelligence in Education,
(2003) 315-322.
14. J. R. Searle, Speech Acts, An essay in the Philosophy of Language, Cambridge, CUP
(1969)
Advantages of Spoken Language Interaction
in Dialogue-Based Intelligent Tutoring Systems
1 Introduction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 390–400, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Advantages of Spoken Language Interaction 391
logue-based ITS to tailor its choice of tactics in the way that humans do, the student
utterances must be spoken rather than typed.
Intelligent tutoring systems that have little to no natural language interaction have
been deployed in public schools and have been shown to be more effective than class-
room instruction alone [19]. However, the effectiveness of both expert and novice
human tutors [3], [9] suggests that there is more room for improvement. Current
results from dialogue-based tutoring systems are promising [22], [24] and suggest that
dialogue-based tutoring systems may be more effective than tutoring systems with no
dialogue. However, most of these systems use either keyboard-to-keyboard interac-
tion or keyboard-to-speech interaction (where the student’s input is typed, but the
tutor’s output is spoken). This progression towards human-like use of natural lan-
guage suggests that tutoring systems with speech-to-speech interaction might be even
more effective. The current state of speech technology has allowed researchers to
build successful spoken dialogue systems in domains ranging from travel planning to
in-car route navigation [1]. There is reason to believe that spoken dialogue tutorial
systems can be just as successful.
Also, recent evidence suggests that spoken tutorial dialogues are more effective
than typed tutorial dialogues. A study of self-explanation (the process of explaining
solution steps in the student’s own words) has shown that spontaneous self-
explanation is more frequent in spoken rather than typed tutorial interactions [17]. In
addition, a comparison of spoken vs. typed human tutorial dialogues showed that the
spoken dialogues contained a higher proportion of student words to tutor words,
which has been shown to correlate with student learning [25].
There are many ways an ITS can benefit from spoken interaction. One idea cur-
rently being explored is that prosodic information from the speech signal can be used
to detect emotion, allowing developers to build more responsive tutoring systems
[21]. Another advantage is that speech allows the student to use their hands to gesture
while speaking (e.g. pointing to objects in the workspace). Finally, spoken input con-
tains meta-communicative information such as hedges, pauses, and disfluencies,
which can be used to make inferences about the student’s understanding. These fea-
tures of spoken language are all things that human tutors have access to when decid-
ing which tactics to use, and that are also available to intelligent tutoring systems with
spoken, multi-modal interfaces (although some are more feasible to detect than oth-
ers). In this paper, we describe how an ITS can take advantage of spoken interaction,
how we have begun to do this in SCoT, and the challenges we have faced.
Spoken dialogue contains many features that human tutors use to gauge student un-
derstanding and student affect. These features include:
hedges (e.g. “I guess I just thought that was right”)
disfluencies (e.g. “urn”, “uh”, “What-what is in this space?”)
prosodic features (e.g. intonation, pitch, energy)
temporal features (e.g. pauses, speech rate)
392 H. Pon-Barry et al.
3 Overview of SCoT
Our approach is based on the assumption that the activity of tutoring is a joint activ-
ity1 where the content of the dialogue (language and other communicative signals)
follows basic properties of conversation but is also driven by the activity at hand [8].
Following this hypothesis, SCoT’s architecture separates conversational intelligence
(e.g. turn management, construction of a structured dialogue history, use of discourse
markers) from the activity that the dialogue accomplishes (in this case, reflective
tutoring). SCoT is developed within the Conversational Intelligence Architecture
[20], a general purpose architecture which supports multi-modal, mixed-initiative
dialogue.
1 A joint activity is an activity where participants coordinate with one another to achieve both
public and private goals [8]. Moving a desk, playing a duet, and shaking hands are all exam-
ples of joint activities.
Advantages of Spoken Language Interaction 393
SCoT-DC, the current instantiation of our tutoring system, is applied to the domain
of shipboard damage control. Shipboard damage control refers to the task of contain-
ing the effects of fires, floods, and other critical events that can occur aboard Navy
vessels. Students carry out a reflective discussion with SCoT-DC after completing a
problem-solving session with DC-Train [5], a fast-paced, real-time, multimedia
training environment for damage control. The fact that problem-solving in damage
control occurs in real-time makes reflective tutorial dialogue more appropriate than
tutorial dialogue during problem-solving. Because the student is not performing
problem-solving steps during the dialogue, it is important for the tutor to get as much
information as possible from the student’s utterances. In other words, having access to
both the meaning of an utterance as well as the manner in which it was spoken will
help the tutor assess how well the student is understanding the material.
SCoT is composed of many separate components. The two most relevant for this
discussion are the dialogue manager and the tutor. They are described in sections 3.1
and 3.2. A more detailed system description is available in [7].
The dialogue manager handles aspects of conversational intelligence (e.g. turn man-
agement, construction of a structured dialogue history, use of discourse markers) in
order to separate purely linguistic aspects of the interaction from tutorial aspects. It
contains multiple dynamically updated components—the two main components are
the dialogue move tree, a structured history of dialogue moves, and the activity tree, a
hierarchical representation of the past, current, and planned activities initiated by
either the tutor or the student. For SCoT, each activity initiated by the tutor corre-
sponds to a tutorial goal; the decompositions of these goals are specified by activity
recipes contained in the recipe library (see section 3.2).
3.2 Tutor
The tutor component contains the tutorial knowledge necessary to plan and carry out
a flexible and coherent tutorial dialogue. The tutorial knowledge is divided between a
planning and execution system and a recipe library (see Figure 1).
The planning and execution system is responsible for selecting initial dialogue
plans, revising plans during the dialogue, classifying student utterances, and deciding
how to respond to the student. All of these tasks rely on external knowledge sources
such as the knowledge reasoner, the student model, and the dialogue move tree (col-
lectively referred to as the Information State). The planning and execution system
“executes” tutorial activities by placing them on the activity tree, where they get in-
terpreted and executed by the dialogue manager. By separating tutorial knowledge
from external knowledge sources, this architecture allows SCoT to lead a flexible
dialogue and to continually re-assess information from the Information State in order
to select the most appropriate tutorial tactic.
394 H. Pon-Barry et al.
The recipe library contains activity recipes that specify how to decompose a tuto-
rial activity into other activities and low-level actions. An activity recipe can be
thought of as a tutorial goal and a plan for how the tutor will achieve the goal. The
recipe library contains a large number of activity recipes for both low-level tactics
(e.g. responding to an incorrect answer) and high-level strategies (e.g. specifications
for initial dialogue plans). The recipes are written in a scripted language [15] allowing
for automatic translation of the recipes into system activities. An example activity
recipe will be shown in section 4.2.
Other components that the tutor makes use of are the knowledge reasoner and the
student model. The knowledge reasoner provides a domain-general interface to do-
main-specific information; it provides the tutor with procedural, causal, and motiva-
tional explanations of domain-specific actions. The student model uses a Bayesian
network to characterize the causal connections between pieces of target domain
knowledge and observable student actions. It can be dynamically updated both during
the problem solving session and during the dialogue.
to make observations and form hypotheses about how human tutors use these features
of spoken dialogue. Two such observations are described below.
One observation we have made is that if the student hedges a correct answer, the
tutor will frequently paraphrase what the student said. This seems plausible because
by paraphrasing, the tutor is grounding the conversation [8] while attempting to
eliminate the student’s uncertainty. An example of a hedged answer followed by
paraphrasing is shown in Figure 2 below.
Another observation we have made is that human tutors frequently refer back to past
dialogue following an incorrect student answer with hedges or mid-sentence pauses.
This seems plausible because referring back to past dialogue helps students integrate
new information with existing knowledge, and promotes reflection, which has been
shown to correlate with learning [6]. An example of an incorrect answer with mid-
sentence pauses followed by a reference to past dialogue is shown in Figure 3 (each
colon ‘:’ represents a 0.5 sec pause).
Fig. 3. Dialogue excerpt from Algebra corpus of spoken tutorial interaction [18]
The division of knowledge in the tutor component (between the recipe library and the
planning and execution system) allows us to independently evaluate hypotheses such
as the ones in section 4.1 (i.e. test whether their presence or absence affects the effec-
tiveness of SCoT). Each hypothesis is realized by a combination of activity recipes,
and the planning and execution system ensures that a coherent dialogue will be pro-
duced regardless of which activities are put on the activity tree.
An activity recipe corresponding to the tutorial goal discuss problem solving se-
quence is shown below. A recipe contains three primary sections: DefinableSlots,
MonitorSlots, and Body. The DefinableSlots specify what information is passed in to
396 H. Pon-Barry et al.
the recipe, the MonitorSlots specify which parts of the Information State are used in
determining how to execute the recipe, and Body specifies how to decompose the
activity into other activities or low-level actions. The recipe below decomposes the
activity of discussing a problem solving sequence into either three or four other ac-
tivities (depending on whether the problem has already been discussed). The tutor
places these activities on the activity tree, and the dialogue manager begins to execute
their respective recipes.
All activity recipes have this same structure. The modular nature of the recipes
helps us test our hypotheses by making it easy to alter the behavior of the tutor. Fur-
thermore, the tutorial recipes are not particular to the domain of damage control;
through our testing of various activity recipes we hope to get a better understanding
of domain-independent tutoring principles.
4.3 Multi-modality
Another way that SCoT takes advantage of the spoken interface is through multi-
modal interaction. Both the tutor and the student can interactively perform actions in
an area of the graphical user interface called the common workspace. In the current
version of SCoT-DC, the common workspace consists of a 3D representation of the
ship which allows either party to zoom in or out and to select (i.e. point to) compart-
ments, regions, and bulkheads (lateral walls of a ship). This is illustrated below in
Figure 4, where the common workspace is the large window in the upper left corner.
The tutor can contextualize the problems being discussed by highlighting com-
partments in specific colors (e.g. red for fire, gray for smoke) to indicate the type and
location of the crises. Because the dialogue in SCoT is spoken rather than typed, the
student also has the ability to coordinate his/her speech with gesture. This latter coor-
dination is an area we are currently working on, and we hope to soon support inter-
changes such the one in Figure 5 below, where both the tutor and student coordinate
their speech with actions in the common workspace.
Advantages of Spoken Language Interaction 397
Although using spoken language in an intelligent tutoring system can bring about
many of the benefits described above, it has also raised some challenges which ITS
developers should be aware of.
Student affect. Maintaining student motivation is a challenge for all intelligent
tutoring systems. We have observed issues relating to student affect, possibly stem-
ming from the spoken nature of the dialogue. For example, in a previous version of
SCoT, listeners remarked that repeated usage of phrases such as You made this mis-
take more than once and We discussed this same mistake earlier made the tutor seem
overly critical. Other (non-spoken) tutorial systems give similar types of feedback
(e.g. [11]), yet none have reported this sort feedback to cause such negative affect.
This suggests that users have different reactions when listening to, rather than read-
ing, the tutor’s output, and that further work is necessary to better understand this
difference.
Improving Speech Recognition. We are currently running an evaluation of SCoT,
and preliminary results show speech recognition accuracy to be fairly high (see sec-
tion 5). However, we have learned that small recognition errors can greatly reduce the
398 H. Pon-Barry et al.
In this paper, we argued that spoken language interaction is an integral part of human
tutorial dialogue and that information from spoken utterances is very useful in build-
ing dialogue-based intelligent tutors that understand and respond to students as effec-
tively as human tutors. We described the Spoken Conversational Tutor we have built,
and described how SCoT is beginning to take advantage of features of spoken lan-
guage. We do not yet understand exactly how human tutors make use of spoken lan-
guage features such as disfluencies and pauses, but we are building a tutorial frame-
work that allows us to test various hypotheses, and in time reach a better understand-
ing of how to take advantage of spoken language in intelligent tutoring systems.
We are currently evaluating the effectiveness of SCoT-DC (a version that does not
yet make use of meta-communicative information or include a student model) with
students at Stanford University. Preliminary quantitative results suggest that interact-
ing with SCoT improves student learning (measured by performance in DC-Train and
on a written test). Qualitatively, naïve users have found the system fairly easy to in-
teract with, and speech recognition has not been a significant problem—preliminary
Advantages of Spoken Language Interaction 399
References
1. Belvin, R., Burns, R., & Hein, C. (2001). Development of the HRL Route Navigation
Dialogue System. In Proceedings of the First International Conference on Human Lan-
guage Technology Research, Paper H01-1016
2. Bhatt, K. (2004). Classifying student hedges and affect in human tutoring sessions for the
CIRCSIM-Tutor intelligent tutoring system. Unpublished M.S. Thesis, Illinois Institute of
Technology.
3. Bloom, B.S. (1984). The 2 sigma problem: The search for methods of group instruction as
effective one-on-one tutoring. Educational Researcher, 13, 4-16.
4. Brennan, S. E., & Williams, M. (1995). The feeling of another’s knowing: Prosody and
filled pauses as cues to listeners about the metacognitive states of speakers. Journal of
Memory and Language, 34, 383-398.
5. Bulitko, V., & Wilkins., D. C. (1999). Automated instructor assistant for ship damage
control. In Proceedings of AAAI-99.
6. Chi, M.T.H., Siler, S., Jeong, H., Yamauchi, T., & Hausmann, R.G. (2001). Learning from
tutoring. Cognitive Science, 25:471-533.
7. Clark, B., Lemon, O., Gruenstein, A., Bratt, E., Fry, J., Peters, S., Pon-Barry, H., Schultz,
K., Thomsen-Gray, Z., & Treeratpituk, P. (In press). A General Purpose Architecture for
Intelligent Tutoring Systems. In Natural, Intelligent and Effective Interaction in Multimo-
dal Dialogue Systems. Edited by Niels Ole Bernsen, Laila Dybkjaer, and Jan van Kup-
pevelt. Dordrecht: Kluwer.
8. Clark, H.H. (1996). Using Language. Cambridge: University Press.
9. Cohen, P.A., Kulik, J.A., & Kulik, C.C. (1982). Educational outcomes of tutoring: A
meta-analysis of findings. American Educational Research Journal, 19, 237-248.
10. Transcripts of face-to-face and keyboard-to-keyboard tutorial dialogues, between physiol-
ogy professors and first-year students at Rush Medical College (received from M. Evens).
11. Evens, M., & Michael, J. (Unpublished manuscript). One-on-One Tutoring by Humans
and Machines. Computer Science Department, Illinois Institute of Technology.
12. Fox, B. (1993). Human Tutorial Dialogue. New Jersey: Lawrence Erlbaum.
400 H. Pon-Barry et al.
13. Graesser, A.C., Person, N.K., & Magliano J. P. (1995). Collaborative dialogue patterns in
naturalistic one-to-one tutoring sessions. Applied Cognitive Psychology, 9, 1-28.
14. Grasso, M.A., & Finin, T.W. (1997). Task Integration in Multimodal Speech Recognition
Environments. Crossroads, 3(3), 19-22.
15. Gruenstein, A. (2002). Conversational Interfaces: A Domain-Independent Architecture for
Task-Oriented Dialogues. Unpublished M.S. Thesis, Stanford University.
16. Hausmann, R. & Chi, M.T.H. (2002). Can a computer interface support self-explaining?
Cognitive Technology, 7(1), 4-15.
17. Hauptmann, A.G. & Rudnicky, A.I. (1988). Talking to Computers: An Empirical Investi-
gation. International Journal of Man-Machine Studies 28(6), 583-604
18. Heffernan, N. T. (2001). Intelligent Tutoring Systems have Forgotten the Tutor: Adding a
Cognitive Model of Human Tutors. Dissertation. Computer Science Department, School
of Computer Science, Carnegie Mellon University. Technical Report CMU-CS-01-127.
19. Koedinger, K. R., Anderson, J.R., Hadley, W.H., & Mark, M. A. (1997). Intelligent tu-
toring goes to school in the big city. International Journal of Artificial Intelligence in
Education, 8, 30-43.
20. Lemon, O., Gruenstein, A., & Peters, S. (2002). Collaborative activities and multitasking
in dialogue systems. In C. Gardent (Ed.), Traitement Automatique des Langues (TAL, spe-
cial issue on dialogue), 43(2), 131-154.
21. Litman, D., & Forbes, K. (2003). Recognizing Emotions from Student Speech in Tutoring
Dialogues. In Proc. of the IEEE Automatic Speech Recognition and Understanding Work-
shop (ASRU).
22. Person, N.K., Graesser, A.C., Bautista, L., Mathews, E., & the Tutoring Reasearch Group.
(2001). Evaluating student learning gains in two versions of AutoTutor. In J. D. Moore, C.
L. Redfield, & W. L. Johnson (Eds.) Proceedings of Artificial intelligence in education:
AI-ED in the wired and wireless future, 286-293.
23. Person, N.K., & Graesser, A.C. (2003). Fourteen facts about human tutoring: Food for
thought for ITS developers. In Proceedings of the AIED 2003 Workshop on Tutorial Dia-
logue Systems: With a View Towards the Classroom.
24. Rosé, C., Jordan, P., Ringenberg, M., Siler, S., VanLehn, K., & Weinstein, A. (2001).
Interactive Conceptual Tutoring in Atlas-Andes. In Proc. of AI in Education 2001.
25. Rosé, C.P., Litman, D., Bhembe, D., Forbes, K., Silliman, S., Srivastava, R., & VanLehn,
K. (2003). A Comparison on Tutor and Student Behavior in Speech Versus Text Based
Tutoring. In Proc. of the HLT-NAACL 03 Workshop on Educational Applications of NLP.
26. Smith, V. L., & Clark, H. H. (1993). On the course of answering questions. Journal of
Memory and Language, 32, 25-38.
27. Xu, W. & Rudnicky, A. (2000). Language modeling for dialog system. In Proceedings of
ICSLP 2000. Paper B1-06.
CycleTalk: Toward a Dialogue Agent That Guides
Design with an Articulate Simulator
Abstract. We discuss the motivation for a novel style of tutorial dialogue sys-
tem that emphasizes reflection in a design context. Our current research focuses
on the hypothesis that this type of dialogue will lead to better learning than
previous tutorial dialogue systems because (1) it motivates students to explain
more in order to justify their thinking, and (2) it supports students’ meta-
cognitive ability to ask themselves good questions about the design choices
they make. We present a preliminary cognitive task analysis of design explora-
tion tasks using CyclePad, an articulate thermodynamics simulator [10]. Using
this cognitive task analysis, we analyze data collected in two initial studies of
students using CyclePad, one in an unguided manner, and one in a Wizard of
Oz scenario. This analysis suggests ways in which tutorial dialogue can be used
to assist students in their exploration and encourage a fruitful learning orienta-
tion. Finally, we conclude with some system desiderata derived from our analy-
sis as well as plans for further exploration.
1 Introduction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 401–411, 2004.
© Springer-Verlag Berlin Heidelberg 2004
402 C.P. Rosé et al.
dialogue has greater impact on learning than the impact that has been demonstrated in
previous comparisons of tutorial dialogue to challenging alternative forms of instruc-
tion such as an otherwise equivalent targeted “mini-lesson” based approach (e.g.,
[12]) or a “2nd-generation” intelligent tutoring system with simple support for self-
explanation (e.g., [1]).
We are conducting our research in the domain of thermodynamics, using as a
foundation the CyclePad articulate simulator [10]. CyclePad offers students a rich,
exploratory learning environment in which they apply their theoretical thermody-
namics knowledge by constructing thermodynamic cycles, performing a wide range
of efficiency analyses. CyclePad has been in active use in a range of thermodynamics
courses at the Naval Academy and elsewhere since 1996 [18]. By carrying out the
calculations that students would otherwise have to do by more laborious means (e.g.,
by extrapolation from tables), CyclePad makes it possible for engineering students to
engage in design activities earlier in the curriculum than would otherwise be possible.
Qualitative evaluations of CyclePad have shown that students who use CyclePad have
a deeper understanding of thermodynamics equations and technical terms [4].
In spite of its very impressive capabilities, it is plausible that CyclePad could be
made even more effective. First, CyclePad supports an unguided approach to explo-
ration and design. While active learning and intense exploration have been shown to
be more effective for learning and transfer than more highly directed, procedural help
[7,8] pure exploratory learning has been hotly debated [3,13,14]. In particular, scien-
tific exploratory learning requires students to be able to effectively form and test
hypotheses. However, students experience many difficulties in these areas [13].
Guided exploratory learning, in which a teacher provides some amount of direction or
feedback, has been demonstrated to be more effective than pure exploratory learning
in a number of contexts [14].
Second, CyclePad is geared towards explaining its inferences to students, at the
student’s request. It is likely to be more fruitful if the students do more of the ex-
plaining themselves, assisted by the system. Some results in the literature show that
students learn better when producing explanations than when receiving them [20].
Thus, a second area where CyclePad might be improved is in giving students the
opportunity to develop their ability to think through their designs at a functional level
and then explain and justify their designs.
A third way in which CyclePad’s pedagogical approach may not be optimal is that
students typically do not make effective use of on-demand help facilities offered by
interactive learning environments (for a review of the relevant literature, see [2]).
That is, students using CyclePad may not necessarily seek out the information pro-
vided by the simulator, showing for example how the second law of thermodynamics
applies to the cycle that they have built, with a possibly detrimental effect on their
learning outcomes. Thus, students’ experience with CyclePad may be enhanced if
they were prompted at key points to reflect on how their conceptual knowledge re-
lates to their design activities.
We argue that engaging students in natural language discussions about the pros
and cons of their design choices as a highly interactive form of guided exploratory
learning is well suited to the purpose of science instruction. In the remainder of the
CycleTalk: Toward a Dialogue Agent That Guides Design 403
2 CycleTalk Curriculum
We have begun to collect data related to how CyclePad is used by students who have
previously taken or are currently taking a college-level thermodynamics course. The
goal of this effort is to begin to assess how tutorial dialogue can extend CyclePad’s
effectiveness and to refine our learning hypotheses in preparation for our first con-
trolled experiment. In particular we are exploring such questions as: (1) To what
extent are students making use of CyclePad’s on-demand help? (2) What exploratory
strategies are students using with CyclePad? Are these strategies successful or are
students floundering? Do students succeed in improving the efficiency of cycles? (3)
404 C.P. Rosé et al.
To what extent are student explorations of the design space correlated with their ob-
served conceptual understanding, as evidenced by their explanation behavior?
At present, we have two forms of data. We have collected the results of a take-
home assignment administered to mechanical engineering students at the US Naval
Academy, in which students were asked to improve the efficiency of a shipboard
version of a Rankine cycle. These results are in the form of written reports, as well as
log files of the student’s interactions with the software. In addition, we have directly
observed several Mechanical Engineering undergraduate students at Carnegie Mellon
University working with CyclePad on a problem involving a slightly simpler Rankine
cycle. These students were first given the opportunity to work in CyclePad independ-
ently. Then, in a Wizard of Oz scenario, they continued to work on the problem while
they were engaged in a conversation via text messaging software with a graduate
student in Mechanical Engineering from the same university. For these students we
have collected log files and screen movies of their interactions with CyclePad as well
as transcripts of their typed conversation with the human tutor.
We have constructed a preliminary cognitive task analysis (See Fig. 1) describing
how students might use CyclePad in the type of scenario they encountered during
these studies (i.e., to improve a simple Rankine cycle).
Creating the cycle and defining key parameters. When creating a thermodynamic
cycle according to the problem description, or modifying a given thermodynamic
cycle, students must select and connect components. Further, they must provide a
limited number of assumed parameter values to customize individual cycle compo-
nents and define the cycle state. CyclePad will compute as many additional parame-
ters as can be derived from those assumptions. When each parameter has a value,
either given or inferred, CyclePad calculates the cycle’s efficiency. In order to be
successful, students must carefully select and connect components and be able to
assume values in ways that acknowledge the relationships between the components.
Investigating Variable Dependencies. Once the cycle state has been fully defined
(i.e., the values of all parameters have been set or inferred), students can use Cy-
clePad’s sensitivity analysis tool to study the effect of possible modifications to these
values. With this tool, students can plot one variable’s effect on another variable.
CycleTalk: Toward a Dialogue Agent That Guides Design 405
These analyses may have implications for their redesign strategy. For example, when
a Rankine cycle has been fully defined, students can plot the effect of the pressure of
the output of the pump on the thermal efficiency of the cycle as a whole. The sensi-
tivity analysis will show that up to a certain point, increasing the pressure will in-
crease efficiency. The student can then adjust the pressure to its optimum level.
Comparing Multiple Cycle Improvements. Students can create their redesigned cy-
cles, and, once the cycle states are fully defined, students can compute the improved
cycle efficiency. Comparing cycle efficiencies of different redesigns lets students
explore the problem space and generate the highest efficiency possible. Suppose a
student began improving the efficiency of the Rankine cycle by including a regenera-
tive cycle. It would then be possible to create an alternative design which included a
reheat cycle (or several reheat cycles) and to compare the effects on efficiency before
combining them. By comparing alternatives, the student has the potential to gain a
deeper understanding of the design space and underlying thermodynamics principles
and is likely to produce a better redesign.
complexity of the redesigned cycles that students constructed (on average, the redes-
igned cycles had 50% more components than the cycle that the students started out
with) made it more difficult for students to identify the key parameters whose values
must be assumed. We have informally observed that our expert tutor is capable of
defining the state of even complicated cycles in CyclePad without much, if any, trial
and error. Perhaps he quickly sees a deep structure, as opposed to novice students
who may be struggling to maintain associations when the number of components
increases (see e.g., [5]). As we continue our data collection, we hope to investigate
how student understanding of the relationships between components affects their
ability to fully define a thermodynamic cycle.
We did observe the complexity of implementing a redesigned cycle directly
through several Wizard-of-Oz-style studies where the student worked first alone, then
with a tutor via text-messaging software. In unguided work with CyclePad, we saw
students having difficulty setting the assumptions for their improved cycle. One stu-
dent was working for approximately 15 minutes on setting the parameters of a few
components, but he encountered difficulty because he had not ordered the compo-
nents in an ideal way. The tutor was able to help him identify and remove the obstacle
so that he could quickly make progress. When the tutoring session began, the tutor
asked the student to explain why he had set up the components in that particular way.
Student: I just figured I should put the exchanger before the htr
[The student is using “htr” to refer to the heater.]
Tutor: How do you think the heat exchanger performance/design will vary with the condi-
tion of the fluid flowing through it? What’s the difference between the fluid going into the
pump and flowing out of it?
Student: after the pump the water’s at a high P
[P is an abbreviation for pressure.]
Tutor: Good! So how will that affect your heat exchanger design?
Student: if the exchanger is after the pump the heating shouldn’t cause it to change phase
because of the high pressure
...
Tutor: But why did you put a heat exchanger in?
Student: I was trying to make the cycle regenerative
...
Tutor: OK, making sure you didn’t waste the energy flowing out of the turbine, right?
After the discussion with the tutor about the plan for the redesign, the student was
able to make the proposed change to the cycle and define the improved cycle com-
pletely without any help from the tutor. Engaging in dialogue forces students to think
through their redesign and catches errors that seem to be difficult for students to de-
tect on their own. By initiating explanation about the design on a functional level, the
tutor was able to elicit an expression of the student’s thinking and give the student a
greater chance for success in fully defining the improved cycle.
CycleTalk: Toward a Dialogue Agent That Guides Design 407
One of the most useful tools that CyclePad offers students is the sensitivity analysis.
A sensitivity analysis will plot the relationship between one variable (such as pressure
408 C.P. Rosé et al.
CyclePad makes it relatively easy for students to try alternative design ideas and
thereby to generate high-quality designs. However, students working independently
with CyclePad tended not to explore the breadth of the design space, even if they
seemed to be aware of design ideas that would improve their design. Although stu-
dents who did the take-home assignment were aware of both the reheat and regenera-
tive strategies through course materials, only 8 of these 19 students incorporated both
strategies into their redesigned cycles. Also, in the written report associated with the
take-home assignment, the students were asked to explain the result of each strategy
on the efficiency of the cycle. 15 of 19 students correctly explained that regeneration
would improve the efficiency of the cycle. However, only 10 of 19 students used a
regeneration strategy in their redesigned cycle.
In contrast, students working with the tutor are prompted to consider as many al-
ternative approaches as they can and they are encouraged to contrast these alterna-
tives with one another on the basis of materials and maintenance cost, in addition to
cycle efficiency. This explicit discussion of alternatives with the tutor should produce
an optimal design. Here is an example dialogue where the tutor is leading the student
to consider alternative possibilities:
Tutor: Yes, very good. How do you think you can make it better? i.e. how will you opti-
mize the new component?
Student: we could heat up the water more
Tutor: That’s one, try it out. What do you learn?
CycleTalk: Toward a Dialogue Agent That Guides Design 409
Student: the efficiency increases pretty steadily with the increased heating - should i put the
materials limitation on like there was earlier? or are we not considering that right now
Tutor: OK, how about other parameters? Obviously this temperature effect is something to
keep in mind. Include the material effect when you start modifying the cycle
Student: ok
Tutor: What else can you change?
Student: pump pressure
Tutor: So what does the sensitivity plot with respect to pump pressure tell you?
Student: so there’s kind of a practical limit to increasing pump pressure, after a while
there’s not much benefit to it
Tutor: Good. What other parameters can you change?
Student: exit state of the turbine
Tutor: Only pressure appears to be changeable, let’s do it. What’s your operating range?
Student: 100 to 15000. right?
Tutor: Do you want to try another range? Or does this plot suggest something?
Student: we could reject even lower, since its a closed cycle
Tutor: Good!
4 System Desiderata
In this paper we have presented an analysis of a preliminary data collection effort and
its implications for the design of the CycleTalk tutorial dialogue agent. We have ar-
gued in favor of natural language discussions as a highly interactive form of guided
discovery learning. We are currently gearing up for a controlled study in which we
will test the hypothesis that exploratory dialogue leads to effective learning. During
the study, students will work on a similar design scenario as the ones presented in this
paper. On a pre/post test we will evaluate improvement of students’ skill in creating
designs, in understanding design trade-offs, and in conceptual understanding of ther-
modynamics, as well as their acquisition of meta-cognitive skills such as self-
explanation. In particular we will assess the value of the dynamic nature of dialogue
by contrasting a Wizard-of-Oz version of CycleTalk with a control condition in which
students are lead in a highly scripted manner to explore the design space, exploring
each of the three major efficiency enhancing approaches in turn through step-by-step
instructions
References
1. Aleven V., Koedinger, K. R., & Popescu, O.: A Tutorial Dialogue System to Support Self-
Explanation: Evaluation and Open Questions. Proceedings of the 11th International Con-
ference on Artificial Intelligence in Education, AI-ED (2003).
2. Aleven, V., Stahl, E., Schworm, S., Fischer, F., & Wallace, R.M.: Seeking and Providing
Help in Interactive Learning Environments. Review of Educational Research, 73(2),
(2003) pp 277-320.
3. Ausubel, D.: Educational Psychology: A Cognitive View, (1978) Holt, Rinehart and
Winston, Inc.
4. Baher, J.: Articulate Virtual Labs in Thermodynamics Education: A Multiple Case Study.
Journal of Engineering Education, October (1999). 429-434.
5. Chi, M. T. H.; Feltovich, P. J.; & Glaser, R.: Categorization and Representation of Physics
Problems by Experts and Novices. Cognitive Sciencs 5(2): 121-152, (1981).
6. Core, M. G., Moore, J. D., & Zinn, C.: The Role of Initiative in Tutorial Dialogue, Pro-
ceedings of the 10th Conference of the European Chapter of the Association for Compu-
tational Linguistics, (2003), Budapest, Hungary.
7. Dutke, S.: Error handling: Visualizations in the human-computer interface and exploratory
learning. Applied Psychology: An International Review, 43, 521-541, (1994).
8. Dutke, S. & Reimer, T.: Evaluation of two types of online help for application software,
Journal of Computer Assisten Learning, 16, 307-315, (2000).
9. Evens, M. and Michael, J.: One-on-One Tutoring by Humans and Machines, Lawrence
Earlbaum and Associates (2003).
CycleTalk: Toward a Dialogue Agent That Guides Design 411
10. Forbus, K. D., Whalley, P. B., Evrett, J. O., Ureel, L., Brokowski, M., Baher, J., Kuehne,
S. E.: CyclePad: An Articulate Virtual Laboratory for Engineering Thermodynamics. Arti-
ficial Intelligence 114(1-2): 297-347, (1999).
11. Graesser, A., Moreno, K. N., Marineau, J. C.: AutoTutor Improves Deep Learning of
Computer Literacy: Is It the Dialog or the Talking Head? Proceedings of AI in Education
(2003).
12. Graesser, A., VanLehn, K., the TRG, & the NLT.: Why2 Report: Evaluation of
Why/Atlas, Why/AutoTutor, and Accomplished Human Tutors on Learning Gains for
Qualitative Physics Problems and Explanations, LRDC Tech Report, (2002) University of
Pittsburgh.
13. de Jong, T. & van Joolingen, W. R.: Scientific Discovery Learning With Computer Simu-
lations of Conceptual Domains, Review of Educational Research, 68(2), pp 179-201,
(1998).
14. Mayer, R. E.: Should there be a three-strikes rule against pure discovery learning? The
Case for Guided Methods of Instruction, American Psychologist 59(1), pp 14-19, (2004).
15. Nückles, M., Wittwer, J., & Renkl, A.: Supporting the computer experts” adaptation to the
client’s knowledge in asynchronous communication: The assessment tool. In F. Schmal-
hofer, R. Young, & G. Katz (Eds.). Proceedings of EuroCogSci 03. The European Cogni-
tive Science Conference (2003) (pp. 247-252). Mahwah, NJ: Erlbaum.
16. Rosé, C. P., Jordan, P., Ringenberg, M., Siler, S., VanLehn, K., & Weinstein, A.: Interac-
tive Conceptual Tutoring in Atlas-Andes, In J. D. Moore, C. L. Redfield, & W. L. Johnson
(Eds.), Artificial Intelligence in Education: AI-ED in the Wired and Wireless Future, Pro-
ceedings of AI-ED 2001 (pp. 256-266). (2001) Amsterdam, IOS Press.
17. Rosé, C. P., Gaydos, A., Hall, B. S., Roque, A. & VanLehn, K:. Overcomming the
Knowledge Engineering Bottleneck for Understanding Student Language Input , Pro-
ceedings of the 11th International Conference on Artificial Intelligence in Education, AI-
ED (2003).
18. Tuttle, K., Wu, Chih.: Intelligent Computer Assisted Instruction in Thermodynamics at the
U.S. Naval Academy, Proceedings of the 15th Annual Workshop on Qualitative Reason-
ing, (2001) San Antonio, Texas.
19. VanLehn, K., Jordan, P., Rosé, C. P., and The Natural Language Tutoring Group.: The
Architecture of Why2-Atlas: a coach for qualitative physics essay writing, Proceedings of
the Intelligent Tutoring Systems Conference, (2002) Biarritz, France.
20. Webb, N. M.: Peer Interaction and Learning in Small Groups. International Journal of
Education Research, 13, 21-39, (1989).
21. Zinn, C., Moore, J. D., & Core, M. G.: A 3-Tier Planning Architecture for Managing
Tutorial Dialogue. In S. A. Cerri, G. Gouardères, & F. Paraguaçu (Eds.), Proceedings of
the Sixth International Conference on Intelligent Tutoring Systems, ITS 2002 (pp. 574-
584). Berlin: Springer Verlag, (2002).
DReSDeN: Towards a Trainable Tutorial Dialogue
Manager to Support Negotiation Dialogues for Learning
and Reflection
1 Introduction
Current tutorial dialogue systems focus on a wide range of application contexts in-
cluding leading students through directed lines of reasoning to support conceptual
understanding [27], clarifying procedures [33], or coaching the generation of expla-
nations for justifying solutions [32], problem solving steps [1], predictions about
complex systems [10], or descriptions of computer architectures [13]. Formative
evaluation studies of these systems demonstrate that state-of-the-art computational
linguistics technology is sufficient for building tutorial dialogue systems that are
robust enough to be put in the hands of students and to provide useful learning expe-
riences for them. In this paper we introduce DReSDeN, a new tutorial dialogue plan-
ner as an extension of the APE tutorial dialogue planner [12]. This work is motivated
by lessons learned from the first generation of tutorial dialogue systems, with a focus
on Knowledge Construction Dialogues [27,16] that were developed using the APE
framework.
1
DReSDeN stands for Debate-Remediate-Self-explain-for-Directing-Negotiation-dialogues.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 412–422, 2004.
© Springer-Verlag Berlin Heidelberg 2004
DReSDeN: Towards a Trainable Tutorial Dialogue Manager 413
The DReSDeN tutorial dialogue planner was developed in the context of the Cy-
cleTalk thermodynamics tutoring project [29] that aims to cultivate self-monitoring
skills by training students to ask themselves valuable questions about the choices they
make in a design context as they work with the CyclePad articulate simulator [11].
The CycleTalk system is meant to do this by engaging students in negotiation dia-
logues in natural language as they design thermodynamic cycles, such as the Rankine
Cycle displayed in Figure 1. A thermodynamic cycle processes energy by transform-
ing a working fluid within a system of networked components (condensers, turbines,
pumps, and such). Power plants, engines, and refrigerators are all examples of ther-
modynamic cycles. In its initial development, the CycleTalk curriculum will empha-
size the improvement of the simple Rankine cycle. Rankine cycles of varying com-
plexities are used in steam-based power plants, which generate the majority of the
electricity in the US.
challenges has been to develop a dialogue manager that can support this type of inter-
action. Note that our focus is not to encourage the students to take initiative in the
dialogue [8], but in the exploratory task itself. Allowing the student to take initiative
at the dialogue level is simply one means to that end.
In the remainder of the paper, we outline the theoretical motivation for the DReS-
DeN tutorial dialogue manager. We then describe how it is used in the CycleTalk
tutorial dialogue system, currently under development. We then give a detailed de-
scription of DReSDeN’s underlying algorithms and data structures, illustrated with a
working example. We conclude with some early work in using machine learning
techniques to adapt DReSDeN’s behavior.
2 Motivation
The development of the DReSDeN tutorial dialogue manager was guided by concerns
specifically related to supporting negotiation and reflection in a tutorial dialogue
context. The role of DReSDeN in CycleTalk is to support student exploration of the
design space, encourage students to consciously reflect on the design choices they are
making, and to offer feedback on their ideas.
The idea of using negotiation dialogue for instruction is not new. For example,
Pilkington et al. (1992) argue the need for computer based tutoring systems to move
to more flexible types of dialogues that involve challenging and defending arguments
to support students’ information gathering processes. When students participate in the
argumentation process, they engage higher-order mental processes, including rea-
soning, critical thinking, evaluative assessment of argument and evidence, all of
which are forms of core academic practice [24]. Negotiation provides a context in
which students are encouraged to adopt an evaluative epistemology [18], where
judgments are evaluated using criteria and evidence in order to weigh alternatives
against one another. Baker (1994) argues that negotiation is an active and interactive
approach to instruction that is an effective mechanism for achieving coordination of
both problem solving and communicative actions between peer learners, or between a
learner and a tutor. It keeps both conversational participants equally active and en-
gaged throughout the process. Nevertheless, the potential for using negotiation as a
pedagogical tool within a tutorial dialogue system has not been thoroughly explored.
While much has been written about the potential for negotiation dialogue for instruc-
tion, very few controlled experiments have compared its effectiveness to that of alter-
native forms of instruction, and no current tutorial dialogue system that has been
evaluated with students fully implements this capability.
On a basic level, the DReSDeN flavor of negotiation shares many common fea-
tures with the types of negotiation modelled previously. For example, all types of
negotiations involve agents making proposals that can either be accepted or rejected
by the other agent or agents. Some models, such as [5,15,9], also provide the means
for modeling justifications for choices as well as the ability to modify a proposal in
the light of objections received from other agents. Nevertheless, at a deep level, the
DReSDeN flavor of negotiation is distinctive. In particular, previous models of nego-
416 C.P. Rosé and C. Torrey
tiation are primarily adversarial in that the primary goal of the dialogue participants is
to agree on a proposal or even to convince the other party of some specific view. The
justifications and elaborations that are part of the conversation are in service to the
goal of convincing the other party to adopt a specific view, or at least a mutually
acceptable view. In the DReSDeN flavor of negotiation, on the other hand, the main
objective is to explore the space and to reflect upon the justifications. Thus, the un-
derlying goals and motivation of the tutor agent are quite different from previously
modeled negotiation style conversational agents and may lead to interesting differ-
ences in information presentation and discourse structure. In particular, while the
negotiation dialogues DReSDeN is designed to engage students in shares many sur-
face features with previously explored forms of negotiation, the underlying goal is not
to convince the student to adopt a particular decision or even to come to an agree-
ment, but instead to motivate the student to reason through the alternatives, to ask
himself reflective questions, and to make a choice with understanding that thought-
fully takes other alternatives into consideration.
Much prior work on managing negotiation dialogues outside of the intelligent tu-
toring community is based on dialogue game theory [22] and the information state
update approach to dialogue management [31,19]. Larsson (2002a, 2002b) presents
an information state update approach to managing negotiations with plans to imple-
ment it in the GoDiS dialogue framework [4]. The information state in his model is a
representation of Issues Under Negotiation, that explicitly indicates what has been
decided so far and which alternative possible choices for as yet unmade decisions are
currently on the table. Lewin (2001) presents a dialogue manager for a negotiative
type of form filling dialogue where users negotiate the contents of a database query,
including both which pieces of information are required as well as the values of those
particular pieces. The DReSDeN tutorial dialogue manager adopts a similar Issues
Under Negotiation approach to that presented in Larsson (2002b). Thus, the informa-
tion state that is maintained in DReSDeN represents the items that are currently being
discussed as well as their relationships to one another. This representation provides a
structure for organizing the representation for the interwoven conversational threads
[26] out of which the negotiation dialogue is composed.
We build on the foundation of our prior work building and evaluating Knowledge
Construction Dialogues (KCDs) [27]. KCDs were motivated by the idea of Socratic
tutoring. KCDs are interactive directed lines of reasoning that are each designed to
lead students to learn as independently as possible one or a small number of concepts,
thus implementing a preference for an “Ask, don’t tell” strategy. When a question is
presented to a student, the student types a response in a text box in natural language.
The student may also simply click on Continue, and thus neglect to answer the ques-
tion. If the student enters a wrong or empty response, the system will engage the
student in a remediation sub-dialogue designed to lead the student to the right answer
to the corresponding question. The system selects a subdialogue based on the content
of the student’s response, so that incorrect responses that provide evidence of an un-
derlying misconception can be handled differently than responses that simply show
ignorance of correct concepts. Once the remediation is complete, the KCD returns to
the next question in the directed line of reasoning.
DReSDeN: Towards a Trainable Tutorial Dialogue Manager 417
In this section we discuss the main data structures and control mechanisms that are
part of the implemented DReSDeN dialogue manager and present a working example
that uses toy versions of the required knowledge sources. Further developing these
knowledge sources is one of our current directions. DReSDeN has four main data
structures that guide its performance. First, it has access to a library of handwritten
KCDs. We also plan to generate some KCDs on the fly using a data structure called
an ArgumentMap that encodes domain information to provide the foundation for the
negotiation or discussion. The KCD library contains lines of reasoning used for ex-
ploring pros and cons of typical design scenarios and for remediating deficits in con-
ceptual understanding that are related to issues under negotiation. The KCD library
also contains generic KCDs for eliciting explanations and design decisions from stu-
dents. Next, there is a threaded discourse history, generated in the course of a conver-
sation, which is a graph with parent-child relationships between threads. Each thread
of the discourse is managed separately with its own KCD like structure. The flexibil-
418 C.P. Rosé and C. Torrey
ity in DReSDeN comes from the potential for multiple threads to be managed in par-
allel. The final data structure, the discourse model describes the rules that determine
how control is passed from one thread to the next.
Each dialogue begins with a single thread, initiated with a single KCD goal. With
the initiation of this thread, a tutor text is produced in order for the dialogue system to
introduce the topic of discussion. When the student responds, the system must decide
whether the student’s text addresses the currently in focus thread, a different thread,
or begins a new thread. This decision is made using the discourse model, which is a
finite state machine. Each state is associated with rules for determining how to relate
the student’s turn to the discourse history as well as rules for determining what the
tutor’s next move should be. For example, part of this decision is whether the tutor
should continue on the currently in focus thread, shift to a different existing thread, or
create a new thread. Currently the conditions on the rules are implemented in terms of
a small number of predicates implemented in Lisp. In the next section we discuss how
we have begun experimenting with machine learning techniques to learn the condi-
tions that determine how to relate student turns to the discourse history.
Figure 4 presents a sample working example. This example was produced using a
discourse model that favors exploring alternative proposals in parallel. In its KCD
library, it has access to a small list of lines of reasoning each exploring a different
proposal as well as a thread for comparing proposals. It’s discourse model imple-
ments a state machine that first elicits proposals from the student until the student has
articulated the list that it is looking for. Each proposal is maintained on its own
thread, which is created when the student introduces the proposal. After all proposals
are elicited, the discourse model causes the focus to shift from parallel thread to par-
allel thread on each turn in a round robin manner until each proposal has been ex-
plored. It then calls for the introduction of a final thread that compares proposals and
elicits a final decision.
See Figure 3 for a dialogue created using this dialogue model. First a thread is in-
troduced into the discourse in turn (1) for the purpose of negotiating design choices
about improving the efficiency of a rankine cycle. Next, two separate threads, each
representing a separate design choice suggested by the student in response to a tutor
request are introduced in turns (2) and (4) and processed in turn using a general elici-
tation KCD construct. Both of these threads are related to the initial thread via a de-
sign-possibility relation. Control passes back and forth between threads as different
aspects of the proposal are explored. Note the alternating thread labels. After the final
design choice elicitation thread is processed, an additional thread, which is subordi-
nate to the two parallel threads just completed, is introduced in order to encourage the
student to compare the two proposals and make a final choice, to which the student
responds by suggesting the addition of a reheat cycle, a preference observed among
the students in our data collection effort. The system responds by offering an alterna-
tive suggestion. As noted, with an alternative discourse model, this dialogue could
have been processed using a different strategy in which each alternative proposal was
completely explored in isolation, in such a way that we would not observe the thread
switching phenomenon observed in Figure 3.
DReSDeN: Towards a Trainable Tutorial Dialogue Manager 419
Our learning hypothesis is that negotiation dialogue will prove to be a highly effec-
tive form of tutorial dialogue. Within that framework, however, there exist a multi-
plicity of more specific research questions about how this expansive vision is most
productively implemented in tutorial dialogue. Many local decisions must be made in
the course of a negotiation that influence the direction that negotiation will take. Ex-
amples include which evidence to select as supporting evidence, which alternative
design choice or prediction to argue in favor of, or when to challenge a student versus
when to let the student move on. When the goal is to encourage exploration of a space
of alternatives rather than to lead the student to a pre-determined conclusion, then
there are many potential answers to all of these questions. Thus, we will explore the
relative pedagogical effectiveness of alternative strategies for using negotiation in
different contexts. Part of our immediate plans for future work is to explore this space
using a machine learning based optimization approach such as reinforcement learning
a [30] or Genetic Programming [17]. The learned knowledge will be encoded in the
discourse model that guides the management of DReSDeN’s multi-threaded discourse
history.
In the KCD approach to dialogue management [27], student answers that do not
express a correct answer to a tutor query are treated as wrong answers. Thus, one
420 C.P. Rosé and C. Torrey
In this paper we have introduced the DReSDeN tutorial dialogue manager as an ex-
tension to the APE tutorial dialogue planner used in our previous research. We cur-
rently have a working prototype implementation of the DReSDeN. We are continuing
to collect Wizard-of-Oz data in the thermodynamics domain, which we plan to use as
the foundation for building our domain specific knowledge sources and for continued
machine learning experiments as described.
References
1. Aleven V., Koedinger, K. R., & Popescu, O.: A Tutorial Dialogue System to Support Self-
Explanation: Evaluation and Open Questions. Proceedings of the 11th International Con-
ference on Artificial Intelligence in Education, AI-ED (2003).
2. Baker, M.: A Model for Negotiation in Teaching-Learning Dialigues, International Journal
of AI in Education, 5(2), pp 199-254, (1994).
3. Bhatt, K., Evens, M. & Argamon, S.: Hedged Responses and Expressions of Affect in
Human/Human and Human/Computer Tutorial Interactions, Proceedings of the Cognitive
Science Society (2004).
4. Bohlin, P., Cooper, R., Engdahl, E., Larsson, S.: Information states and dialogue move
engines. In Alexandersson, J. (Ed.) IJCAI-99 Workshop on Knowledge and Reasoning in
Practical Dialogue Systems, (1999) pp 25-32.
5. Chu-Carroll, J., Carberry, S.: Conflict resolution in collaborative planning dialogues.
International Journal of Human-Computer Studies, 53(6):969-1015. (2000)
6. Cohen, W.: Fast Effective Rule Induction. Machine Learning: Proceedings of the Twelfth
International Conference. (1995)
7. Collins, A., Brown, J. S., Newman, S. E.: Cognitive Apprenticeship: Teaching the Crafts
of Reading, Writing, and Mathematics, in L. B. Resnick (Ed..) Knowing, Learning, And
Instruction: Essays in Honor of Robert Glaser, (1989) Hillsdale: Lawrence Earlbaum As-
sociates.
8. Core, M. G., Moore, J. D., & Zinn, C.: The Role of Initiative in Tutorial Dialogue, in
Proceedings of the Conference of the European Chapter of the Association for Com-
putational Linguistics. (2003)
9. Di Eugenio, B., Jordan, P., Thomason, R., Moore, J.: The Acceptance Cycle: An empirical
investigation of human-human collaborative dialogues, International Journal of Human
Computer Studies. 53(6), (2000) 1017-1076.
10. Evens, M. and Michael, J.: One-on-One Tutoring by Humans and Machines, (2003) Law-
rence Earlbaum and Associates.
11. Forbus, K. D., Whalley, P. B., Evrett, J. O., Ureel, L., Brokowski, M., Baher, J., Kuehne,
S. E.: CyclePad: An Articulate Virtual Laboratory for Engineering Thermodynamics. Arti-
ficial Intelligence 114(1-2): (1999) 297-347.
12. Freedman, R.: Using a Reactive Planner as the Basis for a Dialogue Agent, Proceedings of
FLAIRS 2000, (2000) Orlando.
13. Graesser, A., Moreno, K. N., Marineau, J. C.: AutoTutor Improves Deep Learning of
Computer Literacy: Is It the Dialog or the Talking Head? Proceedings of AI in Education
(2003)
14. Graesser, A., VanLehn, K., the TRG, & the NLT: Why2 Report: Evaluation of Why/Atlas,
Why/AutoTutor, and Accomplished Human Tutors on Learning Gains for Qualitative
Physics Problems and Explanations, LRDC Tech Report, (2002) University of Pittsburgh.
15. Heeman, P. and Hirst, G.: Collaborating on Referring Expressions. Computational Lin-
guistics, 21(3), (1995) 351-382.
16. Jordan, P., Rosé, C. P., & VanLehn, K.: Tools for Authoring Tutorial Dialogue Knowl-
edge. In J. D. Moore, C. L. Redfield, & W. L. Johnson (Eds.), Artificial Intelligence in
Education: AI-ED in the Wired and Wireless Future, Proceedings of AI-ED 2001 (pp.
222-233). (2001) Amsterdam, IOS Press.
17. Koza, J.: Genetic Programming: On the programming of computers by means of natural
selection, (1992) Bradford Books.
422 C.P. Rosé and C. Torrey
18. Kuhn, D.: A developmental model of critical thinking. Educational Researcher. 28(2),
(1999) pp 16-26.
19. Larsson, S. & Traum, D.: Information state dialogue management in the trindi dialogue
move engine toolkit. NLE Special Issue on Best Practice in Spoken Language Dialogue
Systems Engineering, (2000) pp 323-340.
20. Larsson, S.: Issue-based Dialogue Management, PhD Dissertation, Department of Lin-
guistics, Goeteberg University, Sweden (2002)
21. Larsson, S.: Issues Under Negotiation, Proceedings of SIGDIAL 2002.
22. Levin, J. A. ; Moore, J. A.: ‘Dialogue-Games: Meta-communication Structures for Natural
Language Interaction’. Cognitive Science, 1 (4), (1980) 395-420.
23. Lewin, I.: Limited Enquiry Negotiation Dialogues, Proceedings of Eurospeech (2001).
24. McAlister, S. R.: Argumentation and a Design for Learning, CALRG Report No. 197,
(2001) The Open University
25. Pilkington, R. M., Hartley, J. R., Hintze, D., Moore, D.: Learning to Argue and Arguing to
Learn: An interface for computer-based dialogue games. International Journal of Artificial
Intelligence in Education, 3(3), (1992) pp 275-85.
26. Rosé, C. P., Di Eugenio, B., Levin, L. S., Van Ess-Dykema, C.: Discourse Processing of
Dialogues with Multiple Threads, Proceedings of the Association for Computational Lin-
guistics (1995).
27. Rosé, C. P., Jordan, P., Ringenberg, M., Siler, S., VanLehn, K., & Weinstein, A.: Interac-
tive Conceptual Tutoring in Atlas-Andes, In J. D. Moore, C. L. Redfield, & W. L. Johnson
(Eds.), Artificial Intelligence in Education: AI-ED in the Wired and Wireless Future, Pro-
ceedings of AI-ED 2001 (pp. 256-266). (2001) Amsterdam, IOS Press.
28. Rosé, C. P., Roque, A., Bhembe, D., VanLehn, K.: A Hybrid Text Classification Ap-
proach for Analysis of Student Essays, Proceedings of the HLT-NAACL 03 Workshop on
Educational Applications of NLP (2003).
29. Rosé, C. P., Aleven, V. & Torrey, C.: CycleTalk: Supporting Reflection in Design Sce-
narios With Negotiation Dialogue, CHI Workshop on the Designing for the Reflective
Practitioner (2004).
30. Sutton, R. S., & Barto, A. G.: Reinforcement Learning: An Introduction. (1989) The MIT
Press: Cambridge, MA.
31. Traum, D., Bos, R., Cooper, R., Larsson, S., Lewin, I, Mattheson, C., & Poesio, M.: A
model of dialogue moves and information state revision. (2000) Technical Report D2.1,
Trindi.
32. VanLehn, K., Jordan, P., Rosé, C. P., and The Natural Language Tutoring Group: The
Architecture of Why2-Atlas: a coach for qualitative physics essay writing, Proceedings of
the Intelligent Tutoring Systems Conference, (2002) Biarritz, France.
33. Zinn, C., Moore, J. D., & Core, M. G.: A 3-Tier Planning Architecture for Managing
Tutorial Dialogue. In S. A. Cerri, G. Gouardères, & F. Paraguaçu (Eds.), Proceedings of
the Sixth International Conference on Intelligent Tutoring Systems, ITS 2002 (pp. 574-
584). Berlin: Springer Verlag, (2002).
Combining Computational Models of Short Essay
Grading for Conceptual Physics Problems
1 Introduction
Traditional measures of user modeling in intelligent tutoring systems have not de-
pended on an analysis of the meaning of natural language and discourse. However,
natural language understanding has progressed dramatically in recent years with the
development of automated essay graders [5] and tutorial dialogue in natural language
[6], [8].
One challenge in building natural language understanding modules has been the
extension of mainstream representational systems to capture text similarity and the
correctness of the text with respect to some ideal rubric. One framework that has been
successful in meeting this challenge is Latent Semantic Analysis [10], [13]. Latent
Semantic Analysis (LSA) is a statistical language understanding technique that con-
structs relations among words from the analysis of a large corpus of written text.
Word meanings are represented as vectors whereas sentence or essay meanings are
linear combinations of the word vectors. Similarity between two texts is measured by
the cosine between the corresponding two vectors. The input to LSA is a corpus that
is segmented into documents, which are typically paragraphs or sentences. A large
word-document matrix is formed from the corpus, based on the occurrences of the
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 423–431, 2004.
© Springer-Verlag Berlin Heidelberg 2004
424 M.J. Ventura et al.
What do humans do when grading essays and how might different natural language
understanding tools model these processes? Consider the following example:
STUDENT ANSWER: The egg will land behind where the unicycle touches
the ground. The force of gravity and air resistance will slow the egg down.
Combining Computational Models of Short Essay Grading 425
EXPERT ANSWER: The egg will land beside the wheel, which is the point
where the unicycle touches the ground. The egg has the same horizontal velocity
as the unicycle when it is released.
Many of the same words appear in both answers, yet a human expert grader as-
signed this particular answer an F for being too ambiguous. The correct answer says
that the egg will land beside the wheel whereas the student answer incorrectly says it
lands behind the wheel. Therefore, word similarity can only solve part of the puzzle.
In order to properly evaluate correctness, a human or computer system needs to con-
sider the relationship between the two passages beyond their word similarities, to
consider the surrounding context of each individual word, and to consider combina-
tions of words.
We need to address several questions when measuring how well the content of the
student answer matches the correct answer. Two questions are particularly under
focus in the present research. One question is what comparison benchmark to use
when grading essays. The vocabulary used by experts may be somewhat different
from students, so we examined both expert answers and good student answers as
comparison gold standards. The second question is whether it is worthwhile to com-
bine different natural language understanding metrics of similarity in an attempt
achieve more accurate prediction of expert grades. Multiple measures of text quality
and similarity may yield a better approximation of the contextual meaning of an es-
say.
The primary techniques we investigated in the present study were LSA, an alterna-
tive corpus-based model called the Union of Word Neighbors (UWN) model, and
word overlap between essay and answer. It is conceivable that simple word overlap
(including all words) may be superior to LSA. The high frequency words may be
extremely important in judging correctness. For instance, if the correct answer is “the
pumpkin will land behind the runner” and the student answers is “the pumpkin will
land beside the runner”, LSA and UWN will judge this comparison to be quit high
because behind and beside are highly related in LSA; however, simple word matching
will identify no relationship between these two words. On the other hand, LSA and
UWN can abstract information inferentially from the essay, so they provide relevant
information beyond word matching.
In the UNW model, semantic information for any given word w is the pooled words
that co-occur with that word w in the set of sentences with word w in the corpus. This
pool of words is called the neighborhood set; it includes all words that co-occur with
the target word w. These words are assumed to be related to the target word and serve
as the basis for all associations. The neighborhood intersection is the relation that
occurs when two target words share similar co-occurrences with other words. Similar
to LSA, two words (A and B) become associated by virtue of their occurrence with
many of the same third-party words. For example, food and eat may become associ-
ated because they both occur with words such as hungry and table. Therefore, the
neighborhood set N for any word w is the only information we have, based on the
exemplar sentences for words in the corpus.
426 M.J. Ventura et al.
The neighborhood set for any word is intended to represent the meaning of a word
from a corpus. But there were several theoretical challenges that arose when we de-
veloped the model. One challenge was how to differentially weight neighborhood
words. We assigned neighborhood weights to each neighborhood word n of word w
according to Equation (1).
In order to construct the neighborhood set for a word, we explored an algorithm that
pooled all words N that co-occurred with the target word w. Our subject matter was
conceptual physics so we used a corpus consisting of the textbook Conceptual Phys-
ics [11]. Each sentence in the corpus served as the context for direct co-occurrence.
So for the entire set of sentences that contain target word w, every unique
word in sentences was pooled into the neighborhood set N. For example, the
neighborhood of velocity included force, acceleration, and mass because these words
frequently occur in the same sentences that velocity occurs in. This represents the
neighborhood N of each target word w. Each word in the set of N is weighted
by the function described in Equation 1.
Combining word neighbors to capture essay meanings. In order to capture the
meaning of the essay, a neighborhood is formed that is a linear combination of indi-
vidual word neighbors that get pooled into N. To evaluate the relation between any
two essays and we applied the following algorithmic procedure:
Combining Computational Models of Short Essay Grading 427
1. Pool neighborhood sets for each word w in each essay, computing the weights
for all the neighbors for each word in an essay using Equation 1.
2. Add all neighbors’ weights for each word in each essay into (i.e. pooled
neighbors for essay 1) and (i.e. pooled neighbors for essay 2).
3. Calculate neighborhood intersection as in Equation 2.
The numerator is the summation of neighbor weights over the intersection of the
neighborhood sets and whereas the denominator is the summation of neighbor
weights over the union of the two neighborhood sets (i.e., is equal to the neighbor
weight in each essay). This formula produces a value between 0 and 1. In the next
section we will discuss the performance of this model on essay grading.
4 Method
The essay questions consisted of 16 deep-level physics problems that tapped various
conceptual physics principles. All essays (n = 344) were graded by one physics ex-
pert, whose grades were used as the gold standard to evaluate the various measures.
428 M.J. Ventura et al.
Each essay question had two ideal answers, one created by the expert and one taken
randomly from all the student answers that were given an A grade by the expert for
each particular problem. The reason why we used ideal student answers was to evalu-
ate the effect of expert versus student wording on grading performance. Although
both LSA and UWN build semantic relations beyond the words, it is possible that
wording plays an important role in evaluating correctness. Expert wording is some-
what stilted in an academic style whereas student wording is more vernacular.
Additional measures were collected and assessed as possible predictors of essay
grades. These included summed verbosity of essay and answer (measured as number
of words), word overlap, and the adjective incidence score for each student answer.
The adjective incidence score is the number of adjectives per 1000 words, which was
measured by Coh-Metrix [7]. Coh-Metrix is a web facility that analyzes texts on ap-
proximately 200 measures of language, world knowledge, cohesion and discourse.
The adjective incidence score was the only measure in Coh-Metrix that significantly
correlated with expert grades and was not redundant with our other measures (i.e.,
LSA, UWN, verbosity, word overlap). The adjective incidence score captures the
extent to which the student precisely refers to noun referents. The verbosity measure
was included because there is evidence that longer essays receive higher grades [5].
Word overlap captures the extent to which the verbatim articulation of the ideal in-
formation is captured in the student essay. The word overlap score is a proportion of
words shared by the ideal and student essay divided by the total number of words in
both the ideal essay and the student essay.
Tables 1 and 2 show correlation matrices for the different measures using student
ideal essays versus expert essays. The variables in each correlation matrix were
UWN, LSA, word overlap, verbosity, and adjective incidence score. All correlations
of .13 or higher were statistically significant at p < .05, two tailed.
A number of trends emerged in the data analysis. Use of the ideal student answers
yielded somewhat better performance than the ideal expert answers for both LSA and
UWN. LSA, UWN, and word overlap all performed relatively the same in predicting
expert grades when using ideal student answers, (.44, .43, and .41 for UWN,
LSA, and keyword, respectively). UWN and LSA correlations decreased when using
ideal expert answers. There also were large correlations between LSA, UWN, and
word overlap, which suggests that theses measures explain much of the same variance
of the expert ratings. Multiple regression analyses were conducted to assess the sig-
nificance of each individual measure and their combined contributions. Table 3 shows
two forced-entry multiple regression analyses performed with all measures on ideal
expert answers and ideal student answers. The two multiple regression equations were
statistically significant, with of the variance explained for expert answers and
for ideal student answers. As can be seen in these tables of results, word
overlap and adjective incidence were significant when ideal expert answers served as
the comparison benchmark, whereas LSA, UWN, verbosity, and adjective incidence
were significant when the ideal student answers served as the comparison benchmark.
Therefore, it appears that LSA and UWN are not appropriate measures when com-
paring student essays to expert ideal answers. Expert answers are apparently more
abstract, precise, and stilted than the students. Experts express principles of physics
Combining Computational Models of Short Essay Grading 429
(e.g., According to Newton’s third law...) with words that can not be easily substi-
tuted in student answers (i.e., no other word can be used to describe “Newton’s third
law”). However, when ideal student essays are used as a benchmark, LSA and UWN
more accurately predict grades perhaps because of the more vernacular wording or
because of the possible substitutability of words in ideal student answers. Therefore, it
apparently is easier for LSA and UWN to detect isomorphically correct answers using
student ideal essays than ideal expert answers.
5 General Discussion
Our analysis of student physics essays revealed that the amount of variance explained
by our multiple measures of quality was modest, but statistically significant. How-
ever, given the difficulties of the predicting correctness of short essays [5], [13] we
are encouraged by the results. When inferred semantic similarity plays only a small
role in the correctness of an answer, other metrics are needed that can detect similari-
ties and differences between benchmark ideal answers and student answers. This is
where word overlap and frequency counts of adjectives become useful; they are sen-
sitive to high frequency words and characteristics of a text that are independent of
content words. For example, ideal answers in physics contain specific prepositions
430 M.J. Ventura et al.
(e.g., behind, beside, across, in), common polysemous verbs (e.g., land, fall), and
many adjectives and adverbs (e.g., greater, less, farther, horizontal, vertical) that play
a large role in the correctness of an essay. The meanings of these words in LSA or
UWN may be inaccurate because of their high frequency and similarity of word con-
texts in the corpus they appear. Conversely, when content words do play a role in the
answer (e.g., net force, mass, acceleration), similar words can be substituted (e.g.,
force, energy, speed). LSA and UWN are sometimes able to inferentially abstract and
relate words that are substitutable to determine similarity.
We explored the importance of using different benchmarks to score essays. This
has implications for essay grading as well as the curriculum templates to use in Auto-
Tutor’s interactive dialogues. For example, we use LSA to compare student dialogue
turns to expert written expectations when we evaluate the correctness of student an-
swers. The results of this study support the conclusion that the use of expert answers
alone may not be the best strategy for accurately evaluating student correctness. In-
stead of only using expert derived answers, it might be more suitable to use student
derived explanations, given that the multiple regression model using student ideal
answers predicted essay grades more accurately.
Finally, it appears that the UWN model did a moderately good job in predicting
grades, on par with LSA. While UWN did not do well when expert ideal answers
served as the benchmark, it was a good predictor when ideal student answers served
as the benchmark. UWN identifies related words at the sentence level in a corpus,
whereas LSA identifies word meanings and relations at the paragraph level in a cor-
pus; so UWN is not be able to abstract all of the relevant information to compare to an
ideal expert answer. Nevertheless there are two important benefits of UWN. First, it is
a word meaning estimation metric that can create a score online, with no preprocess-
ing needed to calculate word meanings. In the context of intelligent tutor systems, this
enables one to add any relevant feedback from students to the corpus that the UWN is
using to derive word meanings. This could improve performance of UWN because
specific key terms will be given additional meaning from student input. This advan-
tage would be difficult with LSA since the statistical computations require a nontriv-
ial amount of time to derive word meanings. Any added information to the corpus
would result in a new analysis. Therefore, UWN warrants more investigation as a
metric for text comparison because of its dynamic capability of updating its repre-
sentation, as AutoTutor learns from experience.
References
1. Anderson, J. R., Corbett, A. T., Koedinger, K. R., & Pelletier, R. (1995). Cognitive tutors:
Lessons learned. The Journal of the Learning Sciences, 4, 167-207.
Combining Computational Models of Short Essay Grading 431
Iraide Zipitria1,2, Jon Ander Elorriaga2, Ana Arruarte2, and Arantza Diaz de Ilarraza2
1
Developmental and Educational Psychology Department
2
Languages and Information Systems Department
University of the Basque Country (UPV/EHU)
649 P.K., E-20080 Donostia
{iraide,elorriaga,arruarte,jipdisaa}@si.ehu.es
1 Introduction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 432–442, 2004.
© Springer-Verlag Berlin Heidelberg 2004
From Human to Automatic Summary Evaluation 433
The work here presented adds further efforts in automatic free text evaluation with
the design of a model to evaluate summaries automatically. A first step has been
working towards the development of a model of summarisation evaluation based on
expert knowledge that could stand for almost any users. In order to gain this goal, a
cognitive study of teacher and lecturer’s evaluation procedure has been run. This
study has taken into consideration three main problem groups on summary evaluation:
second language (L2), immature and mature summarisers. Finally, once human
cognition has been observed and taking into account needs of experts in different
contexts, we have placed the basis of the design of the automatic summary evaluation
environment
The paper starts with a brief description of related work. Then, there are insights
and data analysis of human summary evaluation. Next, design issues of LEA, as an
automatic summary evaluation environment are presented. Finally, the paper closes
with conclusions and future perspectives.
2 Related Work
The roots of this study are placed on the experience of teacher and lecturers in
practice. The final goal is to be able to provide a system that matches as much as
possible the requirements on summarisation evaluation of our experts. The
requirement, on one hand, was to identify the environments where summarisation
assessment occurs, and on the other, the behaviour that summary raters show when
evaluating.
3.1 Subjects
3.2 Methodology
Results show that S4 was overall rated highest and S2 lowest. S2, S3 and S5 showed a
very similar overall evaluation and S1 were rated highest among the non-mature
summarisers. A graphic representation of overall and partial score means can be seen
in figure 1. Lowest scores in language are produced by the two L2 student summaries
(S2, S5). An underlying score is found in S2 that is the lowest score in language and
highest in comprehension.
From Human to Automatic Summary Evaluation 435
The subjects found out that S1 was copied from the text. Some of them even
suggested that they scored the summary far too high for a plagiarized summary.
Therefore, the result is very much influenced by the rater’s kindness in the given
moment. Further comments in each summary ratings can be seen in table 1.
Contrary to the common belief on free text evaluation, our experts showed a very high
level of agreement. L2 teachers (L) agreed among themselves producing significant
correlations that vary from r=0.75 to r=0.96. University lecturers (U) agreed from
r=0.51 to r=0.9. Finally, secondary school teachers (S) agreed with r=0.47 to r=0.84.
It appeared to be a very high level of agreement that might show underlying stable
variables that would enable us to reproduce stable human like evaluation measures. It
also needs to be pointed out again that raters in some cases came from different
backgrounds and had no connection whatsoever with each other.
In order to observe underlying predicting variables, stepwise multiple linear
regression was calculated. Overall score was chosen as the dependant variable and
coherence, cohesion, language, adequacy and comprehension were chosen as
independent variables. Results explained 89% of the variance as it produced a
From Human to Automatic Summary Evaluation 437
tools are used by students according to their own criteria. Their work is evaluated
including summarisation ability but there is no training whatsoever on summarisation.
Thus, these three groups showed different needs in summary production and
evaluation. In early stages there is a training period where summarisation
methodology is acquired practicing the individual requirements that a summary has
through a stepwise process. Then, primary education students learn text
comprehension strategies, main idea identification, use of connectors, text
transformation, etc. In summary, they gain discursive and abstraction competence.
The L2 group tends frequently to be more heterogeneous. Here, their
summarisation abilities depend upon their previous literacy on one hand, and
language proficiency on the other. But this second group also requires specific
training that does not necessarily match the requirements of the previous group.
Finally, the university group does not obtain any instructive training at all. Training, if
any, become a more individual matter.
A summary of the support tools used by those groups is shown in table 3.
Based on previous findings, this section aims to settle the bases of a summary
evaluation environment.
The previously described study on summary evaluation modelling and the analysis
of past studies on summarisation and summary assessment have been considered to
place the basis of the design of an automatic summary evaluation environment -LEA,
Laburpen Ebaluaketa Automatikoa-. It takes evaluation decisions based on human
expertise modelling, resembling human responses. LEA is addressed to two types of
users: teachers and students. On one hand, teachers will be able to manage
summarisation exercises and inspect student’s responses. On the other hand,
immature, mature or L2 students might create their own summaries.
The main difference from SS is that LEA is designed for virtually any user.
Moreover, this design is aimed not only to train students in summarisation skills but
also to assess human summary evaluation performance. In addition to coherence,
content coverage and cohesion LEA also gives feedback in use of language and
adequacy.
The full architecture of LEA can be seen in figure 2. Next, each component is
briefly described.
Evaluation module
This module is responsible for producing global scores based on partial scores in
cohesion, coherence, adequacy, use of language and comprehension. Global score
decisions will be taken either automatically, based on modelling considerations -see
section 3-, or customised by the teacher. Partial scores will be obtained from the basic
evaluation tools.
Basic evaluation tools
This set of tools provides measures on domain knowledge and summarisation skills,
using Latent Semantic Analysis, LSA [7] and Natural Language Processing (NLP)
techniques. LSA is a paradigm that allows to show human cognitive competence by
means of performing text similarity measures [6]. The set of NLP tools includes a
lemmatiser, spell and style checkers, etc. The combination of these tools will feed
results on coherence, cohesion, comprehension, language and adequacy.
Teacher’s evaluation viewer
The teacher’s evaluation viewer allows instructors to inspect the student models. This
is the place where lecturers will find all the information obtained by the system. For
each student, it will show not only data on the last summary but also comparative
measures to previous performance.
Student’s evaluation viewer
The functionality of this viewer is to show evaluation results to students. Data will be
obtained from the Student Model and will allow the learner to see not only data on the
last summary but also comparative measures to previous work.
Summarisation environment
This module provides the students an environment to produce summaries. The
summarisation environment includes a reading text and a text editor. In addition, it
facilitates the access to a set of aid tools.
Aid tools
Summarisation aid tools will be offered to guide and help students in text
comprehension and summarisation. Some examples are: lexical aids (dictionaries,
wordnet, corpora, etc.), concept maps & scheme editors, and orthography and
grammar corrector (spell and style checker). These tools have been selected to
virtually emulate the aid tools identified in summarisation practice (see table 3).
Exercise database
This database contains all the exercise collection with specific details on each reading
text.
Student history
It keeps student history; previous summarisation exercises and their corresponding
evaluations, and student general details.
5 Conclusions
Against the common belief of subjectivity on free text evaluation criterion, a global
tendency in summarisation assessment has been observed among our subjects. It is
From Human to Automatic Summary Evaluation 441
clear that there is an inter-rater common criterion when rating those summaries. Even
if subject’s background is very heterogeneous, it seems clear that they all had a
similar expectation of what a summary should account for in this experiment. Then,
their mental summary or text-macrostructure seems to have many features in common
to gain this common agreement. Their criterion points out coherence, comprehension
and language as predictors of the overall score in summary evaluation.
The design here presented has been the result of a global study of requirements in
human summary evaluation. It provides all the required tools and specifications that
we have detected thus far. It takes into account the observed needs on primary,
secondary, L2 and university education. Evaluation can be fully automatic, but it also
gives the chance to configure certain features according to instructors’ requirements.
Finally, in addition to instructional assessment, it can be used as a self-learning/self-
evaluation environment. Previous work in summarisation evaluation has focused the
attention on immature summariser training and automatic summary evaluation. In this
case, we deal with a design that takes into consideration mature, immature and L2
summariser’s evaluation. Hence, it is thought for almost any user.
Furthermore, instead of being a disadvantage, one of the warranties of any
automatic design is that it produces stable assessment criteria that will keep stable
from one to the next session/student. This is not the case in human assessment that is
under the influence of many environmental extrinsic and intrinsic variables. Stability
of evaluation criteria is lower but human raters assert that they are able to evaluate
values that no machine could (e.g. student’s motivation). Likewise, the system cannot
produce assessment on calligraphy, opinion, elegance, novelty level, etc. Nonetheless,
this assessment is difficult for humans as well and it is subject to be biased.
It has been concluded that an automatic summary evaluation system should
produce an overall score, and measure in comprehension, cohesion, coherence,
adequacy and language. Whether this evaluation measures would finally be shown to
students or not, has been left to instructors’ consideration. Nonetheless, the model
points out text coherence as the main predictor of overall score in summarisation
followed by comprehension and language ability.
The inclusion of aid tools has shown to be necessary for certain target users. For
instance, grammar theory of Basque and summarisation instruction theory have
shown to be valuable tools in teaching environments. Basque grammar theory has
been reported as valuable for L2 learners of Basque and summarisation instruction
theory has been identified as a necessary tool in early or immature summarisation.
Bearing in mind the modelling study, we have tried to adapt the design to our
subject’s current working procedure. It has been intended to give them a complete
tool to produce the routine task that they are used to, on a different environment and
independently, providing all the required elements. As is known, for many reasons
this task requires continuous teacher supervision. This way, students would be able to
obtain similar feedback independently. Moreover, it can be included in automated
tutoring environments as a complementary evaluation to close ended tasks. According
to our teachers’ reports, many times they are not able to assess all the summaries one
by one, and they tent to assess one anonymously to let students know success and
failures in the given summary. This would provide an alternative evaluation in these
cases.
Future work is directed at the complexion of the automatic summary evaluation
system. It consists on refining this model with greater data and further statistical
calculations. Further statistical analysis of the data is being performed in order to find
442 I. Zipitria et al.
References
1. Aleven, V., Koedinger, K.R., Popescu, O. A Tutorial Dialog System to Support Self-
Explanation: Evaluation and Open Questions. In: Kay, J., editor. Artificial Intelligence in
Education. Sydney, Australia: IOS Press; (2003). p. 35-46.
2. Foltz, P.W., Gilliam, S., Kendall, S. Supporting content-based feedback in online writing
evaluation with LSA. In: Interactive Learning Environments; (2000).
3. Graesser, A., Wiemer-Hastings, P., Wiemer-Hastings, K., Harter, D., Person, N, the
Tutoring Research Group. Using Latent Semantic Analysis to evaluate the contributions of
students in Auto-Tutor. In: Interactive Leraning Environments; (2000). p. 129-148.
4. Ikastolen_Elkartea. OSTADAR DBH-1 Euskara eta Literatura Irakaslearen Gida 3.
zehaztapen maila. In: Ikastolen Elkartea; (2003).
5. Kintsch, E., Steinhart, D., Stahl, G., the LSA research group. Developing summarisation
skills through the use of LSA-based feedback; (2000).
6. Landauer, T.K., Dumais, S.T. A solution to Plato’s problem: The latent semantic analysis
theory of acquisition, induction and representation of knowledge. In: Psychological
Review; (1997). p. 211-240.
7. Landauer, T.K., Foltz, P.W., Laham, D. Introduction to Latent Semantic Analysis. In:
Discourse Processes; (1998). p. 259-284.
8. Lin, C.-Y., Hovy, E. Automatic Evaluation of Summaries Using N-gram Co-occurrence
Statistics. In: Human Technology Conference. Edmonton-Canada; (2003). p. 150-157.
9. Long, J., Harding-Esch, E. Summary and recall of text in first and second languages. In:
Gerver, D., editor. Language Interpretation and Communication: Plenum Press; (1978). p.
273-287.
10. Manning, C., Schutze, H. Foundations of Statistical Natural Language Processing. In: The
MIT Press; (1999).
11. Rickel, J., Lesh, N., Rich, C., Sidner, C.L., Gertner, A. Collaborative Discourse Theory as
a Foundation for Tutorial Dialogue. In: International Conference on Intelligent Tutoring
Systems, ITS; (2002). p. 542-551.
12. Robertson, J., Wiemer-Hastings, P. Feedback on Children’s Stories Via Multiple Interface
Agents. In: International Conference on Intelligent Tutoring Systems, ITS. Biarritz-San
Sebastian; (2002).
13. Rosé, C.P., Gaydos, A., Hall, B.S., Roque, A., VanLehn, K. Overcoming the Knowledge
Engineering Bottelneck for Understanding Student Language Input. In: Kay, J., editor.
Artificial Intelligence in Education. Sydney, Australia: Amsterdam: IOS Press; (2003).
14. Sherrard, C. Teaching students to summarize: Applying textlinguistics. In: Systems;
(1989). p. 1-11.
15. VanLehn, K., Jordan, P.W., Rose, C.P., Bhembe, D., Bottner, D., Gaydos, A., et al. The
Architecture of Why2 Atlas: A Coach for Qualitative Physics Essay Writing. In:
International Conference on Intelligent Tutoring Systems, ITS. Biarritz-San Sebastian;
(2002).
Evaluating the Effectiveness of a Tutorial Dialogue
System for Self-Explanation
Vincent Aleven, Amy Ogan, Octav Popescu, Cristen Torrey, Kenneth Koedinger
1 Introduction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 443–454, 2004.
© Springer-Verlag Berlin Heidelberg 2004
444 V. Aleven et al.
Fig. 1. A student dialog with the tutor, attempting to explain the Separate Supplementary
Angles rule
[2]. The Geometry Cognitive Tutor focuses on geometry problem solving: students
are presented with a diagram and a set of known angle measures and are asked to find
certain unknown angles measures. Students are also required to explain their steps.
We are investigating the effect of two different ways of supporting self-explanation:
In the menu-based version of the system, students explain each step by typing in, or
selecting from an on-line Glossary, the name of a geometry definition or theorem that
justifies the step. By contrast, in the dialogue-based version of the system (i.e., the
Geometry Explanation Tutor), students explain their quantitative answers in their own
words. The system engages them in a dialogue designed to improve their explana-
tions. It incorporates a knowledge-based natural language understanding unit that
interprets students’ explanations [7]. To provide feedback on student explanations,
the system first parses the explanation to create a semantic representation [13]. Next,
it classifies the representation according to a hierarchy of approximately 200 expla-
nation categories that represent partial or incorrect statements of geometry rules that
occur commonly as novices try to state explanation rules. After the tutor classifies the
response, its dialogue management system determines what feedback to present to the
student, based on the classification of the explanation. The feedback given by the
tutor is detailed yet undirected, without giving away too much information. The stu-
dent may be asked a question to elicit a more accurate explanation, but the tutor will
not actually provide the correction. There are also facilities for addressing errors of
commission that suggest that the student remove an unnecessary part of an explana-
tion.
An example of a student-tutor interaction is shown in Fig. 1. The student is focus-
ing on the correct rule, but does not provide a complete explanation on the first at-
tempt. The tutor feedback helps the student in fixing his explanation.
A classroom study was performed with a control group of 39 students using the
menu-based version of the tutor, and an experimental group of 32 students using the
dialogue version (for more details, see [12]). The results reported here focus on 46
students in three class sections, 25 in the Menu condition and 21 in Dialogue condi-
tion, who had spent at least 80 minutes on the tutor and were present for the pre-test
and post-test. All student-tutor interactions were recorded for further evaluation. The
446 V. Aleven et al.
1
In a previously-published analysis of these data [3], a slightly different grading scheme was
used for Explanation items: half credit was given both for providing the name of a correct
rule name and for providing an incomplete statement of a rule. The current scheme better re-
flects both standards of math communication and the effort required to provide an explana-
tion.
Evaluating the Effectiveness of a Tutorial Dialogue System for Self-Explanation 447
A closer look at the Explanation items shows distinct differences in the type and
quality of explanations given by students in each condition (see Fig. 4). In spite of
written directions on the test to give full statements of geometry rules, students in the
Menu condition only attempted to give a statement of a rule 29% of the time, as op-
posed for example to merely providing the name of a rule or not providing any expla-
nation. The Dialogue condition, however, gave a rule statement in 75% of their Ex-
planation items. When either group did attempt to explain a rule, the Dialogue condi-
tion focused on the correct rule more than twice as often as the Menu group (Dia-
logue .51 ± .27, Menu .21 ±.24; F(1,44) = 16.2, p < .001), and gave a complete and
correct statement of that rule almost seven times as often (Dialogue .44 ± .27 Menu
.06 ± .14; F(1,44) = 37.1, p < .001). A selection effect in which poorer students fol-
low instructions better cannot be ruled out but seems unlikely. The results show no
difference for correctness in answering with rule names (Dialogue .58, Menu .61), but
the number of explanations classified as rule names for the Dialogue group (a total of
12) is too small for this result to be meaningful.
To summarize, in a student population with high prior knowledge, we found that
students who explained in a dialogue learned better to state high-quality explanations
than students who explained by
means of a menu, at no expense
for overall learning. Apparently,
for students with high prior
knowledge, the explanation for-
mat affects communication skills
more than that it affects students’
problem-solving skill or under-
standing, as evidenced by the fact
that there was no reliable differ-
ence on problem-solving or trans-
Fig. 4. Relative frequency of different fer items.
explanation types at the post-test
In order to better understand how the quality of the dialogues may have influenced
the learning results, and where the best opportunities for improving the system might
be, we analyzed student-tutor dialogues collected during the study. A secondary goal
of the analysis was to identify a measure of dialogue quality that correlates well with
learning so that it could be used to guide further development efforts.
The analysis focused on testing a series of hypothesized relations between the sys-
tem’s performance, the quality of the student/system dialogues, and ultimately the
students’ learning outcomes. First, it is hypothesized that students who tend to make
progress at each step of their dialogues with the system, with each attempt closer to a
complete and correct explanation than the previous, will have better learning results
than students who do not. Concisely, greater progress deeper learning. Second,
448 V. Aleven et al.
we hypothesize that students who receive better feedback from the tutor will make
greater progress in their dialogues with the system, or better feedback greater pro-
gress deeper learning. Finally, before this feedback is given, the system’s natural
language understanding (NLU) unit must provide an accurate classification of the
student’s explanation. With a good classification, the tutor is likely to provide better,
more helpful feedback to the student. The complete model we explore is whether
better NLU better feedback greater progress deeper learning.
To test the hypothesized relations in this model, several measures were calculated
from a randomly-selected subset of 700 explanations (each a single student explana-
tion attempt-tutor feedback pair) out of 3013 total explanations. Three students who
did not have at least 10% of their total number of explanations included in the 700
were removed because the explanations included might not represent an accurate
picture of their performance.
First, the quality of the system’s performance in classifying student explanations
was measured as the extent to which two human raters agreed with the classification
provided by the NLU. Each rater classified the 700 explanations by hand with respect
to the system’s explanation hierarchy and then their classifications were compared to
each other and to the system’s classification. Since each explanation could be as-
signed a set of labels, a partial credit system was developed to measure the similarity
between sets of labels. A formula to compute the distance between the categories
within the explanation hierarchy was used to establish a weighted measure of agree-
ment between the humans and the NLU. The closer the categories in the hierarchy,
the higher the agreement was rated (for more details, see [7]). The agreement between
the two human raters was 94% with a weighted kappa measurement [14] of .92. The
average agreement between the humans and the NLU was 87% with a weighted
kappa of .81.
Second, the feedback given by the tutor was graded independently by two human
raters. On a one-to-five scale, the quality of feedback was evaluated with respect to
the student’s response and the correct geometry rule. Feedback to partial explanations
was placed on the scale based on its appropriateness in assisting the student with cor-
recting his explanation, with 1 being totally unhelpful and 5 being entirely apropos.
Explanations that were complete yet were not accepted by the tutor, as well as expla-
nations that were not correct yet were accepted as such, were given a rating of one.
Responses where the tutor correctly acknowledged a complete and correct explana-
tion were given a five. The two raters had a weighted agreement kappa of .75, with
89% agreement.
Finally, the progress made by the student within a dialogue was assessed. Each of
the 700 explanations was paired with its subsequent student explanation attempt in
the dialogue and two human raters independently evaluated whether the second ex-
planation in each pair represented progress towards the correct explanation, compared
to the first. The raters were blind with respect to the tutor’s feedback that occurred in
between the two explanations. (That is, the feedback was not shown and thus could
not have influenced the ratings.) Responses were designated “Progress” if the student
advanced in the right direction (i.e., improved the explanation). “Progress & Regres-
sion” applied if the student made progress, but also removed a crucial aspect of the
Evaluating the Effectiveness of a Tutorial Dialogue System for Self-Explanation 449
back grade for each category, again illustrating that better feedback was followed by
greater progress.
Finally, we looked at the
last step in our model, greater
progress deeper learning.
Each student was given a
single progress score by
computing the percentage of
explanations labeled as “Pro-
gress.” Learning gain was
computed as the commonly-
used measure (post – pre) / (1
– pre). While the relation
between learning gain and
progress was not significant (r
Fig. 7. Best Fit Progress vs. Learning Gain = .253, p > .1), we hypothe-
sized that this may in part be
a result of greater progress by
students with high pre-test scores, who may have had lower learning gains because
their scores were high to begin with. This hypothesis was confirmed by doing a me-
dian split that divided the students at a pre-test score of .46. This correlation was sig-
nificant within the low pre-test group (r = .588, p < .05) as seen in Fig. 7, but not
within the high pre-test group (r = .031, p > .9). We also examined the relation better
feedback deeper learning, which is a concatenation of the last two steps in the
model. The relation between learning gain and feedback grade was statistically sig-
nificant (r = .588, p < .01).
Merging the results of these separate analyses, we see that each step in the hy-
pothesized chain of relations, better NLU better feedback greater progress
deeper learning, is supported by means of a statistically significant correlation. We
must stress, however, that the results are correlational, not causal. While it is tempting
to conclude that better NLU and better feedback cause greater learning, we cannot
rule out an alternative interpretation of the data, namely, that the better students
somehow were better able to stay away from situations in which the tutor gives poor
feedback. They might more quickly figure out how to use the tutor, facilitated per-
haps by better understanding of the geometry knowledge. Nonetheless, the results are
of significant practical value, as discussed further below.
In order to get a better sense of the type of dialogue that expands geometric knowl-
edge, we investigated whether there were any individual differences in students’ dia-
logues with the tutor and how such differences relate to students’ learning outcomes.
First we conducted a detailed study of the dialogues of four students in the Dialogue
condition. Two students were randomly selected from the quarter of students with the
Evaluating the Effectiveness of a Tutorial Dialogue System for Self-Explanation 451
highest learning gains, two from the quarter with the lowest learning gains. In re-
viewing these case studies, we observed that the low-improvement students often
referred to specific angles or specific angle measures in their explanations. For exam-
ple, one student’s first attempt at explaining the Triangle Sum rule is as follows: “I
added 154 to 26 and got 180 and that’s how many degrees are in a triangle.” In con-
trast, both high-improvement students often began their dialogue by referring to a
single problem feature such as “isosceles triangle.” In doing so, students first con-
firmed the correct feature using the feedback from the tutor, before attempting to ex-
press the complete rule.
Motivated by the case-study review, the dialogues of all students in the Dialogue
condition were coded for the occurrence of these phenomena. An explanation which
referred to the name of a specific angle or a specific angle measure was labeled
“problem-specific” and an explanation which named only a problem feature was la-
beled “incremental.” The sample of students was ordered by relative frequency of
problem-specific instances and split at the median to create a “problem-specific”
group and a “no-strategy” group. The same procedure was done again, on the basis of
the frequency of incremental instances, to create an “incremental” group and a “no-
strategy” group.
The effect of each
strategy on learning
gain was assessed
using a 2X2 repeated-
measures ANOVA
with the pre- and post-
test scores as repeated
measure and strategy
frequency (high/low)
as independent factor
(see Fig. 8). The effect Fig. 8. Overall test scores (proportion correct)
of the incremental for frequent and infrequent users of the prob-
strategy was not sig- lem-specific strategy
nificant. However, the
effect of the problem-specific strategy on learning gain was significant (F(2,23) =
4.77, p < .05). Although the problem-specific group had slightly higher pre-test
scores than the no-strategy group, the no-strategy group had significantly higher
learning gains.
It was surprising that the incremental strategy, which was used relatively fre-
quently by the two high-improving students in the case studies, was not related with
learning gain in the overall sample. Apparently, incremental explanations are not as
closely tied to a deep understanding of geometry as expected. Perhaps some students
use this strategy to “game” the system, guessing at keywords until they receive posi-
tive feedback, but this cannot be confirmed from the present analysis.
On the other hand, students who used the problem-specific strategy frequently
ended up with lower learning gains. One explanation of this phenomenon may be that
the dialogues that involved problem-specific explanations tended to be longer, as il-
452 V. Aleven et al.
lustrated in Figure 9. The extended length of these dialogues may be resulting in this
group’s weaker learning gains. The problem-specific group averaged only 52.5 prob-
lems, compared to the no-strategy group’s average of 71 problems in the same
amount of time. An alternative explanation is that the problem-specific group could
be less capable, in general, than the no-strategy group, although the pre-test scores
revealed no difference. Problem-specific explanations might reveal an important as-
pect of student understanding. Their reliance on superficial features might indicate a
weakness in their understanding of geometric structures, in their ability to abstract.
Possibly, they illustrate the fact that students at different levels of geometric under-
standing “speak different languages” [15]. While the implications for the design of
the Geometry Explanation Tutor are not fully clear, it is interesting to observe that
students’ explanations reveal more than their pre-test scores.
6 Conclusion
bly because the students in the sample were advanced students, as evidenced by high
pre-test scores, and thus there was not much room for improvement. It is possible also
that the hypothesized advantages of explaining in one’s own words did not material-
ize simply because it takes much time to explain.
Investigating relations between system functioning and student learning, we found
correlational evidence for the hypothesized chain of relations, better NLU better
feedback greater progress deeper learning, Even though these results do not
show that the relations are causal, it is reasonable to concentrate further system devel-
opment efforts on the variables that correlate with student learning, such as progress
in dialogues with the system. Essentially, progress is a performance measure and is
easier to assess than students’ learning gains (no need for pre-test and post-test and
repeated exposure to the same geometry rules).
Good feedback correlates with students’ progress through the dialogues and with
learning. This finding suggests that students do utilize the system’s feedback and can
extract the information they need to improve their explanation. On the other hand,
students who received bad feedback regressed more often. From observation of the
explanation corpus, other students recognized that bad feedback was not helpful and
tended to enter the same explanation a second time. Generally, students who (on av-
erage) received feedback of lesser quality had longer dialogues than students who
received feedback of higher quality (r = .49, p < .05). A study of the 10% longest
dialogues in the corpus revealed a recurrent pattern: stagnation (i.e., the repeated
turns in a dialogue in which the student did not make progress) followed either by a
“sudden jump” to the correct and complete explanation or by the teacher’s indicating
to the system that the explanation was acceptable (using a system feature added espe-
cially for this purpose). This analysis suggests that the tutor should be able to recover
better from periods of extended stagnation. Clearly, the system must detect stagnation
– relatively straightforward to do using its explanation hierarchy [6] – and provide
very directed feedback to help students recover.
The results indicate that accurate classification by the tutor’s NLU component (and
here we are justified in making a causal conclusion) is crucial to achieving good, pre-
cise feedback, although it is not sufficient– the system’s dialogue manager must also
keep up its end of the bargain. Efforts to improve the system focus on areas where the
NLU is not accurate and areas where the NLU is accurate but the feedback is not very
good, as detailed in [7, 12].
Finally, an analysis of the differences between students with better/worse learning
results found strategy differences between these two groups of students. Two specific
strategies were identified, an incremental strategy that focused on using system feed-
back first to get “in the right ballpark” with minimal effort, and then to expand the
explanation. A second strategy was a problem-specific strategic in which students
referred to specific problem elements. Students who used the problem-specific expla-
nation strategy more frequently had lower learning gains. Further investigations are
needed to find out whether the use of the problem-specific strategy provides addi-
tional information about the student that is not apparent from their numeric answers
to problems and if so, how a tutorial dialogue system might take advantage of that
information.
454 V. Aleven et al.
Acknowledgements. The research reported in this paper has been supported by NSF
grants 9720359 and 0113864. We thank Jay Raspat of North Hills JHS for his in-
spired collaboration.
References
Abstract. Cognitive Tutors are proven effective learning environments, but are
still not as effective as one-on-one human tutoring. We describe an environ-
ment (ALPS) designed to engage students in question-asking during problem
solving. ALPS integrates Cognitive Tutors with Synthetic Interview (SI) tech-
nology, allowing students to type free-form questions and receive pre-recorded
video clip answers. We performed a Wizard-of-Oz study to evaluate the feasi-
bility of ALPS and to design the question-and-answer database for the SI. In
the study, a human tutor played the SI’s role, reading the students’ typed ques-
tions and answering over an audio/video channel. We examine the rate at which
students ask questions, the content of the questions, and the events that stimu-
late questions. We found that students ask questions in this paradigm at a
promising rate, but there is a need for further work in encouraging them to ask
deeper questions that may improve knowledge encoding and learning.
1 Introduction
Intelligent tutoring environments for problem solving have proven highly effective
learning environments [2,26]. These environments present complex, multi-step prob-
lems and provide the individualized support students need to complete them: step-by-
step accuracy feedback and context-specific problem-solving advice. Such environ-
ments have been shown to improve learning one standard deviation over conventional
classrooms, roughly a letter grade improvement. They are two or three times as ef-
fective as typical human tutors, but only half as effective as the best human tutors [7].
While intelligent problem-solving tutors are effective active problem-solving envi-
ronments, they can still become more effective active learning environments by en-
gaging students in active knowledge construction. In problem solving, students can
set shallow performance goals, focusing on getting the right answer, rather than
learning goals, focusing on developing knowledge that transfers to other problems
(c.f., [10]). Some successful efforts to foster deeper student learning have explored
plan scaffolding [18], and self-explanations of problem-solving steps [1]. We are
developing an environment intended to cultivate active learning by allowing students
to ask open-ended questions. Encouraging students to ask deep questions during
problem solving may alter their goals from performance-orientation toward learning-
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 455–467, 2004.
© Springer-Verlag Berlin Heidelberg 2004
456 L. Anthony et al.
orientation, perhaps ultimately yielding learning gains. Aleven & Koedinger [1]
showed that getting students to explain what they know helps learning; by extension,
getting students to explain what they don’t know may also help.
In this project, we integrate Cognitive Tutors, a successful problem-solving envi-
ronment, with Synthetic Interviews, a successful active inquiry environment, to create
ALPS, an “Active Learning in Problem Solving” environment. Synthetic Interviews
simulate face-to-face question-and-answer interactions. They allow students to type
questions and receive video clip answers. While others [4,12,13,21] are pursuing
various tutorial dialogue approaches that utilize natural language processing technol-
ogy, one advantage of Synthetic Interviews over these methods is that their creation
may be simpler. A long-term summative goal in this line of research is whether or not
this strategy is as pedagogically-effective as it is cost-effective. Before addressing this
goal, however, we first must address two important formative system-design goals,
which have not been explored in detail in the context of computer tutoring environ-
ments: to what extent will students, when given the opportunity, ask questions of a
computer tutor to aid themselves in problem solving, and what is the content of these
questions? This paper briefly describes the ALPS environment and then focuses on a
Wizard-of-Oz study designed to explore these formative issues.
open a channel for students to ask questions as the basis of such active-learning ac-
tivities.
also serves to collect student questions to populate the ALPS question and answer
databases.
3.1 Methods
Participants. Our participants were 10 middle school students (nine seventh graders,
one eighth grader; eight males, two females) from area schools. Two students had
used the standard Cognitive Tutor algebra curriculum in their classrooms that year,
three students had been exposed to Cognitive Tutors in a previous class session, and
five had never used Cognitive Tutors before.
Procedure. The study took place in a laboratory setting. The students completed
algebra and geometry problems in one session lasting one and a half hours. During a
session, the student sat at a computer running the Cognitive Tutor with a chat session
connected to the Wizard, who was sitting at a computer in another room. The students
were instructed to direct all questions to the Wizard in the other room via the chat
window. In a window on his own computer screen, the Wizard could see the student’s
screen and the questions the student typed. The Wizard responded to student ques-
tions via a microphone and video camera; the student heard his answer through the
computer speakers and saw the Wizard in a video window onscreen. Throughout
problem solving, if the student appeared to be having difficulty (e.g., either he made a
mistake on the same problem-solving action two or more times, or he did not perform
any problem-solving actions for a prolonged period), the Wizard prompted the stu-
dent to ask a question by saying “Do you want to ask a question?”
Measures. The data from the student sessions were recorded via screen capture soft-
ware. All student mouse and keyboard interactions were captured, as well as student
questions in the chat window and audio/video responses from the Wizard. The ses-
sions were later transcribed from the captured videos. All student actions were
marked and coded as “correct,” “error,” “typo,” or “interrupted” (when a student
began typing in a cell but interrupted himself to ask a question). Student utterances
were then separately coded by two of the authors along three dimensions based on the
research questions mentioned above: initiating participant (student or tutor); question
timing in the context of the problem-solving process (i.e., before or after errors or
actions); and question depth. After coding all 10 sessions along the three criteria, the
two coders met to resolve any disagreements. Out of 431 total utterances, disagree-
ment occurred in 12.5% of items; the judges discussed these to reach consensus.
460 L. Anthony et al.
Answer-oriented: These questions ask about the answer to a problem step or about a
concrete calculation by which a student may try to get the answer. The following
interaction occurred in a problem asking about the relationship among pay rate, hours
worked and total pay. An hourly wage of “$5 per hour” was given in the global
problem statement, and the student was answering the following question in the
worksheet: “You normally work 40 hours a week, but one particular week you take
off 9 hours to have a long weekend. How much money would you make that week?”
The student correctly typed “31” for the number of hours worked, but then typed “49”
(40 + 9) for the amount of money made. When the software turned this answer red,
indicating an error, the student asked, “Would I multiply 40 and 9?” The Wizard
asked the student to think about why he picked those numbers. The student answered,
“Because they are the only two numbers in the problem.”
Asking “Would I multiply 40 and 9?” essentially asks “Is the answer 360?” The
student wants the Wizard to tell him if he has the right answer, betraying his perform-
ance-orientation. The student is employing a superficial strategy: trying various op-
erators to arithmetically combine the two numbers (“40” and “9”) that appear in the
question. After the first step in this strategy (addition) fails, he asks the Wizard if
multiplication will yield the correct answer (he likely cannot calculate this in his
head). Rather than ask how to reason about the problem, he asks for the answer to be
given to him.
typed “110” for the area of the rectangle and asked, “How do you find the area of a
triangle?” The Wizard told him the general formula. In this case, the student correctly
understood what he was supposed to compute, but did not know the formula. He is
not asking to be told the answer, but instead how to find it. The Wizard’s general
answer can then help the student on future problems.
Principle-oriented: General principle-oriented questions show when the student is
moving beyond the current problem context and reasoning about the general mathe-
matical principles involved. We saw only one example of this type of question. It
took place after the student had finished computing the area and perimeter of a square
of side length 8 (area = 64, perimeter = 32). The student did not need help from the
Wizard while solving this problem. He typed “2s+2s” for the formula of a square’s
perimeter, and typed for the formula of a square’s area. He then asked, “Is area
always double perimeter?” The student’s question signified a reflection on his prob-
lem-solving activities that prompted him to make a potential hypothesis about
mathematics. A future challenge is to encourage students to ask more of these kinds
of questions, actively engaging them in inquiry about domain principles.
Figures 1, 2, and 3 show the results from the analysis along three dimensions: initiat-
ing participant, question timing, and question depth. Error bars in all cases represent
the 95% confidence interval. Figure 1 shows the mean number of utterances per stu-
dent per hour that are prompted, unprompted, or part of a dialogue. “Unprompted”
(M= 14.44, SD=7.07) means the student asked a question without an explicit prompt
by the tutor. “Prompted” (M=3.49, SD=1.81) means the student asked after the Wiz-
ard prompted him, as in by saying “Do you want to ask a question?” “Dialogue re-
sponse” (M=11.80, SD=12.68) means the student made an utterance in direct re-
sponse to a question or statement by the Wizard, and “Other” (M=8.23, SD=5.04)
includes statements of technical difficulty or post-problem-solving discussions initi-
ated by the Wizard. The latter two categories are not included in further analyses.
Figure 1 shows that students asked questions at a rate of 14.44 unprompted ques-
tions per hour. Students ask approximately four times more unprompted than
prompted questions (t(18)=4.74, p<.01). The number of prompted questions is
bounded by the number of prompts from the Wizard, but note that the number of
Wizard prompts per session (M=9.49, SD=2.65) significantly outnumbers the number
of prompted questions (t(18)=5.92, p<.01). Even when the Wizard explicitly prompts
students to ask questions, they often do not comply. This suggests that a question-
encouraging strategy in ALPS simply consisting of prompting will not be sufficient.
Figure 2 shows question timing with respect to the student’s problem-solving ac-
tions. “Before Action” (M=8.62, SD=6.26) means the student asked the question
about an action he was about to perform. “After Error” (M=8.46, SD=2.55) means the
student asked about an error he had just made or was in the process of resolving.
“After Correct Action” (M=0.85, SD=1.26) means the student asked about a step he
had just answered correctly. The graph shows that students on average ask signifi-
462 L. Anthony et al.
cantly fewer questions after having gotten a step right than in the other two cases
(t(28)=5.09, p<.01), revealing a bias toward treating the problem-solving experience
as a performance-oriented task. Once they obtain the right answer, students do not
generally reflect on what they have done. This suggests that students might need
encouragement after having finished a problem to think about what they have learned
and how the problem relates to other mathematical concepts they have encountered.
Fig. 2. Mean number of unprompted and prompted questions per hour by question timing
Figure 3 shows the mean number of questions grouped by question topic. “Inter-
face” (M= 10.21, SD=5.60) means the question concerned how to accomplish some-
thing in the software interface or how to interpret something that happened in the
software. “Definition” (M=0.97, SD=1.09) questions asked what a particular term
meant. “Answer” (M=4.98, SD=3.58), “Process” (M=1.68, SD=1.60), and “Principle”
(M=0.07, SD=0.23) questions are defined above. Figure 3 shows an emphasis on
interface questions; although one might attribute the high proportion of student inter-
face questions to the fact that half the participants were students who had not used the
Cognitive Tutor software before, the data show no reliable difference between the
two groups in question rate or content. Yet even among non-interface questions, one
can see that students still focus on “getting the answer right,” as shown by the large
proportion of answer-oriented questions. The difference between the number of
“shallow” questions (answer-oriented) and the number of “deep” questions (process-
oriented plus principle-oriented) is significant (t(28)=4.55, p<.01).
While Figure 2 shows that students on average ask questions before actions and
after errors at about the same rate, the type of question asked varies across the two
contexts. The distinction between the distributions of these two question contexts may
be revealing: asking a question before performing an action may imply forethought
Student Question-Asking Patterns in an Intelligent Algebra Tutor 463
and active problem solving, whereas asking only after an error could imply that the
student was not thinking critically about what he understood. Figure 4 displays a
breakdown of the interaction between question timing and the depth or topic. Based
on the data, when students ask questions before performing an action, they are most
likely to be asking about how to accomplish some action in the interface which they
are intending to perform. When they ask questions after an error, they are most often
asking about how to get the answer they could not get right on their own. The one
principle-oriented question was asked after a correct action and is not represented in
Figure 4.
Fig. 3. Mean number of unprompted or prompted questions per hour by perceived depth
Fig. 4. Comparison of distributions of “Before Action” and “After Error” questions based on
question depth. “After Correct Action” is not included due to low frequency of occurrence
Additional analysis shows that, of the questions that are “After Error” (102 total),
100% are directly about the error that the student has just made or is in the process of
resolving (i.e., through several steps guided by the Wizard). Of those that are “After
Correct Action” (9 total), 4 (44%) are requests for feedback about progress (e.g., “am
I doing ok so far?”), 4 (44%) are clarifications about how the interface works (e.g.,
“can I change my answers after I put them in?”) and only one (11%) is a process- or
principle-oriented query about general mathematics (e.g., “is area always double
perimeter?”). Thus it seems that, although students do take the opportunity to ask
questions, they do not generally try to elaborate their knowledge by asking deep
questions.
464 L. Anthony et al.
First, prior instruction on how to structure deep questions can be designed. It has
been shown that training students to self-explain text when working on their own by
asking themselves questions improves learning [22]. By analogy, training students on
how to ask questions of a tutor may be effective in ALPS. Second, it may be possible
to progressively scaffold question-asking by initially providing a fixed set of appro-
priate questions in menu format, and later providing direct feedback and advice on
the questions students ask. It may also be possible to capitalize on shallow questions
students ask as raw material for these scaffolds; the system could suggest several
ways in which a student question is shallow and could be generalized. Finally, it may
be useful to emphasize post-problem review questions as well as problem-solving
questions. Katz and Allbritton [17] report that human tutors often employ post-
problem discussion to deepen understanding and facilitate transfer. Since students do
not have active performance goals at the conclusion of problem solving, it may be an
opportune time not just to invite, but to actively encourage and scaffold deeper ques-
tions.
5 Conclusions
The Wizard-of-Oz study allowed us to evaluate ALPS’ viability and identify design
challenges in supporting active learning via student-initiated questions. The study
successfully demonstrated that students ask questions in the ALPS environment at a
rate approaching that of one-on-one human tutoring. However, based on student
question content, we can conclude that students are still operating with performance
goals rather than learning goals. It may be that the students did not know how to ask
deep questions, or that the question-asking experience was too unstructured to en-
courage deep questions. There may be ways in which we can promote learning goals,
including using prompts specifically designed to elicit deeper questions, implement-
ing various deep-question scaffolds, encouraging reflective post-problem discussions,
and adding a speech recognizer to reduce cognitive load.
References
1. Aleven, V.A.W.M.M., Koedinger, K.R.: An Effective Metacognitive Strategy: Learning
by Doing and and Explaining with a Computer-Based Cognitive Tutor. Cognitive Science
26(2002)147–149
466 L. Anthony et al.
2. Anderson, J.R., Corbett, A.T., Koedinger, K.R., Pelletier, R.: Cognitive Tutors: Lessons
Learned. Journal of the Learning Sciences 4 (1995) 167 –207
3. Baker, R.S., Corbett, A.T., Koedinger, K.R., Wagner, A.Z.: Off-task Behavior in the
Cognitive Tutor Classroom: When Students “Game the System.” Proc. CHI (2004) to ap-
pear
4. Carbonell, J.R.: AI in CAI: Artificial Intelligence Approach to Computer Assisted In-
struction. IEEE Trans. on Man-Machine Systems 11 (1970) 190–202
5. Chi, M.T.H., DeLeeuw, N., Chiu, M.-H., LaVancher, C.: Eliciting Self-explanations Im-
proves Understanding. Cognitive Science 18 (1994) 439–477
6. Chi, M.T.H., Siler, S.A., Jeong, H., Yamauchi, T., Hausmann, R.G.: Learning from Hu-
man Tutoring. Cognitive Science 25 (2001) 471–533
7. Corbett, A.T.: Cognitive Computer Tutors: Solving the Two-Sigma Problem. Proc. User
Modeling (2001) 137–147
8. Corbett, A.T., Koedinger, K.R., Hadley, W.H.: Cognitive Tutors: From the Research
Classroom to All Classrooms. In: P. Goodman (ed.): Technology Enhanced Learning: Op-
portunities for Change. L. Erlbaum, Mahwah New Jersey (2001) 235–263
9. Core, M.G., Moore, J.D., Zinn, C.: Initiative in Tutorial Dialogue. ITS Wkshp on Empiri-
cal Methods for Tutorial Dialogue Systems (2002) 46–55
10. Elliott, E.S., Dweck, C.S.: Goals: An Approach to Motivation and Achievement. Journal
of Personality and Social Psychology 54 (1988) 5–12
11. Freedman, R.: Atlas: A Plan Manager for Mixed-Initiative, Multimodal Dialogue. AAAI
Wkshp on Mixed-Initiative Intelligence (1999)
12. Freedman, R.: Degrees of Mixed-Initiative Interaction in an Intelligent Tutoring System.
AAAI Symposium on Computational Models for Mixed-Initiative Interaction (1997)
13. Graesser, A., Moreno, K.N., Marineau, J.C., Adcock, A.B., Olney, A.M., Person, N.K.:
AutoTutor Improves Deep Learning of Computer Literacy: Is it the Dialog or the Talking
Head? Proc. AIEd (2003) 47–54
14. Graesser, A.C., Person, N.K.: Question Asking During Tutoring. American Educational
Research Journal 31 (1994) 104–137
15. Hausmann, R.G.M., Chi, M.T.H.: Can a Computer Interface Support Self-explaining?
Cognitive Technology 7 (2002) 4–14
16. Jordan, P., Siler, S.: Student Initiative and Questioning Strategies in Computer-Mediated
Human Tutoring Dialogues. ITS Wkshp on Empirical Methods for Tutorial Dialogue
Systems (2002)
17. Katz, S., Allbritton, D.: Going Beyond the Problem Given: How Human Tutors Use Post-
Practice Discussions to Support Transfer. Proc. ITS (2002) 641–650
18. Lovett, M.C.: A Collaborative Convergence on Studying Reasoning Processes: A Case
Study in Statistics. In: Carver, S., Klahr, D. (eds.): Cognition and Instruction: Twenty-five
Years of Progress. L. Erlbaum, Mahwah New Jersey (2001) 347–384
19. Reeves, B., Nass, C.: The Media Equation: How People Treat Computers, Television, and
New Media Like Real People and & Places. Cambridge University Press, Cambridge UK
(1996)
20. Rosé, C.P., Gaydos, A., Hall, B.S., Roque, A., VanLehn, K.: Overcoming the Knowledge
Engineering Bottleneck for Understanding Student Language Input. Proc. AIEd (2003)
315–322
21. Rosé, C.P., Jordan, P., Ringenberg, M., Siler, S., VanLehn, K., Weinstein, A.: Interactive
Conceptual Tutoring in Atlas-Andes. Proc. AIEd (2001) 256–266
22. Rosenshine, B., Meister, C., Chapman, S.: Teaching Students to Generate Questions: A
Review of the Intervention Studies. Review of Educational Research 66 (1996) 181–221
Student Question-Asking Patterns in an Intelligent Algebra Tutor 467
23. Salton, G., Buckley, C.: Term Weighting Approaches in Automatic Text Retrieval. Tech-
nical Report #87-881, Computer Science Dept, Cornell University, Ithaca, NY (1987)
24. Shah, F., Evens, M., Michael, J., Rovick, A.: Classifying Student Initiatives and Tutor
Responses in Human Keyboard-to-Keyboard Tutoring Sessions. Discourse Processes 33
(2002) 23–52
25. Stevens, S.M., Marinelli, D.: Synthetic Interviews: The Art of Creating a ‘Dyad’ Between
Humans and Machine-Based Characters. IEEE Wkshp on Interactive Voice Technology
for Telecommunications Applications (1998)
26. VanLehn, K., Lynch, C., Taylor, L., Weinstein, A., Shelby, R., Schulze, K., Treacy, D.,
Wintersgill, M.: Minimally Invasive Tutoring of Complex Physics Problem Solving. Proc.
ITS (2002) 367–376
Web-Based Intelligent Multimedia Tutoring for High
Stakes Achievement Tests
Abstract. We describe Wayang Outpost, a web-based ITS for the Math section of
the Scholastic Aptitude Test (SAT). It has several distinctive features: help with
multimedia animations and sound, problems embedded in narrative and fantasy
contexts, alternative teaching strategies for students of different mental rotation
abilities and memory retrieval speeds. Our work on adding intelligence for adap-
tivity is described. Evaluations prove that students learn with the tutor, but learn-
ing depends on the interaction of teaching strategies and cognitive abilities. A new
adaptive tutor is being built based on evaluation results; surveys results and stu-
dents’ log files analyses.
1 Introduction
High stakes achievement tests have become increasingly important in the past years
in the United States, and a student’s performance on such tests can have a significant
impact on his or her access to future educational opportunities. At the same time,
concern is growing that the use of high stakes achievement tests, such as the Scholas-
tic Aptitude Test (SAT)-Mathematics exam and others (e.g., the MCAS exam) simply
exacerbates existing group differences, and puts female students and those from tra-
ditionally underrepresented minority groups at a disadvantage. Studies have shown
that women generally perform less well than men on the SAT-M although their aca-
demic performances in college are similar (Wainer&Steiberg, 1992). Student’s per-
formance on SAT has a significant impact on students’ access to future educational
opportunities such as admission to universities and scholarships. New approaches are
required to help all students perform to the best of their ability on high stakes tests.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 468–477, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Web-Based Intelligent Multimedia Tutoring for High Stakes Achievement Tests 469
rapid progress and dramatically improve their performance in specific content areas.
Evaluation studies of ITS for school mathematics showed the benefits to student users
in school settings (Arroyo, 2003).
This paper describes “Wayang Outpost”, an Intelligent Tutoring System to prepare
students for the mathematics section of the SAT, an exam taken by students at the end
of high school in the United States. Wayang Outpost provides web-based access to
tutoring on SAT-Math (http://wayang.cs.umass.edu). Wayang Outpost is an im-
provement over other tutoring systems in several ways. First, although they can pro-
vide effective instruction, few ITS have really taken advantage of the instructional
possibilities of multimedia techniques in the help component, in terms of sound and
animation. Second, this paper describes our work on incorporating intelligence to
improve teaching effectiveness in various parts of the system: problem selection, hint
selection and student engagement. Third, although current ITS model the student’s
knowledge on an ongoing basis to provide effective help, there have been only pre-
liminary attempts to incorporate knowledge of student group characteristics (e.g.,
profile of cognitive skills, gender) into the tutor and to use this profile information to
guide instruction (Shute, 1995; Arroyo et al., 2000). Wayang Outpost addresses fac-
tors that have been shown to cause females to score lower than males in these tests. It
is suspected that cognitive abilities such as spatial abilities and math fact retrieval are
important determinants of the score in these standardized tests. Math Fact retrieval is
a measure of a student’s proficiency with math facts, the probability that a student can
rapidly retrieve an answer to a simple math operation from memory. In some studies,
math fact retrieval was found to be an important source of gender differences in math
problems (Royer et al., 1999). Other studies found that when mental rotation ability
was statistically adjusted for, the significant gender difference in SAT-M disappeared
(Casey et al, 1995).
2 System Description
Wayang Outpost was designed as a supplement to high school geometry courses. Its
orientation is to help students learn to solve math word problems typical of those on
high stakes achievement tests, which may require the novel application of skills to
tackle unfamiliar problems. Wayang Outpost provides web-based instruction. The
student begins a session by logging into the site and receiving a problem. The setting
is an animated classroom based in a research station in Borneo, which provides rich
real world content for mathematical problems. Each math problem (a battery of SAT-
Math problems provided by the College Board) is presented as a flash movie, with
decisions about problem and hint selection made on the server (the tutor’s “brain”). If
the student answers incorrectly, or requests help, step-by-step guidance is provided in
the form of Flash animations with audio (see figure 1). The explanations and hints
provided in Wayang Outpost therefore resemble what a human teacher might provide
when explaining a solution to a student, e.g., by drawing, pointing, highlighting criti-
cal parts of geometry figures, and talking, in contrast to previous ITS that relied
heavily on static text and images.
470 I. Arroyo et al.
Cognitive skills assessment. Past research suggests that the assessment of cognitive
skills is relevant to selecting teaching strategies or external representations that yield
best learning results. For instance, a study of students’ level of cognitive development
in AnimalWatch suggested that hints that use concrete materials in the explanations
yield higher learning than those which explain the solution with numerical procedures
for students at early cognitive development stages (Arroyo et al., 2000). Thus, Way-
ang Outpost also functions as a research test bed to investigate the interaction of gen-
der and cognitive skills in mathematics problem solving, and in selecting the best
pedagogical approach. The site includes integrated on-line assessments of component
cognitive skills known to correlate with mathematics achievement, including an as-
sessment of the student’s proficiency with math facts, indicating the degree of fluency
(accuracy and speed) of arithmetic computation (Royer et al., 1999), and spatial abil-
ity, as indicated by performance on an standard assessment of mental rotation skill
(Vandenberg et al., 1978). Both tests have captured gender differences in the past.
Help in Wayang Outpost. Each geometry problem in Wayang is linked to two alter-
native types of hints, following different strategies to solving the problem: one strat-
egy provides a computational and numeric approach and the second provides spatial
transformations and visual estimations, generally encompassing a spatial “trick” that
makes the problem simpler to solve. An example is shown in Figure 1. The choice of
hint type should be customized for individual students on the basis of their cognitive
profile, to help them develop strategies and approaches that may be more effective for
particular problems. For example, students who score low on the spatial ability as-
sessment might receive a high proportion of hints that emphasize mental rotation and
estimation, approaches that students of poor spatial ability may not apply even though
they are generally more effective in a timed testing situation. This is a major hypothe-
sis we have evaluated, and the findings are described in the evaluation section.
As the student works through a problem, performance data (e.g., latency, answer
choice, hints requested) are stored in a centralized database. This raw data about stu-
dent interactions with the system feed all our intelligent modules, to select problems
at the appropriate level of challenge, to chooses hints that will be helpful for the stu-
dent, to detect negative attitudes towards help and the tutoring system in general.
Major difficulties in building a student model for standardized testing include the
fact that we start without a clear idea of either problem difficulty or which skills
should be taught. Skills are sparse across problems, so there is a high degree of un-
certainty in the estimation of students’ knowledge. This is different from the design
of most other tutoring systems: generally, the ITS designer knows the topics to be
taught, and then needs to create the content and pedagogy. In the case of standardized
472 I. Arroyo et al.
testing, the content is given, without a clear indication of the underlying skills. The
only clear goal is to have students improve their achievement in these types of prob-
lems. Despite clear indicators of learning have been observed, a more effective Way-
ang Outpost is being built by adapting the tutor’s decisions in various parts of the
system. We are adding artificial intelligence for adaptivity in the following tutoring
decisions:
Problem selection. Problems in Wayang are expensive to build, as the help is so-
phisticated (using animations and sound), and each problem is extremely different
from each other, thus making it hard to show a problem more than twice with differ-
ent arguments, without having students get the impression that it is “the same prob-
lem again”. The result is that we cannot afford the construction of hundreds or thou-
sands of problems, so that certain problems can be used and others discarded. Be-
cause Wayang Outpost currently contains 70 distinct problems, the reality is that a
sophisticated algorithm that uses skill mastery levels to determine the appropriate
skills that a problem should contain is not necessary at this stage. However, we be-
lieve some form of intelligent problem selection would be beneficial. We have thus
implemented an algorithm to optimize word problem “ordering”, a pedagogical agent
whose goal is to show a problem where the student will behave slightly worse than
the average behavior expected for the problem (in terms of mistakes made and hints
seen). Expected values of behavior at a problem computed from log files from prior
users of the system (which used random problem selection). The agent keeps a “de-
sired problem difficulty” factor for the next problem. The next problem selected is the
one that has the closest difficulty to the desired difficulty, which changes after every
solved problem: when the student behaves better than what is expected for the prob-
lem (based on log files’ data of past users), the “desired problem difficulty” factor
increases. Otherwise, it decreases, and thus the next problem will be easier.
Level of information in hints. When the student seeks for help, a hint explains a step
in the solution. Sequences of hints explain the full solution to the problem when stu-
dents keep clicking for help. However, hints have been designed to be “skipped”, in
that each hint contains a summary of the previous steps. Thus, skipping a hint implies
providing minimal information about the step (e.g. if a student clicks for help and the
first hint is skipped, the second hint shown will provide a short static summary of the
first step and the full explanation for the second step in the solution using multime-
dia). Martin&Arroyo (2004) present the results of experiments with simulated stu-
dents, which showed how a Reinforcement Learning agent can learn how to “skip”
hints that don’t seem useful. A more efficient Wayang Outpost will be built by pro-
viding only those hints that seem “useful”. The agent learns the usefulness of hints by
rewarding highly those hints that lead the student to an answer and punishing those
hints that lead to incorrect answers or make the students ask for more help.
Attitudes inference. There is growing evidence that students may have non-optimal
help seeking behaviors, and that they seek and react to help depending on student
motivation, gender, past experience and other factors (Aleven et al, 2003). We found
Web-Based Intelligent Multimedia Tutoring for High Stakes Achievement Tests 473
that students’ negative attitudes towards help and the system are detrimental to learn-
ing, and that these attitudes are correlated to specific behaviors with the tutor such as
time spent on hints, problems seen per minute, hints seen per problem, standard de-
viation of hints asked per problem, etc. We created a Bayesian Network from stu-
dents’ log files and surveys about attitudes towards the system, with the purpose of
making inferences of students’ attitudes and beliefs while students use the system, and
we proposed remedial actions when specific attitudes are detected (Arroyo et al.,
2004).
4 Evaluation Studies
We tested the relevance of students’ cognitive strengths (e.g., math fact retrieval
speed and mental rotation abilities) to the effective selection of pedagogies described
in previous sections, to evaluate the worth of adapting help strategy selection to basic
cognitive abilities of each student. As described in the previous sections, two help
strategies were provided by the tutor, emphasizing either spatial or computational
approaches to the solution. The question that arises immediately is whether the help
component should capitalize or compensate for a student’s cognitive strengths. Is the
spatial approach effective for students with high spatial ability (because it capitalizes
on their cognitive strengths) or for those with low spatial ability (because it compen-
sates for their cognitive weaknesses)? Is the computational help better for students
with high mathematics facts accuracy and retrieval speed from memory (because it
capitalizes on the fast retrieval of arithmetic facts), or is it better for students with low
speed of math fact retrieval (because it trains them in the retrieval of facts)? Given a
specific cognitive profile, what type of help should be provided to the student?
Two studies were carried out in rural and urban area schools in Massachusetts. In
each of the studies, students were randomly assigned to two different versions of the
system: one providing spatial help, the other providing computational help. Students
took a computer-based mental rotation test and also a computer-based test that as-
sessed a student’s speed and accuracy in determining whether simple mathematics
facts were true or false (Royer et al., 1999).
474 I. Arroyo et al.
In the first study, 95 students were involved, 75% females. There was no pre and
post-test data, so learning was captured with a ‘Learning Factor’ that describes how
students decrease their need for help in subsequent problems during the tutoring ses-
sion, on average. This measure should be higher when students learn more. See a
description of this measure (which can be higher than 100%) in (Arroyo et al., 2004).
Students used Wayang Outpost for about 2 hours. Students also used the adventures
of the system for about an hour. After that, students were given a survey asking for
feedback about the system and evaluating their willingness to use the system again.
The second study involved 95 students in an urban area school in Massachusetts, who
used the tutoring system in the same way for about the same amount of time. These
students were also given the cognitive skills pretest and a post-tutor survey asking
about perceptions of the system.
4.2 Results
In the first study, we found a significant gender differences in spatial ability, specifi-
cally a significant difference in the number of correct responses (independent samples
t-test, t=2, p=0.05), females having significantly less correct answers than males.
Females also spent more time in each test item, though not significantly more. We did
not find differences for the math fact retrieval test in this experiment, neither for ac-
curacy nor speed. In the second study, we found a significant gender difference in
math fact accuracy (females scoring higher than males). We did not find, however, a
gender difference in retrieval speed in any of the two studies, differences that other
authors have found (Royer, 1999). We created a variable that combined accuracy and
speed to generate an overall score of math fact retrieval ability and spatial ability. By
classifying students into high and low spatial and math fact retrieval ability (by split-
ting at the median score), we established a 2x2x2 design to test the impact of hints
and cognitive abilities on students’ learning, with a group size of 11-15 students.
In the Fall 2003 study, significant interaction effects were found between cognitive
abilities and teaching strategies in predicting learning, based on an ANOVA
An interaction effect between mental rotation and the type of help was
found (F=3.5, p=0.06, figure 2, table 1). The means in this study suggest that hints
capitalize on students’ mental rotation: when a student has low spatial abilities,
learning is higher with computational help, and when the student has high spatial
ability, hints that teach with spatial transformations produce the most learning.
Web-Based Intelligent Multimedia Tutoring for High Stakes Achievement Tests 475
In the second study, pre and posttest improvements were used as a measure of
learning. A significant overall difference in percentage of questions answered cor-
rectly from pre- to post-test was found, F(1,95)=20.20, p=.000. Students showed an
average 27% increase of their pre-test score at post-test time after 2 hours of using the
tutor. An ANOVA revealed an interaction effect between type of hint, gender and
math fact retrieval in predicting pre to posttest score increase (F(1,73)=4.88, p=0.03),
suggesting that girls are the ones who are prone to capitalize on their math fact re-
trieval ability while boys are not (table 2). Girls of low math fact retrieval do not
improve their score when exposed to computational hints, while they do improve
when exposed to spatial hints. A similar ANOVA just for boys gave no significant
interaction effect between hint type and math fact retrieval, while another one just for
girls showed a stronger effect (F(1,41)=5.0, p=0.03). The effect is described in fig-
ure 3.
In the first study, the spatial dimension was more relevant than the math fact re-
trieval dimension, while in the second study, math fact retrieval was more important
than spatial abilities, despite the fact that students had similar scores on average in the
two studies. Despite these disparities, both results are consistent in that that the sys-
tem should provide teaching strategies that capitalize on the students’ cognitive
strengths whenever there is one cognitive ability that is stronger than the other one.
Fantasy component. A second goal in our evaluation studies was to find whether the
fantasy component in the adventures had differential effects on the motivation of girls
and boys to use the system, given the female-friendly characteristics of the fantasy
context and the female role models. After using the plain tutor with no fantasy com-
ponent, we asked students whether they would want to use the system again. Students
then used the adventures (SAT problems embedded in adventures with narratives
about orangutans and female scientists) after using the plain tutor and we then asked
them again whether they would want to use the system. In both occasions, students
were asked how many more times they would like to use the Wayang system (1 to 5
scale), from would not use it again (1) to as many times as possible (5).
In the first study, we found a significant gender difference in willingness to return
to use the fantasy component of the system (independent samples t-test, t=2.2,
p=0.04), boys willing to return to the “adventures” less than girls. This effect was
repeated in the second study (t-test, t=2.2, p=0.03). This suggests that girls enjoyed
the adventures more than boys did, possibly because girls may have felt more identi-
fied with female characters, as there is no significant difference in willingness to
return to the plain tutor section with no fantasy component. Again, the adventures
section seems to capture females’ attention more than males, while the plain tutor
476 I. Arroyo et al.
attracts both genders equaly. However, significant independent samples t-tests indi-
cated that girls liked the overall system more, took it more seriously, thought the help
was useful more than males, heard the audio in the explanations more.
Fig. 2. Learning with two different teaching strategies in the Fall 2003 study.
Fig. 3. Learning with two different teaching strategies in the 2004 study (girls only).
Web-Based Intelligent Multimedia Tutoring for High Stakes Achievement Tests 477
5 Summary
We have described Wayang Outpost, a tutoring system for the mathematics section of
the SAT (Scholastic Aptitude Test). We described how we are adding intelligence for
adaptive behavior in different parts of the system. Girls are especially motivated to
use the fantasy component. The tutor was beneficial for students in general, with high
improvements from pre to posttest. However, results suggest that adapting the pro-
vided hints to students’ cognitive skills yields higher learning. Students with low-
spatial and high-retrieval profiles learn more with computational help (using arithme-
tic, formulas and equations), and students with high-spatial and low-retrieval profiles,
learn more with spatial explanations (spatial tricks and visual estimations of angles
and lengths). These abilities may be diagnosed with pretests before starting to use the
system. Future work involves evaluating the impact of cognitive skills training on
students’ achievement with the tutor, and evaluating the intelligent adaptive tutor.
References
Arroyo, I.; Beck, J.; Woolf, B.; Beal., C; Schultz, K. (2000) Macroadapting Animalwatch to
gender and cognitive differences with respect to hint interactivity and symbolism. Proceed-
ings of the Fifth International Conference on Intelligent Tutoring Systems.
Arroyo, I. (2003). Quantitative evaluation of gender differences, cognitive development differ-
ences and software effectiveness for an elementary mathematics intelligent tutoring system.
Doctoral dissertation. UMass Amherst.
Arroyo, I., Murray, T., Woolf, B.P., Beal, C.R. (2004) Inferring unobservable learning vari-
ables from students’ help seeking behavior. This volume.
Casey, N.B.; Nuttall, R.; Pezaris, E.; Benbow, C. (1995). The influence of spatial ability on
gender differences in math college entrance test scores across diverse samples. Develop-
mental Psychology, 31, 697-705.
Royer, J.M., Tronsky, L.N., Chan, Y., Jackson, S.J., Merchant, H. (1999). Math fact retrieval
as the cognitive mechanism underlying gender differences in math test performance. Con-
temporary Educational Psychology, 24.
Shute, V. (1995). SMART: Student Modeling Approach for Responsive Tutoring. In User
Modeling and User-Adapted Interaction. 5:1-44.
Martin, K., Arroyo, I. (2004). AgentX: Using Reinforcement Learning to Improve the Effec-
tiveness of Intelligent Tutoring Systems. This volume.
Vandenberg, G. S., & Kuse, R. A. (1978). Mental Rotations, A Group Test of Three-
Dimensional Spatial Visualization. Perceptual and Motor Skills 47, 599-604
Wainer, H.; Steiberg, L. S. Sex differences in performance on the mathematics section of the
Scholastic Aptitude Test: a bidirectional validity study, Harvard Educational Review 62 no.
3 (1992), 323-336.
Can Automated Questions Scaffold Children’s Reading
Comprehension?
1
Now at University of Southern California Law School, Los Angeles, CA 90089.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 478–490, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Can Automated Questions Scaffold Children’s Reading Comprehension? 479
Can such interventions be automated? Are the automated versions effective? How can
we tell?
We investigate these questions in the context of Project LISTEN’s Reading Tutor,
which listens to children read aloud, and helps them learn to read [7]. During the
2002-2003 school year, children used the Reading Tutor daily on some 180 Win-
dows™ computers in nine public schools.
The aspect of the 2002-2003 version relevant to this study was its ability to insert
questions when children read. The Reading Tutor presented text incrementally, add-
ing one sentence (or fragment) at a time. Before doing so, it could interrupt the story
to present a multiple-choice question. It displayed a prompt and a menu of choices,
and read them both aloud to the student using digitized human speech, highlighting
each menu item in turn. The student chose a response by clicking on it. The Reading
Tutor then continued, giving the student spoken feedback on whether the answer was
correct, at least when it could tell. We tried to avoid free response typed input since,
aside from difficulties in scoring responses, students using the Reading Tutor are too
young to be skilled typists. In other experiments students average 30 seconds to type
a single word. Requiring typed responses would be far too time-consuming.
This paper investigates three research issues:
What kinds of automated questions assist children’s reading comprehension?
Are their benefits within a story cumulative or transient?
At what point do questions frustrate students?
Section 2 describes the automated questions. Section 3 describes our methodology
and data. Section 4 reports results for the three research issues. Section 5 concludes.
sion, excluding students with fewer than 10 non-hasty sentence prediction responses.
The correlation was only 0.03, indicating that they were not a valid test of compre-
hension. In contrast, Mostow et al. [8] had already shown that performance on auto-
mated cloze questions in the 2001-2002 version of the Reading Tutor predicted Pas-
sage Comprehension at R=.5 for raw % correct, and at R=0.85 in a model that in-
cluded the effects of item difficulty of story level and word type. We didn’t regener-
ate such a model for the 2003 data, but we confirmed that it showed a similar correla-
tion of raw cloze performance to test scores.
Note that the same cloze question operated both as an intervention that might scaf-
fold comprehension, and as a local outcome measure of the preceding interventions.
We use the terms “cloze intervention” and “test question” to distinguish these roles.
Fig. 1 shows the number of recent interventions before 15,196 cloze test items.
We operationalize “recent” as “within the past two minutes,” based on our initial
analysis, which suggested a two-minute window for effects on cloze performance.
The model included three covariates to represent possible temporal effects at dif-
ferent scales. To model improvement over the course of the year, we included the
month when the question was asked. To model changes in comprehension over the
course of the story, we included the time elapsed since the story started. To model
effects of interruption, we included the time since the most recent Reading Tutor
question.
4 Results
Table 2 shows which predictor variables in the logistic regression model affected
cloze test performance. As expected, student identity and test question type were
highly significant. The beta value for a covariate shows how an increase of 1 in the
value of the covariate affects the log odds of the outcome. Thus the increasingly
negative beta values for successive test question types reflect their increasing diffi-
culty. These beta values are not normalized and hence should not be compared to
measure effect size. The p values give the significance of each predictor variable after
controlling for the other predictors.
Generic questions force readers to carry more of the load than do text-specific
questions. Is this extra burden on the student’s working memory worthwhile [5] or a
hindrance [2]? Generic 3W questions, which let students figure out how a question
relates to the current context, had a positive effect. Cloze interventions, which are
sentence-specific and more explicitly related to the text, did not.
What about feedback? One might expect questions to help more when students are
told if their answers are correct. One reason is cognitive: the feedback itself may
improve comprehension by flagging misconceptions. Another reason is motivational:
students might consider a question more seriously if they receive feedback.
Despite the lack of such feedback, 3W questions bolstered comprehension of later
sentences. Despite providing such feedback, cloze interventions did not help. Evi-
dently the advantages of 3W questions sufficed to overcome their lack of feedback.
Fig. 3. Blowoff rate versus time (in seconds) since previous question
Our analyses illustrate some advantages of networked tutors and storing student-
tutor interactions in a database. The ability to easily combine data from many students
and analyze information as recent as the previous day is very powerful. Capturing
interactions in a suitable database representation makes them easier to integrate with
other data and to analyze [9].
One theme of this research is to focus the AI where it can help the most, starting
with the lowest-hanging fruit. Rather than trying to generate sophisticated questions
or understand children’s spoken answers, we instead focused on when to ask simpler,
generic questions. What stories are most appropriate for question asking? What is an
opportune time to ask questions? There are many ways to apply language technolo-
gies to reading comprehension, some of which may turn out to be feasible and benefi-
cial. However, what ultimately matters is the student’s reading comprehension, not
the computer’s. The Reading Tutor cannot evaluate student answers to some types of
questions it asks, but by asking them can nevertheless assist students’ comprehension.
Using the analysis methods presented here may one day enable it to measure in real-
time the effects of those questions.
References
1. Aist, G., Towards automatic glossarization: Automatically constructing and administer-
ing vocabulary assistance factoids and multiple-choice assessment. International Journal
of Artificial Intelligence in Education, 2001. 12: p. 212-231.
2. Anderson, J.R., Rules of the mind. 1993, Hillsdale, NJ: Lawrence Erlbaum Associates.
3. Beck, J.E., J. Mostow, A. Cuneo, and J. Bey. Can automated questioning help children’s
reading comprehension? in Proceedings of the Tenth International Conference on Artifi-
cial Intelligence in Education (AIED2003). 2003.p. 380-382 Sydney, Australia.
4. Brandão, A.C.P. and J. Oakhill. “How do we know the answer?” Children’s use of text
data and general knowledge in story comprehension. in Society for the Scientific Study of
Reading 2002 Conference. 2002.p. The Palmer House Hilton, Chicago.
5. Kashihara, A., A. Sugano, K. Matsumura, and T. Hirashima. A Cognitive Load Applica-
tion Approach to Tutoring. in Proceedings of the Fourth International Conference on
User Modeling. 1994.p. 163-168.
6. Menard, S., Applied Logistic Regression Analysis. Quantitative Applications in the Social
Sciences, 1995. 106.
7. Mostow, J. and G. Aist, Evaluating tutors that listen: An overview of Project LISTEN, in
Smart Machines in Education, K. Forbus and P. Feltovich, Editors. 2001, MIT/AAAI
Press: Menlo Park, CA. p. 169-234.
490 J.E. Beck, J. Mostow, and J. Bey
8. Mostow, J., J. Beck, J. Bey, A. Cuneo, J. Sison, B. Tobin, and J. Valeri, Using automated
questions to assess reading comprehension, vocabulary, and effects of tutorial interven-
tions. Technology, Instruction, Cognition and Learning, to appear. 2.
9. Mostow, J., J. Beck, R. Chalasani, A. Cuneo, and P. Jia. Viewing and Analyzing Multimo-
dal Human-computer Tutorial Dialogue: A Database Approach. in Proceedings of the
Fourth IEEE International Conference on Multimodal Interfaces (ICMI 2002). 2002.p.
129-134 Pittsburgh, PA: IEEE.
10. NRP, Report of the National Reading Panel. Teaching children to read: An evidence-
based assessment of the scientific research literature on reading and its implications for
reading instruction. 2000, National Institute of Child Health & Human Development:
Washington, DC.
11. Roshenshine, B., C. Meister, and S. Chapman, Teaching students to generate questions: A
review of the intervention studies. Review of Educational Research, 1996. 66(2): p. 181-
221.
Web-Based Evaluations Showing Differential Learning
for Tutorial Strategies Employed by the Ms. Lindquist
Tutor
Abstract. In a previous study, Heffernan and Koedinger [6] reported upon the
Ms. Lindquist tutoring system that uses dialog and Heffernan conducted a web-
based evaluation [7]. The previous evaluation considered students coming from
three separate teachers and analyzed the individual learning gains based on the
number of problems completed depending on the tutoring strategy provided.
This paper examines a set of new web-based experiments. One set of experi-
ments is targeted at determining if a differential learning gain exists between
two of the tutoring strategies provided. Another set of experiments is used to
determine if student motivation is dependent on the tutoring strategy. We repli-
cate some findings from [7] with regard to the learning and motivation benefits
of Ms Lindquist’s intelligent tutorial dialog. These experiments related to
learning report on over 1,000 participants contributing at most 20 minutes each,
for a grand total of over 200+ combined student hours.
1 Introduction
Several groups of researchers are working on incorporating dialog into tutoring sys-
tems: for instance, CIRCSIM-tutor [3], AutoTutor [4], the PACT Geometry Tutor [1],
and Atlas-Andes [8]. The value of dialog in learning is still controversial because
dialog takes up precious time that might be better spent telling students the answer
and moving on to another problem.
In previous work, Heffernan and Koedinger [6] reported upon the Ms. Lindquist
tutoring system that uses dialog and Heffernan conducted a web-based evaluation [7]
using the students from one classroom teacher. This paper reports upon some addi-
tional web-based evaluations using the students from multiple teachers. Ms. Lindquist
was the first model-tracing tutor that had both a model of student thinking and a
model on tutorial planning [5]. The Ms. Lindquist tutoring system helps students
become proficient in writing expressions for algebra word problems. This system is
of the “coached practice” variety that does not offer explicit instruction (i.e., long web
pages or lectures), but instead is meant to scaffold “learning by doing” while students
practice their problem solving skills. An assumption in the development of this sys-
tem was that students would learn more if they could have an intelligent dialog rather
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 491–500, 2004.
© Springer-Verlag Berlin Heidelberg 2004
492 N.T. Heffernan and E.A. Croteau
then simply being told the answer. In earlier work [5] we showed that students
learned more when they engaged in a dialog with the Ms. Lindquist tutoring system,
but did not investigate if it was worth the extra time spent. Later we reported some
web-based results [7] that suggested a motivation benefit as some learning benefits
might come from using these more intelligent dialogs that Ms Lindquist provided, but
the evidence was not unequivocal.
This paper reports the latest results of experiments that were conducted with stu-
dents using the tutor via the internet (at www.AlgebraTutor.org ). Each student who
visits the web site is informed that they are taking part in an experiment. Then they
are randomly assigned to a condition in the experiment. There are several experi-
mental conditions in these experiments, each of them representing different types of
“intelligent” feedback that involve a tutorial dialog. These tutorial strategies are based
upon the observation of an experienced tutor [5]. For instance, the “Concrete Articu-
lation” strategy conducts a dialog based upon the idea of using a concrete instance.
The following dialog is a necessarily-short dialog example of the Concrete Articula-
tion Strategy:
Tutor: In the marching band there are 7 fewer flute players than drummers. If the
number of drummers is “x”, write an expression for the total number of flute play-
ers and drummers.
Student: 7-x+x {Correct answer is x+(x-7)}
Tutor: No. We will come back to the total number of flute players and drummers.
Let me try to break this down for you. Please compute the number of flute players
if the number of drummers was 10 drummers? {Note: Ms Lindquist notices that
the student computed the number of drummers wrong, so she focuses in on that
subgoal.}
Student: 10-7
Tutor: Correct. Now use “x” instead of 10 to again express the number of flute
players.
Student: x-7
Tutor: Correct. Now, lets try this question again: Please write an expression for
the total number of flute players and drummers.
Student: x-7+x
Tutor: Correct. Let us go onto the next problem.
The next example will illustrate the “Verbal” strategy that Ms. Lindquist uses based
on the idea that students are more accustomed to expressing ideas in English than in
algebraic expressions. It makes sense to first ask the student to explain in English
how to achieve a goal and then to ask them how to do it using algebraic symbols
afterwards. The following dialog illustrates the Verbal Strategy:
Tutor: Anne is rowing a boat in a lake and is 800 yards from the dock from which
she started. She rows back towards the dock at 40 yards per minute for “m” min-
utes and stops to rest. How far is she from the dock now?
Student: {Correct answer is
Web-Based Evaluations Showing Differential Learning for Tutorial Strategies 493
Tutor: Hmm. No. We will come back to the distance Anne has left to row. Let me
try to break this down for you. Can you explain how you would find the distance
rowed so far? The distance rowed so far is equal to:
A menu is displayed with three drop-down boxes containing the following items:
[Phrase] [Operator] [Phrase]
her speed rowing Plus her speed rowing
the distance Anne has left to row Minus the distance Anne has left to row
the distance rowed so far Times the distance rowed so far
the distance she started from the dock Divided by the distance she started from the dock
the distance she started from the dock the distance she started from the dock
Student: The distance rowed so far is equal to the number of minutes rowing times
the distance Anne has left to row.
Tutor: Correct. Now, follow that up by writing an expression for the distance
rowed so far.
Student:
Tutor: Correct. Now, lets try this question again: Please write an expression for
the distance Anne has left to row.
Student:
Tutor: Correct. A new problem is needed.
The experiments reported upon in this paper mainly pertain to the Concrete Articula-
tion Strategy, but the Ms. Lindquist tutoring system is quite complicated and has
several different pedagogical strategies. Please see [6] for more information on Ms.
Lindquist including other more interesting dialog examples.
The control condition in all of these experiments is to simply tell the student the
correct answer if they make a mistake (i.e., “No. A correct answer is 5m-100. Please
type that.”) If a student does not make an error on a problem, and therefore receives
no corrective feedback of any sort, then the student has not participated in either the
control condition or the experimental condition for that problem. For each experiment
“time on task” is controlled, whereby a student is given problems until a timer has
gone-off and then is advanced to a posttest after completing the problem they are
currently working on.
The Ms. Lindquist’s curriculum is composed of five sections, starting with rela-
tively easy one-operator problems (i.e., “5x”), and progressing up to problems that
need four or more mathematical operations to correctly symbolize (i.e.,
Few students make it to the fifth section, so the experiments we report on are
only in the first two curriculum sections. At the beginning of each curriculum sec-
tion, a tutorial feedback strategy is selected that will be used throughout the exercise
whenever the student needs assistance. Because of this setup, each student can par-
ticipate in five separate experiments, one for each curriculum section. We would like
to learn which tutorial strategy is most effective for each curriculum area.
Since its inception in September 2000, over 17,000 individuals have logged into
the tutoring system via the website, and hundreds of individuals have stuck around
494 N.T. Heffernan and E.A. Croteau
long enough (e.g., 30 minutes) to provide potentially useful data. The system’s archi-
tecture is constructed in such a way that a user download a web page with a Java
applet on it, which communicates to a server located at Carnegie Mellon University.
Students’ responses are logged into files for later analysis. Individuals are asked to
identify themselves as a student, teacher, parent or researcher. We collect no identi-
fying information from students. Students are asked to make up a login name that is
used to identify them if they return at a later time. Students are asked to specify how
much math background they have. We anticipate that some teachers will log in and
pretend to be a student, which will add additional variance to the data we collect,
thereby making it harder to figure out what strategies are most effective; therefore, we
also ask at the end of each curriculum section if we should use their data (i.e., did
they get help from a teacher, or are they really not a student). Such individuals are
removed from any analyses. We recognize that there will probably be more noise in
web based experiments due to the fact that individuals will vary far more than would
normally occur in individual classroom experiments (Ms. Lindquist is used by many
college students trying to brush up on their algebra, as well as by some students just
starting algebra), nevertheless, we believe that there is still the potential for conduct-
ing experiments studying student learning. Even though the variation between indi-
viduals will be higher, thus introducing more noise into the data, we will be able to
compensate for this by generalizing over a larger number of students than would be
possible in traditional laboratory studies.
In all of the experiments described below, the items within a curriculum section
were randomly chosen from a set of problems for that section (usually 20-40 such
problems per section). The posttest items (which are the exact same as the pretest
items) were fixed (i.e., all students received the same two-item posttest for the first
section, as well as the same three-item posttest for the second section, etc.) We will
now present the experiments we performed.
Thirteen experiments were conducted to see if there was a difference in learning gain
(measured by the difference in posttest and pretest score) according to the tutoring
strategy provided by the tutor. To determine if the difference in learning gain be-
tween the tutoring strategies was statistically significant an ANOVA was conducted.
The measure of learning gain was considered to be a “lightweight” evaluation due to
the brevity of the pretest and posttest.
Each experiment involved two tutoring strategies given at random to a group of
students. Each student participating in the experiment answered at least one problem
incorrectly during the curriculum section, causing the tutor to intervene. Students
receiving a perfect pretest were eliminated from some of the experiments in an at-
tempt to eliminate the “ceiling effect” caused by the shortness of the pretest and the
large number of students scoring perfectly.
The experiments can be divided into two groups, the first examining the difference
between the Inductive Support (IS) and Cut-to-the-chase (Cut) strategy and the sec-
Web-Based Evaluations Showing Differential Learning for Tutorial Strategies 495
ond examining the difference between the IS and Verbal strategy. If students re-
ported that they were students and were required to use the tutor, they were given
either the IS or Cut strategy (we consider these students to be in the “forced” group).
If students reported that they were students and were not required to use the tutor,
they were given either the IS or Verbal strategy (these students are referred to as the
“non-forced” group). Each experiment was conducted over a single curriculum sec-
tion. In some cases there were multiple experiments for the same curriculum section
and strategy comparison, which was made possible by having several large but dis-
tinct sets of students coming from different versions of the tutor where time on task
had been modified. The thirteen experiments, which are indicated in table 1, will now
be described along with their results.
An early version of the tutor provided the Verbal and Cut strategies on Section 1 to
forced students, so these two experiments are based on those students. In experiment
1, 64 students received Verbal, whereas 87 students received Cut. Since approxi-
mately 2/3 of these students obtained a perfect pretest, experiment 2 was conducted
with the same students, but removing those students receiving a perfect pretest. The
reason for keeping the first experiment is that reporting on overall learning is only
possible if all students taking the pretest are accounted for even if they received a
perfect score. Due to the large number of students receiving a perfect pretest, it is
obvious that a longer pretest would have helped eliminate this problem, but may also
have reduced the number of students completing the entire curriculum section.
The first experiment showed no evidence of a differential learning gain between the
Verbal and Cut strategies with the learning gain for Verbal being 13% and for Cut
being 14%. This was not surprising since 2/3 of the students had received a perfect
pretest, which was our motivation for creating experiment 2, having those students
eliminated. For the second experiment, there was also no evidence of a differential
learning gain, although the learning gain for Verbal was 41% and Cut was 35%. For
each of these experiments the number of problems solved by strategy was statistically
different (p<.0001). This is not particularly surprising as the Cut strategy simply
provides the correct answer, whereas the Verbal strategy is more time consuming by
using menus and intelligent dialog, which results in fewer problems being completed
on average. Another observation is that the time on task for each strategy was statis-
tically different (p<.0001). This is explained by a design decision to allow students to
finish the problem they are working on before advancing to the posttest, which means
more time consuming tutoring strategies result in a slightly longer average time on
task.
496 N.T. Heffernan and E.A. Croteau
For the first section forced students, the IS and Cut strategies were provided on the
latest version of the tutor. Although the number of forced students was substantially
less than non-forced students (due to the tutor being available online rather than used
just in a classroom setting), both experimental conditions had over 60 students. Only
enough data was available for a single experiment on the first section involving the IS
and Cut strategies since the tutor previously provided the Verbal and Cut strategies on
that section as seen in Experiments 1 and 2.
For this experiment, a differential learning gain between the IS and Cut strategy was
observed as being statistically significant (P=.0224). Students with the IS strategy had
a learning gain of 53% and those with the Cut strategy 36%. The pretest scores were
surprising in that those students given the IS strategy had a lower score, on average
22% correct with those given the Cut strategy having on average 34% correct. Inter-
estingly, the students given the IS strategy not only had lower performance on the
pretest, but also had a higher performance on the posttest, which explains the statisti-
cally significant learning gain observed.
On the second curriculum section, the IS and Cut strategies were given to the forced
students. Two experiments were conducted using a set of students that were con-
trolled for time. The students in Experiment 5 were given twice as much time as
those in Experiment 4 (1200 seconds vs. 600 seconds).
These six experiments compared differential learning for the IS and Verbal strategies
on the first section, which were given to non-forced students. It was noticed for Ex-
periment 6 that approximately 2/3 of the students received a perfect pretest. To pre-
vent a ceiling effect of student’s not demonstrating learning, those students receiving
Web-Based Evaluations Showing Differential Learning for Tutorial Strategies 497
Experiments 6-11 all showed that students given the IS strategy as having a higher
learning gain than those receiving the Verbal strategy. Experiment 8 had a p-value
suggesting the difference in learning gain was not statistically significant, which
could partially be explained by the small sample size (approximately 30 students
given each condition) and due to the high pretest scores (75% for IS and 84% for
Cut), which resulted in a ceiling-effect. Looking at the posttest scores, those given IS
received 89% correct, whereas those given Cut received 93% correct. It should be
noted that Experiment 11, which was the combination of students from Experiments 9
and 10 increased the statistical significance for learning gain from (P=0.1030) and
(P=0.0803) consecutively to (P=0.0210).
These two experiments compared differential learning for the IS and Verbal strategies
on the second section, which were given to non-forced students. Both experiments
involved a separate group of students having a different time on task. In Experiment
12 the average problem solving time was approximately 700 seconds, whereas in
Experiment 13 the average problem solving time was approximately 1200 seconds.
The sample size used for experiment 13 (having approximately 100 students) con-
tained almost twice as many students as that used for experiment 13.
Experiments 12 and 13, which are both on the second curriculum section did not
show statistical evidence of differential learning gain. For Experiment 12 the learning
gain of those students given the IS strategy, which was 22% was slightly higher than
those students given the Cut strategy, which was 18%. The results of Experiment 13,
which had twice the number of students and double the amount of time on task had a
learning gain of 30% for those given the IS strategy and 33% for those given the Cut
strategy. Although the difference in learning gain was insignificant for both of these
experiments, it was odd that such a large number of students would show nothing
498 N.T. Heffernan and E.A. Croteau
significant after 20 minutes of problem solving. It was observed that the pretest score
between condition in Experiment 13 was statistically significant (P=.0465), which
indicates that the lightweight evaluation method may be partially responsible.
For the first experiment, with approximately 150 students in each condition, the per-
cent of students completing the second section was 50% and 49% for IS and Verbal
consecutively which was not statistically different. For the second experiment, with
approximately 65 students in each condition, the section completion rate was 55%
and 65% for IS and Verbal consecutively, which was also not statistically different.
The third and four experiments contained an even larger number of students, but for
both of these experiments no difference in motivation was seen for the given tutorial
strategy. The motivation experiments are summarized in the following table:
From these four experiments, it would appear that student motivation is not influ-
enced by giving either the IS or Verbal strategy. Possibly student motivation is not
seen, because students starting the second section after finishing the first have nearly
the same motivation. It would be interesting to see if student motivation on the
Web-Based Evaluations Showing Differential Learning for Tutorial Strategies 499
first section was dependent on strategy given, which will most likely be looked at in a
future study.
However, these results should be taken with a grain of salt given that students are
talcing a two or three item pretest and posttest, which is due to our decision to provide
only a lightweight evaluation as previously mentioned. Web-based evaluation for the
most part makes this lightweight evaluation useful given the large collection of data
that is produced.
5 Conclusion
In earlier work [5] presented evidence that suggested that students learned more when
they engaged in a dialog with the Ms. Lindquist tutoring system, but did not investi-
gate if it was worth the extra time spent. Later we reported some web-based results
[7] that suggested that the Cut to the chase strategy was inferior to the IS strategy in
terms of learning gain.
From the experiments reported in this work conducted on differential learning by
tutorial strategy, it appears that the benefit to using one strategy over another is
sometimes seen on the first curriculum section. In particular, experiment three is
something of a replication of the work from [7]. This could partially be explained by
the tutorial dialogs on the second section being longer and requiring more time to
read. It should be noted that a student can spend a great deal of time on a single
problem, and these results are making us consider setting a time cut-off for a dialog
so that students don’t spend too much time on any one dialog.
Next we turn to comparing IS with Verbal. It appears that providing the IS strategy
is a better choice than Verbal on the first curriculum section as seen by the significant
difference in learning gain on experiment 7, 9, 10 and 11. We were pleasantly sur-
prised that we could detect differences in learning rates in only 8-10 minutes suing
such crude (2 item pre and posttests).
500 N.T. Heffernan and E.A. Croteau
The strong evidence for the IS strategy being better than the Cut strategy was not
particularly surprising. Heffernan [7] previously reported seeing a similar result, but
this was for students working on the second curriculum section. We have to study this
further to better understand these results.
Finally, it should be reiterated that no differences in motivation could be found
between the IS and Verbal strategies. This could possibly be explained by both of
these strategies being advanced, in that they keep a participant more involved than the
naive Cut strategy. This results is also consistent with [7] that reported the same thing.
Given that students seemed to learn a little better with the IS strategy than the Verbal
strategy, we thought we might see a motivation benefit for the IS strategy but we did
not.
References
1. Aleven V., Popescu, O. & Koedinger, K. R. (2001). Towards tutorial dialog to support self-
explanation: Adding natural language understanding to a cognitive tutor. In Moore, Red-
field, & Johnson (Eds.), Proceedings of Artificial Intelligence in Education 2001. Amster-
dam: IOS Press.
2. Birnbaum, M.H. (Ed.). (2000). “Psychological Experiments on the Internet.” San Diego:
Academic Press. http://psych.fullerton.edu/mbirnbaum/web/IntroWeb.htm
3. CIRCSIM-Tutor (2002). (See http://www.csam.iit.edu/~circsim/)
4. Graesser, A.C., Wiemer-Hastings, P., Wiemer-Hastings, K., Harter, D., Person, N., & the
TRG (in press). Using latent semantic analysis to evaluate the contributions of students in
AutoTutor. Interactive Learning Environments.
5. Heffernan, N. T. (2001). Intelligent Tutoring Systems have Forgotten the Tutor: Adding a
Cognitive Model of an Experienced Human Tutor. Dissertation & Technical Report. Carne-
gie Mellon University, Computer Science, http://www.algebratutor.org/pubs.html.
6. Heffernan, N. T., & Koedinger, K. R. (2002) An intelligent tutoring system incorporating a
model of an experienced human tutor. Sixth International Conference on Intelligent Tutor-
ing Systems.
7. Heffernan, N. T. (2003). Web-Based Evaluations Showing both Cognitive and Motivational
Benefits of the Ms. Lindquist Tutor. International Conference Artificial Intelligence in
Education. Sydney, Australia.
8. Rosé, C., Jordan, P., Ringenberg, M, Siler, S., VanLehn, K. and Weinstein, A. (2001) Inter-
active conceptual Tutoring in Atlas-Andes. In Proceedings of AI in Education 2001 Confer-
ence.
The Impact of Why/AutoTutor on Learning and
Retention of Conceptual Physics
G. Tanner Jackson, Matthew Ventura, Preeti Chewle, Art Graesser,
and the Tutoring Research Group
1 Introduction
Why/AutoTutor is the fourth in a series of tutoring systems built by the Tutoring
Research Group at the University of Memphis. Why/AutoTutor is an intelligent tu-
toring system that uses an animated pedagogical agent to converse in natural language
with students. This recent version was designed to tutor students in Newtonian con-
ceptual physics, whereas all previous versions were designed to teach introductory
computer literacy. The architecture of AutoTutor has been described in previous
publications [1], [2], [3], [4], so only an overview is provided here before we turn to
some empirical tests of Why/AutoTutor on learning gains.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 501–510, 2004.
© Springer-Verlag Berlin Heidelberg 2004
502 G.T. Jackson et al.
the learner and the pedagogical agent. Subject matter content and general world
knowledge are represented with both a structured curriculum script and with latent
semantic analysis (LSA), as discussed below [6], [7]. LSA and surface language fea-
tures determine the assessment metrics of the quality of learners’ contributions.
AutoTutor makes use of an animated conversational agent with facial expressions,
synthesized speech, and rudimentary gestures. Although it is acknowledged that the
conversational dialog will probably never be as dynamic and adaptive as human-to-
human conversation, we do believe that AutoTutor’s conversational skills are as good
as or better than other pedagogical agents. Evaluations of the dialog fidelity have
supported the conclusion that AutoTutor can respond to the vast majority of student
contributions in a conversationally and pedagogically appropriate manner [8], [9].
AutoTutor’s architecture includes a set of permanent databases that do not get up-
dated during the course of tutoring. The first is a curriculum script database, which
contains a complete set of tutoring materials including: tutoring questions, ideal an-
swers, answer expectations (specific components necessary for a complete answer),
associated misconceptions, corrections of misconceptions, and other dialog moves
with related content. A second permanent database is an indexed copy of the Con-
ceptual Physics textbook [10]. When a student asks AutoTutor a question, the tutor
uses a question answering facility to pull a plausible answer from the textbook, or
another relevant document. In a similar manner, AutoTutor makes use of the glossary
from the Conceptual Physics textbook as a third permanent database. Fourth, the
server contains a set of lexicons, syntactic parsers, and other computational linguistics
modules that support information extraction, analyze student contributions, and help
AutoTutor proceed appropriately through a tutoring session. Fifth, the server houses
a space for latent semantic analysis (LSA).
LSA is a core component for representing semantic world knowledge about con-
ceptual physics, curriculum content, or any other subject matter [6], [11]. LSA is a
high-dimensional, statistical representation that assigns vector quantities to words and
documents on the basis of co-occurrence constraints in a large corpus of documents.
These vectors are used to calculate the conceptual similarity of any two segments of
text, which could be as small as a word or as large as a complete document [7], [12],
[13]. We use LSA in AutoTutor as a semantic matching operation that compares the
student contributions to expected good answers and to possible misconceptions.
In a normal session, Why/AutoTutor begins with a two-minute introduction and
overview, and then launches into the appropriate learning session. Why/AutoTutor
typically covers five conceptual physics problems in a single session. After the intro-
duction, it starts the conversation by posing a problem and prompting the student for
an answer. A conversation ensues as the tutor and student attempt to collaboratively
cover the material necessary to answer the question completely. For example, one of
the questions is “Suppose a boy is in a free-falling elevator and he holds his keys
motionless right in front of his face and then lets go. What will happen to the keys?
Explain why”. Below is an excerpt from the middle of a dialog between
Why/AutoTutor and an actual student.
TUTOR-1: What about this? The boy and his keys free fall in the elevator
starting with the same vertical ?
The Impact of Why/AutoTutor on Learning and Retention of Conceptual Physics 503
STUDENT-1: acceleration
TUTOR-2: Sort of ... vertical position.
TUTOR-3: Moving on, and another thing, objects starting from the same posi-
tion, and having the same velocity will have the same displacement.
TUTOR-4: What happens when two objects start at the same position with
same acceleration?
STUDENT-2: they have the same velocity
TUTOR-5: OK.
TUTOR-6: The displacement of the keys and the boy’s face should be ?
STUDENT-3: equal
TUTOR-7: Yep, you are absolutely right! The same.
This example demonstrates several distinct tutoring features that are common in
Why/AutoTutor. The system (a) prompts the student for specific information missing
from a previous response (TUTOR-1, TUTOR-6), (b) inserts correct information into
an erroneous or incomplete student turn (TUTOR-2), (c) provides a summary of rele-
vant information and then gives a hint with a related question (TUTOR-3 & TUTOR-
4), (d) “understands” and assesses natural language contributions, including semanti-
cally similar statements (STUDENT-2, STUDENT-3), (e) provides feedback to the
student on the student’s previous turn (TUTOR-2, TUTOR-5, TUTOR-7), and (f)
maintains coherence from previous turns while it adapts to student contributions
(STUDENT-2 content excludes specific required information about “equal displace-
ment” so the TUTOR-6 turn asks a question related to this required information).
Research on naturalistic tutoring [5], [14], [15], [16] provided some of the guidance
in designing these dialog moves and tutoring behaviors.
and (i) expert human tutors communicating with students via phone and computer
(Human phone mediated).
A number of outcomes have been drawn from previous analyses, but only a few are
mentioned here. First, AutoTutor is effective at promoting learning gains, especially
at deep levels of comprehension (the effect size, [1], [2], when compared
with: the ecologically valid situation where students read nothing, baseline rates at
pretest, or reading the textbook for a controlled amount of time (equivalent to time
spent with AutoTutor). Second, reading the textbook is not much different from doing
nothing. These two results together support the claim that a tutor is needed to encour-
age the learner to focus on the appropriate content and to comprehend it at deeper
levels.
2.1 Participants
As in our previous experiments on Newtonian physics, students were enrolled in
introductory physics courses, and received extra credit for their participation. Stu-
dents were recruited for the experiment after having completed the related material in
the physics course. In total, 70 students participated in the experiment. Due to incom-
plete data for some students, 67 participants were included in the analyses for the
multiple choice data, and only 56 participants were included in the analyses for the
essay data.
2.2 Procedure
Participation in the experiment consisted of two sessions, one week apart, each in-
volving two testing phases. In the first session (approximately 2.5 to 3 hours) partici-
pants took a pretest, interacted with one of the tutors in a training session, and took an
immediate posttest. During the second session (approximately 30 minutes to 1 hour),
which was one week later, participants took a retention test and a far transfer test. The
pretest consisted of three conceptual physics essay questions. During the training
sessions, participants interacted with one of the tutors in an attempt to answer five
conceptual physics problems. The immediate posttest and the retention test were
counterbalanced, both forms consisting of three conceptual physics essays and 26
multiple choice questions. The far transfer task involved answering seven essay ques-
tions that were designed to test the transfer of knowledge (at deep conceptual levels,
not surface similarities) from the training session.
The Impact of Why/AutoTutor on Learning and Retention of Conceptual Physics 505
2.3 Materials
The posttest and retention test both included a counterbalanced set of 26 multiple
choice questions that were extracted from or similar to those in the Force Concept
Inventory (FCI). The FCI is a widely used test of Newtonian physics [19]. An exam-
ple problem is provided below in Table 1. The multiple choice questions in previous
studies were counterbalanced between the pretest and posttest (there was no retention
test). One concern with this procedure is that the participants could possibly become
sensitized to the content of the multiple choice test questions during the pretest, and
would thereby perform better during the posttest phase; the potential pretest sensiti-
zation would confound the overall learning gains. The graded essays correlated
highly (r=.77) with the multiple choice scores in previous studies, so the multiple
choice section was pushed to after the training, and essays alone served as the pretest
measure.
All testing phases included open-ended conceptual physics essay questions that
were designed by experienced physics experts. Each essay question required ap-
proximately a paragraph for a complete answer; an example question is illustrated in
Table 1. All essay questions were evaluated (blind to condition) by accomplished
physics experts both holistically (an overall letter grade) and componentially (by
identifying specific components of an ideal answer, called expectations, and miscon-
ceptions associated with the problem). When grading holistically, the physics experts
read each student essay answer and graded it according to a conventional letter grade
scale (i.e., A, B, C, D, or F). This grade was later translated into numerical form for
analysis purposes, with higher scores corresponding to better grades. Essays were also
graded in a componential manner by grading each expectation and misconception
associated with each essay on an individual basis. The expectations and misconcep-
tions were graded as explicitly present, implicitly present, or absent. To be consid-
ered explicitly present, an expectation/misconception would have to be stated in an
overt, obvious manner. An implicitly present expectation/misconception would be
counted if the participant seemed to have the general idea, but did not necessarily
express it completely. An expectation/misconception would be considered absent if
there were no signs of direct or indirect inclusion, or if it was obviously excluded.
At the end of the second session, participants answered 7 far transfer essay ques-
tions. The far transfer essays were designed to test knowledge transfer from the
training and testing set to a new set and style of questions that covered the same un-
derlying physics principles. Table 1 shows one of the example questions. The far
transfer questions were also graded both holistically and componentially by the
physics experts.
The two learning conditions in this experiment were Why/AutoTutor, as previously
described, and Minilesson. The Minilesson is an automated information delivery sys-
tem which covers the same physics problems as AutoTutor. The Minilessons pro-
vided relevant and informative summaries of Newton’s laws, along with examples
that demonstrated both good principles and common misconceptions. Students were
presented text by the Minilesson and clicked a “Next” button to continue through the
material (paragraph by paragraph). The following is a small excerpt from the Miniles-
506 G.T. Jackson et al.
son, using the same elevator-keys problem as before, “As you know, displacement
can be defined as the total change in position during the elapsed time. The man’s
displacement is the same as that of his keys at every point in time during the fall. So,
we can conclude...” The Minilesson condition was designed to convey the informa-
tion necessary for an ideal answer to the posed problems. It is considered to be an
ideal text for covering all aspects of each problem.
3 Results
We conducted several analyses that investigated differences between training condi-
tions across different testing phases. The results from the multiple choice and essay
data confirmed a previously held hypothesis that the students’ prior knowledge level
may be inversely related to proportional learning gains. This hypothesis is discussed
briefly in the conclusion section of this paper (see also [18]).
Table 2 presents effect sizes (d) for Why/AutoTutor, as well as means and standard
deviations from the multiple choice and holistic essay grades. When considering the
effect sizes for Why/AutoTutor alone, it significantly facilitated learning compared
with the pretest baseline rate. For the posttest immediately following training,
Why/AutoTutor showed an effect size of 0.97 sigma compared to the pretest. That
means, on average, participants who interacted with Why/AutoTutor scored almost a
full standard deviation (approximately a full letter grade) above their initial pretest
score. This large learning gain also persisted through a full week delay when the same
The Impact of Why/AutoTutor on Learning and Retention of Conceptual Physics 507
participants took the retention test (d=0.93) and the far transfer test (d=1.41). It
should be noted that these students had already finished covering the related material
in class sometime before taking the pretest, so they rarely, if ever, covered the mate-
rial again during subsequent class exposure, i.e., between the pre- and posttests.
Thus, any significant knowledge retention can probably be attributable to the training
rather than intervening relearning. Similarly, Why/AutoTutor had a positive effect size
for almost all comparisons with the Minilesson performance: multiple choice reten-
tion scores (d=0.34), holistic retention grades (d=0.14), and holistic far transfer
grades (d=0.38). Why/AutoTutor had only one negative effect size (d=-0.10) when
compared with the Minilesson condition at the immediate posttest performance. Un-
fortunately, however, most of these comparisons with Minilessons were not statisti-
cally significant.
A statistical analysis of the holistic essays revealed that participants performed sig-
nificantly higher in all subsequent tests than in the pretest, F(1,54) = 27.80, p < .001,
so there was significant learning in both conditions. However, an
ANOVA on the holistically graded essays, across all testing occasions, found no
significant differences between Why/AutoTutor and Minilesson participants, F(1,54) =
1.27, p = .27, A one-way ANOVA on the multiple choice test also
indicated that the participants in the Why/AutoTutor condition did not significantly
differ from those in the Minilesson condition, F(1,65) = 2.32, p = .13,
Analyses of the detailed expectation/misconception assessments demonstrated similar
trends as the previous analyses. In these assessments, we computed the proportion of
expectations (or anticipated misconceptions) that were present in the essay according
to the expert judges. Remember that each essay was graded in a componential man-
ner by grading each expectation and misconception as explicitly present, implicitly
present, or absent. The analyses included here used a lenient grading criteria, mean-
ing that expectations are considered covered if they are either explicitly or implicitly
present in a student’s essay. Misconceptions used a similar lenient grading criteria
during analysis. Effect sizes for expectations were favorable when comparing pre-
testperformance in AutoTutor to all respective subsequent posttest phases (d=0.52,
d=0.31, d=0.73, respectively). Similarly, when compared to pretest scores, effect
sizes for the analysis on the misconceptions were favorable for Why/AutoTutor (d=-
0.48, d=-0.56, d=-0.20, in respective order). Having fewer misconceptions is consid-
ered good, so lower numbers and negative effects are better. When Why/AutoTutor
was compared to the Minilesson, each effect size was in a favorable direction (ex-
pectations: d=0.24, d=0.16, d=0.33, and misconceptions d=-0.03, d=-0.17, d=-0.24,
respectively).
A repeated measures analysis on the expectations revealed that in both conditions
participants expressed significantly more correct expectations in all subsequent tests
than in the pretest, F(1,54) = 21.99, p < .001, A repeated measures
analysis of the misconceptions similarly revealed that students expressed significantly
fewer misconceptions in the posttest and retention test than in the pretest, F(1,54)
=13.68, p < .001, A one-way ANOVA on the expectations resulted in
non-significant differences between test phases of Why/AutoTutor
508 G.T. Jackson et al.
and Minilesson
respectively), F(1,54) = 1.38, p = .25, An ANOVA on the
misconceptions also revealed non-significant differences between test phases of
Why/AutoTutor and Minilesson
respectively), F(1,54) = .34, p= .56,
There were similar trends in a previous study that had no retention component [18].
There were overall significant learning gains for each condition, but no differences
between the conditions. Both studies used students currently enrolled in physics
courses, which made the participants “physics intermediates”. Since all previous
studies involved participants with intermediate physics knowledge, subsequent analy-
ses were conducted that examined only those students with a pretest score lower than
forty percent, called “physics novices”. These post hoc analyses on physics novices
indicated that students with lower pretest scores had higher learning gains and
showed different trends than the higher pretest students. Specifically, low knowledge
students may benefit the most from interacting with these learning tools. A study in
progress has been specifically designed to have physics novices interact with the
systems in an attempt to provide more discriminating assessments of potential learn-
ing differences.
Several questions remain unanswered from the available research. What is it about
these systems that facilitates learning, and under what conditions? Is it the mode of
content delivery, the content itself, or some complex interaction? Do motivation and
emotions play an important role, above and beyond the cognitive components? One
of the goals in our current AutoTutor research is to further explore what exactly leads
to these learning gains, and to determine how different learning environments pro-
duce such similar effects. Our current and future studies have been designed to ad-
dress these questions directly. Even though a detailed answer may yet be unknown,
the fact still remains that students learn significantly from interacting with AutoTutor
and this transferable knowledge is acquired at a level that persists over time.
References
1. Graesser, A.C., Jackson, G.T., Mathews, E.C., Mitchell, H.H., Olney, A.,Ventura, M.,
Chipman, P., Franceschetti, D., Hu, X., Louwerse, M.M., Person, N.K., & TRG:
Why/AutoTutor: A test of learning gains from a physics tutor with natural language dia-
log. In R. Alterman and D. Hirsh (Eds.), Proceedings of the Annual Conference of the
Cognitive Science Society. Boston, MA: Cognitive Science Society. (2003) 1-6
2. Graesser, A.C., Lu, S., Jackson, G.T., Mitchell, H., Ventura, M., Olney, A., & Louwerse,
M.M.: AutoTutor: A tutor with dialog in natural language. Behavioral Research Methods,
Instruments, and Computers. (in press)
510 G.T. Jackson et al.
3. Graesser, A.C., Wiemer-Hastings, K., Wiemer-Hastings, P., Kreuz, R., & TRG: AutoTu-
tor: A simulation of a human tutor. Journal of Cognitive Systems Research, 1, (1999) 35-
51
4. Graesser, A.C., VanLehn, K., Rose, C., Jordan, P., & Harter, D.: Intelligent tutoring sys-
tems with conversational dialogue. AI Magazine, 22, (2001) 39-51
5. Graesser, A. C., Person, N. K., & Magliano, J. P. : Collaborative dialog patterns in natu-
ralistic one-to-one tutoring. Applied Cognitive Psychology, 9, (1995) 1-28
6. Graesser, A.C., Wiemer-Hastings, P., Wiemer-Hastings, K., Harter, D., Person, N., and the
TRG: Using latent semantic analysis to evaluate the contributions of students in AutoTu-
tor. Interactive Learning Environments, 8, (2000) 129-148
7. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis.
Discourse Processes, 25, (1998) 259-284
8. Jackson, G. T., Mueller, J., Person, N., & Graesser, A.C.: Assessing the pedagogical ef-
fectiveness and conversational appropriateness in three versions of AutoTutor. In J.D.
Moore, C.L. Redfield, and W.L. Johnson (Eds.) Artificial Intelligence in Education: AI-
ED in the Wired and Wireless Future. Amsterdam: OIS Press. (2001) 263-267
9. Person, N.K., Graesser, A.C., Kreuz, R.J., Pomeroy, V., & TRG: Simulating human tutor
dialog moves in AutoTutor. International Journal of Artificial Intelligence in Education,
12, (2001) 23-39
10. Hewitt, P.G. Conceptual physics edition). Reading, MA: Addison-Wesley. (1992)
11. Olde, B. A., Franceschetti, D.R., Karnavat, Graesser, A. C. & the TRG: The right stuff: Do
you need to sanitize your corpus when using latent semantic analysis? Proceedings of the
24th Annual Conference of the Cognitive Science Society. Mahwah, NJ: Erlbaum. (2002)
708-713
12. Foltz, P.W., Gilliam, S., & Kendall, S.: Supporting content-based feedback in on-line
writing evaluation with LSA. Interactive Learning Environments, 8, (2000) 111-127
13. Kintsch, W.: Comprehension: A paradigm for cognition. Cambridge, MA: Cambridge
University Press. (1998)
14. Chi, M. T. H., Siler, S., Jeong, H., Yamauchi, T., & Hausmann, R. G.: Learning from
human tutoring. Cognitive Science, 25, (2001) 471-533
15. Fox, B.: The human tutorial dialog project. Hillsdale, NJ: Erlbaum. (1993)
16. Moore, J.D.: Participating in explanatory dialogs. Cambridge, MA: MIT Press. (1995)
17. Graesser, A.C., Moreno, K., Marineau, J., Adcock, A., Olney, A., & Person, N.: AutoTu-
tor improves deep learning of computer literacy: Is it the dialog or the talking head? In U.
Hoppe, F. Verdejo, and J. Kay (Eds.), Proceedings of Artificial Intelligence in Education.
Amsterdam: IOS Press. (2003) 47-54
18. VanLehn, K. & Graesser, A. C.: Why2 Report: Evaluation of Why/Atlas, Why/AutoTutor,
and accomplished human tutors on learning gains for qualitative physics problems and ex-
planations. Unpublished report prepared by the University of Pittsburgh CIRCLE group
and the University of Memphis Tutoring Research Group. (2002)
19. Hestenes, D., Wells, M., & Swackhamer, G.: Force Concept Inventory. The Physics
Teacher, 30, 141-158.
ITS Evaluation in Classroom: The Case of AMBRE-AWP
LIRIS
Université Claude Bernard Lyon 1 - CNRS
Nautibus, 8 bd Niels Bohr, Campus de la Doua
69622 Villeurbanne Cedex
FRANCE
{Sandra.Nogry,Stephanie.Jean-Daubias,
Nathalie.Guin-Duclosson}@liris.cnrs.fr
1 Introduction
This paper describes studies conducted in the framework of the AMBRE project. The
purpose of this project is to design Intelligent Tutoring Systems (ITS) to teach
methods. Derived from didactic studies, these methods are based on a classification of
problems and solving tools. The AMBRE project proposes to help the learner to
acquire a method following the steps of the Case-Based Reasoning (CBR) paradigm.
We applied this principle to the additive word problems domain. We implemented the
AMBRE-AWP system and evaluated this system with eight-year-old pupils in different
manners.
In this paper, we first present the AMBRE principle. Then, we describe its application
to additive word problems and two experiments carried out with eight-year-old pupils
in laboratory and in classroom to evaluate the AMBRE-AWP ITS.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 511–520, 2004.
© Springer-Verlag Berlin Heidelberg 2004
512 S. Nogry, S. Jean-Daubias, and N. Duclosson
The methods we want to teach in the AMBRE project were suggested by mathematic
didactic studies [12] [15]. In a small domain, a method is based on a classification of
problems and of solving tools. The acquisition of this classification enables the
learner to choose the solving technique that is best suited to a given problem to solve.
However, in some domains, it is not possible to explicitly teach problem classes and
solving techniques associated with those classes. So, the AMBRE project proposes to
enable the learner to build his own method using the case-based reasoning paradigm.
Case-Based Reasoning [7] can be described as a set of sequential steps (elaborate a
target case, retrieve a source case, adapt the source to find the target case solution,
revise the solution, store the case). The CBR paradigm is a technique that has already
been used in various parts of ITS (e.g. learner model, diagnosis). The closest
application to our approach is Case-Based Teaching [1] [9] [13]. Systems based on
this learning strategy present a close case to the learner when (s)he encounters
difficulties in solving a problem, or when (s)he faces a problem (s)he never came
across before (in a new domain or a new type).
In the AMBRE project, CBR is not used by the system, but proposed to the learner as a
learning strategy. Thus, in order to help the learner to acquire a method, we propose
to present him a few typical worked-out examples (serving as case base initialization).
Then, the learner is assisted in solving new problems. The environment guides the
learner’s solving of the problem by following each step of the CBR cycle (Fig. 1): the
learner reformulates the problem in order to identify problem structure features
(elaboration of the CBR cycle). Then, (s)he chooses a typical problem (retrieval).
Next, (s)he adapts the typical problem solution to the problem to solve (adaptation).
Finally, (s)he classifies the new problem (storing). The steps are guided by the
system, but done by the learner. In the AMBRE ITS, revision is included as a diagnosis
of learner responses in each step of the cycle.
The design process adopted in the AMBRE project is iterative, it is based on the
implementation of prototypes that are tested and then modified This design satisfied
the preoccupation with validating multidisciplinary design choices and detecting
problems of use as early as possible.
Before the AMBRE design, the SYRCLAD solver [5] was designed to be used in ITS.
SYRCLAD solves problems according to the methods we want to teach.
To begin the AMBRE design, we specified the objective of the project (to learn
methods) and the approach to be used (CBR approach). Then we developed a first
simple prototype (AMBRE-counting) for the numbering problems domain (final
scientific year level, 18 year-old students). This prototype implemented the AMBRE
principle with a limited number of problems, and limited functionalities (the Artificial
Intelligence modules were not integrated). This prototype was evaluated in classroom
using experimental method of cognitive psychology to assess the impact of the CBR
paradigm on method learning. The results did not show significant learning
improvement using the AMBRE ITS. Nevertheless, we identified difficulties
experienced by learners during the system use [4]. These results and complementary
ITS Evaluation in Classroom: The Case of AMBRE-AWP 513
AMBRE-AWP is an ITS for additive word problem solving based on the AMBRE
principle. We chose additive word problems domain because this difficult domain for
children is suitable to AMBRE principle. Learners have difficulties to visualize the
problem situation [3]. Didactic studies proposed additive word problems classes [17]
identifying problem type (add, change, compare) and the place of the unknown that
can help learners to visualize the situation. Nonetheless, it is not possible to teach
these classes explicitly. AMBRE principle might help the learner to identify the
problem’s relevant features (the problem class).
These problems are studied in primary school. Thus we adapted the system to be used
individually in classroom in primary school by eight-year-old pupils.
According to the AMBRE principle, AMBRE-AWP presents examples to learner and
then guides him following the steps described below.
Reformulation of the problem: once the learner has read the problem to solve (e.g.
“Julia had 17 cookies in her bag. She ate some of them during the break. Now, she
has 9 left. How many cookies did Julia eat during the break?”), the first step consists
in reformulating the problem. The learner is asked to build a new formulation of the
submitted problem identifying its relevant features (i.e. problem type and unknown
place). We chose to represent problem classes by diagrams that we adapted from
didactic studies [17] [18]. The reformulation no longer has most of the initial
problem’s surface features, and becomes a reference for the remainder of the solving.
Choice of a typical problem: the second step of the solving consists for the learner in
comparing the problem to be solved with the typical problems by identifying
differences and similarities in each case. Typical problems are represented by their
wording and their reformulation. The learner should choose the problem that seems
the nearest to the problem to be solved, such nearness being based on reformulations.
By choosing a typical problem, the learner implicitly identifies the class of the
problem to be solved.
Adaptation of the typical problem solution to the problem to be solved: in order
to write the solution, the learner should adapt the solution of the typical problem he
chose in the previous step to the problem to be solved (Fig. 2). The solution writing
consists first in establishing the equation corresponding to the problem. Then, the
learner writes how to calculate the solution and then calculates it. Finally, (s)he
constructs a sentence to answer the question. If the learner uses the help functionality,
514 S. Nogry, S. Jean-Daubias, and N. Duclosson
the system can assist the adaptation by outlining with colors similarities between the
typical problem (Fig. 2: left side) and the problem to solve (Fig. 2: right side).
Classification of the problem: first, the learner can read the report of the problem
solving. Then, he has to classify the new problem by associating it with a typical
problem that represents a group of existing problems of the same class. During that
step, the learner should identify the group of problems associated with the solved
problem.
task enables to evaluate the impact of the system on paper and pencil task with simple
and difficult problems.
In the “Equation writing task” we presented a diagram representing a problem class.
The learner task consisted in typing the equation corresponding to the diagram (filling
up boxes with numbers and operation). This task allows us to test the learner ability to
associate the corresponding equation with the problem class (represented by
diagrams). This task is realized only by groups that made the reformulation step (the
AMBRE-AWP group and the “Reformulation and solving system” group).
The experimental design we adopt is an interrupted-time series design: we present the
problem solving task as pre-test, after the fourth system use, as post-test and as
delayed post-test one month after the last system use. The “structure features
detection task” is presented after each system use; the “equation writing task” is
presented after the fifth system use and as post-test.
In order to complete these data, we adopt a qualitative approach [8]. Before the
experiment, we made an “a priori” analysis in order to highlight the various strategies
usable by learners who solve problems with AMBRE-AWP. During the system use, we
noticed all questions asked. Moreover, we observed the difficulties encountered by
learners, the interactions between the learners and the interactions between the
learners and the persons that supervise the sessions. In post-test, the learners filled up
a questionnaire in order to take into account their satisfaction and remarks. Finally,
we analysed the use traces in order to identify the strategies used by learners, to
highlight the most frequent errors and to identify the steps that cause difficulties to
learners. With these methods, we would like to identify difficulties encountered by
learners and want to take into account the complexity of the situation.
4.4 Results
In this section, we present the quantitative results and we discuss these results using
qualitative results.
With the problem solving task, we performed an analysis of variance on performances
with groups (AMBRE-AWP, simple solver system, Reformulation and solving system)
and tests (4 tests) as variables. Performances in pre-test are significantly lower than
performance of the other tests (F(3,192)=18.1; p<0.001). There is no significant
difference between tests performed after the fourth system use, as post-test and as
delayed post-test one month after the last system use. There is no significant
differences between groups (F(2,64)=0.12; p=0.89) and no interaction between group
and sessions (F(6,192)=1.15; p=0.33). With the “structure features detection task”,
there is no significant difference between the AMBRE-AWP group and the other
groups (dl=1)=0.21; p= 0.64). Even at the end of the experiment, surface
features interfere with structure feature in problem choice. The “equation writing
task” shows that learners that use AMBRE-AWP and “Reformulation and solving
system” are both able to write the right equation corresponding to a problem class
represented by a diagram in fifty percent of the cases. Thus there is no difference
between the results of the AMBRE-AWP group and the control groups for each task.
The three systems equally improve learning outcomes. Results of “structure feature
detection task” and “equation writing task” do not show method learning. So, these
results do not validate the AMBRE principle.
518 S. Nogry, S. Jean-Daubias, and N. Duclosson
The qualitative analysis allows explaining these results. First, pupils did not use
AMBRE-AWP as we expected. The observation shows that when they wrote the
solution, they did not adapt the typical problem to solve the problem. Secondly,
learners solved each problem very slowly (means 15 minutes). As they are beginner
readers, they had difficulties to read instructions and messages, and were discouraged
sometimes to read them. Besides, they met difficulties during reformulation and
adaptation steps because they did not identify well their mistakes and they did not
master arithmetic techniques. Thirdly, the comparison between “simple solving
system” and AMBRE-AWP is questionable. Indeed, despite the additional task, the
“simple solving system” group resolved significantly more problems than the AMBRE-
AWP group (means 9 problems vs. 14 problems during the 6 sessions, F (1,45) = 9.7;
p<0.01). Moreover assistance required by pupils and given by persons that supervised
sessions varied with groups. With AMBRE-AWP, questions and assistance often
consisted in reformulating help and diagnosis messages. Whereas, in the simple
solving system it consisted in giving mathematic helps sometimes comparable to
AMBRE-AWP reformulation. So, even if AMBRE principle has an impact on learning,
the difference between number of problems solved by AMBRE-AWP group and
“simple solving system” group and the difference of assistance could partly explain
that these two groups have similar results.
Thus, the quantitative results (no difference between groups) can be explained by
three reasons. First, pupils did not use prototypical problems to solve their problem.
As we expected that the choice and adaptation of a typical problem could facilitate
analogy between problems and favour method learning, it is not surprising that we do
not observe method learning. Secondly, learners solved each problem slowly and they
were confronted with a lot of difficulties (reading, reformulation, solution calculating)
all over the AMBRE cycle. These difficulties probably disrupt their understanding of
the AMBRE principle. Third, there are methodological issues due to the difficulty to
use comparison method in real word experiments because it is not possible to control
all factors. A pre-test of the control system should decrease these difficulties but not
suppress them. These methodological issues confirm our impression that it is
necessary to complete experimental method with qualitative approach to evaluate an
ITS in real word [10].
These qualitative results show that AMBRE-AWP is not well adapted for eight-year-
old pupils. However, questionnaire and interviews showed that a lot of pupils were
enthusiastic to use AMBRE-AWP (more than the “simple solver system”); they
appreciated to reformulate the problem with diagrams.
The framework of the study described in this paper is the AMBRE project. This project
relies on the CBR solving cycle to have the learner acquire a problem solving method
based on a classification of problems. We implemented a system based on the AMBRE
principle for additive word problems solving (AMBRE-AWP). We evaluated it with
eight-year-old pupils. In the first experiment, we observed five children in laboratory,
in order to identify some usability problems and to verify the adequacy of the system
with this type of users. Then, we realized an experiment in classroom during six week
with 76 pupils. We compared the system with two control systems to assess the
ITS Evaluation in Classroom: The Case of AMBRE-AWP 519
References
1. Aleven, V. & Ashley, K.D.: Teaching Case-Based Argumentation through a Model and
Examples - Empirical Evaluation of an Intelligent Learning Environment. Artificial
Intelligence in Education, IOS Press (1997), 87-94.
2. Bastien, C. & Scapin, D.: Ergonomic Criteria for the Evaluation of Human-Computer
Interfaces. In RT n°156, INRIA, (1993).
3. Greeno, J.G. & Riley, M.S.: Processes and development of understanding. In
metacognition, motivation and understanding, F.E. Weinert, R.H. Kluwe Eds (1987), Chap
10, 289-313.
520 S. Nogry, S. Jean-Daubias, and N. Duclosson
4. Guin-Duclosson, N., Jean-Daubias, S. & Nogry, S.: The AMBRE ILE: How to Use Case-
Based Reasoning to Teach Methods. In proceedings of ITS, Biarritz, France: Springer
(2002), 782-791.
5. Guin-Duclosson, N.: SYRCLAD: une architecture de résolveurs de problèmes permettant
d’expliciter des connaissances de classification, reformulation et résolution. Revue
d’Intelligence Artificielle, vol 13-2, Paris : Hermès (1999), 225-282
6. Jean, S.: Application de recommandations ergonomiques : spécificités des EIAO dédiés à
l’évaluation. In proceedings of RJC IHM 2000 (2000), 39-42
7. Kolodner, J.: Case Based Reasoning. San Mateo, CA: Morgan Kaufmann Publishers
(1993).
8. Mark, M. A., & Greer, J. E.: Evaluation methodologies for intelligent tutoring systems.
Journal of Artificial Intelligence in Education, vol 4.2/3 (1993), 129-153.
9. Masterton, S.: The Virtual Participant: Lessons to be Learned from a Case-Based Tutor’s
Assistant. Computer Support for Collaborative Learning, Toronto (1997), 179-186.
10. Murray, T.: Formative Qualitative Evaluation for “Exploratory” ITS research. Journal of
Artificial Intelligence in Education, vol 4(2/3, (1993), 179-207.
11. Nielsen, J.: Usability Engineering, Academic Press (1993).
12. Rogalski, M.: Les concepts de l’EIAO sont-ils indépendants du domaine? L’exemple
d’enseignement de méthodes en analyse. Recherches en Didactiques des Mathématiques,
vol 14 n° 1.2 (1994), 43-66.
13. Schank, R. & Edelson, D.: A Role for AI in Education: Using Technology to Reshape
Education. Journal of Artificial Intelligence in Education, vol 1.2 (1990), 3-20.
14. Schneiderman, B.: Designing the User Interface: Strategies for Effective Human-
Computer Interaction. Reading, MA : Addison-Wesley (1992).
15. Schoenfeld, A.: Mathematical Problem Solving. New York: Academic Press (1985).
16. Senach, B.: L’évaluation ergonomique des interfaces homme-machine. L’ergonomie dans
la conception des projets informatiques, Octares editions (1993), 69-122.
17. Vergnaud, G.: A classification of cognitive tasks and operations of the thought involved in
addition and substraction problems. Addition and substraction: A cognitive perspective,
Hillsdale: Erlbaum (1982), 39-58.
18. Willis, G. B. & Fuson, K.C.: Teaching children to use schematic drawings to solve
addition and subtraction word problems. Journal of Educational Psychology, vol 80
(1988), 190-201.
Implicit Versus Explicit Learning of Strategies in a
Non-procedural Cognitive Skill
This paper compares methods for tutoring non-procedural cognitive skills. A cogni-
tive skill is a task domain where solving a problem requires taking many actions, but
the challenge is not in the physical demands of the actions, which are quite simple
ones such as drawing or typing, but in deciding which actions to take. If the skill is
such that at any given moment, the set of acceptable actions is fairly small, then it is
called a procedural cognitive skill. Otherwise, let us call it a non-procedural cogni-
tive skill. For instance, programming a VCR is a procedural cognitive skill, whereas
developing a Java program is a non-procedural skill because the acceptable actions at
most points include editing code, executing it, turning tracing on and off, reading the
manual, inventing some test cases and so forth. Roughly speaking, the sequence of
actions matters for procedural skills, but for non-procedural skills, only the final state
matters. However, skills exists at all points along the continuum between procedural
and non-procedure. Moreover, even in highly non-procedural skills, some sequences
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 521–530, 2004.
© Springer-Verlag Berlin Heidelberg 2004
522 K. VanLehn et al.
tracing tutors are often used when learning the problem solving strategy is an instruc-
tional objective. The strategy is usually discussed explicitly by the tutor in its hints,
and presented explicitly in the texts that accompany the tutor. In contrast, the process
critiquing tutors rarely teach an explicit problem solving strategy.
All three techniques have advantages and disadvantages. Different ones are appro-
priate for different cognitive skills. The question posed by this paper is which one is
best for a specific task domain, physics problem solving. Although the argument
concerns physics, elements of it may perhaps be applied to other task domains as
well.
rated algebraically into the main propositions. The justifications are almost never
displayed by students or instructors, although textbook examples often mention a few
major justifications. Such proof-like derivations are the solution structures of many
other non-procedural skills, including geometry theorem proving, logical theorem
proving, algebraic or calculus equation solving, etc.
Although AI has developed many well-defined procedures for deductive problem
solving, such as forward chaining and backwards chaining, they are not explicitly
taught in physics. Explicit strategy teaching is also absent in many other non-
procedure cognitive skills.
Although no physics problem solving procedures are taught, some students do
manage to become competent problem solvers. Although it could be that only the
most gifted students can learn physics problem solving strategies implicitly, two facts
suggest otherwise. First, for simpler skills than physics, many experiments have dem-
onstrated that people can learn implicitly, and that explicit instruction sometimes has
no benefit (e.g., Berry & Broadbent, 1984). Second, the Cascade model of cognitive
skill acquisition, which features implicit learning of strategy, is both computationally
sufficient to learn physics and an accurate predictor of student protocol data
(VanLehn & Jones, 1993; VanLehn, Jones, & Chi, 1992).
If students really are learning how to select principles from their experience, as this
prior work suggests, perhaps a tutoring system should merely expedite such experi-
ential learning rather than replace it with explicit teaching/learning. One way to do
that, which is suggested by stimulus sampling and other theories of memory, is to
ensure that when students attempt to retrieve an experience that could be useful in the
present situation, they draw from a pool of successful problem solving experiences.
This in turn suggests that the tutoring system should just keep students on successful
solution paths. It should prevent floundering, generation of useless steps, traveling
down dead end paths, errors and other unproductive experiences. This pedagogy has
been implemented by Andes, a physics tutoring system (VanLehn et al., 2002). The
pedagogy was refined over many years of evaluation at the United States Naval
Academy. The next section describes Andes’ pedagogical method.
Andes does not teach a problem solving strategy, but it does attempt to fill students’
episodic memory with appropriate experiences. In particular, whenever the student
makes an entry on the user interface, Andes colors it red if it is incorrect and green if
it is correct. Students almost always correct the red entries immediately, asking Andes
for help if necessary. Thus, their memories should contain either episodes of green,
correct steps or well-marked episodes of red errors and remediation.
The most recent version of Andes does present a small amount of strategy instruc-
tion in one special context, namely, when students get stuck and ask for help on what
to do next. This kind of help is called “next-step help” in order to differentiate it from
asking what is wrong with a red entry. Andes’ next-step help suggests applying a
major principle whose equation contains a quantity that the problem is seeking. Even
Implicit Versus Explicit Learning of Strategies in a Non-procedural Cognitive Skill 525
if there are other major principles in the problem’s solution, it prefers one that is
contains a sought quantity. For instance, suppose a student were solving the problem
shown in Table 1, had entered the givens and asked for next-step help. Andes would
elicit a23 as the sought quantity and the definition of average velocity (shown on line
7 of Table 1) as the major principle.
Andes’ approach to tutoring non-procedural skills is different from product cri-
tiquing, process critiquing and model tracing. Andes gives feedback during the prob-
lem solving process, so it is not product critiquing. Like a model-tracing tutor, it uses
rules to represent correct actions, but like a process-critiquing tutor, it does not ex-
plicitly teach a problem solving strategy. Thus, is pedagogically similar to a process-
critiquing system and technically similar to a model-tracing system.
Andes is a highly effective tutoring system. In a series of real-world (not labora-
tory) evaluations conducted at the US Naval Academy, effect sizes ranged from 0.44
to 0.92 standard deviations (VanLehn et al., 2002).
However, there is still room for improvement, particularly in getting students to
follow more sensible problem solving strategies. Log files suggest that students
sometimes get so lost that they ask for Andes’ help on almost every action, which
suggests that they have no “weak method” or other general problem solving strategy
to fall back upon when their implicit memories fail to show them a way to solve a
problem. Students often produce actions that are not needed for solving the problem,
and they produce actions in an order that conforms to no recognizable strategy. The
resulting disorganized and cluttered derivation makes it difficult to appreciate the
basic physics underlying the problem’s solution.
We tried augmenting Andes’ next-step help system to explicitly teach a problem
solving strategy (VanLehn et al., 2002). This led to such long, complex interactions
that students generally refused to ask for help even when they clearly needed it. The
students and instructors both felt that this approach was a failure.
It seems clear in retrospect that a general problem solving strategy is just too com-
plex and too abstract to teach in the context of giving students hints. It needs to be
taught explicitly. That is, it should be presented in the accompanying texts, and stu-
dents should be stepped carefully through it for several problems until they have
mastered the procedural aspects of the strategy. In other words, students may learn
even better than Andes if taught in a model-tracing manner.
This section describes an experiment comparing two tutoring systems, a model trac-
ing tutor (Pyrenees) with a tutor that encourages implicit learning of strategies (An-
des). Pyrenees teaches a form of backward chaining called the Target Variable Strat-
egy. It is taught to the students briefly using the instructions shown in the appendix.
Although Pyrenees uses the same physics principles and the same physics problems
as Andes, its user interface differs because it explicitly teaches the Target Variable
Strategy.
526 K. VanLehn et al.
Both Andes and Pyrenees have the same 5 windows, which display:
The physics problem to be solved
The variables defined by the student
Vectors and axes
The equations entered by the student
A dialogue between the student and the tutor
In both systems, equations and variable names are entered via typing, and all other
entries are made via menu selections. Andes uses a conventional menu system (pull
down menus, pop-up menus and dialogue boxes), whereas Pyrenees uses teletype-
style menus.
For both tutors, every variable defined by the student is represented by a line in the
Variables window. The line displays the variable’s name and definition. However, in
Pyrenees, the window also displays the variable’s state, which is one of these:
Sought: If a value for the variable is currently being sought, then the line
displays, e.g., “mb = SOUGHT: the mass of the boy.”
Known: If a value has been given or calculated for a variable, then the line
displays the value, e.g., “mb = 5 kg: the mass of the boy.”
Other: If a variable is neither Sought nor Known, then the line displays
only the variables name and definition, e.g., “mb: the mass of the boy.”
The Target Variable Strategy’s second phase, labeled “applying principles” in the
Appendix, is a form of backwards chaining where Sought variables serve as goals.
The student starts this phase with some variables Known and some Sought. The stu-
dent selects a Sought variable, executes the Apply Principle command, and eventually
changes the status of the variable from Sought to Other. However, if the equation
produced by applying the principle has variables in it that are not yet Known, then the
student marks them Sought. This is equivalent to subgoaling in backwards chaining.
The Variables window thus acts like a bookkeeping device for the backwards chain-
ing strategy; it keeps the current goals visible.
As an illustration, suppose a student is solving the problem of Table 1 and has en-
tered the givens already. The student selects a23 as the sought variable, and it is
marked Sought in the Variable window. The student executes the Apply Principle
command, selects “Projection” and produces the equation shown on line 9 of Table 1,
a23_x=a23. This equation has an unknown variable in it, a23_x, so it is marked
Sought in the Variable window. The Sought mark is removed from a23. Now the
cycle repeats. The student executes the Apply Principle command, selects “definition
of average acceleration,” produces the equation shown on line 7 of Table 1, removes
the Sought mark from a23_x, and adds a Sought mark to v2_x. This cycle repeats
until no variables are marked Sought. The resulting system of equations can now be
solved algebraically, because it is guaranteed to contain all and only the equations
required for solving the problem.
In Andes, students can type any equation they wish into the Equation window, and
only the equation is displayed in the window. In Pyrenees, equations are entered only
by applying principles in order to determine the value of a Sought variable, so its
Implicit Versus Explicit Learning of Strategies in a Non-procedural Cognitive Skill 527
equation window displays the equation plus the Sought variable and the principle
application, e.g., “In order to find W, we apply the weight law to the boy:
Some steps, such as defining variables for the quantities given in the problem
statement, are repeated so often that students master them early and find them tedious
thereafter. Both Andes and Pyrenees relieve students of some of these tedious steps.
In Andes, this is done by predefining certain variables in problems that appear late in
the sequence of problems. In Pyrenees, steps in applying the Target Variable Strat-
egy, shown indented in the Appendix, can be done by either the student or the tutor.
When students have demonstrated mastery of a particular step by doing it correctly
the last 4 out of 5 times, then Pyrenees will take over executing that step for the stu-
dent. Once it has taken over a step, Pyrenees will do it 80% of the time; the student
must still do the step 20% of the time. Thus, student’s skills are kept fresh. If they
make a mistake when it is their turn, then Pyrenees will stop doing the step for them
until they have re-demonstrated their competence.
The experiment used a two-condition, repeated measures design with 20 students per
condition. Students were required to have competence in high-school trigonometry
and algebra, but to have taken no college physics course. They completed a pre-test, a
multi-session training, and a post-test.
The training had two phases. In phase 1, students learned how to use the tutoring
system. In the case of Pyrenees, this included learning the target variable strategy.
During Phase 1, students studied a short textbook, studied two worked example
problems, and solved 3 non-physics algebra word problems. In phase 2, students
learned the major principles of translational kinematics, namely, the definition of
average velocity v=d/t, the definition average acceleration a=(vf-vi)/t, the constant-
acceleration equation v=(vi+vf)/2 and freefall acceleration equation, a=g. They
studied a short textbook, studied a worked example problem, solved 7 training prob-
lems on their tutoring system and took the post-test.
4.3 Results
The post-test consisted of 4 problems similar to the training problems. Students were
not told how their test problems would be scored. They were free to show as much
work as they wished. Thus, we created two scoring rubrics for the tests. The “Answer
rubric” counted only the answers, and the “Show work” rubric counted only the deri-
vations leading up to the answers but not including the answers themselves. The
Show-work rubric gave more credit for writing major principles’ equations than mi-
nor ones. It also gave more credit for defining vector variables than scalar variables.
Table 2 presents the results. Scores are reported as percentages. A one-way
ANOVA showed that the pre-test means were not significantly different. When stu-
dents post-tests were scored with the Answer rubric, their scores were not signifi-
cantly different according to both an one-way Anova (F(29)=.888, p=.354) and an
528 K. VanLehn et al.
Ancova with the pre-test as the covariate (F(28)=2.548, p=.122). However, when the
post-test were scored with the Show-work rubric, the Pyrenees students scored relia-
bly higher than the Andes students according to both an Anova (F(29)=6.076,
p=.020) and an Ancova with the pre-test as the covariate (F(28)=5.527, p=.026).
5 Discussion
References
1. Berry, E. C., & Broadbent, D. E. (1984). On the relationship between task performance
and associated verbalizable knowledge. The Quarterly Journal of Experimental Psychol-
ogy, 36A, 209-231.
2. Burton, R. R., & Brown, J. S. (1982). An investigation of computer coaching for informal
learning activities. In D. Sleeman & J. S. Brown (Eds.), Intelligent Tutoring Systems. New
York: Academic Press.
3. Corbett, A. T., & Bhatnagar, A. (1997). Student modeling in the ACT programming tutor:
Adjusting a procedural learning model with declarative knowledge, Proceedings of the
Sixth International Conference on User Modeling.
4. Graesser, A. C., VanLehn, K., Rose, C. P., Jordan, P. W., & Harter, D. (2001). Intelligent
tutoring systems with conversational dialogue. AI Magazine, 22(4), 39-51.
5. Lesgold, A., Lajoie, S., Bunzo, M., & Eggan, G. (1992). Sherlock: A coached practice
environment for an electronics troubleshooting job. In J. H. a. C. Larkin, R.W. (Ed.),
Computer Assisted Instruction and Intelligent Tutoring Systems: Shared Goals and Com-
plementary Approaches (pp. 201-238). Hillsdale, NJ: Lawrence Erlbaum Associates.
6. Mitrovic, A., & Ohlsson, S. (1999). Evaluation of a constraint-based tutor for a database
language. International Journal of Artificial Intelligence and Education, 10, 238-256.
7. Reiser, B. J., Kimberg, D. Y., Lovett, M. C., & Ranney, M. (1992). Knowledge represen-
tation and explanation in GIL, an intelligent tutor for programming. In J. H. Larkin & R.
W. Chabay (Eds.), Computer Assisted Instruction and Intelligent Tutoring Systems:
Shared Goals and Complementary Approaches (pp. 111-150). Hillsdale, NJ: Lawrence
Erlbaum Associates.
8. Scheines, R., & Sieg, W. (1994). Computer environments for proof construction. Interac-
tive Learning Environments, 4(2), 159-169.
9. VanLehn, K., & Jones, R. M. (1993). Learning by explaining examples to oneself: A
computational model. In S. Chipman & A. Meyrowitz (Eds.), Cognitive Models of Com-
plex Learning (pp. 25-82). Boston, MA: Kluwer Academic Publishers.
10. VanLehn, K., Jones, R. M., & Chi, M. T. H. (1992). A model of the self-explanation
effect. The Journal of the Learning Sciences, 2(1), 1-59.
11. VanLehn, K., Lynch, C., Taylor, L., Weinstein, A., Shelby, R., Schulze, k., Treacy, D., &
Wintersgill, M. (2002). Minimally invasive tutoring of complex physics problem solving.
In S. A. Cerri & G. Gouarderes & F. Paraguacu (Eds.), Intelligent Tutoring Systems 1001:
Proceedings of the 6th International Conference (pp. 158-167). Berlin: Springer-Verlag.
530 K. VanLehn et al.
The Target Variable Strategy is has three main phases, each of which consists of
several repeated steps. The strategy is:
1 Translating the problem statement. For each quantity mentioned in the problem
statement, you should:
1.1 define a variable for the quantity; and
1.2 give the variable a value if the problem statement specifies one, or mark the
variable as “Sought” if the problem statement asks for its value to be deter-
mined. The tutoring system displays a list of variables that indicates which are
Sought and which have values.
2 Applying principles. As long as there is at least one variable marked Sought in
the list of variables, you should:
2.1 choose one of the Sought variables (this is called the “target” variable);
2.2 select a principle application such that when the equation for that principle is
written, the equation will contain the target variable;
2.3 define variables for all the undefined quantities in the equation;
2.4 write the equation, replacing its generic variables with variables you have
defined
2.5 (optional) rewrite the equation by replacing its variables with algebraic ex-
pressions and simplifying
2.6 remove the Sought mark from the target variable; and
2.7 mark the other variables in the equation Sought unless those variables are
already known or were marked Sought earlier.
3 Solving equations. As long as there are equations that have not yet been solved,
you should:
3.1 pick the most recently written equation that has not yet been solved;
3.2 recall the target variable for that equation;
3.3 replace all other variables in the equation by their values; and
3.4 algebraically manipulate the equation into the form V=E where V is the target
variable and E is an expression that does not contain the target variable (usu-
ally E is just a number).
On simple problems, the Target Variable Strategy may feel like a simple mechani-
cal procedure, but on complex problems, choosing a principle to apply (step 2.2)
requires planning ahead. Depending on which principle is selected, the derivation of
a solution can be short, long or impossible. Making an appropriate choice requires
planning ahead, but that is a skill that can only be mastered by solving a variety of
problems. In order to learn more quickly, students should occasionally make inap-
propriate choices, because this lets them practice detecting when an inappropriate
choice has been made, going back to find the unlucky principle selection (use the
Backspace key to undo recent entries), and selecting a different principle instead.
Detecting Student Misuse of
Intelligent Tutoring Systems
1 Introduction
There has been growing interest in the motivation of students using intelligent tutor-
ing systems (ITSs), and in how a student’s motivation affects the way he or she inter-
acts with the software. Tutoring systems have become highly effective at assessing
what skills a student possesses and tailoring the choice of exercises to a student’s
skills [6,14], leading to curricula which are impressively effective in real-world class-
room settings [7]. However, intelligent tutors are not immune to the motivational
problems that plague traditional classrooms. Although it has been observed that stu-
dents in intelligent tutoring classes are more motivated than students in traditional
classes [17], students misuse intelligent tutoring software in a way that suggests less
than ideal motivation [1,15]. In one recent study, students who frequently misused
tutor software learned only 2/3 as much students who used the tutor properly, con-
trolling for prior knowledge and general academic ability [5]. Hence, intelligent tutors
which can respond to differences in student motivation as well as differences in stu-
dent cognition (as proposed in [9]) may be even more effective than current systems.
Developing intelligent tutors that can adapt appropriately to unmotivated students
depends upon the creation of effective tools for assessing a student’s motivation. Two
different visions of motivation’s role in intelligent tutors have resulted in two distinct
approaches to assessing motivation. In the first approach, increased student motiva-
tion is seen as an end in itself, and the goal is to create more empathetic, enjoyable,
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 531–540, 2004.
© Springer-Verlag Berlin Heidelberg 2004
532 R.S. Baker, A.T. Corbett, and K.R. Koedinger
cessive levels of help to prevent rapid-fire usage1, may reduce gaming, but at the cost
of making the tutor more frustrating and less time-efficient for other students. Since
many students use help effectively [18] and seldom or never game the system [5], the
costs of using such an approach indiscriminately may be higher than the rewards.
Whichever approach we take to remediating gaming the system, the success of that
approach is likely to depend on accurately and automatically detecting which students
are gaming the system and which are not.
In this paper, we report progress towards this goal: we present and discuss a ma-
chine-learned Latent Response Model (LRM) [13] that is highly successful at dis-
cerning which students frequently game the system in a way that is correlated with
low learning. Cross-validation shows that this model should be effective for other
students using the same tutor lesson. Additionally, this model corroborates the hy-
pothesis in Baker et al 2004 that students who game the system (especially those who
show the poorest learning) are more likely to do so on the most difficult steps.
2 Methods
The tutoring software’s assessment of the action – was the action correct, incor-
rect and indicating a known bug (procedural misconception), incorrect but not
indicating a known bug, or a help request2? (represented as 3 binary variables)
The type of interface widget involved in the action – was the student choosing
from a pull-down menu, typing in a string, typing in a number, plotting a point,
or selecting a checkbox? (represented as 4 binary variables)
The tutor’s assessment, post-action, of the probability that the student knew the
skill involved in this action, called “pknow” (derived using the Bayesian knowl-
edge tracing algorithm in [6]).
Was this the student’s first attempt to answer (or get help) on this problem step?
“Pknow-direct”, a feature drawn directly from the tutor log files (the previous
two features were distilled from it). If the current action is the student’s first at-
1
A modification currently in place in the commercial version of Cognitive Tutor Algebra.
2
Due to an error in tutor log collection, we only obtained data about entire help requests, not
about the internal steps of a help request.
534 R.S. Baker, A.T. Corbett, and K.R. Koedinger
tempt on this problem step, then pknow-direct is equal to pknow, but if the stu-
dent has already made an attempt on this problem step, then pknow-direct is -1.
Pknow-direct allows a contrast between a student’s first attempt on a skill he/she
knows very well and a student’s later attempts.
How many seconds the action took (both the actual number of seconds, and the
standard deviations from the mean time taken by all students on this problem step
across problems.)
How many seconds were spent in the last 3 actions, or 5 actions. (two variables)
How many seconds the student spent on each opportunity to practice this skill,
averaged across problems.
The total number of times the student has gotten this specific problem step
wrong, across all problems. (includes multiple attempts within one problem)
The number of times the student asked for help or made errors at this skill, in-
cluding previous problems.
How many of the last 5 actions involved this problem step.
How many times the student asked for help in the last 8 actions.
How many errors the student made in the last 5 actions.
The second source of data was the set of human-coded observations of student be-
havior during the lesson. This gave us the approximate proportion of time each stu-
dent spent gaming the system,
Since it is not clear that all students game the system for the same reasons or in ex-
actly the same fashion, we used student learning outcomes as a third source of data.
We divided students into three sets: a set of 53 students never observed gaming the
system, a set of 9 students observed gaming the system who were not obviously hurt
by their gaming behavior, having either a high pretest score or a high pretest-posttest
gain (this group will be referred to as GAMED-NOT-HURT), and a set of 8 students
observed gaming the system who were apparently hurt by gaming, scoring low on the
post-test (referred to as GAMED-HURT). It is important to distinguish GAMED-
HURT students from GAMED-NOT-HURT students, since these two groups may
behave differently (even if an observer sees their actions as similar), and it is more
important to target interventions to the GAMED-HURT group than the GAMED-
NOT-HURT group. This sort of distinction has been found effective for developing
algorithms to differentiate cheating from other categories of behavior [11].
Using these three data sources, we trained a density estimator to predict how fre-
quently an arbitrary student gamed the system. The algorithm we chose was forward-
selection [16] on a set of Latent Response Models (LRM) [13]. LRMs provide two
prominent advantages for modeling our data: First, they offer excellent support for
integrating multiple sources of data, including both labeled and unlabeled data. Sec-
ondly, an LRM’s results can be interpreted much more easily by humans than the
Detecting Student Misuse of Intelligent Tutoring Systems 535
results of most neural network, support vector machine, or decision tree algorithms,
facilitating thought about design implications.
The set of possible parameters was drawn from linear effects on the 24 features
discussed above quadratic effects on those 24 features
and 23x24 interaction effects between features
During model selection, the potential parameter was added that most
reduced the mean absolute deviation between our model predictions and the original
data, using iterative gradient descent to find the best value for each candidate pa-
rameter. Forward-selection continued until no parameter could be found which appre-
ciably reduced the mean absolute deviation. The best-fitting model had 4 parameters,
and no model considered had more than 6 parameters.
Given a specific model, the algorithm first predicted whether each individual tutor
action was an instance of gaming the system or not. Given a set of n parameters
across all students and actions, with each parameter associated with feature (or
or a prediction as to whether action m was an instance of gaming the
system was computed as Each prediction
was then thresholded using a step function, such that if otherwise
This gave us a classification for each action within the tutor. We then
determined, for each student, what proportion of that student’s actions were classi-
fied as gaming, giving us a set of values By comparing the values
to the observed proportions of time each student spent gaming the system,
we computed each candidate model’s deviation from the original data. These devia-
tions were used during iterative gradient descent and model selection, in order to find
the best model parameters.
Along with finding the best model for the entire data set, we conducted Leave One
Out Cross Validation (LOOCV) to get a measure of how effectively the model will
generalize to students who were not in the original data set (the issue of how well the
model will generalize to different tutor lessons will be discussed in the Future Work
section). In doing a LOOCV, we fit to sets of 69 of the 70 students, and then investi-
gated how good the model was at making predictions about the student.
2.3 Classifier
3 Results
In this section, we discuss our classifier’s ability to detect which students game. All
discussion is with reference to the cross-validated version of our model/classifier, in
order to assess how well our approach will generalize to the population in general,
rather than to just our sample of 70 students.
Since most potential interventions will have side-effects and costs (in terms of
time, if nothing else), it is important both that the classifier is good at correctly identi-
fying the GAMED-HURT students who are gaming and not learning, and that it
rarely assigns an intervention to students who do not game.
If we take a model trained to treat both GAMED-HURT and GAMED-NOT-
HURT students as gaming, it is significantly better than chance at classifying the
GAMED-HURT students as gaming (A' =0.82, p<0.001). At the threshold value with
the highest ratio between hits and false positives, this classifier correctly identifies
88% of the GAMED-HURT students as gaming, while only classifying 15% of the
non-gaming students as gaming. Hence, this model can be reliably used to assign
interventions to the GAMED-HURT students. By contrast, the same model is not
significantly better than chance at classifying the GAMED-NOT-HURT students as
gaming (A' =0.57, p=0.58).
Fig. 1. Empirical ROC Curves showing the trade-off between true positives and false positives,
for the cross-validated model trained on both groups of gaming students.
our further research, we will use the model trained on both groups of students to iden-
tify GAMED-HURT students.
It is important to note that although gaming is negatively correlated to post-test
score, our classifier is not just classifying which students fail to learn. Our model is
not better than chance at classifying students with low post-test scores (A' =0.60,
p=0.35) or students with low learning (low pre-test and low post-test) (A' =0.56,
p=0.59). Thus, our model is not simply identifying all gaming students, nor is it iden-
tifying all students with low learning – it is identifying the students who game and
have low learning: the GAMED-HURT students.
At this point, our primary goal for creating a model of student gaming has been
achieved – we have developed a model that can accurately identify which students are
gaming the system, in order to assign interventions. Our model does so by first pre-
dicting whether each of a student’s actions is an instance of gaming. Although the
data from our original study does not allow us to directly validate that a specific step
is an instance of gaming, we can investigate what our model’s predictions imply
about gaming, and whether those predictions help us understand gaming better.
The model predicts that a specific action is an instance of gaming when the expres-
sion shown in Table 1 is greater than 0.5.
Feature “ERROR-NOW, MANY-ERRORS-EACH-PROBLEM”, identifies a
student as more likely to be gaming if the student has already made at least one error
on this problem step within this problem, and has also made a large number of errors
on this problem step in previous problems. It identifies a student as less likely to be
gaming if the student has made a lot of errors on this problem step in the past, but
now probably understands it (and has not yet gotten the step wrong in this problem).
538 R.S. Baker, A.T. Corbett, and K.R. Koedinger
One interesting aspect of our model is how it predicts gaming actions are distributed
across a student’s actions. 49% of our model’s 21,520 gaming predictions occurred in
clusters where at least 2 of the nearest 4 actions were also instances of gaming. To
determine the chance frequency of such clusters, we ran a Monte Carlo simulation
where each student’s instances of predicted gaming were randomly distributed across
that student’s 71 to 478 actions. In this simulation, only 5% (SD=1%) of gaming
predictions occurred such clusters. Hence, our model predicts that substantially more
gaming actions occur in clusters than one could expect from chance.
Our model also suggests that there is at least one substantial difference between
when GAMED-HURT and GAMED-NOT-HURT students choose to game – and this
difference may explain why the GAMED-HURT students learn less. Compare the
model’s predicted frequency of gaming on “difficult skills”, which the tutor estimated
the student had under a 20% chance of knowing (20% was the tutor’s estimated prob-
Detecting Student Misuse of Intelligent Tutoring Systems 539
ability that a student knew a skill upon starting the lesson), to the frequency of gam-
ing on “easy skills”, which the tutor estimated the student had over a 90% chance of
knowing. The model predicted that students in the GAMED-HURT group gamed
significantly more on difficult skills (12%) than easy skills (2%), t(7)=2.99, p<0.05
for a two-tailed paired t-test. By comparison, the model predicted that students in the
GAMED-NOT-HURT group did not game a significantly different amount of the
time on difficult skills (2%) than on easy skills (4%), t(8)=1.69, p=0.13. This pattern
of results suggests that the difference between GAMED-HURT and GAMED-NOT-
HURT students may be that GAMED-HURT students chose to game exactly when it
will hurt them most.
At this point, we have a model which is successful at recognizing students who game
the system and show poor learning. As it has good results under cross-validation, it is
likely that it will generalize well to other students using the same tutor.
We have three goals for our future work. The first goal is to study this phenomena
in other middle school mathematics tutors, and to generalize our classifier to those
tutors. In order to do so, we will collect observations of gaming in other tutors, and
attempt to adapt our current classifier to recognize gaming in those tutors. Comparing
our model’s predictions about student gaming to the recent predictions about help
abuse in [2] is likely to provide additional insight and opportunities. The second goal
is to determine more conclusively whether our model is actually able to identify ex-
actly when a student is gaming. Collecting labeled data, where we can link the precise
time of each observation to the actions in a log file, will assist us in this goal. The
third goal is to use this model to select which students receive interventions to reduce
gaming. We have avoided discussing how to remediate gaming in this paper, in part
because we have not completed our investigations into why students game. Designing
appropriate responses to gaming will require understanding why students game.
Our long-term goal is to develop intelligent tutors that can adapt not only to a stu-
dent’s knowledge and cognitive characteristics, but also to a student’s behavioral
characteristics. By doing so, we may be able to make tutors more effective learning
environments for all students.
References
1. Aleven, V., Koedinger, K.R. Investigations into Help Seeking and Learning with a Cogni-
tive Tutor. In R. Luckin (Ed.), Papers of the AIED-2001 Workshop on Help Provision and
Help Seeking in Interactive Learning Environments (2001) 47-58
2. Aleven, V., McLaren, B., Roll, I., Koedinger, K. Toward Tutoring Help Seeking: Apply-
ing Cognitive Modeling to Meta-Cognitive Skills. To appear at Intelligent Tutoring Sys-
tems Conference (2004)
3. Arbreton, A. Student Goal Orientation and Help-Seeking Strategy Use. In S.A. Karabe-
nick (Ed.), Strategic Help Seeking: Implications For Learning And Teaching. Mahwah,
NJ: Lawrence Erlbaum Associates (1998) 95-116
4. Baker, R.S., Corbett, A.T., Koedinger, K.R. Learning to Distinguish Between Representa-
tions of Data: a Cognitive Tutor That Uses Contrasting Cases. To appear at International
Conference of the Learning Sciences (2004)
5. Baker, R.S., Corbett, A.T., Koedinger, K.R., Wagner, A.Z. Off-Task Behavior in the
Cognitive Tutor Classroom: When Students “Game the System”. Proceedings of ACM
CHI 2004: Computer-Human Interaction (2004) 383-390
6. Corbett, A.T. & Anderson, J.R. Knowledge Tracing: Modeling the Acquisition of Proce-
dural Knowledge. User Modeling and User-Adapted Interaction Vol. 4 (1995) 253-278
7. Corbett, A.T., Koedinger, K.R., & Hadley, W. S. Cognitive Tutors: From the Research
Classroom to All Classrooms. In P. Goodman (Ed.), Technology Enhanced Learning: Op-
portunities For Change. Mahwah, NJ : Lawrence Erlbaum Associates (2001) 235-263
8. de Vicente, A., Pain, H. Informing the Detection of the Students’ Motivational State: an
Empirical Study. In S. A. Cerri, G. Gouarderes, F. Paraguacu (Eds.), Proceedings of the
Sixth International Conference on Intelligent Tutoring Systems (2002) 933-943
9. del Soldato, T., du Boulay, B. Implementation of Motivational Tactics in Tutoring Sys-
tems. Journal of Artificial Intelligence in Education Vol. 6(4) (1995) 337-376
10. Donaldson, W. Accuracy of d’ and A’ as Estimates of Sensitivity. Bulletin of the Psycho-
nomic Society Vol. 31(4) (1993) 271-274.
11. Jacob, B.A., Levitt, S.D. Catching Cheating Teachers: The Results of an Unusual Experi-
ment in Implementing Theory. To appear in Brookings-Wharton Papers on Urban Affairs.
12. Lloyd, J.W., Loper, A.B. Measurement and Evaluation of Task-Related Learning Behav-
ior: Attention to Task and Metacognition. School Psychology Review vol. 15(3)(1986)
336-345.
13. Maris, E. Psychometric Latent Response Models. Psychometrika vol.60(4) (1995) 523-
547.
14. Martin, J., vanLehn, K. Student Assessment Using Bayesian Nets. International Journal of
Human-Computer Studies vol. 42 (1995) 575-591
15. Mostow, J., Aist, G., Beck, J., Chalasani, R., Cuneo, A., Jia, P., Kadaru, K. A La Recher-
che du Temps Perdu, or As Time Goes By: Where Does the Time Go in a Reading Tutor
that Listens? Sixth International Conference on Intelligent Tutoring Systems (2002) 320-
329
16. Ramsey, F.L., Schafer, D.W. The Statistical Sleuth: A Course in Methods of Data Analy-
sis. Belmont, CA: Duxbury Press (1997) Section 12.3
17. Schofield, J.W. Computers and Classroom Culture. Cambridge, UK: Cambridge Univer-
sity Press (1995)
18. Wood, H., Wood, D. Help Seeking, Learning, and Contingent Tutoring. Computers and
Education vol.33 (1999) 153-159
Applying Machine Learning Techniques to
Rule Generation in Intelligent Tutoring Systems
Matthew P. Jarvis, Goss Nuzzo-Jones, and Neil T. Heffernan
Abstract. The purpose of this research was to apply machine learning tech-
niques to automate rule generation in the construction of Intelligent Tutoring
Systems. By using a pair of somewhat intelligent iterative-deepening, depth-
first searches, we were able to generate production rules from a set of marked
examples and domain background knowledge. Such production rules required
independent searches for both the “if” and “then” portion of the rule. This
automated rule generation allows generalized rules with a small number of sub-
operations to be generated in a reasonable amount of time, and provides non-
programmer domain experts with a tool for developing Intelligent Tutoring
Systems.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 541–553, 2004.
© Springer-Verlag Berlin Heidelberg 2004
542 M.P. Jarvis, G. Nuzzo-Jones, and N.T. Heffernan
[5] [8]. Through this automated method, domain experts would be able to create ITS
without programming knowledge. When compared to tutor development at present,
this could provide an enormous benefit, as writing the rules for a single problem can
take a prohibitive amount of time.
The CTAT provide an extensive framework for developing intelligent tutors. The
tools provide an intelligent GUI builder, a Behavior Recorder for recording solution
paths, and a system for production rule programming. The process starts with a devel-
oper designing an interface in which a subject matter expert can demonstrate how to
solve the problem. CTAT comes with a set of recordable and scriptable widgets (but-
tons, menus, text-input fields, as well as some more complicated widgets such as ta-
bles) (shown in Figure 1) as we will see momentarily. The GUI shown in Figure 1
shows three multiplication problems on one GUI, which we do just to show that this
system can generalize across problems; we would not plan to show students three dif-
ferent multiplication problems at the same time.
Creating the interface shown in Figure 1 involved dragging and dropping three ta-
bles into a panel, setting the size for the tables, adding the help and “done” buttons,
and adding the purely decorative elements such as the “X” and the bold lines under
the fourth and seventh rows. Once the interface is built, the developer runs it, sets the
initial state by typing in the initial numbers, and clicks “create start state”. While in
“demonstrate mode”, the developer demonstrates possibly multiple sets of correct ac-
tions needed to solve the problems. The Behavior Recorder records each action with
an arc in the behavior recorder window. Each white box indicates a state of the inter-
face. The developer can click on a state to put the interface into that state. After dem-
onstrating correct actions, the developer demonstrates common errors, and can write
“bug” messages to be displayed to the student, should they take that step. The devel-
oper can also add a hint message to each arc, which, should the student click on the
hint button, the hint sequence would be presented to the student, one by one, until the
student solved the problem. A hint sequence will be shown later in Figure 4. At this
point, the developer takes the three problems into the field for students to use. The
purpose of this is to ensure that the design seems reasonable. His software will work
only for these three problems and has no ability to generalize to another multiplication
Applying Machine Learning Techniques to Rule Generation 543
problem. Once the developer wants to make this system work for any multiplication
problem instead of just the three he has demonstrated, he will need to write a set of
production rules that are able to complete the task. At this point, programming by
demonstration starts to come into play. Since the developer already wanted to demon-
strate several steps, the machine learning system can use those demonstrations as
positive examples (for correct student actions) or negative examples (for expected
student errors) to try to induce a general rule.
In general, the developer will want to induce a set of rules, as there will be different
rules representing different conceptual steps. Figure 2 shows how the developer could
break down a multiplication problem into a set of nine rules. he developer must then
mark which actions correspond to which rules. This process should be relatively easy
for a teacher. The second key way we make the task feasible is by having the devel-
oper tell us a set of inputs for each rule instance. Figure 1 shows the developer click
in the interface to indicate to the system that the greyed cells containing the 8 and 9
are inputs to the rule (that the developer named “mult_mod”) that should be able to
generate the 2 in the A position (as shown in Figure 2). The right hand side of Figure
2 shows the six examples of the “mult_mod” rule with the two inputs being listed first
and the output listed last. These six examples correspond to the six locations in Fig-
ure 1 where an “A” is in one of the tables.
These two hints (labeling rules and indicating the location of input values) that the
developer provides for us help reduce the complexity of the search enough to make
some searches computationally feasible (inside a minute). The inputs serve as “is-
lands” in the search space that will allow us to separate the right hand side and the left
hand side searches into two separate steps. Labeling the inputs is something that the
CTAT did not provide, but without which we do not think we could have succeed at
all.
544 M.P. Jarvis, G. Nuzzo-Jones, and N.T. Heffernan
The tutoring systems capable of being developed by the CTAT are composed of an
interface displaying each problem, the rules defining the problem, and the working
memory of the tutor. Most every GUI element (text field, button, and even some enti-
ties like columns) have a representation in working memory. Basically, everything
that is in the interface is known in working memory. The working memory of the tu-
tor stores the state of each problem, as well as intermediate variables and structures
associated with any given problem. Working memory elements (JESS facts) are oper-
ated upon by the JESS rules defining each problem. Each tutor is likely to have its
own unique working memory structure, usually a hierarchy relating to the interface
elements. The CTAT provide access and control to the working memory of a tutor
during construction, as well as possible intermediate working memory states. This
allows a developer to debug possible JESS rules, as well as for the model-tracing al-
gorithm [4] [1] of the Authoring Tools to validate such rules.
the search continues until a function/variable binding permutation meets with success
or the search is cancelled.
This search, while basic in design, has proven to be useful. In contrast to the ILP
methods described earlier, this search will specifically develop a single rule that cov-
ers all examples. It will only consider possible rules and test them against examples
once the rule is “complete,” or the rule length is the maximum depth of the search.
However, as one would expect, the search is computationally prohibitive in all but the
simple cases, as run time is exponential in the number of functions as well as the
depth of the rule. This combinatorial explosion generally limits the useful depth of
our search to about depth five, but for learning ITS rules, this rule length is acceptable
since one of the points of intelligent tutoring systems is to create very finely grained
rules. The search can usually find simple rules of depth one to three in less than thirty
seconds, making it possible that as the developer is demonstrating examples, the sys-
tem is using background processing time to try to induce the correct rules. Depth four
rules can generally be achieved in less than three minutes. Another limitation of the
search is that it assumes entirely accurate examples. Any noise in the examples or
background knowledge will result in an incorrect rule, but this is acceptable as we can
rely on the developer to accurately create examples.
While we have not altered the search in any way so as to affect the asymptotic effi-
ciency, we have made some small improvements that increase the speed of learning
the short rules that we desire. The first was to take advantage of the possible commu-
tative properties of some background knowledge functions. We allow each function to
be marked as commutative, and if it is, we are able to reduce the variable binding
branching factor by ignoring variable ordering in the permutation.
We noted that in ITSs, because of their educational nature, problems tend to in-
crease in complexity inside a curriculum, building upon themselves and other simpler
problems. We sought to take advantage of this by creating support for “macro-
operators,” or composite rules. These composite rules are similar to the macro-
operators used to complete sub-goals in Korf’s work with state space searches [7].
Once a rule has been learned from the background knowledge functions, the user can
choose to add that new rule to the background knowledge. The new rule, or even just
pieces of it, can then be used to try to speed up future searches.
elements, or facts, often have a one-to-one correspondence with elements in the inter-
face. For instance, a text field displayed on the interface will have a corresponding
working memory element with its value and properties. More complex interface ele-
ments, such as tables, have associated working memory structures, such as columns
and rows. A developer may also define abstract working memory structures, relating
interface elements to each other in ways not explicitly shown in the interface.
To generate the left-hand side in a similarly automated manner as the right-hand
side, we must make create a hierarchy of conditionals that generalizes the given ex-
amples, but does not “fire” the right-hand side inappropriately. Only examples listed
as positive examples can be used for the left-hand side search, as examples denoted as
negative are incorrect in regard to the right-hand side only. For our left-hand side
generation, we make the assumption that the facts in working memory are connected
somehow, and do not loop. They are connected to form “paths” (as can be seen in the
Figure 4) where tables point to lists of columns which in turn point to lists of cells
which point to given cell which has a value.
To demonstrate how we automatically generate the left-hand side, we will step
through an example JESS rule, given in Figure 4. This “Multiply, Mod 10” rule oc-
curs in the multi-column multiplication problem described below. Left-hand side gen-
eration is conducted by first finding all paths searching from the “top” of working
memory (the “?factMAIN_problem1” fact in the example) to the “inputs” (that the
developer has labeled in the procedure shown in Figure 1) that feed into the right-
hand side search (in this case, the cells containing the values being operated on by the
right-hand side operators.) This search yields a set of paths from the “top” to the val-
ues themselves. In this multiplication example, there is only one such path, but in Ex-
periment #3 we had multiple different paths from the “top” to the examples. Even
with the absence of multiple ways to get from “top” to an input, we still had a difficult
problem.
Once we combine the individual paths, and there are no loops, the structure can be
best represented as a tree rooted at “top” with the inputs and the single output as
leaves in the tree. This search can be conducted on a single example of working
memory, but will generate rules that have very specific left-hand sides which assume
the inputs and output locations will always remain fixed on the interface. This as-
sumption of fixed locations is violated somewhat in this example (the output for A
moves and so does the second input location) and massively violated in tic-tac-toe.
Given that we want parsimonious rules, we bias ourselves towards short rules but risk
learning a rule that is too specific unless we collect multiple examples.
One of these trees would be what would come about if we were only looking at the
first instance of rule A, as shown in Figure 2, where you would tend to assume that
the two inputs and the output will always be in the same last column as shown graphi-
cally in Figure 5.
A different set of paths from top to the inputs occurs in the second instance of rule
A that occurs in the 2nd column, 7th row. In this example we see that the first input
and second input are not always in the same column, but the 2nd input and the output
are in the same column as shown in Figure 5.
One such path is the series of facts given in the example rule, from problem to ta-
ble, to two possible columns, to three cells within those columns. Since this path
branches and contains no loops, it can best be represented as a tree. This search can be
conducted on a single example of working memory, but will generate a very specific
Applying Machine Learning Techniques to Rule Generation 547
Fig. 4. An actual JESS rule that we learned. The order of the conditionals on the left hand side
has been changed, and indentation added, to make the rule easier to understand
left-hand side. To create a generalized left-hand side, we need to conduct this path
search over multiple examples.
548 M.P. Jarvis, G. Nuzzo-Jones, and N.T. Heffernan
Despite the obvious differences in the two trees shown above, they represent the left-
hand side of the same rule, as the same operations are being performed on the cells
once they are reached. Thus, we must create a general rule that applies in both cases.
To do this, we merge the above trees to create a more general tree. This merge opera-
tion marks where facts are the same in each tree, and uses wildcards to designate
where a fact may apply in more than one location. If a fact cannot be merged, the tree
will then split. A merged example of the two above trees is show in Figure 5.
In this merged tree (there are many possible trees), the “Table 1” and “Table 2”
references have been converted to a wildcard This generalizes the tree so
that the wildcard reference can apply to any table, not a single definite one. Also the
“Column 2” reference in the first tree has been converted to a wildcard. This indicates
that that column could be any column, not just “Column 2”. This allows this merged
tree to generalize the second tree as well, for the wildcard could be “Column 4.” This
is one possible merged tree resulting from the merge operation, and is likely to be
generalized further by additional examples. However, it mirrors the rule given in Fig-
ure 4, with the exception that “Cell 2” is a wildcard in the rule.
We can see the wildcards in the rule by examining the pattern matching operators.
For instance, we select any table by using:
The “$?” operators indicate that there may be any number of interface elements be-
fore or after the “?factMAIN_tableAny1” that we select. To select a fact in a definite
position, we use the “?” operator, as in this example:
This selects the 4th column by indicating that there a three preceding facts (three
“?’s”) and any number of facts following the 4th (“$?”).
We convert the trees generated by our search and merge algorithm to JESS rules by
applying these pattern matching operations. The search and merge operations often
generate more than one tree, as there can be multiple paths to reach the inputs, and to
maintain generality, many different methods of merging the trees are used. This often
leads to more than one correct JESS rule being provided.
We have implemented this algorithm and the various enhancements noted in Java
within the CTAT. This implementation was used in the trials reported below, but re-
mains a work in progress. Following correct generation of the desired rule, the algo-
rithm outputs a number of JESS production rules. These rules are verified for consis-
tency with the examples immediately after generation, but can be further tested using
the model trace algorithm of the authoring tools [4].
Applying Machine Learning Techniques to Rule Generation 549
4 Methods/Experiments
4.1 Experiment #1: Multi-column Multiplication
The goal of our first experiment was to try to learn all of the rules required for a typi-
cal tutoring problem, in this case, Multi-Column Multiplication. In order to extract
the information that our system requires, the tutor must demonstrate each action re-
quired to solve the problem. This includes labeling each action with a rule name, as
well as specifying the inputs that were used to obtain the output for each action.
While this can be somewhat time-consuming, it eliminates the need for the developer
to create and debug his or her own production rules.
For this experiment, we demonstrated two multiplication problems, and identified
nine separate skills, each representing a rule that the system was asked to learn (see
Figure 2). After learning these nine rules, the system could automatically complete a
multiplication problem. These nine rules are shown in Figure 2.
The right-hand sides of each of these rules were learned using a library of Arithm-
etic methods, including basic operations such as add, multiply, modulus ten, among
others. Only positive examples were used in this experiment, as it is not necessary
(merely helpful) to define negative examples for each rule. The left-hand side search
was given the same positive examples, as well as the working memory state for each
example.
played in the Behavior Recorder allow the student to enter the values in any order
they wish.
In this experiment, we attempted to learn the rules for playing an optimal game of
Tic-Tac-Toe (see Figure 8). The rules for Tic-Tac-Toe differ significantly from the
rules of the previous problem. In particular, the right-hand side of the rule is always a
single operation, simply a mark “X” or a mark “O.” The left-hand side is then essen-
tially the entire rule for any Tic-Tac-Toe rule, and the left-hand sides are more com-
plex than either of the past two experiments. In order to correctly learn these rules, it
was necessary to augment working memory with information particular to a Tic-Tac-
Toe game. Specifically, there are eight ways to win a Tic-Tac-Toe game: one of the
three rows, one of the three columns, or one of the two diagonals. Rather than simply
grouping cells into columns as they were for multiplication, the cells are grouped into
these winning combinations (or “triples”). The following rules to play Tic-Tac-Toe
were learned using nine examples of each:
Rule #1: Win (win the game with one
move)
Rule #2: Play Center (optimal opening
move)
Rule #3: Fork (force a win on the next
move)
Rule #4: Block (prevent an opponent from
winning)
5 Results
These experiments were performed on a Pentium IV, 1.9 GHz with 256 MB RAM
running Windows 2000 and Java Runtime Environment 1.4.2. We report the time it
Applying Machine Learning Techniques to Rule Generation 551
takes to learn each rule, including both the left-hand-side search and the right-hand-
side search.
6 Discussion
6.1 Experiment #1: Multi-column Multiplication
The results from Experiment #1 show that all of the rules required to build a Multi-
Column Multiplication tutor can be learned in a reasonable amount of time. Even
some longer rules that require three mathematical operations can be learned quickly
552 M.P. Jarvis, G. Nuzzo-Jones, and N.T. Heffernan
using only a few positive examples. The rules learned by our algorithm will correctly
fire and model-trace within the CTAT. However, these rules often have over general
left-hand sides. For instance, the first rule learned, “Rule A”, (also shown in Figure
4), may select arguments from several locations. The variance of these locations
within the example set leads the search to generalize the left-hand side to select mul-
tiple arguments, some of which may not be used by the rule. During design of the left-
hand side search, we intentionally biased the search towards more general rules. De-
spite these over-generalities, this experiment presents encouraging evidence that our
system is able to learn rules that are required to develop a typical tutoring system.
7 Conclusions
Intelligent tutoring systems provide an extremely useful educational tool in many ar-
eas. However, due to their complexity, they will be unable to achieve wide usage
without a much simpler development process. The CTAT [6] provide a step in the
right direction, but to allow most educators to create their own tutoring systems, sup-
port for non-programmers is crucial. The rule learning algorithm presented here pro-
vides a small advancement toward this goal of allowing people with little or no pro-
gramming knowledge to create intelligent tutoring systems in a realistic amount of
time. While the algorithm presented here has distinct limitations, it provides a signifi-
cant stepping-stone towards automated rule creation in intelligent tutoring systems.
Applying Machine Learning Techniques to Rule Generation 553
Acknowledgements. This research was partially funded by the Office of Naval Re-
search (ONR) and the US Department of Education. The opinions expressed in this
paper are solely those of the authors and do not represent the opinions of ONR or the
US Dept. of Education.
References
1. Anderson, J. R. and Pellitier, R. (1991) A developmental system for model-tracing tutors.
In Lawrence Birnbaum (Eds.) The International Conference on the Learning Sciences. As-
sociation for the Advancement of Computing in Education. Charlottesville, Virginia (pp.
1-8).
2. Blessing, S.B. (2003) A Programming by Demonstration Authoring Tool for Model-
Tracing Tutors. In Murray, T., Blessing, S.B., & Ainsworth, S. (Ed.), Authoring Tools for
Advanced Technology Learning Environments: Toward Cost-Effective Adaptive, Interac-
tive and Intelligent Educational Software. (pp. 93-119). Boston, MA: Kluwer Academic
Publishers
3. Choksey, S. and Heffernan, N. (2003) An Evaluation of the Run-Time Performance of the
Model-Tracing Algorithm of Two Different Production Systems: JESS and TDK. Techni-
cal Report WPI-CS-TR-03-31. Worcester, MA: Worcester Polytechnic Institute
4. Cypher, A., and Halbert, D.C. Editors. (1993) Watch what I do : Programming by Demon-
stration. Cambridge, MA: The MIT Press.
5. Koedinger, K. R., Aleven, V., & Heffernan, N. T. (2003) Toward a rapid development en-
vironment for cognitive tutors. 12th Annual Conference on Behavior Representation in
Modeling and Simulation. Simulation Interoperability Standards Organization.
6. Korf, R. (1985) Macro-operators: A weak method for learning. Artificial Intelligence, Vol.
26, No. 1.
7. Lieberman, H. Editor. (2001) Your Wish is My Command: Programming by Example.
Morgan Kaufmann, San Francisco
8. Muggleton, S. (1995) Inverse Entailment and Progol. New Generation Computing, Special
issue on Inductive Logic Programming, 13.
9. Quinlan, J.R. (1996). Learning first-order definitions of functions. Journal of Artificial In-
telligence Research. 5. (pp 139-161)
10. Quinlan, J.R., and R.M. Cameron-Jones. (1993) FOIL: A Midterm Report. Sydney: Uni-
versity of Sydney.
11. VanLehn, K., Freedman, R., Jordan, P., Murray, C., Rosé, C. P., Schulze, K., Shelby, R.,
Treacy, D., Weinstein, A. & Wintersgill, M. (2000). Fading and deepening: The next steps
for Andes and other model-tracing tutors. Intelligent Tutoring Systems: International
Conference, Montreal, Canada. Gauthier, Frasson, VanLehn (eds), Springer (Lecture
Notes in Computer Science, Vol. 1839), pp. 474-483.
A Category-Based Self-Improving Planning Module
Abstract. Though various approaches have been used to tackle the task of
instructional planning, the compelling need is for ITSs to improve their own
plans dynamically. We have developed a Category-based Self-improving
Planning Module (CSPM) for a tutor agent that utilizes the knowledge learned
from automatically derived student categories to support efficient on-line self-
improvement. We have tested and validated the learning capability of CSPM to
alter its planning knowledge towards achieving effective plans for various
student categories using recorded teaching scenarios.
1 Introduction
Instructional planning is the process of sequencing teaching activities to achieve a
pedagogical goal. Its use in tutoring, coaching, cognitive apprenticeship, or Socratic
dialogue can provide consistency, coherence, and continuity to the teaching process
[20], in addition to achieving selected teaching goals [10].
Though ITSs are generally adaptive, few are capable of self-improvement despite
the identification and reiteration of several authors of the need for this capability (e.g.,
[7, 14, 12, 9, 5, 6]). A self-improving tutor is capable of revising instructional plans
and/or learning new ones in response to any perceived inefficiencies in existing plans.
O’Shea’s quadratic tutor [12], for example, could change instructional plans by
backward-reasoning through a set of causal teaching rules, chaining from a desired
change in a teaching “variable” (e.g., the time a student needs to learn a skill) to
executable actions. However, it does not follow that this and similar tutors (e.g., [9, 5,
6]) can self-improve efficiently.
Machine learning techniques have been successfully applied in computerized tutors
in various ways: to infer student models (as reviewed in [16]); to optimize teaching
responses to students [3, 11]; and to evaluate a tutor and understand how learning
proceeds through simulated students [19, 3]. We innovatively utilize an information-
theoretic metric called cohesion, and matching and sampling heuristics to assist a Q-
learning algorithm in developing and improving plans for different student categories.
The result is a learning process that enables the tutor of an ITS to efficiently self-
improve on-line with respect to the needs of different categories of learners. We have
implemented the learning process in a Category-based Self-improving Planning
Module (CSPM) within an ITS tutor agent. As an agent, the tutor becomes capable of
learning and performing on its own during on-line time-constrained interactions.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 554–563, 2004.
© Springer-Verlag Berlin Heidelberg 2004
A Category-Based Self-Improving Planning Module 555
In the rest of this paper, we first provide an overview of CSPM and describe the
methodology used to test and validate its learning capabilities using real-world data.
We then expound the learning approaches of CSPM, and for each approach, we report
and discuss selected experimental results that demonstrate its viability. Finally, we
give our concluding remarks and future direction.
Fig. 1. The functional view of CSPM as well as its external relationships with the other
components of the Ist
3 Experimentation Methodology
By segregating architecturally the components for pedagogic decision making and
delivery, we can test the learning capability of CSPM with minimal influence from the
other ITS components. This kind of testing follows a layered evaluation framework [8,
556 R. Legaspi, R. Sison, and M. Numao
4] and opens CSPM to the benefits of an ablative evaluation approach to direct any
future efforts to improve it [2].
Experimentation is performed in three stages. An initial set of category models is
derived and their usefulness is observed in the first stage. In the second, category
knowledge is utilized to construct a map that will serve as source of candidate plans.
The third one simulates the development of the same teaching scenario as different
plans are applied and the results are measured in terms of the changes in the
effectiveness level of the derived plans and the efficiency of the plan learning task.
For us to carry out relevant experiments under the same initial condition, a corpus
of recorded teaching scenarios is used as experiment data. A teaching scenario defines
an instructional plan and the context in which it will succeed. These scenarios were
adapted from [13]’s 105 unique cases of recorded verbal protocols of interactions
between 26 seasoned tutors (i.e., two instructors and 24 peer tutors) and 120 freshman
Computer Science students of an introductory programming course using C language.
Each student received an average of three instructional sessions. Each case contained
a plan, effective or otherwise, and the context in which it was applied. For ineffective
plans, however, repairs which can render them effective were indicated.
Each teaching scenario consists of (1) student attributes: cognitive ability, learning
style, knowledge scope, and list of errors committed; (2) session attributes: session goal
and topic to be discussed; and (3) the corresponding effective plan. The cognitive
abilities of the tutees were measured in terms of their performance in tests and problem-
solving exercises conducted in class prior to their initial tutorial session, and their
learning styles were determined using an assessment instrument. The knowledge scope
attribute indicates until where in the course syllabus has the student been taught.
All in all, this method permits us to do away with the expensive process of evaluating
CSPM while in deployment with all the other ITS components.
familiarization was implemented since no knowledge yet about C has been given,
while Plan 2 already included a test since both topics have been covered already.
instructional plans. This relationship is depicted in Fig. 3. With the low-level learners
of A, support comes through easy to grasp examples, exercises, and explanations, and
with the tutor providing sufficient guidance through feedback, advice, and motivation.
With B’s moderate-level learners, the tutor can minimize supervision while increasing
the difficulty level of the activity objects. Transition to a new topic (discussion on
FOR construct precedes that of the WHILE and DO-WHILE) is characterized by
plans that preteach vocabulary, integrate new knowledge, contextualize instruction,
and test current knowledge (A1, A2, and A4); while reference to a previous topic may
call for summarization and further internalization (Bl).
Fig. 3. Two [of the initial] 78 category models which exemplify relations in features and plans
Due to the imperfect and incomplete knowledge of its categories, CSPM must be
capable of incremental learning. In building new categories and updating existing
ones, the difficulty lies in deriving and self-improving the plans for each category.
Though it is plausible for CSPM to start expecting that the plan being sought for is
the plan local to the category model where the current student is classified to, there is
still no guarantee that the plan will immediately work for the student. A more accurate
behavior is for CSPM to acquire that plan but then slowly adapt and improve it to fit
the student. But if the student is classified to a new category, where and how can
CSPM derive this plan? CSPM demonstrates these two requisite intelligent behaviors
– find an initial plan, and then adapt and improve it – by utilizing unsupervised
machine learning techniques and heuristics for learning from experience.
find this category, CSPM uses an information-theoretic measure called cohesion and
applies it on the student attribute values. Unlike a Euclidean distance metric that sums
the attribute values independently, cohesion is a distance measure in terms of relations
between attributes. (For an elaborate analysis and discussion of this metric, we refer
the reader to [18]). Briefly, cohesion is defined as where
represents the average distance between the members of category C and represents
the average distance between C and all other categories. The category that is most
cohesive is the one that best maximizes the similarity among its members while
concurrently minimizing its similarity with other categories. CSPM pairs the new
category to one of the existing categories and treats this pair as one category, say NE.
Cohesion score can now be computed for NE and the rest of the existing categories
The computation is repeated, pairing each time the new category
with another existing one, until the cohesion score has been computed for all possible
pairs. The existing category in the pair that yields the highest is the nearest.
Once CSPM learns the nearest category, it immediately seeks the branches whose
goal and topic are identical to, or if not, resemble most (i.e., most similar in terms of
subgoals that comprise the goal, and in terms of syntax, semantics, and/or purpose
that describe the topic’s construct), those of the new category. CSPM finally adopts
the plan of the selected branch. Fig. 4 illustrates an outcome of this process. The new
model in (4a) was derived using a teaching scenario that is not among the initial ones.
Fig. 4. The figure in (b) describes the nearest category model learned by CSPM for the new
model in (a). CSPM adopts as initial modifiable plan the one in the selected branch or path (as
indicated by the shaded portion) of the nearest category
To alter the acquired initial plan – be it from the nearest category or from an
already existing one – towards an effective version, CSPM learns a map of alternative
plans which it will intelligently explore until it exploits the one it learned as effective.
560 R. Legaspi, R. Sison, and M. Numao
Fig. 5. The map of alternative plans for the new category model in Fig. 4a
Given the map, CSPM must intelligently explore it to mitigate the effect of random
selection. Intuitively, the best path is the one that resembles most the initial
modifiable plan. Using a plan-map matching heuristic, CSPM selects the subpath that
preserves most the initial plan’s activities and their sequence. With this exploration
becomes focused. Afterwards, the selected subpath is augmented with the other
necessary activities. CSPM follows the sampling heuristic of selecting first the most
A Category-Based Self-Improving Planning Module 561
frequent successions since they worked well in many, if not most, situations. With
this, exploration becomes prioritized. The category-effected subpath and the heuristic
values provide a guided exploration mechanism based on experience.
We demonstrate this learning task using the initial plan from Fig. 4b and the map
in Fig. 5. Executing the plan-map matching heuristic, CSPM selects the subpath
D5,D1,D4,D7,D2,A1. Notice that “recallElaboration” in the initial plan is removed
automatically, which is valid since reviewing the concepts is no longer among the
subgoals, and “giveNonExample” can be replaced with “giveDivergentExample(D7)”
since both can be used to discriminate between concepts. To determine which
activities are appropriate for the first subgoal, CSPM will heuristically sample the
successions. Once sampled, the edge value becomes zero to give way to other
successions. Lastly, depending on the student’s score after A1 is carried out, CSPM
directs the TM to A2 in case the student fails or to end the session if otherwise.
prevent the Q-learner from getting stuck to a sub-optimal version. Over time, is
gradually reduced and the Q-learner begins to exploit the plan it evaluates as optimal.
We run the Q-learner using new teaching scenarios as test cases differing in their
level of required learning tasks. We want to know (1) if and when CSPM can learn
the effective plans expected for these scenarios, (2) if it can self-improve efficiently,
and (3) if category knowledge is at all helpful in deriving the effective plans.
This last experiment has three different set-ups: (1) category knowledge and
heuristics are utilized; (2) category knowledge is removed and only heuristic is used;
and (3) CSPM randomly selected among possible successions. Each set-up simulates
the development of the same scenario for 50 successive stages; each stage is
characterized by a version of CSPM’s plan. Each version is evaluated vis-à-vis the
effective plan in the test scenario. CSPM’s learning performance is the mean
effectiveness in every stage across all test scenarios.
Analogous to a teacher improving and perfecting his craft, Fig. 6 (next page)
shows how CSPM’s learning performance becomes effective [asymptotically] over
time. It took so much time to learn an effective plan using heuristics alone, worst
random selection. When category knowledge is infused, however, CSPM acquired the
effective plans at an early stage. It can be expected that as more category background
knowledge are constructed prior to system deployment, and/or learned during on-line
interactions, a better asymptotic behavior can be achieved. Lastly, CSPM was able to
discover new plans, albeit without new to successions since it learned the new
plans using existing ones. However, this can be addressed by providing other viable
sources of new successions, for example, appropriate learner’s feedback which can be
incorporated as new workable paths to be evaluated in the succeeding stages.
References
1. Arroyo, I., Beck, J., Beal, C., Woolf, B., Schultz, K.: Macroadapting AnimalWatch to
gender and cognitive differences with respect to hint interactivity and symbolism.
Proceedings of the Fifth International Conference on Intelligent Tutoring Systems (2000)
574-583
A Category-Based Self-Improving Planning Module 563
2. Beck, J.: Directing Development Effort with Simulated Students, In: Cerri, S.A.,
Gouardes, G., Paraguacu, F. (eds.). Lecture Notes in Computer Science, 2363 (2002) 851-
860
3. Beck, J.E., Woolf, B.P., Beal, C.R.: ADVISOR: A machine learning architecture for
intelligent tutor construction. Proceedings of the Seventeenth National Conference on
Artificial Intelligence (2000) 552-557
4. Brusilovsky, P., Karagiannidis, C., Sampson, D.: The Benefits of Layered Evaluation of
Adaptive Applications and Services. International Conference on User Modelling,
Workshop on Empirical Evaluations of Adaptive Systems (2001) 1-8
5. Dillenbourg, P.: The design of a self–improving tutor: PROTO-TEG. Instructional Science,
18(3) (1989) 193-216
6. Gutstein, E.: SIFT: A Self-Improving Fractions Tutor. PhD thesis, Department of
Computer Sciences, University of Wisconsin-Madison (1993)
7. Hartley, J.R., Sleeman, D.H.: Towards more intelligent teaching systems. International
Journal of Man-machine Studies, 5 (1973) 215-236
8. Karagiannidis, C., Sampson, D.: Layered Evaluation of Adaptive Applications and
Services. Proceedings on International Conference on Adaptive Hypermedia and Adaptive
Web-Based Systems (2000) 343-346
9. Kimball, R.: A self-improving tutor for symbolic integration. In: Sleeman, D.H., and
Brown, J.S. (eds): Intelligent Tutoring Systems, London Academic Press (1982)
10. MacMillan, S.A., Sleeman, D.H.: An Architecture for a Self-improving Instructional
Planner for Intelligent Tutoring Systems. Computational Intelligence, 3 (1987) 17-27
11. Mayo, M., Mitrovic, A.: Optimising ITS Behaviour with Bayesian Networks and Decision
Theory. International Journal of Artificial Intelligence in Education, 12 (2001) 124-153
12. O’Shea, T.: A self-improving quadratic tutor. International Journal of Man-machine
Studies, 11 (1979) 97-124. Reprinted in: Sleeman, D.H., and Brown, J.S. (eds): Intelligent
Tutoring Systems, London Academic Press (1982)
13. Reyes, R.: A Case-Based Reasoning Approach in Designing Explicit Representation of
Pedagogical Situations in an Intelligent Tutoring System. PhD thesis, College of Computer
Studies, De La Salle University, Manila (2002)
14. Self, J.A.: Student models and artificial intelligence. Computers and Education, 3 (1977)
309-312
15. Singer, B., Veloso, M.: Learning state features from policies to bias exploration in
reinforcement learning. Proceedings of the Sixteenth National Conference on Artificial
Intelligence (1999) 981
16. Sison, R., Shimura, M.: Student modeling and machine learning. International Journal of
Artificial Intelligence in Education, 9 (1998) 128-158
17. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. Cambridge, MA: MIT
Press (1998)
18. Talmon, J.L., Fonteijn, H., Braspenning, P.J.: An Analysis of the WITT Algorithm.
Machine Learning, 11, (1993) 91-104
19. VanLehn, K., Ohlsson, S., Nason, R.: Applications of simulated students: An exploration.
Journal of Artificial Intelligence in Education, 5(2) (1994) 135-175
20. Vassileva, J., Wasson, B.: Instructional Planning Approaches: from Tutoring towards Free
Learning. Proceedings of Euro-AIED ’96 (1996) 1-8
21. Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning, 8, (1992) 279-292
AgentX: Using Reinforcement Learning to Improve the
Effectiveness of Intelligent Tutoring Systems
1 Introduction
The cost of tracing knowledge when students ask for help is high, as students need to
be monitored after each step of the solution. The ITS requires a special interface to
have the student interact with the system at each step, or explain to the tutoring system
what steps have been done. Such is the case of ANDES [6] or the CMU Algebra tutor
[8]. While trying to reduce the cost of Intelligent Tutoring Systems, one possibility is
to try to infer students flaws based on the answers they enter or the hints they ask for.
However, if students steps in a solution are not traced by asking the student after each
step of the solution, and the student asks for help, how do we determine what hints to
provide? One possibility is to show hints for the first step, and then for the second step
if the student keeps asking for help and so on. However, the assumption cannot be made
that the students seeking utility from the ITS are all at the same level when in fact even
within a single classroom, students will show a range of strengths and weaknesses. Some
students may need help with the first step, others may be fine with a summary of the first
step and need help on the second one. Efficiency could be improved by skipping hints
that aid on skills that the student already knows. In an ITS that gives hints to a student
in order to assist the student in reaching a correct solution, the hints are ordered by the
ITS developer and may not reflect the true nature of the help needed by the student.
Though feedback may be gathered through formative evaluations after the student has
used the system for future enhancements, traditional tutoring systems get no feedback
on the usefulness of the hints while the student is using the system.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 564–572, 2004.
© Springer-Verlag Berlin Heidelberg 2004
AgentX: Using Reinforcement Learning to Improve the Effectiveness 565
2 Related Work
There exist intelligent tutoring systems that have implored techniques from Machine
Learning (ML) in order to reduce the amount of knowledge engineering done at de-
velopment [1, 3, 4, 5, 7]. These systems are modified so that the configurability of the
system is done on the fly, making the system more adaptive to the student and simplifying
the need for rigid constructs at development. ADVISOR [3] is an ML agent developed
to simplify the structure of an ITS. ADVISOR parameterizes the teaching goals of the
system so that they rely less on expert knowledge a priori and can be adjusted as needed.
CLARISSE [1] is an ITS that uses Machine Learning to initialize the student model by
way of classifying the student into learning groups. ANDES [7] is a Newtonian physics
tutor that uses a Bayes Nets approach to create a student model to decide what type of
help to make available to the student by keeping track of the students progress within a
specific physics problem, their overall knowledge of physics and their abstract goals for
solving the problem.
Our goal is to combine methods of clustering students and predicting the type and
amount of help that is more useful to the student to boost overall efficiency of the ITS.
state space is then made up of all possible states that the agent can perceive and the
set of actions being all actions available to the agent from a perceived state, thus a
reward function maps perceived states of the environment to a single number, a reward,
indicating the intrinsic desirability of the state. The value of a state, is the total
amount of reward the agent can expect to accumulate over the future starting from that
state. So, a policy is said to be optimal if the value for all states in the state space are
optimal and the policy leads an agent from its current state through states that will lead
it to the state with the highest expected return, R.
4 Experimental Design
The experiments and setup referred to in this paper are based on theWayang Out-
post [2] web-based intelligent tutoring system. A simplified overview of the archi-
tecture of the Wayang system is as follows: A problem is associated with a set
of skills related to the problem. A problem has hints (that aid on a skill associ-
ated with the problem) for which order is significant. For the purpose of AgentX,
the skills have been mapped to distinct letters A, B, . . . , P and the hints are then
where the order of the hints are
preserved by their appearance in any problem (i.e., can never follow
or all hints for the problem have been shown. In order to reduce the complexity of the
state space, we consider a distinct path to be a sequence of skills. This reduction speeds
up the learning rate because it reduces the number of distinct states needed to be seen in
optimizing the policy since the set of skills is small. If in solving a problem, the student
could see hints that aid on the following sequence of skills (as
arriving at the solution to this problem involves steps that imply the use of these skills),
or some subsequence of this sequence, then Figure 2 shows all of the states associated
with this problem. Any subsequence of skills can be formed by moving up and to the
right (zero or more spaces) in the tree.
Wayang has the option that students take a computer-based pretest before using the ITS.
The pretest uncovers the students strengths and weaknesses as they relate to problems
in the system. With the computer-based pretest, information is easily accessible, and we
can reduce the state space even further by excluding hints from skills where the student
excels. In addition to the exclusion of the would be superfluous hints, we are able to use
information about the weaknesses the student exhibits by initializing the value of the
states that include skills of weakness with greater expected rewards, making the state
more desirable to the agent instead of initializing each state with the same state-value.
An action in this system can be seen as moving from state to state where a state is a
specific skill sequence containing skills that are related to the problem. Rewards occur
568 K.N. Martin and I. Arroyo
Fig. 2. Possible skill trajectories from the state subspace for problem P.
only at the end of each problem, then propagate back to all states which are sub-states
of the skill sequence.
4.3 Rewards
In acting in the system, the agent seeks states that will lead to greater rewards, then
updates the value of each state effected by the action at the end of the problem. In order
to guide the agent toward more desirable states, developing a reward structure that makes
incorrect answers worse as the student receives more hints and correct answers better as
students receive less hints allows us to shape the behavior of the action selection process
at each state (Table 1). The reward for each problem is then the sum of the rewards given
after each hint is seen. By influencing the agents with a reward [9] structure such as
this, getting at correct answers sooner seems most desirable and speed up the process of
reinforcement learning. The agent updates affected by the problem as it moves through
AgentX: Using Reinforcement Learning to Improve the Effectiveness 569
5 Experimental Setup
In creating the learning agent, we randomly generate student data for a student population
of 1000. The random data is in the form of pre-test evaluation scores that allows the
student to answer correctly, incorrectly, and not at all with different probabilities based
on the data generated from the pre-test evaluation (Equation 2).
As the student learns a skill, the probabilities are shifted away from answering incor-
rectly. Also, as the students actions are recorded the agent, the percentages for giving no
answer and incorrect answers are able to effect the probability weightings. The students
are first sorted. The randomized pretest produces a numerical score for each of the skills
utilized in the entire tutoring system, then we can use the harmonic mean of all scores to
sort students into the multiple learning levels. Learning levels are created by the students
expected success after having pretest results recorded and measured in percentiles. Table
2 shows the learning levels. Any student with no pretest data available is automatically
placed into learning level L4 since it contains the students who perform in the 50th per-
centile. Once the clusters are formed, after a short period of question answering (after
x problems are attempted, where x is a small number such as 3 or 4), the students are
able to change clusters based on their success within the tutor. The current success is
measured by actions in percentages as seen in Equation 3.
570 K.N. Martin and I. Arroyo
So, answering correctly after each hint seen is 100% success and answering correctly
after two hints are seen is 50% if the student answers incorrectly after the first hint and
100% if the student does not answer after the first hint.
While the learning levels are meant to achieve a certain amount of generalization
over students, it is true that students who are in will perform better over all skills than
that of any other grouping of students, which is why it is sufficient to use these learning
levels even though the students may have different strengths. By the time students attain
they will be good at most skills and need fewer hints. While students in the middle
levels will show distinctive strengths and weaknesses, but at different degrees of success,
allowing the learning levels to properly sort them based on success. This clarifies a goal of
the system to be able to cluster all students into Figure 4 shows the initial population
of students within learning levels.
Fig. 4. Student population in learning levels before any problems are attempted.
AgentX: Using Reinforcement Learning to Improve the Effectiveness 571
6 Results
Fig. 5. Student population in learning levels after each student has attempted 15 problems.
572 K.N. Martin and I. Arroyo
7 Conclusions
Using Reinforcement Learning agents can help to dynamically customize ITSs. In this
paper we have shown that it is possible to boost student performance and a method for
increasing the efficiency of an ITS after a small number of problems have been seen by
incorporating a student model that allows the system to cluster students into learning
levels and by choosing subsequences of all possible hints for a problem instead of simply
showing all possible hints available to that problem. Defining a reward structure based
on a students progress within a problem and allowing their response to affect a region of
other, similar students reduces the need to see more distinct problems in creating a policy
for how to act when faced with new skill sets and the need to solicit student feedback
after each hint. With the goal to increase membership in learning level L1 (90–100%
success), which directly relates to the notion of increasing the efficiency of the system,
we have shown that using an RL agent within an ITS can accomplish this.
References
[1] Esma Aimeur, Gilles Brassard, Hugo Dufort, and Sebastien Gambs. CLARISSE: A Machine
Learning Tool to Initialize Student Models. In the Proceedings of the 6th International
Conference on Intelligent Tutoring Systems. 2002.
[2] Carole R. Beal, Ivon Arroyo, James M. Royer, and Beverly P. Woolf. Wayang Outpost: A
web-based multimedia intelligent tutoring system for high stakes math achievement tests.
Submitted to AERA 2003.
[3] Joseph E. Beck, Beverly P. Woolf, and Carole R. Beal. ADVISOR: A machine learning archi-
tecture for intelligent tutor construction. In the Proceedings of the 17th National Conference
On Artificial Intelligence. 2000.
[4] Joseph E. Beck and Beverly P.Woolf. High-level Student Modeling with Machine Learning.
In the Proceedings of the 5th International Conference on Intelligent Tutoring Systems.
2000.
[5] Joseph E. Beck and Beverly P. Woolf. Using a Learning Agent with a Student Model. In the
Proceedings of the 4th International Conference on Intelligent Tutoring Systems. pp. 6-15.
1998.
[6] Gertner, A. and VanLehn, K. Andes: A Coached Problem Solving Environment for Physics.
In the Proceedings of the 5th International Conference, ITS 2000, Montreal Canada, June
2000.
[7] Gertner, A, Conati, C, and VanLehn, K.. Procedural help in Andes: Generating hints using
a Bayesian network student model. In the Proceedings of the 15th National Conference on
Artificial Intelligence. Madison, Wisconsin. 1998.
[8] Koedinger, K. R., Anderson, J.R., Hadley, W.H., and Mark, M . A. Intelligent tutoring goes
to school in the big city. International Journal of Artificial Intelligence in Education, 8,
30-43. 1997.
[9] Adam Laud and Gerald DeJong. The Influence of Reward on the Speed of Reinforcement
Learning. In the Proceedings of the 20th International Conference on Machine Learning
(ICML-2003). 2003.
[10] R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cam-
bridge, MA. 1998.
An Intelligent Tutoring System Based on Self-Organizing
Maps – Design, Implementation and Evaluation
1 Introduction
Since 1950, the computer has been employed in Education as an auxiliary tool to-
wards successful learning [1] with Computer-Assisted Instruction (CAI). The inclu-
sion of (symbolic) intelligent techniques has introduced the Intelligent Computer-
Assisted Instruction (ICAI) or Intelligent Tutoring Systems (ITS). The adaptation to
the personal user features is one of the main characteristics of this new paradigm [2].
Despite the evolution of ICAI systems, the tutoring methods are basically defined
by the expert conceptual knowledge and by the user learning behavior during the
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 573–579, 2004.
© Springer-Verlag Berlin Heidelberg 2004
574 W. Martins and S. Diniz de Carvalho
tutoring process. Besides, the development of such systems is limited to the field of
symbolic Artificial Intelligence (AI).
In this article, the use of the widest spread subsymbolic model, artificial neural
networks, is proposed with an original methodology of content engineering (instruc-
tional design). Additionally, some experiments are reported in order to compare the
proposed system with another system where content navigation is decided by the user
free-will. These navigations are evaluated and the best ones are extracted to build the
neural training set. Alencar [3] has introduced this idea with no empirical evidence.
He has shown that multilayer perceptrons (MLP) networks [6] could find important
patterns for the development of dynamic lesson generation (automatic guided content
navigations). Our work employs a different neural model, self organizing maps
(SOM), which adaptively builds topological ordered maps with reduction of dimen-
sionality.
The main difference between this proposal and traditional ICAI systems is related
to the need of expert knowledge. No expert knowledge is required in our work.
Self-organizing maps were introduced by Teuvo Kohonen [4]. They have biological
plausibility since similar maps have been found in the brain. After the training has
taken place, neurons with similar functions are situated at the same region. The dis-
tance between neurons shows the difference of their responses. Similar stimuli are
recognized (lead to highest responses) by the same set of neurons which are at the
same region of the topological ordered map.
Self-organizing maps are composed basically by one layer (if it is not considered
the input layer, where each input is perceived by one neuron), see Fig. 1. Training
implements competitive learning: neurons compete to respond to specific input pat-
terns, the ones that are more similar with their own prototypes (which are realized by
the synaptic weights). Neurons are locally connected by a soft scheme. Not only is the
most excited neuron involved at the adaptation process but also the ones in his neigh-
borhood. Therefore, not just one neuron learns to respond more specifically but the
entire region nearby.
The specification of the winner neuron is performed typically by using the Eucli-
dian distance between the neuron prototype and the current input pattern [5]. Fig. 2
shows and example of topological map built to order a set of colors (represented by
red, green and blue components). At the end of the training, neurons at the same re-
An Intelligent Tutoring System Based on Self-Organizing Maps 575
gion are focused at similar colors. Two distant neurons respond better to very differ-
ent colors.
The initialization of neurons prototypes are done at random. Sometimes, this tactic
is abandoned if the examples are not very spread in the input space (for instance, the
colors are all redish). An alternative is the use of randomly chosen samples from the
training set. SOM training is conducted in two phases. The first one is characterized
by global ordering and fast decreasing of neighborhood while the second one does
local and minor adjustments [8].
The definition of the winner neuron in Self-Organizing Maps could be done by
using several metrics. The commonest procedure is the identification of the neuron
that has the smallest Euclidian distance in relation to the presented input [4]. This
distance can be calculated as shown below.
where:
is the distance between the j-th neuron and n-dimensional input pattern
is the i-th dimension of the input pattern
is the conexion weight of the neuron related to the i-th dimension of the
input pattern.
2 Proposed System
The idea of creating an intelligent tutoring system, capable of dynamic lesson gen-
eration, based on neural networks has been originated from the interest of developing
a system able to decide without expert advice. Such constraint is commonly found in
the literature [7].
In the proposed system, neural networks are responsible for the decision making.
They are trained to imitate the best content navigations that have been encountered
576 W. Martins and S. Diniz de Carvalho
when users have been guided by their free-will. Notice that the control group is also
the source of knowledge needed to train the neural networks employed in the experi-
mental group. Our target is to produce faster content navigation with performance
similar to the best occurrences in free navigation.
The first phase is the data collection originated by free navigation. Fig. 3 shows its
dynamics and, in particular, the content engineering. Lessons are organized in se-
quences of topics. Each topic defines a context. Each context is expressed in five
levels: intermediary, easy, advanced, examples and faq (frequent answered ques-
tions). The last two levels are considered auxiliary of the others. The intermediary
level is the entry point of every context. The advanced level includes extra informa-
tion in order to keep the interest of advanced students. The easy level, on the other
hand, simplifies the intermediary context in an attempt to reach the student compre-
hension. The example level is intended to students that perceive things by concrete
situations. The faq level tries to anticipate questions commonly found in the process
of learning that specific content.
After the contact with each level (in all contexts), learners face a multiple-choice
exercise. Before the lesson starts, there is a need to introduce aspects of the environ-
ment to the learner and to implement an initial evaluation. After the lesson, there is a
final test in order to measure the resulting retention of information (that will serve as
an estimate of the learning efficiency).
2.1 Implementation
Despite the typical use of two-dimensional SOM, we have opted in favor of uni-
dimensional SOM disposed at a ring topology (with 10 neurons each). The training of
each SOM was completed after 5,400 cycles. Each SOM was evaluated for global
ordering and accuracy. To force SOMs to decide on destinations within the tutor,
there is a need to label each neuron. This labeling was carried out by a simple ranking
rule where the neuron responds the destiny to which it was more similar (in the sense
of average Euclidian distance) at the training set. If a neuron has been more respon-
sive to situations where the next destiny is the next context so this is its label, its deci-
sion when it is the most excited neuron of the map (refer to [9] for details).
2.2 Experiments
Students (randomly chosen) from the first year of Computer Engineering and Infor-
mation Systems from the State University of Goiás were taken to test our hypotheses.
Some instruction was given to the students to explain how the system works. Individ-
ual sessions were kept below one hour. The experimental design has involved, there-
fore, two independent samples. Initial and final tests were composed by 11 questions
each. The level of correctness and the time latency were recorded throughout the
student session.
Twenty two students have been submitted to the free navigation. One of them was
discarded because he has shown no improvement (by the comparison of final and
initial evaluations).
The subject of the tutor was “First Concepts in Informatics” and was structured in
11 contexts (with 5 levels each). As a consequence, 55 SOM networks were trained.
The visits to these contexts and exercises have produced 1,418 records.
2.3 Results
With respect to session duration, a relevant aspect in every learning process (particu-
larly in web training), we have performed the comparison between control and ex-
perimental groups by taking out initial and final tests. Fig. 4 shows average session
duration at each group. By applying the t-test, we have confirmed the hypothesis of
significant less time spent by the experimental group (an approximately 10-minute
difference has occurred in average).
The application of the t-test has resulted in an observed t of 2.65. By using level of
significance of 5% and degree of freedom (df) of 39, the critical t is 1.68. Therefore,
the observed t statistic is within the critical zone and the null hypothesis (that states
no significant differences) should be rejected in favor of the experimental hypothesis.
With respect to the improvements shown by means of the initial and final tests, we
have compared the control and experimental groups by employing the t-test again. By
doing so, we have tried to assess the learning efficiency of both methods.
578 W. Martins and S. Diniz de Carvalho
Fig. 5 shows the average of corrected answers in both tests. One can see that the
control group has produced slightly better averages. In fact, these differences are not
significant when inferential statistics are employed. The observed value of the t was
1.55. As before, the critical t is 1.68 and there are 39 degrees of freedom when a level
of significance of 5% is used. In this situation, the observed value is outside the criti-
cal zone and the null hypothesis should not be rejected based on this empirical evi-
dence. Therefore, we should not reject the hypothesis that observed differences have
occurred by chance (and/or sample error). Furthermore, one should notice the occur-
rence of relevant improvement in both groups. In the end, students have more than
doubled their corrected answers. We should remind that each test is composed of 11
questions (one question for each one of the contexts).
3 Conclusion
This article has formalized the proposal of an intelligent tutoring system based on
self-organizing maps (also known as Kohonen maps) without any use of expert
knowledge. Additionally, we have implemented this proposal in web technology and
An Intelligent Tutoring System Based on Self-Organizing Maps 579
tested it with two groups in order to contrast free and intelligent navigation. The con-
trol (free navigation) group is also the source of examples for SOM training.
The content is organized in a sequence of contexts. Each context is expressed in 5
levels: intermediary, easy, advanced, examples and frequent answered questions. The
subject of the implemented tutor was “First Concepts in Informatics” and was struc-
tured in 11 contexts. This structure is modular and easily applied to other subjects
which is an important feature of the proposed system.
Results from experimental work have shown significant differences on session du-
ration with no loss of learning. This work contributes in the sense of presenting a new
model for the creation of intelligent tutoring systems. We are not claiming its superi-
ority but its right for consideration in specific situations or in the design of hybrid
systems.
References
1 Introduction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 580–591, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Modeling the Development of Problem Solving Skills in Chemistry 581
Fig. 1. HAZMAT This composite screen shot of Hazmat illustrates the challenge to the student
and shows the menu items on the left side of the screen. Also shown are two of the test items
available. The item in the upper left corner shows the result of a precipitation reaction and the
frame at the lower left is the result of flame testing the unknown
582 R. Stevens et al.
To ensure that students gain adequate experience, this problem set contains 34 cases
that can be performed in class, assigned as homework, or used for testing. These cases
are of known difficulty from item response theory analysis (IRT [14]), helping
teachers select “hard” or “easy” cases depending on their student’s ability [15].
Developing learning trajectories from these sequences of intentional student actions is
a two-stage process. First, the strategies used on individual cases of a problem set are
identified and classified with artificial neural networks (ANN) [16], [15], [17], [18].
Then, as students solve additional problems, the sequences of strategies are modeled
into performance states by Hidden Markov Modeling (HMM) [19].
The most common student approaches (i.e. strategies) to solving Hazmat are
identified with competitive, self-organizing artificial neural networks (SOM) using
the student’s selections of menu items as they solve the problem as input vectors [15],
[17]. Self-organizing maps learn to recognize groups of similar performances in such
a way that neurons near each other in the neuron layer respond to similar input vectors
[20]. The result is a topological ordering of the neural network nodes according to the
structure of the data where geometric distance becomes a metaphor for strategic
similarity. Often we use a 36-node neural network and train with between 2000-5000
performances derived from students with different ability levels (i.e. regular, honors
and AP high school and university freshmen) and where each student performed at
least 6 problems of the problem set. Selection criteria for the numbers of nodes, the
different architectures, neighborhoods, and training parameters have been described
previously [17]. The components of each strategy in this classification can be
visualized for each of the 36 nodes by histograms showing the frequency of items
selected (Figure 2).
Fig. 2. Sample Neural Network Nodal Analysis. A. This analysis plots the selection frequency
of each item for the performances at a particular node (here, node 15). General categories of
these tests are identified by the associated labels. This representation is useful for determining
the characteristics of the performances at a particular node, and the relation of these
performances to those of neighboring neurons. B. This figure shows the item selection
frequencies for all 36 nodes following training with 5284 student performances
Modeling the Development of Problem Solving Skills in Chemistry 583
Most strategies defined in this way consist of items that are always selected for
performances at that node (i.e. those with a frequency of 1) as well as items that are
ordered more variably. For instance, all Node 15 performances shown in Figure 2 A
contain the items 1 (Prologue) and 11 (Flame Test). Items 5, 6, 10, 13, 14, 15 and 18
have a selection frequency of 60 - 80% and so any individual student performance
would contain only some of these items. Finally, there are items with a selection
frequency of 10-30%, which we regard more as background noise.
Figure 2 B is a composite ANN nodal map, which illustrates the topology generated
during the self-organizing training process. Each of the 36 graphs in the matrix
represents one node in the ANN, where each individual node summarizes groups of
similar students problem solving performances automatically clustered together by the
ANN procedure. As the neural network was trained with vectors representing the
items students selected, it is not surprising that a topology developed based on the
quantity of items. For instance, the upper right hand of the map (nodes 6, 12)
represents strategies where a large number of tests have been ordered, whereas the
lower left corner contains strategies where few tests have been ordered.
A more subtle strategic difference is where students select a large number of
Reactions and Chemical Tests (items 15-21), but no longer use the Background
Information (items 2-9). This strategy is represented in the lower right hand corner of
Figure 2 B (nodes 29, 30, 34, 35, 36) and is characterized by extensive selection of
items mainly on the right-hand side of each histogram. The lower-left hand corner and
the middle of the topology map suggest more selective picking and choosing of a few,
relevant items. In these cases, the SOM’s show us that the students are able to solve
the problem efficiently, because they know and select those items that impact their
decision processes the most, and which other items are less significant.
Once ANN’s are trained and the strategies represented by each node defined, then
new performances can be tested on the trained neural network, and the node (strategy)
that best matches the new performance can be identified. Were a student to order
many tests while solving a Hazmat case, this performance would be classified with
the nodes of the upper right hand corner of Figure 2 B, whereas a performance where
few tests were ordered would be more to the left side of the ANN map. The strategies
defined in this way can be aggregated by class, grade level, school, or gender, and
related to other achievement and demographic measures. This classification is an
observable variable that can be used for immediate feedback to the student, serve as
input to a test-level scoring process, or serve as data for further research.
This section describes how we can use the ANN performance classification procedure
described in the previous section to model student learning progress over multiple
problem solving cases. Here students perform multiple cases in the 34-case Hazmat
problem set, and we then classify each performance with the trained ANN (Table 1).
Some sequences of performances localize to a limited portion of the ANN topology
map like examples 1 or 3, suggesting only small shifts in strategy with each new
performance. Other performance sequences, like example 2 show localized activity on
584 R. Stevens et al.
the topology activity early in the sequence followed by large topology shifts
indicating more extensive strategy shifts. Others illustrate diverse strategy shifts
moving over the entire topology map (i.e. examples 4, 5).
Both of these features are shown in Figure 3 with the transitions between the different
states in the center, and the ANN nodes representing each state at the periphery.
States 1, 4, and 5 appear to be absorbing states as these strategies once used are likely
to be used again. In contrast, students adopting State 2 and 3 strategies are less likely
to persist with those states but are more likely to transit to another state. When the
emission matrix of each state was overlaid on the 6 x 6 neural network grid, each state
(Figure 3), represented topology regions of the neural network that were often
contiguous (with the exception of State 4).
Fig. 3. Mapping the HMM Emission and Transition Matrices to Artificial Neural Network
Classifications. The five states comprising the HMM for Hazmat are indicated by the central
circles with the transitions between the states shown by the arrows. Surrounding the states are
the artificial neural network nodes most closely associated with each state
2 Results
As we wish to use the HMM to determine how students strategic reasoning changes
with time, we performed initial validation studies to determine 1) how the state
distribution changes with the number of cases performed, 2) whether these changes
reflect learning progress, and, 3) whether the changes over time ‘make sense’ from the
perspective of novice/expert cognitive differences.
The overall solution frequency for the Hazmat dataset (N= 7630 performances) was
56%, but when students’ performance was mapped to their strategy usage as mapped
586 R. Stevens et al.
by the HMM states these states revealed the following quantitative and qualitative
characteristics:
State 1 – 55% solution frequency showing variable numbers of test items and little
use of Background Information;
State 2 – 60% solution frequency showing equal usage of Background Information
as well as action items; little use of precipitation reactions.
State 3 – 45% solution frequency with nearly all items being selected.
State 4 – 54% solution frequency with many test items and limited use of
Background Information.
State 5 – 70% solution frequency with few items selected Litmus test and Flame
tests uniformly present.
We next profiled the states for the dynamics of state changes, and possible gender and
group vs. individual performance differences.
Dynamics of State Changes. Across 7 Hazmat performances the solved rate
increased from 53% (case 1) to 62% (case 5) (Pearson and this was
accompanied by corresponding state changes (Figure 4). These longitudinal changes
were characterized by a decrease in the proportions of States 1 and 3 performances
and an increase and then decrease in State 2 performances and a general increase in
State 5 (with the highest solution frequency).
Fig. 4. Dynamics of HMM State Distributions with Experience and Across Classrooms. The
bar chart tracks the changes in all student strategy states (n=7196) across seven Hazmat
performances. Mini-frames of the strategies in each state are shown for reference
Group vs. Individual Performance. In some settings the students worked on the
cases in teams of 2-3 rather than individually. Group performance significantly
increased the solution frequency from a 51% solve rate for individuals to 63% for the
students in groups. Strategically, the most notable differences were the maintenance
of State 1 as the dominant state, the nearly complete lack of performances in States 2
and 3, and the more rapid adoption of State 4 performances by the groups (Figure 5).
Modeling the Development of Problem Solving Skills in Chemistry 587
In addition, the groups stabilized their performances faster, changing little after the
third performance whereas males and females stabilized only after performance 5.
This makes sense because states 2 and 3 represent transitional phases that students
pass through as they develop competence. Collaborative learners may spend less time
in these phases if group interaction indeed helps students see multiple perspectives
and reconcile different viewpoints [23].
Also, shown in Figure 5 are the differences in the state distribution of performances
across males and females ((Pearson While there was a steady
reduction in State 1 performances for both groups, the females entered State 2 more
rapidly and exited more rapidly to State 5. These differences became non-significant
at the stable phase of the trajectories (performances 6 and 7). Thus males and females
have different learning trajectories but appear to arrive at similar strategy states.
Ability and State Transitions. Learning trajectories were then developed according
to student ability as determined by IRT. For these studies, students were grouped into
high (person measure = 72-99, n = 1300), medium (person measure 50-72, n = 4336)
and low (person measure 20-50, n = 1994) abilities. As expected from the nature of
IRT, the percentage solved rate correlated with student ability. What was less
expected was that when the solved rate by ability was examined for the sequence of
performances, the students with the lowest ability had not only the highest solved rate
on the first performance, but also one that was significantly better than the highest
588 R. Stevens et al.
ability students (57% vs 44% n = 866, p< 0.00). Predictably, this was rapidly reversed
on subsequent cases. To better understand these framing differences a cross-tabulation
analysis was conducted between student ability and neural network nodal
classifications on the first performances. This analysis highlighted nodes 3, 4, 18, 19,
25, 26, and 31 as having the highest residuals for the low ability students, and nodes
5, 6, 12, and 17 for the highest ability students. From this data, it appeared that the
higher ability students more thoroughly explored the problem space on their first
performance, to the detriment of their solution frequency, but took advantage of this
knowledge on subsequent performances to improve their strategies. These
improvements during the transition and stabilization stages include increased use of
State 5 performances, and decreased use of States 1 and 4; i.e. they become both more
efficient and effective
Predicting Future Student Strategies. An additional advantage of a HMM is that
predictions can be made regarding the student’s learning trajectory. The prediction
accuracy was tested in the following way. First, a ‘true’ mapping of each node and the
corresponding state was conducted for each performance of a performance sequence.
For each step of each sequence, i.e. going from performance 2 to 3, or 3 to 4, or 4 to
5, the posterior state probabilities of the emission sequence (ANN nodes) were
calculated to give the probability that the HMM is in a particular state when it
generated a symbol in the sequence, given that the sequence was emitted. For
instance, ANN nodal sequence [6 18 1] mapped to HMM states (3 4 4). Then, this
‘true’ value is compared with the most likely value obtained when the last sequence
value was substituted by each of the 36 possible emissions representing the 36 ANN
nodes describing the student strategies. For instance, the HMM calculated the
likelihood of the emission sequences, [6 18 X] in each case where X = 1 to 36. The
most likely emission value for X (the student’s most likely next strategy) was given
by the sequence with the highest probability of occurrence, given the trained HMM.
The student’s most likely next performance state was then given by the state with the
maximum likelihood for that sequence.
Comparing the ‘true’ state values with the predicted values estimated the predictive
accuracy of the model at nearly 90% (Table 2). As the performance sequence
increased, the prediction rate also increased, most likely reflecting that by
performances 4, 5 and 6, students are repeatedly using similar strategies.
3 Discussion
The goal of this study was to explore the use of HMMs to begin to model how
students gain competence in domain-specific problem solving. The idea of ‘learning
trajectories’ is useful when thinking about how students progress on the road to
competence [24]. These trajectories are developed from the different ways that
Modeling the Development of Problem Solving Skills in Chemistry 589
novices and experts think and perform in a domain, and can be thought of as defining
stages of understanding of a domain or discipline [4]. During early learning, students’
domain knowledge is limited and fragmented, the terminology is uncertain and it is
difficult for them to know how to properly frame problems. In our models, this first
strategic stage is best represented by State 3 where students extensively explore the
problem space and select many of the available items. As expected, the solved rate for
such a strategy was poor. This approach is characteristic of surface level strategies or
those built from situational (and perhaps inaccurate) experiences. From the transition
matrix in Figure 4, State 3 is not an absorbing state and most students move from this
strategy type on subsequent performances.
With experience, the student’s knowledge base becomes qualitatively more structured
and quantitatively deeper and this is reflected in the way competent students, or
experts approach and solve difficult domain-related problems. In our model States 2
and 4 would best represent the beginning of this stage of understanding. State 2
consists of an equal selection of background information and test information,
suggesting a lack of familiarity of the nature of the data being observed. State 4 on the
other hand shows little/no selection of background information but still extensive and
non-discriminating test item selection. Whereas State 2 is a transition state, State 4 is
an absorbing state - perhaps one warranting intervention for students who persist with
strategies represented by this state.
Once competence is developed students would be expected to employ both effective
and efficient strategies. These are most clearly shown by our States 1 and 5. These
states show an interesting dichotomy in that they are differentially represented in the
male and female populations with males having a higher than expected number of
State 1 strategies and females higher than expected State 5 strategies.
The solution frequencies at each state provide an interesting view of progress. For
instance, if we compare the earlier differences in solution frequencies with the most
likely state transitions from the matrix shown in Figure 4, we see that most of the
students who enter State 3, having the lowest problem solving rate (45%), will transit
either to State 2 or 4. Those students who transit from State 3 to 2 will show on
average a 15% performance increase (from 45% to 60%) and those students who
transit from States 3 to 4 will show on average a 9% performance increase (from 45%
to 54%). The transition matrix also shows that students who are performing in State 2
(with a 60% solve rate) will tend to either stay in that state, or transit to State 5,
showing a 10% performance increase (from 60% to 70%). This analysis shows that
students’ performance increases as they solve science inquiry problems through the
IMMEX Interactive Learning Environment, and that by using ANN and HMM
methods, we are able to track and understand their progress.
When given enough data about student’s previous performances, our HMM models
performed at over 90% accuracy when tasked to predict the most likely problem
solving strategy the student will apply next. Knowing whether or not a student is
likely to continue to use an inefficient problem solving strategy allows us to
determine whether or not the student is likely to need help in the near future. Perhaps
more interestingly, however, is the possibility that knowing the distribution of
students’ problem solving strategies and their most likely future behaviors may allow
us to strategically construct collaborative learning groups containing heterogeneous
590 R. Stevens et al.
References
1. Anderson, J.D.,(1980). Cognitive psychology and its implications. San Francisco: W.H.
Freeman
2. Chi, M. T. H., Glaser, R., and Farr, M.J. (eds.), (1988). The Nature of Expertise, Hillsdale,
Lawrence Erlbaum, pp 129-152
3. Chi, M. T. H., Bassok, M., Lewis, M. W., Reinmann, P., and Glaser, R. (1989). Self-
Explanations: how students study and use examples in learning to solve problems.
Cognitive Science, 13, 145-182
4. VanLehn, K., (1996). Cognitive Skill Acquisition. Annu. Rev. Psychol 47: 513-539
5. Schunn, C.D., and Anderson, J.R. (2002). The generality/specificity of expertise in
scientific reasoning. Cognitive Science
6. Corbett, A. T. & Anderson, J. R. (1995). Knowledge tracing: Modeling the acquisition of
procedural knowledge. User Modeling and User-Adapted Interaction, 4, 253-278
7. Schunn, C.D., Lovett, M.C., and Reder, L.M. (2001). Awareness and working memory in
strategy adaptivity. Memory & Cognition, 29(2); 254-266
8. Haider, H., and Frensch, P.A. (1996). The role of information reduction in skill
acquisition. Cognitive Psychology 30: 304-337
9. Alexander, P., (2003). The development of expertise: the journey from acclimation to
proficiency. Educational Researcher, 32: (8), 10-14
10. Stevens, R.H., Ikeda, J., Casillas, A., Palacio-Cayetano, J., and S. Clyman (1999).
Artificial neural network-based performance assessments. Computers in Human Behavior,
15: 295-314
11. Underdahl, J., Palacio-Cayetano, J., and Stevens, R., (2001). Practice makes perfect:
assessing and enhancing knowledge and problem-solving skills with IMMEX software.
Learning and Leading with Technology. 28: 26-31
12. Lawson, A.E. (1995). Science Teaching and the Development of Thinking. Wadsworth
Publishing Company, Belmont, California
13. Olson, A., & Loucks-Horsley, S. (Eds). (2000). Inquiry and the National Science
Education Standards: A guide for teaching and learning. Washington, DC: National
Academy Press
14. Linacre, J.M. (2004). WINSTEPS Rasch measurement computer program. Chicago.
Winsteps.com
Modeling the Development of Problem Solving Skills in Chemistry 591
15. Stevens, R.H., and Najafi K. (1993). Artificial neural networks as adjuncts for assessing
Medical students’ problem-solving performances on computer-based simulations.
Computers and Biomedical Research 26(2), 172-187
16. Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing: Explorations
in the Microstructure of Cognition. Volume 1: Foundations. Cambridge, MA: MIT Press
17. Stevens, R., Wang, P., Lopo, A. (1996). Artificial neural networks can distinguish novice
and expert strategies during complex problem solving. JAMIA vol. 3 Number 2 p 131-138
18. Casillas, A.M., Clyman, S.G., Fan, Y.V., and Stevens, R.H. (1999). Exploring alternative
models of complex patient management with artificial neural networks. Advances in
Health Sciences Education 1: 1-19, 1999
19. Rabiner, L., (1989). A tutorial on Hidden Markov Models and selected applications in
speech recognition. Proc. IEEE, 77: 257-286
20. Kohonen, T., 2001. Self Organizing Maps. 3rd extended edit. Springer, Berlin, Heidelberg,
New York
21. Soller, A. (2004). Understanding knowledge sharing breakdowns: A meeting of the
quantitative and qualitative minds. Journal of Computer Assisted Learning (in press)
22. Soller, A., and Lesgold, A. (2003). A computational approach to analyzing online
knowledge sharing interaction. Proceedings of Artificial Intelligence in Education, 2003.
Australia, 253-260
23. Lesgold, A., Katz, S., Greenberg, L., Hughes, E., & Eggan, G. (1992). Extensions of
intelligent tutoring paradigms to support collaborative learning. In S. Dijkstra, H.
Krammer, & J. van Merrienboer (Eds.), Instructional Models in Computer-Based Learning
Environments. Berlin: Springer-Verlag, 291-311
24. Lajoie, S.P. (2003). Transitions and trajectories for studies of expertise. Educational
Researcher, 32: 21-25
25. Giordani, A., & Soller, A. (2004). Strategic Collaboration Support in a Web-based
Scientific Inquiry Environment. European Conference on Artificial Intelligence,
“Workshop on Artificial Intelligence in Computer Supported Collaborative Learning”,
Valencia, Spain
Pedagogical Agent Design: The Impact of Agent Realism,
Gender, Ethnicity, and Instructional Role
Abstract. In the first of two experimental studies, 312 students were randomly
assigned to one of 8 conditions, where agents differed by ethnicity (Black,
White), gender (male, female), and image (realistic, cartoon), yet had identical
messages and computer-generated voice. In the second study, 229 students
were randomly assigned to one of 12 conditions where agents represented dif-
ferent instructional roles (expert, motivator, and mentor), also differing by eth-
nicity (Black, White), and gender (male, female). Overall, it was found that
students had greater transfer of learning when the agents had more realistic im-
ages and when agents in the “expert” role were represented non-traditionally
(as Black versus White). Results also generally confirmed prior research where
agents perceived as less intelligent lead to significantly improved self-efficacy.
The presence of motivational messages, as employed through the motivator and
mentor agent roles, led to enhanced learner self-regulation and self-efficacy.
Results are discussed with respect to social cognitive theory.
1 Introduction
Pedagogical agent design has recently been placing greater emphasis on the impor-
tance of the agent as an actor rather than as a tool (Persson, Laaksolahti, & Lonnqvist,
2002), thus focusing on the agent’s implicit social relationship with the learner. The
social cognitive perspective in teaching and learning emphasizes the importance that
social interaction (e.g., Lave & Wenger, 2001; Vygotsky, Cole, John-Steiner, Scrib-
ner, & Souberman, 1978) plays in contributing to motivational outcomes such as
learner self-efficacy (Bandura, 2000) and self-regulation (Zimmerman, 2000).
According to Bandura (1997), attribute similarities between a social model and a
learner, such as gender, ethnicity, and competency, often have predictive significance
for the learner’s efficacy beliefs and achievements. Similarly, pedagogical agents of
the same gender or ethnicity or similar competency as learners’ might be viewed as
more affable and could instill strong efficacy beliefs and behavioral intentions to
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 592–603, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Pedagogical Agent Design: The Impact of Agent Realism 593
learners. Learners may draw positive judgments about their capabilities when they
observe agents who demonstrate successful performance.
Even so, while college students were not more likely to choose to work with an agent
of the same gender (Baylor, Shen, & Huang, 2003), in a between-subjects study they
were more satisfied with their performance and reported that the agent better facili-
tated self-regulation if it was male (Baylor & Kim, 2003). Similarly, Moreno and
colleagues (2002) revealed that learners applied gender stereotypes to animated
agents, and this stereotypic expectation affected their learning. With respect to the
ethnicity of pedagogical agents, empirical results do not provide consistent results. In
both a computer-mediated communication and an agent environment, participants
who had similar-ethnicity partners than those with different-ethnicity partners pre-
sented more persuasive and better arguments; elicited more conformity to the part-
ners’ opinions; and perceived their partners as more attractive and trustworthy (Lee &
Nass, 1998). In a more recent study, Baylor and Kim (2003b) examined the impact of
pedagogical agents’ ethnicity on learners’ perception of the agents. Undergraduate
participants who worked with pedagogical agents of the same ethnicity rated the
agents as more credible, engaging, and affable than those who worked with agents of
different ethnicity. However, Moreno and colleagues (2002) indicated that the ethnic-
ity of pedagogical agents did not influence students’ stereotypic expectations or
learning.
Given their function for supporting learning, pedagogical agents must also
represent different instructional roles, such as expert, instructor, mentor, or learning
companion. These roles also may interact with the agent’s gender and ethnicity given
that human social relationships influence their perceptions and understanding in gen-
eral (Dunn, 2000). In a similar fashion, the instructional roles of the pedagogical
agents may influence the perceptions or expectations of and the social bonds with
learners. Along this line, Baylor and Kim (2003c, in press) showed that distinct roles
for pedagogical agents—as expert, motivator, and mentor—significantly influenced
the learners’ perceptions of the agent persona, self-efficacy, and learning.
Lastly, Norman (1994; 1997) expressed concerns about human-like inter-
faces. If an interface is anthropomorphized too realistically, people tend to form unre-
alistic expectations. That is, a too realistic human-like appearance and interaction can
be deceptive and misleading by implying promises of functionality that can be never
reached. On the other hand, socially intelligent agents are of “no virtual difference”
from humans (Vassileva, 1998) and can provoke “illusion of life” (Hays-Roth &
Doyle, 1998), thus impressing the learners interacting with a “living” virtual being
(Rizzo, 2000). So, we may inquire how realistic agent images should be to establish
social relations to learners. Norman argues that people will be more accepting of an
intelligent interface when their expectation matches with its real functionality. What
extent of agent realism will match learners’ expectations with agent functionality is an
open question, however.
Consequently, the relationships among pedagogical agent gender, ethnicity,
instructional role, and realism seem to play a role to enhance learner motivation (e.g.,
self-efficacy), self-regulation, and learning. The purpose of this research was to ex-
amine these relationships through two controlled experiments. Experiment I exam-
594 A.L. Baylor and Y. Kim
ined the impact of agent gender, ethnicity, and realism; Experiment II examined the
impact of agent gender, ethnicity, and instructional role.
Eight agent images were designed by a graphic artist based on the same basic face,
but differing by gender, ethnicity, and realism. The animated agents were then
developed using a 3D character design tool, Poser 5, and Microsoft Agent Character
Builder. Next, the agents were incorporated into the web-based research application,
MIMIC (Multiple Intelligent Mentors Instructing Collaboratively) (Baylor, 2002). To
control confounding effects, we used consistent parameters and matrices to delineate
facial expression, mouth movement, and overall silhouettes across the agents. Also,
except for image, the agents had identical scripts, voice, animation, and emotion. For
voice, we used computer-generated male and female voices. For animation, blinking
and mouth movements were included. Emotion was expressed using the scripts
together with facial expression, such as smiling. Figure 1 presents the images of the
eight agents used in the study.
2.2 Method
they needed to finish each phase of the tasks. The entire session took about an hour
with individual variations.
Design and Analysis. The study employed a 2 × 2 × 2 design, including agent gender
(Male vs. Female), agent ethnicity (Caucasian vs. African-American), and agent real-
ism (realistic vs. cartoon-like) as the factors. For self-regulation, a MANOVA (multi-
variate analysis of variance) was conducted. For self-efficacy and learning, analysis
of variance (ANOVA) was conducted. The significance level was set as
2.3 Results
Self-efficacy. ANOVA indicated a significant main effect for agent gender where the
presence of the male agent led to increased self-efficacy, F(1, 289)=4.20, p<.05.
Analysis of additional Likert items revealed that students perceived the male agents as
significantly more interesting, intelligent, useful, and leading to greater satisfaction
than the female agents.
Learning. For all students (male and female) ANOVA revealed a marginally signifi-
cant main effect for agent realism on learning, F (1, 289) = 4.2, p =.09. Overall, stu-
dents who worked with the realistic agents (M = 3.13, SD = 1.05) performed margin-
ally better than students who worked with the cartoon-like agents (M = 2.94, SD =
1.1). Interestingly, a post-hoc ANOVA indicated a significant main effect for agent
realism where males working with realistic agents (M=3.50) learned more than males
working with cartoon agents (M=2.51, F(1,84) =6.50, p=.01. For female students, the
main effect for agent realism was not significant.
For the second study, a different set of twelve agents, differing by gender, ethnicity,
and role, were designed using a 3D character design tool, Poser 5 and Mimic Pro 2.
These agents were richer than those in Experiment I, where the focus was on the
agent image. Consequently, to establish distinct instructional roles, it was important to
consider a set of media features that influence agent “persona,” including image,
animation, affect, and voice. Image is a key factor in affecting learners’ perception of
the computer-based agent as credible (Baylor & Ryu, 2003b) and motivating (Baylor
Pedagogical Agent Design: The Impact of Agent Realism 597
& Kim, 2003a; Baylor, Shen, & Huang, 2003; Kim, Baylor, & Reed, 2003). Anima-
tion includes body movements such as hand gestures, facial expression, and head
nods, which can convey information and draw students’ attention (Cassell, 1998;
Johnson, Rickel, & Lester, 2000; McNeill, 1992; Roth, 2001). Affect, or emotion, is
also an integral part of human intellectual and cognitive functioning (Kort, Reilly, &
Picard, 2001; Picard, 1997) and thus was deemed as critical for facilitating the social
relationship with learners and affecting their emotional development (Saarni, 2001).
Finally, voice is a powerful indicator of social presence (Nass & Steuer, 1993), and so
the human voices were recorded to match the voices with the gender, ethnicity, and
roles of each agent and with their behaviors, attitudes, and language. Figure 2 shows
the images of the twelve agents.
The agent-student dialogue was pre-defined to control for agent functionality across
students. Given that people tend to apply the same social rules and expectations from
human-human interaction to computer-human interaction (Reeves & Nass, 1996), we
referred to research on human instructors for implications for the agent role design.
Agent as Expert. The design of the Expert was based on research that shows that the
development of expertise in humans requires years of deliberate practice in a domain
(Ericsson, Krampe, & Tesch-Romer, 1993) and that experts exhibit mastery or exten-
sive knowledge and perform better than the average within a domain (Ericsson, 1996;
Gonzales, Burdenski, Stough, & Palmer, 2001). Also, experts will be confident and
stable in performance and not swayed emotionally by instant internal or external
stimulation. Based on this, we operationalized the expert agent through the image of a
professor in forties. His animation was limited to deictic gestures, and he spoke in a
formal and professional manner, with authoritative speech. Being emotionally de-
tached from the learners, his function was to provide accurate information in a suc-
cinct way (see sample script in Table 2).
598 A.L. Baylor and Y. Kim
Agent as Motivator. The design of the Motivator was based on social modeling
research dealing with learners’ efficacy beliefs, a critical component of learner moti-
vation. According to Bandura (1997), attribute similarity between the learner and
social model significantly affects the learners’ self-efficacy beliefs. In other words,
learning and motivation are enhanced when learners observed a social model of the
same age (Schunk, 1989). Further, verbal encouragement in support of the learner
performing a task facilitates learners’ self-efficacy beliefs. Thus, we operationalized a
motivator agent with a peer-like image of a casually-dressed student in his twenties,
considering that our target population was college students. Given that expressive
gestures of pedagogical agents may have a strong motivating effects (Johnson et al.,
2000), the agent gestures were expressive and highly-animated. He spoke enthusiasti-
cally and energetically, while sometimes using colloquial expressions, e.g., ‘What’s
your gut feeling?’ He was not presented as particularly knowledgeable but as an eager
participant who suggested his own ideas, verbally encouraged the learner to sustain at
the tasks, and, by asking questions, stimulated the learners to reflect on their thinking
(see sample script in Table 2). He expressed emotion that commonly occurs in learn-
ing, such as frustration, confusion, and enjoyment (Kort et al., 2001).
Agent as Mentor. An ideal human mentor does not simply give out information;
rather, a mentor provides guidance for the learner to bridge the gap between the cur-
rent and desired skill levels (Driscoll, 2000). Thus, a mentor should not be an
authoritarian figure, but instead should be a guide or coach with advanced experience
and knowledge who can work collaboratively with the learners to achieve goals.
Thus, the agent as mentor should demonstrate competence to the learner while si-
multaneously developing a social relationship to motivate the learner (Baylor, 2000).
Consequently, the design of the Mentor included an image that was less formal than
the Expert, yet older than the peer-like Motivator. The Mentor’s gestures were de-
signed to be identical to the Motivator, incorporating both deictic and emotional ex-
pressions. His voice was friendly and approachable, yet more professional and confi-
dent than the Motivator. We operationalized the Mentor’s functionality to incorporate
the characteristics of both the Expert and Motivator, (i.e., to provide information and
motivation); thus, his script was a concatenation of the content of the Expert and
Motivator scripts.
Validation. We initially validated that each agent was effectively representing the
intended gender, ethnicity, and roles with 174 undergraduates in a between-subjects
design. The results indicated successful instantiations of the twelve agents.
3.2 Method
Design and Analysis. The study employed a 2 × 2 × 3 design, including agent gender
(Male vs. Female), agent ethnicity (White vs. Black), and agent role (expert vs. moti-
vator vs. mentor) as the factors. For self-regulation, a MANOVA (multivariate analy-
sis of variance) was conducted. For self-efficacy and learning, analysis of variance
(ANOVA) was conducted. The significance level was set as
3.3 Results
Self-regulation. MANOVA revealed a significant main effect for agent role on self-
regulation, Wilks’ Lambda = .94, F (6, 430) = 2.22, p < .05. Overall, students who
worked with the mentor or motivator agents rated their self-regulation significantly
higher than students who worked with the expert agent. MANOVA also revealed a
main effect for agent ethnicity on self-regulation where Black agents led to increased
self-regulation as compared to White agents, Wilks’ Lambda =.96, F(3, 205) =2.90,
p<.05.
Self-efficacy. There was a significant main effect for agent gender on self-efficacy, F
(1, 217) = 6.90, p <.05. Students who worked with the female agents (M = 2.36, SD =
1.16) showed higher self-efficacy beliefs than students who worked with the male
agents (M = 2.01, SD = 1.12). Analysis of additional Likert items revealed that stu-
dents perceived the female agents as significantly less knowledgeable and intelligent
than the male agents. There was also a significant main effect for agent role on self-
efficacy, F (2, 217) = 4.37, p =.01. Students who worked with the motivator (M =
2.37, SD = 1.2) and mentor agents (M = 2.32, SD = 1.2) showed higher self-efficacy
beliefs than students who worked with the expert agent (M = 1.86, SD = 0.94).
Learning. There was a significant interaction of agent role and agent ethnicity on
learning, F (2, 214) = 3.36, p <.05. Post hoc t-tests of the cell means indicated that
there was a significant difference between the Black (M = 2.61, SD =.75) and White
600 A.L. Baylor and Y. Kim
Experts (M = 2.13 , SD =.84, p<.01), indicating that the Black agents were signifi-
cantly more effective in the role of Expert than the White agents. This interaction is
illustrated in Figure 3. Additional analysis of Likert items regarding the level to
which students paid attention during the program revealed that students with the
Black Experts better “focused on the relevant information” ((M = 3.03, SD =1.08 vs.
M = 2.42, SD =1.11) and “concentrated” (M = 2.70, SD = .95 vs. M = 2.23, SD =
1.10).
4 Discussion
Results from Experiment I highlight the potential value of more realistic agent images
(particularly for male students) to positively affect transfer of learning. This supports
the value in designing pedagogical agents to best represent the live humans that they
attempt to simulate (e.g., Hays-Roth & Doyle, 1998; Rizzo, 2000). Even so, a variety
of permutations of agents with different levels of realism needs to be examined to
more fully substantiate this finding.
In Experiment II, the Black agents in the role of expert led to significantly
improved learning as compared to the White agents as experts, even though both had
identical messages. Students working with the Black experts also reported enhanced
concentration and focus, which could be explained by the fact that they perceived the
agents as more novel (and thereby more worthy of paying attention to) than the White
experts. Similarly, Black agents overall (in all roles) led to enhanced learner self-
regulation in the same experiment, perhaps because they also warranted greater atten-
tion and focus. In support of this explanation (i.e., that students pay more attention to
agents that represent non-traditional roles), we recently found that a female agent
acting as a non-traditional engineer (e.g., outgoing, highly attractive) significantly
enhanced student interest in engineering as compared to a more stereotypical “nerdy”
version (e.g., introverted, homely) (Baylor, 2004).
Pedagogical Agent Design: The Impact of Agent Realism 601
The importance of the agent message was demonstrated in Experiment II, where
the presence of motivational messages (as delivered through the motivator and men-
tor agent instructional roles) led to greater learner self-regulation and self-efficacy.
This finding is supported by Bandura (1997), who suggests that such verbal persua-
sion leads to positive motivational outcomes.
Our prior research has indicated that agents that are perceived as less intelligent
lead to greater self-efficacy (Baylor, 2004; Baylor & Kim, in press). This was repli-
cated in Experiment II since the female agents (who were perceived as significantly
less intelligent than the males) led to enhanced self-efficacy. Similarly, the finding
that the motivator and mentor agents led to greater self-efficacy could be attributed to
the fact that they were validated to be perceived as significantly less expert-like (i.e.,
knowledgeable, intelligent) than the expert agents. While results from Experiment I
initially seem contradictory because the agents rated as most intelligent (males) also
led to improved self-efficacy, this can be attributed to an overall positive student bias
toward the male agents in this particular study (e.g., they were rated as more useful,
interesting, and leading to overall more satisfaction and self-regulation).
Overall, while the agent message is undoubtedly important, results support the
conclusion that a seemingly superficial interface feature like pedagogical agent image
plays a very important role in impacting learning and motivational outcomes. The
image is key because it directly impacts how the learner perceives it as a human-like
instructor; consequently, pedagogical agent designers must take great care in choos-
ing how to represent the agent’s gender, ethnicity, and realism.
References
Arroyo, I., Beck, J. E., Woolf, B. P., Beal, C. R., & Schultz, K. (2000). Macroadapting ani-
malwatch to gender and cognitive differences with respect to hint interactivity and sym-
bolism. In Intelligent Tutoring Systems, Proceedings (Vol. 1839, pp. 574-583).
Arroyo, I., Murray, T., Woolf, B. P., & Beal, C. R. (2003). Further results on gender and
cognitive differences in help effectiveness. Paper presented at the The International Confer-
ence of Artificial Intelligence in Education, Sydney, Australia.
Bandura, A. (1997). Self-efficacy: The exercise of control. New York: W. H. Freeman.
Bandura, A. (Ed.). (2000). Self-Efficacy: The Foundation of Agency. Mahwah, NJ: Lawrence
Erlbaum Associates, Inc.
Baylor, A. L. (2000). Beyond butlers: intelligent agents as mentors. Journal of Educational
Computing Research, 22(4), 373-382.
Baylor, A. L. (2004). Encouraging more positive engineering stereotypes with animated in-
terface agents. Unpublished manuscript.
Baylor, A. L., & Kim, Y. (2003a). The Role of Gender and Ethnicity in Pedagogical Agent
Perception. Paper presented at the E-Learn (World Conference on E-Learning in Corpo-
rate, Government, Healthcare, & Higher Education), Phoenix, Arizona.
602 A.L. Baylor and Y. Kim
Baylor, A. L., & Kim, Y. (2003b). The role of gender and ethnicity in pedagogical agent per-
ception. Paper presented at the E-Learn, the Annual Conference of Association for the Ad-
vancement of Computing in Education., Phoenix, AZ.
Baylor, A. L., & Kim, Y. (2003c). Validating Pedagogical Agent Roles: Expert, Motivator,
and Mentor. Paper presented at the International Conference of Ed-Media, Honolulu, Ha-
waii.
Baylor, A. L. & Kim, Y. (in press). The effectiveness of simulating instructional roles with
pedagogical agents. International Journal of Artificial Intelligence in Education.
Baylor, A. L., & Ryu, J. (2003a). The API (Agent Persona Instrument) for assessing pedagogi-
cal agent persona. Paper presented at the International Conference of Ed-Media, Honolulu,
Hawaii.
Baylor, A. L., & Ryu, J. (2003b). Does the presence of image and animation enhance peda-
gogical agent persona? Journal of Educational Computing Research, 28(4), 373-395.
Baylor, A. L., Shen, E., & Huang, X. (2003). Which Pedagogical Agent do Learners Choose?
The Effects of Gender and Ethnicity. Paper presented at the E-Learn (World Conference on
E-Learning in Corporate, Government, Healthcare, & Higher Education), Phoenix, Ari-
zona.
Cassell, J. (1998). A Framework For Gesture Generation And Interpretation. In A. Pentland
(Ed.), Computer Vision in Human-Machine Interaction. New York: Cambridge University
Press.
Cooper, J., & Weaver, K. D. (2003). Gender and Computers: Understanding the Digital Di-
vide: NJ: Lawrence Erlbaum Associates.
Driscoll, M. P. (2000). Psychology of Learning for Instruction: Allyn & Bacon.
Dunn, J. (2000). Mind-reading, emotion understanding, and relationships. International Jour-
nal of Behavioral Development, 24(2), 142-144.
Ericsson, K. A. (1996). The acquisition of expert performance: an introduction to some of the
issues. In K. A. Ericsson (Ed.), The Road to Excellence: The Acquisition of Expert Per-
formance in the Arts, Sciences, Sports, and Games (pp. 1-50). Hillsdale, NJ: Erlbaum.
Ericsson, K. A., Krampe, R. T., & Tesch-Romer, C. (1993). The role of deliberate practice in
the acquisition of expert performance. Psychological Review, 100(3), 363-406.
Gonzales, M., Burdenski, T. K., Jr., Stough, L. M., & Palmer, D. J. (2001, April 10-14). Iden-
tifying teacher expertise: an examination of researchers’ decision-making. Paper presented
at the American Educational Research Association, Seattle, WA.
Hays-Roth, B., & Doyle, P. (1998). Animate Characters. Autonomous Agents and Multi-Agent
Systems, 1, 195-230.
Johnson, W. L., Rickel, J. W., & Lester, J. C. (2000). Animated pedagogical agents: face-to-
face interaction in interactive learning environments. International Journal of Artificial
Intelligence in Education, 11, 47-78.
Kim, Y., Baylor, A. L., & Reed, G. (2003). The Impact of Image and Voice with Pedagogical
Agents. Paper presented at the E-Learn (World Conference on E-Learning in Corporate,
Government, Healthcare, & Higher Education), Phoenix, Arizona.
Kort, B., Reilly, R., & Picard, R. W. (2001). An affective model of interplay between emotions
and learning: reengineering educational pedagogy-building a learning companion. Pro-
ceedings IEEE International Conference on Advanced Learning Technologies, 43-46.
Lave, J., & Wenger, E. (2001). Situated learning: legitimate peripheral participation: Cam-
bridge University Press.
Lee, E., & Nass, C. (1998). Does the ethnicity of a computer agent matter? An experimental
comparison of human-computer interaction and computer-mediated communication. Paper
presented at the WECC Conference, Lake Tahoe, CA.
Pedagogical Agent Design: The Impact of Agent Realism 603
McCrae, R. R., & John, O. P. (1992). An introduction to the fve factor model and its applica-
tions. Journal of Personality, 60, 175-215.
McNeill, D. (1992). Hand and mind: what gestures reveal about thought. Chicago: University
of Chicago Press.
Moreno, K. N., Person, N. K., Adcock, A. B., Eck, R. N. V., Jackson, G. T., & Marineau, J. C.
(2002). Etiquette and Efficacy in Animated pedagogical agents: the role of stereotypes.
Paper presented at the AAAI Symposium on Personalized Agents, Cape Cod, MA.
Nass, C., & Steuer, J. (1993). Computers, voices, and sources of messages: computers are
social actors. Human Communication Research, 19(4), 504-527.
Norman, D. A. (1994). How might people interact with agents? Communications of the ACM,
37(7), 68-71.
Norman, D. A. (1997). How might people interact with agents. In J. M. Bradshaw (Ed.), Soft-
ware agents (pp. 49-55). Menlo Park, CA: MIT Press.
Passig, D., & Levin, H. (2000). Gender preferences for multimedia interfaces. Journal of Com-
puter Assisted Learning, 16(1), 64-71.
Persson, P., Laaksolahti, J., & Lonnqvist, P. (2002). Understanding social intelligence. In K.
Dautenhahn, A. H. Bond, L. Canamero & B. Edmonds (Eds.), Socially intelligent agents:
Creating relationships with computers and robots. Norwell, MA: Kluwer Academic Pub-
lishers.
Piaget, J. (1962). Play, dreams, and imitation in childhood. New York: Norton.
Piaget, L. (1995). Sociological studies (I. Smith, Trans. 2nd ed.). New York: Routledge.
Picard, R. (1997). Affective Computing. Cambridge: The MIT Press.
Reeves, B., & Nass, C. (1996). The Media Equation: How people treat computers, television,
and new media like real people and places. Cambridge: Cambridge University Press.
Rizzo, P. (2000). Why should agents be emotional for entertaining users? A critical analysis. In
A. M. Paiva (Ed.), Affective interaction: Towards a new generation of computer interfaces
(pp. 166-181). Berlin: Springer-Verlag.
Roth, W.-M. (2001). Gestures: their role in teaching and learning. Review of Educational
Research, 71(3), 365-392.
Saarni, C. (2001). Emotion communication and relationship context. International Journal of
Behavioral Development, 25(4), 354-356.
Schunk, D. H. (1989). Social cognitive theory and self-regulated learning. In B. J. Zimmerman
& D. H. Schunk (Eds.), Self-regulated learning and academic achievement: Theory, re-
search, and practice (pp. 83-110). New York: Springer-Verlag.
Vassileva, J. (1998). Goal-based autonomous social agents: Supporting adaptation and
teaching in a distributed environment. Paper presented at the 4th International Conference
of ITS 98, San Antonio, TX.
Vygotsky, L. S., Cole, M., John-Steiner, V., Scribner, S., & Souberman, E. (1978). Mind in
society. Cambridge, Massachusetts: Harvard University Press.
Zimmerman, B. J. (2000). Attaining self-regulation: A social cognitive perspective. In M.
Boekaerts, P. Pintrich & M. Zeidner (Eds.), Self-Regulation: Theory, Research and Appli-
cation (pp. 13-39). Orlando, FL: Academic Press.
Designing Empathic Agents: Adults Versus Kids
Lynne Hall1, Sarah Woods2, Kerstin Dautenhahn2, Daniel Sobral3, Ana Paiva3,
Dieter Wolke4, and Lynne Newall5
1
School of Computing & Technology, University of Sunderland, UK,
[email protected]
2
Adaptive Systems Research Group, University of Hertfordshire, UK, s.n.woods,
[email protected]
3
Instituto Superior Technico & INESC-ID, Porto Salvo, Portugal
[email protected]
4
Jacobs Foundation, Zurich, Switzerland, [email protected]
5
Northumbria University, Newcastle, UK, [email protected]
1 Introduction
Virtual Learning Environments (VLEs) populated with animated characters offer
children a safe environment where they can explore and learn through experiential
activities [5, 8]. Animated characters offer a high level of engagement, through their
use of expressive and emotional behaviours [6], making them intuitively applicable for
exploring personal and social issues. However, the design and implementation of
VLEs populated with animated characters are complex tasks, involving an iterative
development process with a range of stakeholders.
The VICTEC (Virtual ICT with Empathic Characters) project uses synthetic char-
acters and Emergent Narrative as an innovative means for children aged 8-12 years to
explore issues surrounding bullying behaviour. FearNot (Fun with Empathic Agents to
Reach Novel Outcomes in Teaching), the application being developed in VICTEC, is
a 3D VLE featuring a school populated by 3-D self-animated agents representing
various character roles involved in bullying behaviour through improvised dramas.
The main focus of this paper is to consider the different perspectives and empathic
reactions of adult and child populations in order to optimise the design and ultimately
usage of a virtual world to tackle bullying problems. The perspective that we have
taken is that if children empathise with characters a deeper exploration and under-
standing of bullying issues is possible [3]. Whilst it is less critical for other
stakeholder groups, such as teachers, to exhibit similar empathic reactions to children,
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 604–613, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Designing Empathic Agents: Adults Versus Kids 605
the level of empathy and its impact on agent believability [9] has strong implications
for teacher’s usage of such applications for classroom-based teaching. As relatively
few teachers have exposure to sophisticated, innovative educational environments they
may have inappropriately low or high expectations of an unknown technology. To
offer an alternative perspective, the views and empathic reactions of discipline-
specific experts were also obtained to enable us to gain the view of stakeholders who
were “early adopters” of VLEs and synthetic characters.
The main questions we are seeking to answer in this paper are: Are there differ-
ences in the views, opinions and attitudes of children and adults? And, if there are
differences, what are their design implications? In the first section we discuss devel-
opment and technical issues for our early prototype. In the second section we discuss
our approach to using this prototype. We then present the results and discuss our
findings.
Fig. 1. identifies how interaction will occur with the final version of FearNot. How-
ever, we needed to gain feedback from users and stakeholders at an early stage in the
lifecycle when there was no stable version of the final product and where development
emerges as a response to research findings. Recognising this as an issue early in the
design of FearNot prompted the creation of the trailer approach which is a snapshot
vision of the final product, similar to the trailers seen for movies, where the major
themes of a film are revealed. Also similar to a movie trailer using real movie clips,
our trailer used a technology closely resembling the final application.
606 L. Hall et al.
The trailer depicts a physical bullying episode containing 3 characters, Luke the
bully, John the victim and Martina the narrator. The trailer begins with an introduction
to the main characters, Luke and John and subsequently shows Luke knocking John’s
pencil case off the table and then kicking him to the floor. John then asks the user
what he should do to try and stop Luke bullying him and arrives at 3 possible choices:
1) Ignore Luke, 2) Fight back, 3) Tell someone that he trusts such as his teacher or
parents.
Developmental constraints of the application did not allow us to include the dia-
logue phase in the first trailer developed. Nonetheless, the importance of the dialogue
phase for the overall success of the application required us to include it. As an ad-
vance, we built a dialogue phase between the bullying situation and the final message.
We are using the Wizard of OZ technique [1] to iterate on our dialogue system and
adjust the user interaction during this stage.
The re-use of the trailer technology in the final application is possible due to the
agent-based approach [14] we adopted for the FearNot application, as depicted in Fig.
2. Several Agents share a virtual symbolic world where they can perform high-level
acts. These can be simply communicative acts or can change the symbolic world,
which contains domain-specific information, in this case, information regarding bul-
lying situations. A specific agent must manage the translation of such symbolic infor-
mation and the agents’ acts to a particular display system. Such a process is outlined in
Fig. 2. (the ellipse outlines the technology used in the trailer).
Teachers in the sample were from a wide range of primary and secondary schools
in the South of England. They were predominantly female (90%), aged between 25 to
56. The children, aged from 8-13 (x=9.83, SD=1.04), were from primary schools
608 L. Hall et al.
located in urban and rural areas of Hertfordshire, UK (47%) and Cascais, Portugal
(53%). The experts were attendees at the Intelligent Virtual Agents workshop in Klo-
ster Irsee, Germany and were predominantly male (80%) and under 35 (67%). Table
2 illustrates the procedure used for showing the FearNot trailer and completion of the
trailer questionnaire.
4 Results
Frequency distributions were examined using histograms for questions that employed
Likert scales to ensure that the data was normally distributed. Chi-square tests in the
form of cross-tabulations were calculated to determine relationships between different
variables for categorical data. One way analysis of variance (ANOVA) using
Scheffe’s post-hoc test were carried out to examine mean differences between the 3
stakeholder groups according to questionnaire responses using the Likert scale.
There were significant differences between the stakeholder groups and views of the
believability (F=6.16, (225, df=2), p=0.002), realism F=9.16, (225, df=2), p=0.00)
and smoothness (F=12.96, (224, df=2), p=0.00) of character movement with children
finding character movement more believable, realistic and smooth compared to adults,
see table 3. No significant gender differences were revealed for the believability or
smoothness of character movement. An independent samples T-test revealed signifi-
cant gender differences for the realism of character movement (t=2.91, 225, df=220,
p=0.004). Females (m=3.17) found character movement significantly more realistic
than males (m=3.63).
Significant differences were found for the believability (F=11.82, (224, df=2),
p=0.00) and likeability (F=9.35, (221, df=2), p=0.00) of character voices, with teach-
ers finding voices less believable and likeable. An independent samples T-test re-
vealed significant differences between gender and believability of voices (t=-2.65,
221, df = 219, p=0.01). Females (m=2.53) found the character voices less believable
than males (m=2.15).
Designing Empathic Agents: Adults Versus Kids 609
4.2 Storyline
No significant differences were found between children, teachers and experts or gen-
der for the believability of character conversation and interest levels of character con-
versation. Significant differences were found in the views of the storyline believability
(F=10.17, (224, df=2), p=0.00) and the true-to-lifeness of both the character conver-
sation (F=6.45, (223, df=2), p=0.002) and the storyline (F=14.08, (225, df=2),
p=0.00), with children finding the conversation and storyline more true to life and
believable.
There were significant differences between child, expert and teacher views in rela-
tion to the match between the school environment and the characters (F=10.40, (220,
df=2), p=0.00). Children were significantly more positive towards the match between
the school environment and characters compared to teachers (Fig. 4.). Children were
also more positive about the School appearance (F=22.08, (224, df=2), p=0.00)
Fig. 4. Mean Group Differences for the Attractiveness of the Virtual School Environment and
the Match between Characters and the School Environment.
Significant gender differences were found for children only when character preference
was considered, (x=20.46, N=195, df = 2, p=0.000) indicating no overall gender pref-
erences for John (the victim) but that significantly more female children preferred
610 L. Hall et al.
Martina (the narrator), and significantly more male children preferred Luke (the
bully).
Fig. 5. Percentages for Least Liked Characters According to Children, Experts and Teachers.
Significant differences were revealed between teachers, children and experts for the
least liked character (x=18.35, N=201, DF=4, p=0.001) (Fig. 5). Significantly more
teachers least liked John (the victim), compared to children and experts. Female adults
disliked John (the victim) more than children and experts (37%), and male children
disliked Martina the most (52%). 78% of female children disliked Luke the most
closely followed by the male adults, 60% of whom disliked Luke the most.
There were no significant differences between children, teachers and experts in
which of the characters they would like to be. However, significant differences
emerged when gender and age were taken into account. 40% of male children chose to
be John and 88% of female children, followed by 73% of female adults chose to be
Martina. No female children (n=59) chose to be Luke compared to 44% of male chil-
dren who chose to be Luke. Male adults did not wish to be John, with 51% wishing to
be Martina and 34% wanting to be Luke.
4.4 Empathy
Significant differences were found between children, experts and teachers for ex-
pressing sorrow (x=10.33, N=216, df=2, p=0.006) and anger (x=26.13, N=213, df=2,
p=0.000). Children were most likely to feel sorry or angry, see table 4, however,
whilst most children felt sorry for the victim, significantly more experts felt sorry for
Luke (the bully) compared to teachers and children (x=13.60, N=175, df = 2,
p=0.001). Significant age and gender differences emerged, (x=27.42, N=210, df=3,
p=0.000) where more female children expressed anger towards the characters com-
pared to adults. This anger was almost exclusively directed at Luke (90%).
Designing Empathic Agents: Adults Versus Kids 611
5 Discussion
The main aims of this paper were to consider whether there were any differences in
the opinions, attitudes and empathic reactions of children and adults towards FearNot,
and whether differences uncovered offer important design implications for VLEs
addressing complex social issues such as bullying.
A summary of the main results revealed that (1) Children were more favourable to-
wards the appearance of the school environment, character voices, and character
movement compared to teachers who viewed these aspects less positively. (2) Chil-
dren, particularly male children found the conversation and storyline most believable,
realistic and true-to-life. (3) No significant differences were revealed between children
and adults for most-liked character, although teachers disliked ‘John’ the victim char-
acter the most compared to children and experts. (4) Children preferred same-gender
children, with male characters disliking the female narrator character; female children
disliking the male bully; and children choosing to be same-gender characters. (5)
Children, particularly females, expressed more empathic reactions (feeling sorry
and/or angry for the characters) compared to adults.
Throughout the results, a recurrent finding was the more positive attitude and per-
spective of children towards the FearNot trailer in terms of the school environment,
character appearance, character movement, conversation between the characters and
engagement with the storyline. Children’s views expressed were typically within the
positive range under 3 (scale 1 to 5). Children’s engagement and high level of em-
pathic reactions to the trailer are encouraging as they indicate the potential for experi-
ential learning with children clearly having a high level of belief and comprehension
of a physical virtual bullying scenario.
The opposite trend seems to have emerged from the teacher responses, where
teachers clearly have high expectations that are not met or possibly are unable to en-
gage effectively with such a novel system such as FearNot. Experts were positive
about the technical issues of FearNot such as the physical representation of the char-
acters. However, they failed to engage with the educational theme of bullying and
applied generic criteria ignoring the underlying domain. Thus, whilst character move-
612 L. Hall et al.
ment and voices were rated highly, limited levels of empathy were seen with experts
taking a somewhat voyeuristic approach.
We consider that self-animated characters bring richness to the interaction essential
to obtain believable interactions. Nevertheless, danger of unbelievable “schizo-
phrenic” behaviour [10] is real, and enormous technical challenges emerge. To over-
come these, constant interaction between agent developers and psychologists is cru-
cial. Furthermore, the use of a higher-level narrative control arises as another technical
challenge that is being explored, towards the achievement of story coherence that
characters are ineffective, on their own, to attain. The use of a cartoon style offers a
technical safety net that hinders some jerkiness natural to experimental software. Fur-
thermore, the cartoon metaphor already provides design decisions that most cartoon-
viewing children accept naturally.
6 Conclusion
The trailer approach described in this paper enabled us to obtain a range of viewpoints
and perspectives from different stakeholder groups. Further, the re-use of the technol-
ogy for the trailer within the final application highlights the benefits of adopting an
agent-based approach, allowing the development of a mid-tech prototype that can
evolve into the final application. Input from a range of stakeholders is essential for the
development of an appropriate application. There must be a balance between true to
life and acceptable (by teachers and parents) behaviours and language. The use of
stereotypical roles (e.g. typical bully) can bias children’s understanding and simple
design decisions can influence the children’s perception of a character (e.g., Luke
looks a lot “cooler” than John). The educational perspective inhibits the applicability
of the «game» label to the application, which most of the time children instantly apply
to an application like this. Achieving a balance between the expectations of all
stakeholders involved may be the hardest goal to achieve over and above technical
challenges.
References
1. Anderson, G., Höök, K., Paiva, A., & Costa, M. (2002). Using a Wizard of Oz study to
inform the design of SenToy. Paper presented at the Designing Interactive Systems.
2. Badler, N., Philips, C, & Webber, B. (1993). Simulating humans. Paper presented at the
Computer graphics animation and control, New York.
3. Dautenhahn, K. (2002). Design spaces and niche spaces of believable social robots. Paper
presented at the International Workshop on Robots and Human Interactive Communica-
tion.
4. Magnenat-Thalmann, N., & Thalmann, D. (1991). Complex models for animating syn-
thetic actors. Computer Graphics and Applications, 11, 32-44.
5. Moreno, R., Mayer, R. E., Spires, H. A., & Lester, J. C. (2001). The Case for Social
Agency in Computer-Based Teaching: Do Students Learn More Deeply When They Inter-
act With Animated Pedagogical Agents. Cognition and Instruction, 19(2), 177-213.
Designing Empathic Agents: Adults Versus Kids 613
6. Nass, C., Isbister, K., & Lee, E. (2001). Truth is beauty: researching embodied conversa-
tional agents. Cambridge, MA: MIT Press.
7. Perlin, K., & Goldberg, A. (1996). Improv: A system for scripting interactive actors in
virtual worlds. Paper presented at the Computer Graphics, 30 (Annual Conference Series).
8. Pertaub, D.-P., Slater, M., & Barker, C. (2001). An Experiment on Public Speaking Anxi-
ety in Response to Three Different Types of Virtual Audience. Presence: Teleoperators
and Virtual Environments,, 11(1), 68-78.
9. Prendinger, H., & Ishizuka, M. (2001). Let’s talk! Socially intelligent agents for language
conversation training. IEEE Transactions on Systems, Man, and Cybernetics - Part A:
Systems and Humans, 31(5), 465-471.
10. Sengers, P. (1998). Anti-Boxology: Agent Design in Cultural Context. PhD Thesis, Tech-
nical Report CMU-CS-98-151, Carnegie Mellon University.
11. Wooldridge, M. (2002). An Introduction to Multiagent Systems. London: John Wiley and
Sons Ltd.
RMT: A Dialog-Based Research Methods Tutor
With or Without a Head
1 Introduction
Research on human to human tutoring has identified one primary factor that
influences learning: the cooperative solving of example problems [1]. Typically, a
tutor poses a problem (selected from a relatively small set of problems that they
frequently use), and gives it to the student. The student attempts to solve the
problem, one piece at a time. The tutor gives feedback, but rarely gives direct
negative feedback. The tutor uses pumps (e.g. “Go on.”), hints, and prompts (e.g.
“The groups would be chosen ...”) to keep the interaction going. The student
and tutor incrementally piece together a solution for the problem. Then the tutor
often offers a summary of the final solution [1]. This model of tutoring has been
adopted by a number of recent dialog-based intelligent tutoring systems.
Understanding natural language student responses has been a major chal-
lenge for ITS’s. Approaches have ranged from encouraging one-word answers [2]
to full syntactic and semantic analysis of the responses [3,4,5]. Unfortunately,
it can take man-years of effort to develop the specialized lexical, syntactic, and
conceptual knowledge to make such language-analysis successful which limits
how far these approaches can spread.
The AutoTutor system took a different approach to the natural language pro-
cessing problem. AutoTutor uses a mechanism called Latent Semantic Analysis
(LSA, described more completely below) which is automatically derived from
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 614–623, 2004.
© Springer-Verlag Berlin Heidelberg 2004
RMT: A Dialog-Based Research Methods Tutor With or Without a Head 615
a large corpus of texts, and which gives an approximate but useful similarity
metric between any two texts [6]. Student answers are evaluated by comparing
them to a set of expected answers with LSA. This greatly reduces the knowledge
acquisition bottleneck for tutoring systems. AutoTutor’s tutoring style is mod-
eled on human tutors. It maintains only a simple model of the student, and uses
the same dialog moves mentioned above (prompts and pumps, for example) to
do constructive, collaborative problem solving with the student. AutoTutor has
been shown to produce learning gains of approximately one standard deviation
unit compared to reading a textbook [7], been ported to a number of domains,
and has been integrated with another tutoring system: Why/AutoTutor [7].
This paper describes RMT (Research Methods Tutor) which is a descendant
of the AutoTutor system. RMT uses the same basic tutoring style that AutoTu-
tor does, but was developed with a modular architecture to facilitate the study
of different tools and techniques for dialog-based tutoring. One primary goal
of the project is to create a system which can be integrated into the Research
Methods in Psychology classes at DePaul University (and potentially elsewhere).
We describe here the basic architecture of RMT, our first attempts to integrate
it with the courses, and the results of an experiment that compares the use of
an animated agent with text-only tutoring.
2 RMT Architecture
As mentioned above, RMT is a close descendant of the AutoTutor system. While
AutoTutor incorporates a wide variety of artificial intelligence techniques, RMT
was designed as a lightweight, modular system that would incorporate only those
techniques required to provide educationally beneficial tutoring to the student.
This section gives a brief description of RMT’s critical components.
dialog manager, and the DM maintains information about the parameters of the
tutoring session and the current state of the dialog. The DM reads student re-
sponses as posts from a web page, and then asks the Dialog Advancer Transition
Network (DATN) to compute an appropriate tutor response.
Each tutor “turn” can perform three different functions: evaluate the stu-
dent’s previous utterance (e.g. “Good!”), confirm or add some additional infor-
mation (e.g. “The dependent variable is test score.”), and produce an utterance
that keeps the dialog moving. Like AutoTutor, RMT uses pumps, prompts, and
hints to try to get the student to add information about the current topic. RMT
also asks questions, summarizes topics, and answers questions.
The DATN determines which type of response the tutor will give using a
decision network which graphically depicts the conditions, actions and system
outputs. Figure 2 shows a segment of RMT’s decision network. For every tutor
turn, the DATN begins processing at the Start state. The paths through the
network eventually join back up at the Finish state, not shown here. On the
arcs, the items marked C are the conditions for that arc to be chosen. The
items labeled A are actions that will be performed. For example, on the arc
from the start state, the DATN categorizes the student response. The items
marked O are outputs — what the tutor will say next. Because this graph-based
representation controls utterance selection, the tutor’s behavior can be modified
by simply modifying the graph.
Logging. For data collection purposes, RMT borrows a piece of wisdom from
a very successful reading tutor called Project LISTEN, “Log everything” [9].
As it interacts with a student, RMT stores information about each interaction
in a database. The database collects and relates the individual utterances and
a variety of other variables, for example, the type and quality of a student
response. The database also contains information about the students and the
tutoring conditions that they are assigned to. Thus, in addition to providing
data for the experiments described below, we will be able to perform post hoc
analyses by selecting relevant tutoring topics. (For example, “Is there a difference
in student response quality on Mondays and Fridays?”)
Talking Heads. As AutoTutor does, RMT uses an animated agent with syn-
thesized speech to present the tutor’s utterances to the student. In principle, this
allows the system to use multiple modes of communication to deliver a richer
message. For example, the tutor can avoid face-threatening direct negative feed-
back, but still communicate doubt about an answer with a general word like
“Well” with the proper intonation. Furthermore, in relation to text-only tutor-
ing, the student is more likely to “get the whole message” because they can not
simply skim over the text.
Curriculum Script. A number of studies have shown that human tutors use
a “curriculum script”, or a rich set of topics which they plan to cover during a
tutoring session [1]. RMT’s curriculum script serves the same function. It is the
repository of the system’s knowledge about the tutoring domain. In particular,
it contains the topics that can be covered, the questions that the tutor can ask,
the answers that it expects it might get from the students, and a variety of dialog
moves to keep the discourse going. RMT’s curriculum script currently contains
approximately 2500 items in 5 topics. We believe that this gives us a reasonable
starting point for using the tutoring system throughout a significant portion of
a quarter-long class.
618 P. Wiemer-Hastings, D. Allbritton, and E. Arnott
3 Experiment
Our design was a 2 × 2 factorial, with agent (the Miyako head vs. text only) and
task version (traditional tutoring task vs. simulated employment as a research
assistant, described in more detail below) as between-subjects factors. Students
were randomly assigned to the conditions except that participation in the agent
conditions required the ability to install software on their Windows-based com-
puter. As a result, more students interacted with the text-only presentation
rather than the Miyako animated agent. 101 participants took the pretest. 23
were assigned to the “Miyako” agent, 78 to text-only presentation. 59 were as-
signed to the research assistant task version, and 42 to the tutor task version.
Each participant had one or two modules available (experimental design, re-
liability) to be completed.1 We first reviewed the transcripts to code whether
each participant had completed each module. We discarded data from partici-
pants who were non-responsive or who had technical difficulties.
Many students appeared to have difficulty installing the speech and agent
software and getting it to work properly. A 2 x 2 between-subjects ANOVA
comparing the number of modules completed (0, 1 or 2) for the four conditions
in the study also suggested that there were significant technical issues with the
agent software. Although there was no significant difference in the number of
modules completed by participants in the two task versions (RA = .69; tutor =
.81 modules completed), participants in the Miyako agent condition completed
significantly fewer modules (.47) than those in the text-only condition (1.0),
Our primary dependent measure was gain score, defined as the difference
between the number correct on a 40-question multiple-choice post-test and an
identical pre-test. All analyses of gain scores included pre-test score as a covari-
ate, an analysis which is functionally equivalent to analyzing post-test scores
with pre-test scores as a covariate [11].
We first examined whether completion of the tutor modules was associated
with greater gain scores compared to students who took the pre- and post-
tests but did not successfully complete the modules. Of the 75 participants who
completed both the pre-test and the post-test, 28 completed both modules, 26
completed one module, and 21 did not satisfactorily complete either module
before taking the post-test. In a one-way ANCOVA, gain scores were analyzed
with number of modules completed as the independent variable and pre-test
score as the covariate. The main effect of number of modules was significant,
Although the mean pre-test to post-test gain score for
those completing two modules (4.4 on a 40-item multiple-choice test) was greater
than that for those who completed no modules (2.4), participants who completed
only one module showed no gain at all (gain = -.3). Only the difference between
the mean gain for one module (-.3) versus 2 modules (4.4) was statistically
significant, as indicated by non-overlapping 95% confidence intervals.
1
One week into the experiment, we found that students were completing the first
topic too quickly, so we added another.
620 P. Wiemer-Hastings, D. Allbritton, and E. Arnott
Breaking down the effects on gain scores for each of the two modules, it
appeared that the “reliability” module significantly improved learning, but the
“experimental design” module did not. Students who completed the reliability
module had higher gain scores (4.4) than those who did not (0.9), and this dif-
ference was significant in an ANCOVA in which pre-test score was entered as
the covariate, A similar analysis for the experimen-
tal design module revealed non-significantly lower gain scores for students who
completed the experimental design module than those who did not, with mean
gains of 2.1 vs. 2.4 respectively, F(l,72) < 1.
The reliability module was considerably longer than the experimental de-
sign module, so time on task may be partly responsible for the differences in
effectiveness between the two modules.
We next examined the effects of our two primary independent variables, agent
and task version, on gain scores. For these analyses we included only participants
who had successfully completed at least one module after taking the pre-test and
before taking the post-test. Of the 54 participants who completed at least one
module, 6 interacted with the Miyako agent and 48 used the text-only interface.
Students were more evenly divided between the two task versions, with 25 in the
tutor and 29 in the research assistant version.
Gain scores were entered into a 2 x 2 ANCOVA with agent and task version
as between-subjects factors and pre-test score as the covariate. Gain scores were
greater for students using the text-only interface (mean = 2.6, N = 48) than
for those interacting with the Miyako agent (mean = -1.5, N = 6),
Neither the main effect of task version nor the agent task version
interaction was significant, Fs < 1.
Because of the low number of participants interacting with the animated
agent the effect of agent in the this analysis must be interpreted with caution,
but it is consistent with our other findings indicating that students had difficulty
with the Miyako agent. We suspect that technical difficulties may have been
largely responsible.
4 Discussion
In this section, we describe some of the aspects of the system that may have
contributed to the results of the experiment. In particular, we look at the the
tutoring modules that were used, the animated agent, and the task version.
It could also be the case that the subject pool students had enough familiarity
with the experimental design material that they performed better on the pre-
test, and therefore had less opportunity for gain.
The Agent. There were two significant weakness of the agent used here that
may have affected our results. First, there may have been software installation
difficulties. The participants were using the system on their own computers in
their homes, and had to install the agent software if they were assigned to the
agents version. The underlying agent technology that we used, Microsoft Agents,
requires three programs to be installed from a Microsoft server. The participants
could have had difficulty following the instructions for downloading the software
or could have been nervous about installing software that they did not search
out for themselves.
Second, the particular animated agent that we used was rather limited. A
good talking head should be able not just to tap into the social dynamics present
between a human tutor and student, but also provide an additional modality of
communication: prosody. In particular, human tutors are known to avoid giving
explicit negative feedback because that could cause the student to “lose face”
and make her nervous about offering further answers. Instead, human tutors
tend to respond to poor student answers with vague verbal feedback (“well” or
“okay”) accompanied by intonation that makes it clear that the answer could
have been better [12].
Unfortunately, the agent that we used was essentially a shareware agent that
had good basic graphics, but relatively no additional animations that might
display the tutor’s affect that goes along with the verbal feedback. Furthermore,
the text-to-speech synthesizer that we used (Lernout & Hauspie British English)
was relatively comprehensible, but we have not yet tackled the difficult task of
trying to make the speech engine produce the type of prosodic contours that
human tutors use. Thus, all of the tutor utterances are offered in a relatively
detached, stoic conversational style.
Despite these limitations, we had hypothesized that the agent version would
have an advantage over text-only for at least one reason: in the text-only version,
the students might well just scan over the feedback text to find the next question.
With audio feedback, the student is essentially forced to listen to the entire
feedback and question before entering the next response. Of course, this may
have also contributed to the lower completion rate of students in the agent
version because they may have become frustrated by the relatively slow pace of
presentation of the agent’s synthesized speech.
Task Version. As mentioned above, we tested two different task versions, the
traditional tutor and a simulated research assistant condition. In the former, the
tutor poses questions,2 the student types in an answer, and the dialog continues
with both parties contributing further information until a relatively complete
2
As in human-human tutoring, students may ask questions, but rarely do [12].
622 P. Wiemer-Hastings, D. Allbritton, and E. Arnott
answer has been given. In the research assistant condition, the basic “rules of
the game” are the same with one subtle, but potentially significant difference:
instead of a tutor, the system is assuming the role of an employer who has
hired the student to work on a research project. As previous research has shown,
putting students into an authentic functional role — even when it is simulated
— can greatly increase their motivation to perform the task, and thereby also
increase their learning [13].
Unfortunately, in the current version of RMT, our simulation of the research
advisor role is rather minimal. The only difference is in the initial “introduction”
that the agent gives to the student. In the traditional tutor condition, the agent
(or text) describes briefly how the tutoring session will progress with the stu-
dent typing their responses into the browser window. In the research assistant
version, the agent starts with an introduction that is intended to establish the
social relationship between the research supervisor and student/research assis-
tant. Unfortunately, there are no continuing cues to enforce this relationship.
We intend to develop this aspect of the system further, but for the current eval-
uation we needed to focus on getting the basic mechanisms of the tutor in place
along with the research methods tutoring content.
5 Conclusions
References
1. Graesser, A.C., Person, N.K., Magliano, J.P.: Collaborative dialogue patterns in
naturalistic one-to-one tutoring. Applied Cognitive Psychology 9 (1995) 359–387
2. Glass, M.: Processing language input in the CIRCSIM-tutor intelligent tutoring
system. In Moore, J., Redfield, C., Johnson, W., eds.: Artificial Intelligence in
Education, Amsterdam, IOS Press (2001) 210–221
3. Rosé, C., Jordan, P., Ringenberg, M., S. Siler and, K.V., Weinstein, A.: Interactive
conceptual tutoring in Atlas-Andes. In: Proceedings of AI in Education 2001
Conference. (2001)
4. Aleven, V., Popescu, O., Koedinger, K.R.: Towards tutorial dialog to support
self-explanation: Adding natural language understanding to a cognitive tutor. In:
Proceedings of the 10th International Conference on Artificial Intelligence in Ed-
ucation. (2001)
5. Zinn, C., Moore, J.D., Core, M.G., Varges, S., Porayska-Pomsta, K.: The be&e
tutorial learning environment (beetle). In: Proceedings of the Seventh Workshop
on the Semantics and Pragmatics of Dialogue (DiaBruck 2003). (2003) Available
at http://www.coli.uni-sb.de/diabruck/.
6. Wiemer-Hastings, P., Graesser, A., Harter, D., the Tutoring Research Group: The
foundations and architecture of AutoTutor. In Goettl, B., Halff, H., Redfield, C.,
Shute, V., eds.: Intelligent Tutoring Systems, Proceedings of the 4th International
Conference, Berlin, Springer (1998) 334–343
7. Graesser, A., Jackson, G., Mathews, E., Mitchell, H., Olney, A., Ventura,
M., Chipman, P., Franceschetti, D., Hu, X., Louwerse, M., Person, N., TRG:
Why/autotutor: A test of learning gains from a physics tutor with natural lan-
guage dialog. In: Proceedings of the 25th Annual Conference of the Cognitive
Science Society, Mahwah, NJ, Erlbaum (2003)
8. Landauer, T.K., Laham, D., Rehder, R., Schreiner, M.E.: How well can passage
meaning be derived without using word order? a comparison of Latent Seman-
tic Analysis and humans. In: Proceedings of the 19th Annual Conference of the
Cognitive Science Society, Mahwah, NJ, Erlbaum (1997) 412–417
9. Mostow, J., Aist, G.: Evaluating tutors that listen. In Forbus, K., Feltovich, P.,
eds.: Smart Machines in Education. AAAI Press, Menlo Park, CA (2001) 169–234
10. Moreno, K., Klettke, B., Nibbaragandla, K., Graesser, A.: Perceived character-
istics and pedagogical efficacy of animated conversational agents. In Cerri, S.,
Gouarderes, G., Paraguacu, F., eds.: Proceedings of the 6th Annual Conference on
Intelligent Tutoring Systems, Springer (2002) 963–972
11. Werts, C.E., Linn, R.L.: A general linear model for studying growth. Psychological
Bulletin 73 (1970) 17–22
12. Person, N.K., Graesser, A.C., Magliano, J.P., Kreuz, R.J.: Inferring what the
student knows in one-to-one tutoring: The role of student questions and answers.
Learning and Individual Differences 6 (1994) 205–229
13. Schank, R., Neaman, A.: Motivation and failure in educational simulation design.
In Forbus, K., Feltovich, P., eds.: Smart machines in education. AAAI Press, Menlo
Park, CA (2001) 37–69
Using Knowledge Tracing to Measure Student Reading
Proficiencies
1 Introduction
Project LISTEN’s Reading Tutor [8] is an intelligent tutor that listens to students read
aloud with the goal of helping them learn how to read English. Target users are stu-
dents in first through fourth grades (approximately 6- through 9-year olds). Students
are shown one sentence (or fragment) at a time, and the Reading Tutor uses speech
recognition technology to (try to) determine which words the student has read incor-
rectly. Much of the Reading Tutor’s power comes from allowing children to request
help and from detecting some mistakes that students make while reading. It does not
have the strong reasoning about the user that distinguishes a classic intelligent tutor-
ing system, although it does base some decisions, such as picking a story at an appro-
priate level of challenge, on the student’s reading proficiency.
We have constructed models that assess a student’s overall reading proficiency [2],
but have not built a model of the student’s performance on various skills in reading.
Much of the difficulty comes from the inaccuracies inherent in speech recognition.
Providing explicit feedback based only on student performance on one attempt at
reading a word is not viable since the accuracy at distinguishing correct from incor-
rect reading is not high enough [13]. Due to such problems, student modeling has not
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 624–634, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Using Knowledge Tracing to Measure Student Reading Proficiencies 625
2 Knowledge Tracing
Knowledge tracing [4] is an approach for estimating the probability a student knows a
skill given observations of him attempting to perform the skill. First we briefly dis-
cuss the parameters used in knowledge tracing, then we describe how to modify the
approach to work with speech recognition.
For each skill in the curriculum, there is a P(k) representing the probability the stu-
dent knows the skill, and there are also two learning parameters:
P(L0) is the initial probability a student knows a skill
P(t) is the probability a student learns a skill given an opportunity
However, student performance is a noisy reflection of his underlying knowledge.
Therefore, there are two performance parameters for each skill:
P(slip) = P(incorrect know skill), i.e., the probability a student gives an in-
correct response even if he has mastered the skill. For example, hastily typ-
ing “32” instead of “23.”
P(guess) = P(correct didn’t know skill), i.e. the probability a student man-
ages to generate a correct response even if he has not mastered the skill. For
example, a student has a 50% chance of getting a true/false question correct.
When the tutor observes a student respond to a question either correctly or incor-
rectly, it uses the appropriate skill’s performance parameters (to discount guesses and
slips) to update its estimate of the student’s knowledge. A fuller discussion of knowl-
edge tracing is available in [4].
the probability of a False Alarm and MD stands for the probability of Miscue Detec-
tion. A false alarm is when the student reads a word correctly but the word is rejected
by the ASR; a detected miscue is when the student misreads a word and it is scored as
incorrect by the ASR. In a perfect environment, FA would be 0 and MD would be 1,
and there would therefore be no need for the additional transitions. Overall in the
Reading Tutor, and (only counting cases where the student said
some other word, the tutor is much better at scoring silence as incorrectly reading a
word).
All we are able to observe is whether the student’s response is scored as being cor-
rect, and the tutor’s estimate of his knowledge. Given these limitations, any path that
takes the student from knowing a skill to generating an incorrect response is consid-
ered a slip; it does not matter if the student actually slipped, or if his response was
observed as incorrect due to a false alarm. Similarly, a guess is any path from the
student not knowing the skill to an observed correct performance. Therefore, can
define two additional variables slip’ and guess’ to account for both paths:
Since we expect ASR performance to vary based on the words being read, it is not
appropriate to use a constant MD and FA for all words. Therefore, when we observe
a slip, while it would be informative to know whether it was caused by the student or
the ASR, there is no good way of knowing which is at fault. As a result, we do not
try to infer the FA, MD, slip, and guess parameters. Instead, we directly estimate the
slip’ and guess’ parameters for each skill directly from data (see Section 3.4).
For simplicity, we henceforth refer to guess’ and slip’ and guess and slip. How-
ever, note that the semantics of P(slip) and P(guess) change when using knowledge
tracing in this manner. These parameters now model both the student and the method
for scoring the student’s performance. However, the application of knowledge trac-
ing and the updating of student knowledge remain unchanged.
Our data come from 284 students who used the Reading Tutor in the 2002-2003
school year. The students using the Reading Tutor were part of a controlled study of
learning gains, so were pre- and post-tested on several reading tests. Students were
administered the Woodcock Reading Mastery Test [14], the Test of Written Spelling
[6], the Gray Oral Reading Test [12], and the Test of Word Reading Efficiency [11].
All of these tests are human administered and scored.
Using Knowledge Tracing to Measure Student Reading Proficiencies 627
Students’ usage ranged from 27 seconds to 29 hours, with a mean of 8.6 hours and
a median of 5.9 hours. The 27 seconds of usage was anomalous, as only four other
users had less than one hour of usage.
While using the Reading Tutor, students read from 3 words to 35102. The mean
number of words read was 8129 and the median was 5715. When students read a
sentence, their speech was processed by the ASR and aligned against the sentence
[10]. This alignment scores each word of the sentence as either being accepted (heard
by the ASR as read correctly), rejected (the ASR heard and aligned some other
word), or skipped. In Table 1, the student was supposed to read “The dog ran behind
the house.” The bottom row of the table shows how the student’s performance would
be scored by the tutor.
and would not generalize to new words the student encounters. Instead, we assess a
student’s knowledge of mappings. A grapheme is a
group of letters in a word that produces a particular phoneme (sound). So our goal is
to assess the student’s knowledge these mappings.
For example, ch can make the /CH/ sound as in the word “Charles.” However, ch
can also make the /K/ sound as in “chaos.” By assessing students on the component
skills necessary to read a word, we hope to build a model that will allow the tutor to
make predictions about words the student has not yet seen. For example, if the student
cannot read “chaos” then he probably cannot read “chemistry” either.
Modeling the student’s proficiency at a subword level is difficult, as we do not
have observations of the student attempting to read mappings in isolation.
There are two reasons for this lack. First, speech recognition is imperfect differenti-
ating individual phonemes. Second, the primary goal of the 2002-2003 Reading Tutor
is to have students learn to read by reading connected text, not to read isolated graph-
emes with the goal of allowing the tutor to assess their skills. To overcome this prob-
lem, we apply knowledge tracing to the individual mappings that make up the
particular word. For example, the word “chemist” contains
and mappings.
However, which mappings are indicative of a student’s skill? Prior research on
children’s reading [9] shows that children are often able to decode the beginning and
end of a word, but have problems with the interior. Therefore, we ignore the first and
last mappings of a word and use the student’s performance reading to word to
update the tutor’s estimate of the student’s knowledge of the interior mappings.
In the above example we would update the student’s knowledge on
and Words with fewer than three graphemes do not adjust the esti-
mate of the student’s knowledge.
When students read a sentence in the Reading Tutor, sometimes they do not attempt
to read all of the words in the sentence. If the student pauses in his reading, the ASR
will score what the student has read so far. For example, in Table 1, the student ap-
pears to have gotten stuck on the word “behind” and stopped reading. It is reasonable
to infer the student could not read the word “behind.” However, the scoring of “the”
and “house” depends on what skills are being assessed. If the goal is to measure the
student’s overall reading competency, then counting those words as read incorrectly
will provide a better estimate since stronger readers will need to pause fewer times.
Informal experiments on our data bear out this idea.
However, our goal is not to assess a student’s overall reading proficiency, but to
estimate his proficiency at particular mappings. For this goal, the words “the”
and “house” provide no information about the student’s competency on the mappings
that make up those words. Therefore we do not apply knowledge tracing to those
words.
More formally, we estimate the words a student attempted as follows:
Using Knowledge Tracing to Measure Student Reading Proficiencies 629
1
Source code is courtesy of Albert Corbett and Ryan Baker and is available at
http://www.cs.cmu.edu/~rsbaker/curvefit.tar.gz
630 J.E. Beck and J. Sison
4 Validation
We now discuss validating our model of the student’s reading proficiency. First we
demonstrate that, overall, it is a good model of how well students can identify words.
Then we show that the individual estimates have predictive power.
one, 25 mappings for grade two, five mappings for grade three, and four mappings
for grade four.
The resulting regression model for WI scores had, using a leave-one-out cross
validation, an overall correlation of 0.88 with the WI test. It is reasonable to conclude
that our model of students’ word identification abilities is in reasonable agreement
with a well-validated instrument for measuring the skill. We examined the case where
our model’s error from the student’s actual WI was greatest: a fourth grader whose
pretest WI score was 3.9, her posttest was 3.3, and our model’s prediction was 6.1. It
is unlikely the student’s proficiency declined by 0.6 grade levels over the course of
the year, and it was unclear whether we should believe the 3.3 or the 6.1. Perhaps our
model is more trustworthy than the gold standard against which we validated it?
There are a variety of reasons not to trust a single test measurement, including that it
was administered on a particular day. Perhaps the student was feeling ill or did not
take the test seriously? Also, we would like to know if our measure is better than WI.
To get around these limitations, we looked at an alternate method of measuring word
identification.
closer to the WI3 score. The WI test was closer to the WI3 score 52.8% of the time,
while our model was closer 47.2% of the time. An alternate evaluation is to examine
the mean absolute error (MAE) between each estimate and WI3. WI had an MAE of
0.71 (SD of 0.56), while our model had an MAE of 0.77 (SD of 0.67), a difference of
only 0.06 GE (roughly three weeks). So our model was marginally worse than the
WI test at assessing (a proxy for) a student’s word identification abilities. However,
the WI test is a well-validated instrument, and to come with 0.06 GE of it is an ac-
complishment. Although marginally worse than the paper test, the knowledge tracing
model can estimate the student’s proficiency at any time throughoutthe school year,
and requires no student time to generate an assessment.
References
1. Canadian Psychological Association: Guidelines for Educational and Psychological
Testing. 1996: Also available at: http://www.acposb.on.ca/test.htm.
2. Beck, J.E., P. Jia, and J. Mostow. Assessing Student Proficiency in a Reading Tutor that
Listens, in Ninth International Conference on User Modeling. 2003.p. 323-327
Johnstown, PA.
3. Carver, R.P., The highly lawful relationship among pseudoword decoding, word identifi-
cation, spelling, listening, and reading. Scientific Studies of Reading, 2003. 7(2): p. 127-
154.
4. Corbett, A. and J. Anderson, Knowledge tracing: Modeling the acquisition of procedural
knowledge. User modeling and user-adapted interaction, 1995. 4: p. 253-278.
5. Heift, T. and M. Schulze, Student Modeling and ab initio Language Learning. System, the
International Journal of Educational Technology and Language Learning Systems, 2003.
31(4): p. 519-535.
6. Larsen, S.C., D.D. Hammill, and L.C. Moats, Test of Written Spelling. fourth ed. 1999,
Austin, Texas: Pro-Ed.
7. Michaud, L.N., K.F. McCoy, and L.A. Stark. Modeling the Acquisition of English: an
Intelligent CALL Approach”. in Eighth International Conference on User Modeling.
2001.p.: Springer-Verlag.
8. Mostow, J. and G. Aist, Evaluating tutors that listen: An overview of Project LISTEN, in
Smart Machines in Education, K. Forbus and P. Feltovich, Editors. 2001, MIT/AAAI
Press: Menlo Park, CA. p. 169-234.
9. Perfetti, C.A., The representation problem in reading acquisition, in Reading Acquisition,
P.B. Gough, L.C. Ehri, and R. Treiman, Editors. 1992, Lawrence Erlbaum: Hillsdale, NJ.
p. 145-174.
10. Tam, Y.-C., J. Beck, J. Mostow, and S. Banerjee. Training a Confidence Measure for a
Reading Tutor that Listens, in Proc. 8th European Conference on Speech Communication
and Technology (Eurospeech 2003). 2003.p. 3161-3164 Geneva, Switzerland.
11. Torgesen, J.K., R.K. Wagner, and C.A. Rashotte, TOWRE: Test of Word Reading Effi-
ciency. 1999, Austin: Pro-Ed.
12. Wiederholt, J.L. and B.R. Bryant, Gray Oral Reading Tests. 3rd ed. 1992, Austin, TX:
Pro-Ed.
13. Williams, S.M., D. Nix, and P. Fairweather. Using Speech Recognition Technology to
Enhance Literacy Instruction for Emerging Readers. in Fourth International Conference
of the Learning Sciences. 2000.p. 115-120: Erlbaum.
14. Woodcock, R.W., Woodcock Reading Mastery Tests - Revised (WRMT-R/NU). 1998,
Circle Pines, Minnesota: American Guidance Service.
The Massive User Modelling System (MUMS)
1 Introduction
A recent trend within intelligent tutoring systems and related educational technologies
research is to move away from monolithic tutors that deal with individual learners,
and instead favour “adaptive learning communities” that provide a related variety of
collaborative learning services for multiple learners [9]. An urgent challenge facing
this new breed of tutoring systems is the need for precise and timely coordination that
facilitates effective adaptation in all constituent components. In addition to supporting
collaboration between the native parts of a tutoring system, an effective inter-
component communication system is required to provide the ability to know of and
react to learner actions in external applications. For example, consider the kinds of
errors a student encounters when trying to solve a Java programming problem. If the
errors are syntactical, a tutor may find it useful to intervene directly within the devel-
opment environment that student is using. If the errors are related to the higher level
course concepts, the tutor may instead find it useful to dynamically assemble and
deliver external resources (learning objects) to the student. Finally, if an appropriate
solution can not be found that helps the student to resolve their errors, the tutor may
find it useful to refer the user to a domain expert or peer who has had success at
similar tasks.
To provide this level of adaptation, the tutor must be able to form a coherent model
of students as they work with different domain applications. The tutor must be able to
collect, understand, and respond to user modelling “events” in both real time and on
an archival basis. These needs can be partially addressed by integrating intelligent
tutoring system functionality within larger web-based e-learning systems including
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 635–645, 2004.
© Springer-Verlag Berlin Heidelberg 2004
636 C. Brooks et al.
learning management systems such as WebCT [28] and Blackboard [3] or e-learning
portals like uPortal [26]. These applications provide an array of functionality meant to
directly support learning activities including social communication, learner manage-
ment, and content delivery functions. An inherent problem with these e-learning sys-
tems is that they are often unable to capture interaction between a learner and other
applications the learner may be using to complete their learning task. While a poten-
tial solution to this problem is to integrate all possible external applications that may
be used by the student within an e-learning system, this task is difficult at best due to
proprietary API’s and e-learning system homogeneity.
In [27] we proposed a method of integrating various e-learning applications using a
multi-agent architecture, where each application was represented by an agent that
negotiated with other agents to provide information about learners using the system.
A learner using the system was then able to see portions of this information by inter-
acting with a personal agent, who represented the tutor of the system. In this system,
the tutor’s sole job was to match learners with one another based on learner prefer-
ences and competencies. This system was useful at a conceptual level, but suffered
from the drawbacks of being difficult to implement and hard to scale-up. The integra-
tion of agent features (in particular reasoning and negotiation) within every applica-
tion instance required high computational power forcing the user into a more central-
ized computing environment. To further provide the performance and reliability re-
quired, agents had to be carefully crafted using a proprietary protocol for communi-
cation. This hindered both agent interoperability and system extensibility.
This paper presents a framework and prototype specifically aimed at supporting the
process of collecting and disseminating user information to software components
interested in forming user models. This framework uses both semantic web and web
service technologies to encourage interoperability and extensibility at both the se-
mantic and the syntactic levels. The organization of this paper is as follows: Section 2
describes the framework at a conceptual level. Section 3 follows with an outline of the
environment we are using to prototype the system, with a particular emphasis on the
integration of our modelling framework with the legacy e-learning applications we are
trying to support. Section 4 contrasts our work with similar work in the semantic web
community. Finally, Section 5 concludes with a look at future goals.
intentions. While the opinions created can be of any size, the focus is on creating
brief contextualized statements about a user, as opposed to fully modelling the
user.
2. Modellers: are interested in acting on opinions about the user, usually by reasoning
over these to create a user model. The modeller then interacts with the user (or the
other aspects of the system, such as learning materials) to provide adaptation.
Modellers may be interested in modelling more than one user, and may receive
opinions from more than one producer. Further, modellers may be situated and per-
form purpose-based user modelling by restricting the set of opinions they are inter-
ested in receiving.
3. Broker: acts as an intermediary between producers and modellers. The broker re-
ceives opinions from producers and routes them to interested modellers. Modellers
communicate with the broker using either a publish/subscribe model or a
query/response model. While the broker is a logically centralized component, dif-
ferent MUMS implementations may find it useful to distribute and specialize the
services being provided for scalability reasons.
While the definition of an opinion centers on human users, it does not restrict the
producer from describing other entities and relationships of interest. For instance, an
evidence producer embedded within an integrated software development environment
might not just express information about the particular compile-time errors a student
receives, but may also include the context of the student’s history for this program-
ming session, as well as some indication of how the tutor should provide treatment for
the problem. The definition also allows for evidence producers to have disagreeing
opinions about users, and for the opinion of a producer can change over time.
This three-entity system purposefully supports the notion of active learner model-
ling [17]. In the active learning modelling philosophy, the focus is on creating a
learner model situated for a given purpose, as opposed to creating a complete model
of the learner. This form of modelling tends to be less intensive than traditional user
modelling techniques, and focuses on the just-in-time creation and delivery of models
instead of the storage and retrieval of models. The MUMS architecture supports this
by providing both a stream-based publish/subscribe and an archival query/response
method of obtaining opinions from a broker. Both of these modes of event delivery
require that modellers provide a semantic query for the opinions they are interested in,
as opposed to the more traditional event system notions of channel subscription and
producer subscription. This approach decouples the producers of information from the
consumers of information, and leads to a more easily adaptable system where new
producers and modellers can be added in an as-needed fashion. The stream-based
method of retrieving opinions allows modellers to provide just-in-time reasoning,
while the archival method allows for more resource-intensive user modelling to occur.
All opinions transferred within the MUMS system include a timestamp indicating
when they were generated, allowing modellers to build up more complete or historical
user models using the asynchronous querying capabilities provided by the broker.
By applying the adaptor pattern [8] to the system, a fourth entity of interest can be
derived, namely the filter.
638 C. Brooks et al.
4. Filters: act as broker, modeller, and producer of opinions. By registering for and
reasoning over opinions from producers, a filter can create higher level opinions.
This offloads the amount of work done by a modeller to form a user model, but
maintains the more flexible decentralized environment. Filters can be chained to-
gether to provide any amount of value-added reasoning that is desired. Finally, fil-
ters can be specialized within a particular instance of the MUMS framework by
providing domain specific rules that govern the registration of, processing of, and
creation of opinions.
Interactions between the entities are shown in Fig. 1. Some set of evidence producers
publish opinions based on observations with the user to a given broker. The broker
routes these opinions to interested parties (in this case, both a filter and the modeller
towards the top of the diagram). The filter reasons over the opinions, forms derivative
statements, and publishes these new opinions back to the broker and any modellers
registered with the filter. Lastly, modellers interested in retrieving archival statements
about the user can do so by querying any entity which stores these opinions (in this
example, the second modeller queries the broker instead of registering for real time
opinion notification).
The benefits of this architecture are numerous. First, the removal of reasoning and
negotiation abilities from the producers of opinions greatly decreases the complexity
when creating new producer types. Instead of being rebuilt from scratch with user
modelling in mind, existing applications (be they applications explicitly meant to
support the learning process, or domain-specific applications) can be easily extended
and added to the system. Second, the decoupling between the producers and the mod-
The Massive User Modelling System (MUMS) 639
ellers serves to increase both the performance and the extensibility of the system. By
adding more physical brokers to store and route messages, a greater number of pro-
ducers or modellers can be supported. This allows for a truly distributed system,
where modelling is done on different physical machines throughout the network.
Third, the semantic querying and decoupling between modellers and producers allows
for the dynamic addition of arbitrary numbers of both types of application to the
MUMS system. Once these entities have joined in the system, their participation can
increase the expressiveness of the user models created, without requiring modifica-
tions to existing producers and modellers. Finally, the logical centralization the broker
allows for the setting of administration policies, such as privacy rules and the mainte-
nance of data integrity, through the addition of filters.
All of these benefits address key challenges for adaptive learning systems. These
systems must allow for the integration of both existing domain applications as well as
learning management specific applications. This integration must be able to take place
with a minimal amount of effort to accommodate the various stakeholders within an
institution (e.g. administrators, researchers, instructional designers), and must be able
to be centrally managed to provide for privacy of user data. Last, the system must be
able to scale not just to the size of a single classroom, but to the needs of a whole
department or institution.
3 Implementation Prototype
3.1 Interoperability
With the goal of distributing the system to as many domain specific applications as is
necessary, interoperability is a key concern. To this end, all opinion publishing from
producers is done using our implementation of the Web Services Events (WS-Events)
[5] infrastructure specification. This infrastructure defines a set of data types and rules
for passing events using web services. These rules include administrative information
about the producer or modeller (e.g. contact information, quality of service, etc), a
payload that contains the semantics of the opinion, and information on managing
advertising and subscriptions. Using this infrastructure helps to protect entities from
future changes in the way opinion distribution is handled. Further, modellers can
either subscribe to events using WS-Events (publish/subscribe), or can query the
broker directly using standard web service technologies (query/response). This allows
for both the real-time delivery of new modelling information, as well as the ability to
access archived information from the past in a manner independent of platform and
programming language.
We enhance semantic interoperability by expressing the payload of each event us-
ing the Resource Description Framework (RDF) [16]. This language provides a natu-
rally extensible and ontology-neutral method for describing modeling information in a
format that is easily computer readable. It has become the lingua franca of the seman-
tic web, and a number of toolkits (notably, Jena [13] and Drive [24]) have arisen to
make RDF graph manipulation easier. When registering for events, modellers provide
patterns to match using the RDF Data Query Language (RDQL) [23].
Finally, design time interoperability is achieved by maintaining a separate ontology
database which authors can inspect when creating new system components. This
encourages the reuse of previously deployed ontologies, while maintaining the flexi-
bility of opinion routing independent of ontology.
3.2 Extensibility
Besides the natural extensibility afforded by the use of the RDF as a payload format,
the MUMS architecture provides for distributed reasoning through the use of filters.
In general, a filter is a component that masquerades as any combination of producer,
modeller, or broker of events. There are at least two specialized instances of a filter:
1. Reasoners: register or query for events with the goal of being able to produce
higher level derivative events. For instance, one might create a reasoner to listen
for events related to message sending from the web-based discussion and instant
messenger producers, and then create new opinions which indicate the changing
social structure amongst peers in the class.
2. Blockers: are placed between producers and modellers with the goal of modifying
or restricting events that are published. Privacy filters are an example of a blocker.
These filters can anonymize events or require that a modeller provide special
authentication privileges when subscribing.
While the system components in our current implementation follow a clear separation
between those that are producers and consumers of information, we expect most fu-
The Massive User Modelling System (MUMS) 641
ture components will aim at value adding the network by reasoning over data sources
before producing opinions. Thus we imagine that the majority of the network will be
made up of reasoner filters chained together with a few blockers to implement ad-
ministrative policies.
3.3 Scalability
Early lessons learned from testing the implementation prototype indicated that there
are two main factors involved in slowing down the propagation of opinions:
1. Message serialization: The deserialization of SOAP messages into native data
types is an expensive process. This process is especially important to the broker,
which shares a many-to-one relationship with producers.
2. Subscription evaluation: Evaluating RDF models against a RDQL query is a time
consuming operation. This operation grows with the complexity of the models, the
complexity of the query, and the number of queries (number of modeller registra-
tions) that a broker has.
To counteract this, the MUMS architecture can be extended to include the notion of
domain brokers. A domain broker is a broker that is ontology aware, and can provide
enhanced quality of service because of this awareness. This quality of service usually
comes in the form of more efficient model storage, and thus faster query resolution.
Further, brokers are free to provide alternative transport mechanisms which may lead
to faster data transfers (e.g. a binary protocol which compresses RDF messages could
be used for mobile clients with error-prone connections, while a UDP protocol de-
scribing RDF using N-Triples [10] could be used to provide for the more real-time
delivery of events). The use of domain brokers can be combined with reasoners and
blockers to meet the performance, management, and expressability requirements of
the system.
Finally, the architectural notion of a broker as a centralized entity is a logical no-
tion only. Physically we distribute the load of the broker amongst a small cluster of
machines connected to a single data store to maintain integrity.
An overview of the prototype, including the technologies in use, is presented in
Fig. 2. Evidence producers are written in a variety of languages, including a Java
producer for the course delivery system, a C# producer for the public discussion sys-
tem and a C++ producer (in the works) for the Mozilla web browser. The broker is
realized through a cluster of Tomcat web application servers running an Apache Axis
application which manage subscriptions and semantic routing. This application uses a
PostreSQL database to store both subscription and archival information. Subscriptions
are stored as a tuple indicating the RDQL pattern that should be matched, and the
URL at which the modeller can be contacted. At this moment there is one Java-based
modeller which graphically displays aggregate student information for instructors
from the I-Help public forums. Besides a description of student posting frequency,
this modeller displays statistics for a whole forum, as well as a graphical picture of
student interaction. In addition there are two other applications under development
including a pedagogical content planner and a peer help matchmaker.
642 C. Brooks et al.
4 Related Work
While inspired by the needs for distributed intelligent tutoring systems, we see this
work overlapping three distinct fields of computer science; distributed computing, the
semantic web, and learner modelling. Related research in each of these fields will be
addressed in turn.
The distributed systems field is a mature field that has provided a catalyst for much
of our work. Both general and specific kinds of event systems are described through-
out the literature, and a number of mature specifications, such as the Java RMI and
the CORBA, exist. Unlike MUMS, these event systems require the consumers of
events (modellers) to subscribe to events (opinions) based on the expected event pro-
ducer or the channel (subject) the events will arrive on. This increases the coupling
between entities in the system, requiring that either the consumer is aware of a given
producer, or that they share a strict messaging ontology. In [4], Carzaniga et al. de-
scribe a model for content-based addressing and routing at the network level. We
build upon this model by applying similar principles in the application layer, allowing
the modellers of opinions to register for those opinions which match some semantic
pattern. This allows for the ad hoc creation and removal of both evidence producers
and modellers within the system.
While the semantic web as a research area has been growing quickly for a number
of years, the focus of this area has been on creating formalisms for knowledge man-
agement representation. The general approach with sharing data over the semantic
web is to consider it just “an extension of the current web” [2], and to follow a
query/response communication model. Thus, a fair amount of work has been done in
The Massive User Modelling System (MUMS) 643
conjunction with database research to produce efficient mechanisms for storing (e.g.
[22] [12]) and querying data (e.g. [23]), but new methods for transmitting this data
have largely been unexplored. For instance, the HP Joseki project [1] and the Nokia
URI Query Agent Model [19]provide methods for publishing, updating, and retriev-
ing RDF data models using HTTP. This approach is useful for large centralized mod-
els where data transfer uses more resources than data querying; however, it provides
poor support for the real-time delivery of modeling information. Further, it supports
the notion of a single model per user which is formed through consensus between
producers, as opposed to the more lightweight situated user modeling suggested by
active modeling researchers. We instead provide a method which completely decou-
ples producers from one another, and offload the work in forming user modellers to
the consumers of opinions.
The work done by Nejdl et al. in [18] and Dolog and Nejdl in [7] and [6] marries
the idea of the semantic web with learner modelling. In these works the authors de-
scribe a network of learning materials set up in a peer-to-peer fashion. Resources are
described in RDF using both general pedagogical metadata (in particular the IEEE
Learning Object Metadata specification) and learner specific metadata (such as the
IMS LIPS or PAPI). The network is searchable by end-users through the use of per-
sonal learning assistants who can query peers in the network for learning resource
metadata, then filter the results based on a user model. While this architecture distrib-
utes the responsibility for user modeling, it also limits communication to the
query/response model. Thus, personal learning agents must continually query data
sources to discover new information about the student they are modelling. In addition,
by arranging data sources in a peer network the system loses its ability to effectively
centrally control these sources. For instance, an institution would need to control all
of the peers in the network to provide for data integrity or privacy over the data being
shared.
As cited by Picard et al., the ITS working group of 1995 described tutoring systems
as:
“...hand-crafted, monolithic, standalone applications. They are time-consuming
and costly to design, implement, and deploy. Each development teams must rede-
velop all of the component functionalities needed. Because these components are
so hard and costly to build, few tutors of realistic depth and breadth ever get built,
and even fewer ever get tested on real students.” [21]
Despite research invested into providing agent based architectures for tutoring sys-
tems, tutors remain largely centralized in deployment. These tutors are generally do-
main specific, and are unable to easily interface with the various legacy applications
that students may be using to augment their learning. When such interfacing is avail-
able, it conies with a high cost to designers, as integration requires both a shared on-
tology to describe what the student has done, as well as considerable low level soft-
ware integration work. MUMS provides an alternative architecture where producers
644 C. Brooks et al.
can be readily associated with legacy applications and where modellers and reasoners
can readily produce useful learner modelling information.
This paper has presented both a framework and a prototype to support the just-in-
time production and delivery of user modelling information. It provides a general
architecture for e-learning applications to share user data, as well as details on a spe-
cific implementation for this architecture, which builds on technologies being used
within the web services and semantic web communities. It provides an approach to
student modelling that is platform, language, and ontology independent. Further, this
approach allows for both the just-in-time delivery of modelling information, as well
as the archival and retrieval of past modelling opinions.
Our immediate future work involves further integration of domain specific appli-
cations within this framework. We will use this new domain specific information to
provide for more accurate resource suggestions to the learner, including both the ac-
quisition of learning objects from learning object repositories as well as expertise
location through peer matchmaking. Tangential to this, we are interested in pursuing
the use of user defined filters through personal learning agents. These agents can act
as a “front-end” for the learner to have input over the control and dissemination rights
of their learner information. Finally, we are examining the issue of design time
interoperability through ontology sharing using the Web Ontology Language (OWL).
Acknowledgements. We would like to thank the reviewers for their valuable recom-
mendations. This work has been conducted with support from a grant funded by the
Natural Science and Engineering Research Council of Canada (NSERC) for the
Learning Object Repositories Network (LORNET).
References
1. Joseki : The Jena RDF Server. Available online at http://www.joseki.org/. Last accessed
March 22, 2004.
2. Berners-Lee, T., Hendler, J., and Lassila, O., The Semantic Web Scientific American,
May, 2001. Scientific American, Inc.
3. Blackboard Inc. blackboard. Available online at http://www.blackboard.com/. Last ac-
cessed March 22, 2004.
4. Carzanigay, A., Rosenblumz, D. S., and Wolfy, A. L. Content-Based Addressing and
Routing: A General Model and its Application. In Technical Report CU-CS-902-00 .
5. Catania, N., et al. Web Services Events (WS-Events) Version 2.0. Available online at
http://devresource.hp.com/drc/specifications/wsmf/WS-Events.pdf. Last accessed March
22, 2004.
6. Dolog, P. and Nejdl, W. Challenges and Benefits of the Semantic Web for User Model-
ling. In Workshop on Adaptive Hypermedia and Adaptive Web-Based Systems 2003, Held
at WWW 2003.
7. Dolog, P. and Nejdl, W. Personalisation in Elena: How to cope with personalisation in
distributed eLearning Networks. In International Conference on Worldwide Coherent
Workforce, Satisfied Users - New Services For Scientific Information. Oldenburg, Ger-
many.
8. Gamma, E., Helm, R., Johnson, R., and Vlissides, J. (eds) Design Patterns, 1st edition.
Addison-Wesley, 1995.
The Massive User Modelling System (MUMS) 645
Abstract. This paper considers research on open learner models, which are
usually aimed at adult learners, and describes how this has been applied to an
intelligent tutoring system for 8-9 year-old children and their teachers. We in-
troduce Subtraction Master, a learning environment with an open learner model
for two and three digit subtraction, with and without adjustment (borrowing). It
was found that some children were quite interested in their learner model and in
a comparison of their own progress to that of their peers, whereas others did not
demonstrate such interest. The level of interest and engagement with the learner
model did not clearly relate to ability.
1 Introduction
There have been several investigations into open learner models (OLM). One of the
aims of opening the learner model to the individual modelled, is to encourage students
to reflect on their learning. For example, Mr Collins [1] and STyLE-OLM [2] employ
a negotiation mechanism whereby the student can debate the contents of their model
with the learning environment, if they disagree with the representations of their be-
liefs. This process is intended to help improve the accuracy of the learner model while
also promoting learner reflection on their understanding, as users are required to jus-
tify any changes they wish to make to their model, before these are incorporated.
Mitrovic and Martin argue that self-assessment is important in learning, and this
might be facilitated by providing students access to their learner model [3]. Their
system employs a simpler skill meter to open the model, to consider whether even
with a simple learner model representation, self-assessment can be enhanced. They
suggest their open learner model may be especially helpful for less able students.
The above examples are for use by university students, who can be expected to un-
derstand the role of reflection in learning. Less research has been directed at children’s
use of OLMs, and whether children might benefit from their availability. One exam-
ple is Zapata-Rivera and Greer, who allowed 10-13 year-old children in different
experimental conditions to browse their learner model, changing it if they felt this to
be appropriate [4]. They argue that children of this age can perform self-assessment
and undertake reflection on their knowledge in association with an OLM. In contrast,
Barnard and Sandberg found that secondary school children did not look at their
learner model when this was available optionally [5].
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 646–655, 2004.
© Springer-Verlag Berlin Heidelberg 2004
An Open Learner Model for Children and Teachers: Inspecting Knowledge Level 647
Another set of users who have received some attention, are instructors - i.e. tutors
can access the representations of the knowledge of those they teach. For example, in
some systems the instructor can use their students’ learner models as a source of in-
formation to help them adapt their teaching to the individual or group [6], [7].
Kay suggests users might want to see how they are doing compared to others in
their cohort [8]. Linton and Schaefer display a learner’s knowledge in skill meter form
against the combined knowledge of other user groups [9]. Bull and Broady show co-
present pairs their respective learner models, to prompt peer tutoring [10].
Given the interest in the use of various forms of OLM to promote reflection by
university students, both by showing them their own models and in some cases, the
models of peers; and work on showing learner models to instructors, it would be in-
teresting to extend this approach to children and teachers. Some work has been un-
dertaken with children [4], [5], but we wish to consider the possibilities for younger
students. We therefore use a simple learner model representation.
We introduce Subtraction Master, an intelligent tutoring system (ITS) for mathe-
matics for use by 8-9 year olds. Subtraction Master opens its learner model to the
child, including a comparison of their progress against the general progress of their
peers; and opens individual and average models to the teacher. The aim is to investi-
gate whether children of this age will sufficiently understand a simple OLM and,
moreover, whether they will want to use it. If so, do they wish to view information
about their own understanding, and/or about how they relate to others in their class?
Will they want to try to improve if their knowledge is shown to be weak?
2 Subtraction Master
Subtraction Master is an ITS with an OLM, for 8-9 year-olds. The aim of developing
the system was to investigate the potential of OLMs for teachers and children at a
younger age than previously investigated. The domain of subtraction was chosen as
there is comprehensive research on children’s problems in this area [11], [12].
Subtraction Master is a standard ITS, comprising a domain model, learner model
and teaching strategies. The teaching strategies are straightforward, selected based on
a child’s progress, with random questions of appropriate difficulty according to their
knowledge. Questions also elicit further information about misconceptions if it is
inferred that these may exist. Additional help can be consulted at any time, and can
also be recommended by the system. Help is adaptive, related to the question and
question type the child is currently attempting, and is presented in the format most
suitable for the individual. This section provides an overview of the system.
The domain is based on the U.K. National Numeracy Strategy [13], incorporating
common calculation errors and misconceptions. The domain covers 2 and 3 digit
subtraction, ranging from 2 digit no adjustment (borrow), to 3 digit hundreds to tens
adjustment, tens to units adjustment. Specifically, the following are considered:
two digit subtraction (no adjustment) e.g. 23-12
two digit subtraction (adjustment from tens to units) e.g. 76-28
648 S. Bull and M. McKay
will outweigh any misconceptions. The data in the learner model is associated with a
degree of certainty, depending on the extent of evidence available to support it.
This section presents the open learner model as seen by children and teachers.
The OLM can be accessed as a means to raise children’s awareness of their progress,
using a menu or buttons. These are labelled: ‘see how you are doing’ and ‘compare
yourself to children of your age’. The individual learner model is displayed in Fig. 1.
The children have not yet learnt about graphs, therefore the learner model data
cannot be presented in that form. Instead, images are used, that correspond to the skill
levels for each question type:
(level 1: no image, not attempted / none correct / weak performance)
level 2: tick, satisfactory
level 3: smiling face, good
level 4: grinning face, very good
level 5: ‘cool’ grinning face with sunglasses, fantastic
Weak performances are not shown. A blank square might mean the child has not
attempted questions of that type, or they have performed badly. This avoids demoti-
vating children by showing negative information. Their aim is to ‘achieve faces’.
If the child chooses ‘compare yourself to children of your age’, they can view the
model of themselves compared to the ‘average peer’, as in Fig. 2. In this example, the
child is doing extremely well compared to their peers in the first question type, indi-
cated by the grinning face with sunglasses; and very well with the second and third
650 S. Bull and M. McKay
Fig. 1. The individual learner model Fig. 2. Comparison to the average peer
types. They are performing in line with others in the fourth. However, in the final
type, there is no representation. In this case this is because the child, and the class as a
whole, have not yet attempted many questions of this kind. Where a child was not
doing well compared to others, there would also be no representation. The aim is that
the child will want to improve after making this comparison to their peers.
After 20 questions, the child is presented with their individual learner model and
offered the chance to improve specific areas if these have been assessed as weak
(bottom left of Fig. 1). This may be simply where they are having most difficulty, or
where misconceptions are inferred, or it might be where the system is less certain of
its representations. This is in part to encourage those who have not explored their
learner model, to do so, and in part to guide learners towards areas that they could
improve. While guidance occurs during individualised tutoring, this prompting within
the OLM explicitly alerts learners of where they might best invest their effort.
In systems for use by adults, an approach of negotiating the learner model has been
used [1], [2], to allow learners to try to persuade the system that their perception of
their understanding is correct, if they disagree with the system’s representations in the
model. One way in which they can do this is to request a short test to prove their
point. Since negotiation with a system over one’s knowledge state is quite a complex
procedure, this may not be appropriate for younger children. Thus the idea of a brief
test to provoke change in the model if a child disagrees with it, is maintained in Sub-
traction Master, but the possibilities for adjusting the model are suggested by the
system. The child can take up the challenge of a test if they believe they can improve
the representation in their learner model; or they can accept the test while at the same
time working through further examples to improve their skills in their weaker areas.
The former quick method of correcting the model is useful, for example, if a child
suddenly understands their problem. This can be illustrated with an example from one
of the children in the study (see section 4), who showed misconceptions about com-
mutative laws. On viewing help, she exclaimed ‘I got it, I keep changing the numbers
around instead of borrowing’. The student’s learner model contained a history of the
problem. When offered a test to change the model contents, she accepted and man-
aged to remove the problem area from her model. She therefore did not have to com-
plete a longer series of questions in order for the model to reflect this progress.
An Open Learner Model for Children and Teachers: Inspecting Knowledge Level 651
Teachers can access the models of individuals, or they can view the average model of
the group. Figs. 3 and 4 show the teacher’s view, that can be accessed while they are
with the child during the child’s interaction, or later at their own PC. Teachers can edit
the model of any individual if they believe it to have become incorrect (such as when
a child has suddenly grasped a concept during coaching by the teacher, or if new
results from paper-based testing are available, etc.). I.e. teachers can update the model
to improve its accuracy, in order that the system continues to adapt appropriately to
meet children’s needs if they have been learning away from Subtraction Master.
Fig. 3. The teacher’s view of the individual Fig. 4. The teacher’s view of the
individual compared to the group
Children are not shown misconceptions. However, this may be useful data for teach-
ers. Figs. 3 and 4 show the learner model of Tom. Fig. 3 illustrates areas in which he
could have shown misconceptions given the questions attempted (shaded light), and
the misconceptions that were observed (shaded dark). The last column shows ‘unde-
fined’ errors. In the above example, from a possible 15 undefined errors (15 questions
were attempted), 2 undefined errors were exhibited. 3 incorrect responses suggest a
likely place value misconception, out of 3 questions attempted, where this problem
could be manifested (column 3). The first column shows Tom answered 6 questions
where he could have shown a commutative error, but did not.
The upper portion of the right side of the screen shows Tom’s performance across
question types (number attempted, number correct). Below this is the strength of
evidence for the five types of misconception or bug. As can be seen by the figure for
place value being 0, the teacher has edited the model to reflect the fact that Tom no
longer holds this misconception after help from the teacher.
Fig. 4 shows Tom’s performance against the average achievement of the group.
The group data can also be presented without comparison to a specific individual.
Thus teachers can also investigate where the group is having most difficulty.
652 S. Bull and M. McKay
4.1 Subjects
Subjects were 11 8-9 year-olds at a Birmingham school, selected by the Head Teacher
to represent high achievers (4), average (2), low achievers (5); with 6 boys and 5 girls
spread quite evenly in the high and low groups. Both average pupils were boys.
Audio recordings were made while children used Subtraction Master. They were
sometimes prompted for further information. Written notes were made to provide
contextual information. Additional information was obtained by structured interview
after the interaction. Sessions lasted around half an hour.
4.3 Results
Table 1 shows use of the open learner model by children. Students are listed in se-
quence as ranked for ability by the Head Teacher, from lowest to highest.
Four children made little or no use of their OLM after the first inspection, while 7
returned to it spontaneously - 2 using it extensively (S6 and S10). There is no clear
relationship between ability and preference for viewing the learner model, though in
general it appears that the higher ranked students tend to be more interested. How-
ever, the lowest ranked child did use their model, and the third highest did not.
Transcripts of children’s comments while using the OLM suggest many understand
and benefit from it, illustrated by the following. (E=experimenter, S=subject.)
E: [Asks about the open learner model]
S1: Well at first that little face and then afterwards the big face was there.
E: And what did that mean to you?
An Open Learner Model for Children and Teachers: Inspecting Knowledge Level 653
4.4 Discussion
Our aim was not to develop a full ITS, but rather to investigate the potential for using
OLMs with 8-9 year olds. Hence the system is relatively simple. We recognise that
using only 11 subjects, our results are merely suggestive. It does seem, however, that
the issue of using OLMs with children of this age, is worth investigating further.
We return to the questions raised earlier: Would children want to view their learner
model? Would children want to view the peer comparison model? Would children be
able to interpret their learner model? Would children be able to interpret the peer
comparison model? Would children find the open learner model useful?
There appears to be a difference between children in their interest in the OLM.
Seven children chose to use their model on more than one occasion, with 2 using it
extensively. These two kept referring back to their model to monitor their progress.
For them, the OLM was clearly motivating. One was high ability, and the other, me-
dium ability. Thus the OLM can be a motivating factor for at least these two levels of
student. The remaining 5 children who used their learner model spontaneously were
two low-achievers, the other medium-level student, and two further high-achievers.
The model therefore appears of interest to children of all abilities, though in general
the higher level children had a greater tendency to refer to their learner models.
654 S. Bull and M. McKay
Of the 4 children who did not use their learner model, 2 were also disinterested in
learning with computers generally (S2 and S5). Thus it may be this factor rather than
the learner model itself, that is the basis of their lack of use of their learner model.
In addition to observations of students returning to their models, the transcript ex-
cerpts from low and high ability children demonstrate that 8-9 year olds can under-
stand simple learner model representations. S1, the child with most difficulties, ar-
ticulated his views only after prompting, but nevertheless showed an understanding of
the learner model, albeit at a simple level. S10 and S11, high ability students, gave
spontaneous explanations. The excerpts given, show their views of the comparative
peer model. Both wanted to check their progress relative to others. S11 spontaneously
asked how other children were doing before viewing the peer model. When the peer
model was then shown to him, he became particularly interested in it. S6, an average
student, referred to his learner model in order to report his progress to his mother.
The above questions can be answered positively for over half the children, as noted
in the structured interview and student comments while using Subtraction Master.
There was a tendency for higher- and medium-ability children to show greater inter-
est, but two of the five lower-ability children also used their learner model.
We do not know to what extent the results are influenced by the novelty of the ap-
proach to the children, and the fact that they were selected for ‘special treatment’. This
needs to be followed up with a longer study with a greater number of subjects, which
also considers learning gains. (A short pre- and post-test were administered, showing
an average 16% improvement across subjects, but due to the limited time with the
children, extended use of the system and delayed post-test were not possible.)
5 Summary
This paper introduced Subtraction Master, an ITS for subtraction for 8-9 year-olds. It
was designed as a vehicle for investigating the potential of simple individual OLMs
and comparison of the individual to peers, to enhance children’s awareness of their
progress. The children demonstrated an understanding of their learner model, and 7 of
the 11 showed an interest in using it. These had a range of abilities. The next step is to
allow children to use the system longer-term, to discover whether this level of interest
is maintained over time, and if so, to develop a more complex ITS and investigate
further open learner modelling issues with a larger number of children.
References
1. Bull, S. & Pain, H. (1995). ‘Did I say what I think I said, and do you agree with me?’:
Inspecting and Questioning the Student Model, Proceedings of World Conference on Arti-
ficial Intelligence in Education, Association for the Advancement of Computing in Educa-
tion (AACE), Charlottesville, VA, 1995, 501-508.
An Open Learner Model for Children and Teachers: Inspecting Knowledge Level 655
2. Dimitrova, V., Self, J. & Brna, P. (2001). Applying Interactive Open Learner Models to
Learning Technical Terminology, User Modeling 2001: 8th International Conference,
Springer-Verlag, Berlin Heidelberg, 148-157.
3. Mitrovic, A. & Martin, B. (2002). Evaluating the Effects of Open Student Models on
Learning, Adaptive Hypermedia and Adaptive Web-Based Systems, Proceedings of Sec-
ond International Conference, Springer-Verlag, Berlin Heidelberg, 296-305.
4. Zapata-Rivera, J.D. & Greer, J.E. (2002). Exploring Various Guidance Mechanisms to
Support Interaction with Inspectable Learner Models, Intelligent Tutoring Systems: In-
ternational Conference, Springer-Verlag, Berlin, Heidelberg, 442-452.
5. Barnard, Y.F. & Sandberg, J.A.C. (1996). Self-Explanations, do we get them from our
students?, Proceedings of European Conference on AI in Education, Lisbon, 115-121.
6. Bull, S. & Nghiem, T. (2002). Helping Learners to Understand Themselves with a Learner
Model Open to Students, Peers and Instructors, Proceedings of Workshop on Individual
and Group Modelling Methods that Help Learners Understand Themselves, International
Conference on Intelligent Tutoring Systems 2002, 5-13.
7. Zapata-Rivera, J-D. & Greer, J.E. (2001). Externalising Learner Modelling Representa-
tions, Proceedings of Workshop on External Representations of AIED: Multiple Forms
and Multiple Roles, International Conference on Artificial Intelligence in Education 2001,
71-76.
8. Kay, J. (1997). Learner Know Thyself: Student Models to Give Learner Control and Re-
sponsibility, Proceedings of ICCE, AACE, 17-24.
9. Linton, F. & Schaefer, H-P. (2000). Recommender Systems for Learning: Building User
and Expert Models through Long-Term Observation of Application Use, User Modeling
and User-Adapted Interaction 10, 181-207.
10. Bull, S. & Broady, E. (1997). Spontaneous Peer Tutoring from Sharing Student Models,
Artificial Intelligence in Education, IOS Press, Amsterdam, 143-150.
11. Brown, J.S. & Burton, R.R. (1978). Diagnostic Models for Procedural Bugs in Basic
Mathematical Skills, Cognitive Science 2, 155-192.
12. Burton, R.R. (1982). Diagnosing Bugs in a Simple Procedural Skill, Intelligent Tutoring
Systems, Academic Press, 157-183.
13. Department for Education and Skills (2004). The Standards Site: The National Numeracy
Strategy, http://www.standards.dfes.gov.uk/numeracy.
Scaffolding Self-Explanation to Improve Learning in
Exploratory Learning Environments
1 Introduction
Several studies in Cognitive Science and ITS have shown the effectiveness of the
learning skill known as self-explanation, i.e., spontaneously explaining to oneself
available instructional material in terms of the underlying domain knowledge [6].
Because there is evidence that this learning skill can be taught (e.g., [2]), several
computer-based tutors have been devised to provide explicit support for self-
explanation. However, all these tutors focus on coaching self-explanation during
fairly structured interactions targeting problem-solving skills (e.g., [1], [7, 8] and
[10]). For instance, The SE-Coach [7][8] is designed to model and trigger students’
self-explanations as they study examples of worked out solutions for physics prob-
lems. The Geometry Explanation Tutor [1] and Normit-SE [10] support self-
explanations of problem-solving steps, in geometry theorem proving and data nor-
malization respectively. In this paper, we describe how we are extending support for
self-explanation to the type of less structured pedagogical interactions supported by
open learning environments.
Open learning environments place less emphasis on supporting learning through
structured, explicit instruction and more on allowing the learner to freely explore the
available instructional material [11]. In theory, this type of active learning should
enable students to acquire a deeper, more structured understanding of concepts in the
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 656–667, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Scaffolding Self-Explanation to Improve Learning 657
domain [11]. In practice, empirical evaluations have shown that open learning envi-
ronments are not always effective for all students. The degree of learning from such
environments depends on a number of student-specific features, including activity
level, whether or not the student already possesses the meta-cognitive skills necessary
to learn from exploration and general academic achievement (e.g., [11] and [12]).
To improve the effectiveness of open learning environments for different types of
learners, we have been working on devising adaptive support for effective explora-
tion. The basis of this support is a student model that monitors the learners’ explora-
tory behaviour and detects when they need guidance in the exploration process. The
model is implemented in the Adaptive Coach for Exploration (ACE), an open learn-
ing environment for mathematical functions [3, 4]. An initial version of this model
integrated information on both student domain knowledge and the amount of ex-
ploratory actions performed during the interaction to dynamically assess the effec-
tiveness of student exploration. Empirical studies showed that hints based on this
model helped students learn from ACE. However, these studies also showed that the
model sometimes overestimated the learners’ exploratory behaviour, because it al-
ways interpreted a large number of exploratory actions as evidence of good explora-
tion. In other words, the model was not able distinguish between learners who merely
performed actions in ACE’s interface and learners who self-explained those actions.
In this paper, we describe 1) how we modified ACE’s student model to assess a
student’s self-explanation behaviour, and 2) how ACE uses this assessment to im-
prove the effectiveness of a student’s exploration through tailored scaffolding for
self-explanation. ACE differs from Geometry Explanation Tutor and the Normit-SE
not only because it supports self-explanation in a different kind of educational activ-
ity, but also because these systems do not model a student’s need or tendency to self-
explain. The Geometry Explanation Tutor prompts students to self-explain every
problem-solving step, Normit-SE prompts students to self-explain every new or in-
correct problem-solving step. Neither of these systems considers whether it is dealing
with a self-explainer who would have initiated the self-explanation regardless of the
coach’s hints, even though previous studies on self-explanations have shown that
some students do self-explain spontaneously [6]. Thus, these approaches are too re-
strictive for an open learning environment, because they may force spontaneous self-
explainers to perform unnecessary interface actions, contradicting the idea of inter-
fering as little as possible with students’ free exploration. Our approach is closer to
the SE-Coach’s, which prompts for self-explanation only when its student model
assesses that the student actually needs the scaffolding [9]. However, the SE-Coach
mainly relies on the time spent on interface actions to assess whether or not a student
is spontaneously self-explaining. In contrast, ACE also relies on the assessment of a
student’s self-explanation tendency, including how this tendency evolves as a conse-
quence of the interaction with ACE. Using this richer set of information, ACE can
provide support for self-explanation in a more timely and tailored manner.
In the rest of the paper, we first describe ACE’s interface, and the general structure
of its student model. Next, we illustrate the changes made to the interface and the
model to provide explicit guidance for self-explanation. Finally, we illustrate the
model’s behaviour based on sample simulated scenarios.
658 A. Bunt, C. Conati, and K. Muldner
ACE is an adaptive open learning environment for the domain of mathematical func-
tions. ACE’s activities are divided into units, which are collections of exercises. Fig-
ure 1 shows the main interaction window for two of ACE’s units: the Machine Unit
and the Plot Unit. Ace’s third unit, the Arrow Unit, is not displayed for brevity.
The Machine and the Arrow Units allow a learner to explore the relation between
input and output of a function. In the Machine Unit, the learner can drag the inputs
displayed at the top of the screen to the tail of the “machine” (the large arrow shown
in Fig. 1, left), which then computes the corresponding output. The Arrow Unit al-
lows the learner to match a function’s inputs and outputs, and is the only unit within
ACE that has a clear definition of correct behaviour. The Plot Unit (Fig. 1, right),
allows the learner to explore the relationship between a function’s graph and its equa-
tion by manipulating one entity, and then observing the corresponding changes in the
other.
To support the exploration process, ACE includes a coaching component that pro-
vides tailored hints when ACE’s student model assesses that students have difficulties
exploring effectively. For more detail on ACE’s interface and coaching component
see [4]. In the next section, we describe the general structure of ACE’s student model.
ACE’s student model uses Bayesian Networks to manage the uncertainty inherent to
assessing students’ exploratory behaviour. The main cause of this uncertainty is that
both exploratory behaviour and the related meta-cognitive skills are not easily ob-
servable unless students are required to make them explicit. However, forcing stu-
dents to articulate their exploration steps would clash with the unrestricted nature of
open learning environments.
Scaffolding Self-Explanation to Improve Learning 659
The structure of ACE’s student model derives from an iterative design process [3]
that gave us a better understanding of what defines effective exploration. Figure 2
shows a high-level description of this structure, which comprises several types of
nodes used to assess exploratory behaviour at different levels of granularity:
Relevant Exploration Cases: the exploration of individual exploration cases in an
exercise (e.g., dragging the number 3, a small positive input, to the back of the
function machine in the Machine Unit).
Exploration of Exercises: the exploration of individual exercises.
Exploration of Units: the exploration of individual units.
Exploration of Categories: the exploration of groups of relevant exploration cases
that appear across multiple exercises (e.g., all the cases involving a positive slope
in the Plot unit).
Exploration of General Concepts: the exploration of general domain concepts
(e.g., the input/output relation for different types of functions).
The links among the different types of exploration nodes represent how they inter-
act to define effective exploration. Exploration nodes have binary values representing
the probability that the learner has sufficiently explored the associated item.
ACE’s student model also includes binary nodes representing the probability that
the learner understands the relevant domain knowledge (summarized by the node
Knowledge in Fig. 2). The links between knowledge and exploration nodes represent
the fact that the degree of exploration needed to understand a concept depends on
how much knowledge of that concept a learner already has. Knowledge nodes are
updated only through actions for which there is a clear definition of correctness (e.g.,
linking inputs and outputs in the Arrow Unit).
However, these studies also showed that sometimes ACE overestimated students’
exploratory behaviour (as indicated by post-test scores).
We believe that a likely cause of this problem was that ACE considered a student’s
interface actions to be sufficient evidence of good exploratory behaviour, without
taking into account whether s/he was self-explaining the outcome of these actions. To
understand how self-explanation can play a key role in effective exploration, consider
a student who quickly moves a function graph around the screen in the Plot unit,
without reflecting on how these movements change the function equation. Although
this learner is performing many exploratory actions, s/he can hardly learn from them
because s/he is not self-explaining their outcomes. We observed this exact behaviour
in a number of our subjects who tended to spend little time on exploratory actions,
and who did not learn the associated concepts (as demonstrated by pre-test / post-test
differences).
To address this limitation, we decided to extend ACE’s interface and student
model to provide tailored support for self-explanation. We first describe modifica-
tions made to ACE’s interface to provide this support.
The original version of ACE only generated hints indicating that the student should
further explore some elements of a given exercise. The new version of ACE can also
generate tailored hints to support a student’s self-explanation, if this is detected to be
the cause of poor exploration. Deciding when to hint for self-explanation is a chal-
lenging issue in an open learning environment. The hints should interfere as little as
possible with the exploratory nature of the interaction, but should also be timely so
that even the more reluctant self-explainers can appreciate their relevance. Thus, ACE
hints to self-explain individual actions are provided as soon as the model predicts that
the student is not self-explaining their outcomes when s/he should be.
The first of these hints is a generic suggestion to slow down and think a little more
about the outcome of the performed actions. Following the approach of the SE-Coach
[7, 8], ACE provides further support for those students who cannot spontaneously
self-explain by suggesting the usage of interface tools, designed to help students gen-
Scaffolding Self-Explanation to Improve Learning 661
In the absence of explicit SE actions, the model tries to assess whether the student is
implicitly self-explaining each exploratory action. Figure 4 shows two time slices
created after two exploratory actions. Since the remainder of the exploration hierar-
chy (see Fig. 2) has not undergone significant change, we omit it for clarity. In this
figure, the learner is currently exploring exercise 0 (node in Fig. 4), which has three
relevant exploration cases and in Fig. 4). At time T, the
learner performed an action corresponding to the exploration of at time T+1,
the action corresponded to Nodes representing the assessment of self-
explanation are shaded grey. All nodes in the model are binary, except for time,
which has values Low/Med/High. We now describe each type of self-explanation
node:
Implicit SE: represents the probability that the learner has self-explained a case
implicitly, without using the interface tools. The factors influencing this assess-
ment include the time spent exploring the case and the stimuli that the learner has
to self-explain. Low time on action is always taken as negative evidence for im-
plicit explanation. The probability that self-explanation happened with longer time
depends on whether there is a stimulus to self-explain.
Stimuli to SE: represents the probability that the learner has stimuli to self-explain,
either from the learner’s general SE tendency or from a coach’s explicit hint.
SE Tendency: represents the model’s assessment of a student’s SE tendency. The
prior probability for this node will be set using either default population data or,
when possible, data for a specific student. In either case, the student model’s be-
lief in that student’s tendency will subsequently be refined by observing her be-
haviour with ACE. More detail on this assessment is presented in section 5.3.
Time: represents the probability that the learner has spent a sufficient time cover-
ing the case. We use time spent as an indication of effort (i.e., the more time spent
the greater the potential for self-explanation). Time nodes are observed to low,
medium and high according to the intervals between learner actions.
Coach Hint to SE: indicates whether or not the learner’s self-explanation action
was preceded by a prompt from the coach.
We now discuss the impact of the above nodes on the model’s assessment of the
learner’s exploratory behaviour. Whether or not a learner’s action implies effective
exploration of a given case (e.g., depends on the probability that: 1) the stu-
dent self-explained the action and 2) s/he knows the corresponding concept, as as-
sessed by the set of knowledge nodes in the model (summarized in Fig. 4 by the node
Knowledge). In particular, the CPT for a case exploration node is defined so that low
self-explanation with high knowledge generates an assessment of adequate explora-
tion and, thus, does not trigger a Coach hint. This accounts for the fact that a student
with high knowledge of a concept does not need to dwell on the related case to im-
prove her understanding [3]. Note that the assessment of implicit SE is independent
from the student’s knowledge. We consider implicit self-explanation to have occurred
regardless of correctness, consistent with the original definition of self-explanation
[6].
Scaffolding Self-Explanation to Improve Learning 663
Usage of ACE’s SE tools provides the model with additional information on the stu-
dent’s self-explanation behaviour. Self-explanation actions using these tools generate
explicit self-explanation slices; two such slices are displayed in Figure 5. Compared
to implicit SE slices, explicit SE slices include additional evidence nodes represent-
ing: 1) the usage of the SE tool (SE Action node in Fig. 5), and 2) the correctness of
this action (Correctness node in Fig. 5). The SE Action node, together with the time
the student spent on this action, influences the assessment of whether an explicit self-
explanation actually occurred (Explicit SE node in Fig. 5). As was the case for the
implicit SE slices, correctness of the SE action does not influence this assessment.
However, correctness does influence the assessment of the student’s corresponding
knowledge, since it is a form of explicit evidence. Consequently, if the explicit SE
action is correct, the belief that the student effectively explored the corresponding
case is further increased through the influence of the corresponding knowledge
node(s).
One novel aspect of ACE’s student model is its ability to assess how a student’s ten-
dency to self-explain evolves during the interaction with the system. In particular, the
model currently represents the finding that explicit coaching can improve SE ten-
dency [2]. Fig. 5 shows how the model assesses this tendency in the explicit SE slices.
664 A. Bunt, C. Conati, and K. Muldner
6 Sample Assessment
We now illustrate the assessment generated by our model with two sample scenarios.
Scenario 1: Explicit SE Action. Suppose a student, assessed to have a low initial
knowledge and a fairly high SE tendency, is exploring an exercise in the Plot Unit.
She first performs an exploratory action, and then chooses to self-explain explicitly
using the SE tools. Figure 6 (Top) illustrates the model’s assessment of the relevant
knowledge, SE tendency, and case exploration for the first three slices of the interac-
tion. Slice 1 shows the model’s assessment prior to any student activity. Slice 2 shows
the assessment after one exploratory action with medium time has been performed,
but not explicitly self-explained. Since the student’s SE tendency is fairly high and
medium time was spent performing the action, the assessment of case exploration
increases. Slice 3 shows the assessment after the student performed an explicit SE
action (corresponding to the same exploration case). Since the action was performed
Scaffolding Self-Explanation to Improve Learning 665
without a Coach hint, the appraisal of her SE tendency increases in that time slice.
The self-explanation action was correct, which increases knowledge of the related
concept. Finally, case exploration increases in Slice 3 because 1) the learner spent
enough time self-explaining and 2) has provided direct evidence of her knowledge.
took into account only coverage of exploratory actions, without assessing whether the
student had self-explained the implications of those actions. Thus, that model would
have assessed our student to have adequately explored the cases covered by her ac-
tions.
References
Claudia Gama
Federal University of Bahia, Department of Computer Science
Salvador(BA), Brazil
www.dcc.ufba.br
[email protected]
1 Introduction
Metacognition is a form of cognition, a second or higher order thinking process
which involves active control over cognitive processes [1]. Sometimes it is simply
defined as thinking about thinking or as a person’s cognition about cognition.
Extensive research suggests that metacognition has a number of concrete
and important effects on learning, as it produces a distinctive awareness of the
processes, as well as the results of the learning endeavour [1]. Recently, many
studies have examined ways in which theories of metacognition can be applied
to education, focusing on the fundamental question “Can explicit instruction
of metacognitive processes facilitate learning?”. The literature points to several
successful examples (see [2], for instance).
Research also indicates that metacognitively aware learners are more strate-
gic and perform better than unaware learners [3]. One explanation is that me-
tacognitive awareness enables individuals to plan, sequence, and monitor their
1
This research was supported by grant No. 200275-98.4 from CNPq-Brazil.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 668–677, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Metacognition in Interactive Learning Environments 669
learning in a way that directly improves performance [4]. However, not all stu-
dents engage spontaneously in metacognitive thinking unless they are explicitly
encouraged to do so through carefully designed instructional activities [5].
Hence it is important to do research on effective ways to include metacogni-
tive support in the design of natural and computer-based learning environments.
Some attempts have been made to incorporate metacognition training into in-
teractive learning environments (ILEs) and Intelligent Tutoring Systems (ITSs),
mostly in the form of embedded reflection on the learning task or processes.
Researchers in this area have recognized the importance of incorporating
metacognitive models into ILE design [6]. However the lack of an operational
model of metacognition makes this task a difficult one. Thus, the development of
models or frameworks that aim to develop metacognition, cognitive monitoring,
and regulation in ILEs is an interesting and open topic of investigation.
This paper describes a framework called the Reflection Assistant (RA) Model
for metacognition instruction. This model was implemented into an ILE called
MIRA. The model and some illustrations of the MIRA environment are presen-
ted, together with the results of an empirical evaluation performed.
student’s cognitive load; and (ii) to get students to recognize the importance of
the metacognitive activities to the learning process.
and (c) evaluation or judgement stage [12]. The RA is organized around these
stages, matching specific metacognition instruction to the characteristics of each
of these stages as shown in Fig.1.
The Post-task reflection stage creates a space where the student thinks
about her actions during the past problem solving activity, comparing them with
her reflections expressed just before the beginning of that problem.
The RA is divided into two main modules: pre-task reflection and familiari-
zation assistant and post-task reflection assistant.
The pre-task reflection and familiarization assistant aims at preparing the
student for the problem solving activity, promoting reflection on knowledge mo-
nitoring, assessment of the understanding of the problem to be attempted, and
awareness of useful metacognitive strategies.
The post-task reflection assistant presents activities related to the evaluation
of problem solving and takes place just after the student finishes a problem.
Besides the modules, the RA includes an inference engine to assess students’
level of metacognition. Finally, the RA incorporates a series of data reposito-
ries, which contain either general knowledge about metacognition or information
about students’ demonstrated or inferred metacognition.
From the user’s perspective the interaction takes place in the following se-
quence: (1) the student starts by performing the first two activities proposed in
Metacognition in Interactive Learning Environments 673
the pre-task and familiarization assistant, then the ILE presents a new problem
and she proceeds to the other two activities of the same assistant; (2) she solves
the problem with the aid of the problem solving tools provided by the ILE;
and (3) after finishing the problem, she performs the activities proposed by the
post-task reflection assistant.
activities; and control group, that interacted with a version of MIRA where all
reflective activities had been removed.
As both groups had the same amount of time to interact with MIRA, we
predicted that the experimental group would solve fewer problems than the
control group. However, we also predicted that the experimental group would
perform better.
Indeed, the number of problems attempted by the experimental group (N=112)
was highly significantly smaller (Mann-Whitney U test, z=2.56,
than that of the control group (N=136). At the same time, the experi-
mental group had a significantly better performance in MIRA than the control
group: the number of correct answers per total of problems attempted was signi-
ficantly greater than that of the control group (z=1.66, It
was also the case for the number of answers almost correct (with minor errors)
per total of problems attempted (z=1.82,
At the metacognitive level, there was a higher increase of KMA in the ex-
perimental group than in the control group. However, this difference was not
statistically significant. So, even though we have some evidence of the benefits
on students’ knowledge monitoring skill, we can not make strong claims about
the validity of the RA model for knowledge monitoring development.
5 Conclusions
The Reflection Assistant model is a generic framework that can be tailored
and used together with different types of problem solving environments. All
the elements can be adjusted and augmented according to the objectives of the
designers and the requirements of the domain. The interface has to be designed
according to the ILE as it was done in the MIRA System presented. The ultimate
goal is to create a comprehensive problem solving environment that provides
activities that serve to anchor new concepts into the learner’s existing cognitive
knowledge to make them retrievable.
One important innovation introduced by the RA Model is the idea that it
is necessary to conceive specific moments and activities to promote awareness
of aspects of the problem solving process. Therefore, the RA is designed as a
separate component from the problem solving environment with specialized ac-
tivities. Therefore, two new stages in problem solving activity are proposed:
Metacognition in Interactive Learning Environments 677
pre-task reflection stage and post-task reflection stage. During these stages the
student interacts uniquely with the reflective activities proposed in the RA Mo-
del, which focus on the her metacognitive skills and reflection on her problem
solving experience.
As the evaluation of MIRA demonstrated a shift from quantity to quality was
an interesting consequence of the inclusion of the RA model in MIRA. As seen
in the experiment, the quantity of problems performed did not lead to better
performance.
Another experiment with a bigger number of subjects is necessary to draw
more definite conclusions about the influence and benefits of the Reflection As-
sistant model proposed here.
References
1. Flavell, J.H.: Metacognition and cognitive monitoring. a new area of cognitive-
developmental inquiry. American Psychologist 34 (1979) 906–911
2. Hacker, D.J., Dunlosky, J., Graesser, A.C., eds.: Metacognition in Educational
Theory and Practice. Hillsdale, NJ: Lawrence Erlbaum Associates (1998)
3. Pressley, M., Ghatala, E.S.: Self-regulated learning: Monitoring learning from text.
Educational Psychologist 25 (1990) 19–33
4. Schraw, G., Dennison, R.S.: Assessing metacognitive awareness. Contemporary
Educational Psychology 19 (1994) 460–475
5. Lin, X.D., Lehman, J.D.: Supporting learning of variable control in a computer-
based biology environment: Effects of prompting college students to reflect on their
own thinking. Journal of Research in Science Teaching 36 (1999) 837–858
6. Aleven, V., Koedinger, K.R.: Limitations of student control: Do students know
when they need help? In Gauthier, G., C., F., VanLehn, K., eds.: 5th International
Conference on Intelligent Tutoring Systems - ITS 2000, Berlin: Springer Verlag
(2000) 292–303
7. Puntambekar, S.: Investigating the effect of a computer tool on students’ meta-
cognitive processes. PhD thesis, University of Sussex (1995)
8. Conati, C., Vanlehn, K.: Toward computer-based support of meta-cognitive skills:
a computational framework to coach self-explanation. International Journal of
Artificial Intelligence in Education 11 (2000) 398–415
9. Gama, C.: Metacognition and reflection in its: increasing awareness to improve
learning. In Moore, J.D., ed.: Proceedings of the Artificial Intelligence in Education
Conference, San Antonio, Texas, IOS Press (2001) 492–495
10. Tobias, S., Everson, H.T.: Knowing what you know and what you don’t: further
research on metacognitive knowledge monitoring. College Board Research Report
2002-3, College Entrance Examination Board: New York (2002)
11. Tobias, S., Everson, H.T., Laitusis, V., Fields, M.: Metacognitive Knowledge Mo-
nitoring: Domain Specific or General? Paper presented at the Annual meeting of
the Society for the Scientific Study of Reading, Montreal (1999)
12. Artzt, A.F., Armour-Thomas, E.: Mathematics teaching as problem solving: A
framework for studying teacher metacognition underlying instructional practice in
mathematics. Instructional Science 26 (1998) 5–25
Predicting Learning Characteristics in a Multiple
Intelligence Based Tutoring System
Abstract. Research on learning has shown that students learn differently and
that they process knowledge in various ways. EDUCE is an Intelligent Tutoring
System for which a set of learning resources has been developed using the
principles of Multiple Intelligences. It can dynamically identify user learning
characteristics and adaptively provide a customised learning material tailored to
the learner. This paper introduces the predictive engine used within EDUCE. It
describes the input representation model and the learning mechanism
employed. The input representation model consists of input features that
describe how different resources were used and inferred from fine-grained
information collected during student computer interactions. The predictive
engine employs the Naive Bayes classifier and operates online using no prior
information. Using data from a previous experimental study, a comparison was
made between the performance of the predictive engine and the actual
behaviour of a group of students using the learning material without any
guidance from EDUCE. Results indicate correlation between student’s
behaviour and the predictions made by EDUCE. These results suggest that the
concept of learning characteristics can be modelled using a learning scheme
with appropriately chosen attributes.
1 Introduction
Research on learning has shown that students learn differently, that they process and
represent knowledge in different ways, that it is possible to diagnose a student’s
learning style and that some students learn more effectively when taught with
preferred methods [1, 2].
Individual learning characteristics could be used as the basis of selecting material
but, identifying learning characteristics can be difficult. Furthermore it is not clear
which aspects of learning characteristics are worth modelling, how the modelling can
take place and what can be done differently for users with different learning styles in
a computer based environment [3].
Typically questionnaires and psychometric tests are used to assess and diagnose
learning characteristics [4] but these can be time-consuming, require the student to be
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 678–688, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Predicting Learning Characteristics 679
explicitly involved in the process and may not be accurate. Once the profile is
generated, it is static and does not change regardless of user interactions. What is
desirable, is a learning environment that has the capacity to develop and refine the
profile of the student’s learning characteristics whilst the student is engaged with the
computer
Several system adapting to the individual’s learning characteristic have been
developed [5],[6]. In attempts to build a model of student’s learning characteristics,
information from the student is obtained using questionnaires, navigation paths,
answers to questions, link sorting, stretch text viewing and explicit updates by the
user to their own student model. Machine learning techniques offer a solution in the
quest to develop and refine a model of learning characteristics [7], [8]. Typically
these systems contain a variety of instructional types such as explanations or example
and fragments of different media types representing the same content, with the
tutoring system choosing the most suitable for the learner. Another approach is to
compare the student’s performance in tests to that of other students, and to match
students with instructors who can work successfully with that type of student [9].
Other systems try to model learning characteristics such as logical, arithmetic and
diagrammatic ability [10].
EDUCE[11] is an Intelligent Tutoring System that uses a predictive engine, built
using machine learning techniques, to identify and predicts learning characteristics
online in order to provide a customised learning path. It uses a pedagogical model
based on Gardner’s Multiple Intelligence(MI) concept [12] to classify content, model
the student and deliver material in diverse ways. In EDUCE[13] four different
intelligences are used to develop four categories of content: verbal/linguistic,
visual/spatial, logical/mathematical and musical/rhythmic intelligences. Currently,
science is the subject area for which content has been developed.
This paper describes the predictive engine within EDUCE. The predictive engine is
based upon the assumption that students do exhibit patterns of behaviour appropriate
to their particular learning characteristics and it is possible to describe those patterns.
Through observations of the student, it builds an individual predictive model for each
learners and allows EDUCE to adapt the presentation of content.
The input representation model to the learning scheme consists of fine-grained
features that describe the student’s interest in and use of different resources available.
The predictive engine employs the Naive Bayes algorithm [14]. It operates online
using no prior information, develops a predictive model for each individual student
and continues to refine that model with further observations of the student. At the
start of each learning unit predictions are made as to what the learners preferred
resource is and when will it be used.
The paper outlines how, using data from a previous experimental study, an
evaluation was made on the predictive accuracy of the adaptive engine and the
appropriateness of the input features chosen. The results suggest that that the concept
of learning characteristics can be modelled using a learning scheme with
appropriately chosen attributes.
680 D. Kelly and B. Tangney
2 EDUCE Architecture
Fig. 1. The Awaken stage of “Opposites Attract” with four options for different resources
3 Predictive Engine
In EDUCE predictions are made about which resource a student prefers. Being able to
predict student behaviour provides the mechanism by which instruction can be
adapted and by which to motivate a student with appropriate material. As the student
progresses through a tutorial, each leaning unit offers four different types of
resources. The student has the option to view only one, view them all or to repeatedly
view some. The prediction task is to identify at the start of each learning unit which
resource the student would prefer, this is referred to as the predicted preferred
resource. Fig. 2 illustrates the main phases of the prediction process and their
implementation within EDUCE.
Fig. 2. The different stages in the predictive engine and their implementation within EDUCE.
Instead of modelling the static features of the learning resources themselves, a set
of dynamic features describing the usage of the different resources has been
682 D. Kelly and B. Tangney
identified. The following attributes have been chose to reflect how the student uses
the different resources.
NormalTime {Yes, No}: Yes if students spent more that 2 seconds viewing
aresources otherwise No. The assumption is made that if a student has spent less
than 2 seconds he has not had the time to use it. The values is also No if the
student does not select the resource. 2 seconds was chosen as in experimental
studies it provided the optimal classification accuracy.
LongTime {Yes, No}: Yes if the student spends more that 15 seconds on the
resource otherwise No. The assumption is that that if the student spends more
that 15 seconds he is engaged with the resource. 15 seconds provided the optimal
classification accuracy.
FirstChoice {Yes, No}: Yes if the student views the resource first otherwise No
OnlyOne {Yes, No}: Yes if this is the only resource the student looks at
otherwise No
Repeat {Yes, No}: Yes if the student looks at the resource more than once
otherwise No
QuestAtt {Yes, No}: Yes if the student looks at the resource and attempts a
question otherwise No.
QuestRight {Yes, No}: Yes if the student looks at the resource and gets the
question right otherwise no.
Resource {VL, LM, VS, MR}: The name of the resource: Verbal/Linguistic,
Logical/Mathematical, Visual/Spatial and Musical/Rhythmic. This is the feature
the learning scheme will classify.
The goal is to construct individual user models based upon the user’s own data.
However this results in only a small number of training instances per user. The other
requirement is that the classifier may have no prior knowledge of the user. With these
requirements in mind, the learning mechanism chosen was the Naïve Bayes algorithm
as it works well with sparse datasets [14]. Naïve Bayes works on the assumption that
all attributes are uncorrelated, statistically independent and normally distributed. The
formula for the Naïve Bayes classifier can be expressed as
is the target value which can be any value from the finite set V. is the
probability of the attribute for the given class The probability for the target
value of a particular instance, or of observing the conjunction is the
product of the probabilities of the individual attributes.
During each learning unit observations are made about how different resources are
used. At the end of the learning unit, one instance is created for each target class
value For example, the instances generated for one student after the interaction with
one particular learning unit and four resources are given in Table 1.
Predicting Learning Characteristics 683
The training data is updated with these new instances. The entire training data set
for each student consists of all the instances generated, with equal weighting, from the
learning units that have been used. At the start of each learning unit the predictive
engine is asked to classify the instance that describes what the student spends time on,
what he views first, what he repeatedly views and what helps him to answer
questions, namely the instance illustrated in Table 2.
The range of target values is {VL, LM, VS, MR} one for each class of resource.
For each possible target value the Naive Bayes classifier calculates a probability on
the fly. The probabilities are obtained by counting the frequency of various data
combinations within the training examples. The target class value chosen is the one
with the highest probability. Figure 3 illustrates the main steps in the algorithm of the
predictive engine.
Fig. 3. The algorithm describing how instances are created and predictions made.
684 D. Kelly and B. Tangney
4 Evaluation
Data involving 25 female students from a previous experimental research study [15]
was used to evaluate the accuracy of the predictive engine. Each student interacted
with EDUCE for approximately 40 minutes giving a total of 3381 observations over
the entire group. 840 of these interactions were selections for a particular type of
resource. In each learning unit students had a choice of four different modes of
instruction: VL, VS, MR, LM. As no prior knowledge of student preference is
available, the first learning unit experienced by the student was ignored when doing
the evaluation.
For individual modeling, one approach is to load all of the student’s data at the end
of a session and evaluate the resultant classifier against individual selections made.
The other approach is to evaluate the classifier predictions against user choices made
only using data up to the point the user’s choice was made. This approach simulates
the real behaviour of the classifier when working with incomplete profiles of the
student. The second approach was used as this reflects the real performance when
dynamically making predictions in an online environment
A number of different investigations were made to determine answers to the
following questions
Is it possible to predict if the student will use a resource in a learning unit ?
Is it possible to predict when the student will use a resource in a learning unit ?
What range of resources did students use ?
How often did the prediction of students preferred type of resource change ?
Can removing extreme cases where there is no discernable pattern of behaviour
help in the prediction the preferred resource ?
Each learning unit has up to four types of resources to use. At the start of each unit,
the student’s most preferred type of resource was predicted based on previous
selections the student had made. After the student had completed the learning unit, it
was investigated to see if the student had used the predicted preferred resource. In 75
% of cases the prediction was correct and the student had used the resource. In other
words EDUCE was able to predict with 75 % accuracy that a student will use the
predicted preferred resource. The results suggest that there is a pattern of behaviour
when choosing among a range of resources and that students will continually use their
preferred resource.
In each learning unit, the student can determine the order in which resources can be
viewed. Is it possible to know at what stage the student will use his preferred
resource? When inspecting the learning units where the predicted preferred resource
Predicting Learning Characteristics 685
was used, it was found that in 78 % of cases the predicted preferred resource was used
first, i.e. in the 75% of cases where the prediction was correct the predicted resource
was visited first 78% of the time. The results suggest that it is a challenging
classification to predict the first resource a student will use in a learning unit.
However when the student does use the predicted preferred resource, it will with 78
% accuracy be the first one used. Figure 4 illustrates these results. The analogy is that
of shooting an arrow at a target. 75 % of the time the target is hit and when the target
is hit, 78 % of the time it is a bulls-eye!.
To determine the extent of how stable the predicted preferred resource is, an analysis
was made of the number of times the prediction changed. The average number of
changes in the preferred resource was 1.04. The results suggest that as student’s
progress throughout a tutorial they identify quite quickly which type of resource they
prefer as the predicted resource will on average change once per student.
Did students use all available resources or just a subset of those resources ? By
performing an analysis of the resources selected from those available in each unit, it
was found that students on average used 40 % of the available resources. This result
suggests that students identified for themselves a particular subset of resources which
appealed to them and ignored the rest. But did all students choose the same subset ?
To determine which subset, a breakdown of the resources used against each class of
resource was calculated. Table 3 displays the results.
The even breakdown across all resources suggests that each student chose a
different subset of resources. (If all students chose the same subset of VL and LM
resources, VS and MR would be 0 %). It is interesting to note that the MR approach
appeals to the most number of students and the LM approach appeals to the least
686 D. Kelly and B. Tangney
number of students. Taking this into account, each class of resource is appealing to
different students groups of roughly equal size.
Inspecting students with extreme preferences, very strong and very weak reveals
some further insights, into the modelling of learning characteristics. With one student
with a very strong preference for the VL approach, it could be predicted with 100 %
accuracy that she would use the VL resource in a learning unit, and that with 92 %
accuracy that she would use it first before any other resources. On the other extreme,
some students seem to have a complex selection process not easily recognisable. For
example with one student, it could only be predicted with 33 % accuracy that she
would use her predicted preferred resource in a learning unit and only with 11 %
accuracy that she would used first. In this particular case, the results suggest that she
was picking a different resource in each unit and not looking at alternatives.
Some students will not display easily discernable patterns of behaviour and these
outliers can be removed to get a clearer picture of the prediction accuracy for students
with strong patterns of behaviour. After removing the 5 students with the lowest
prediction rates the prediction accuracy for the rest of group was recalculated. This
resulted in an accuracy of 84 % that the predicted preferred resource will be used and
in an accuracy of 65 % that the predicted preferred resource will be used first in a
learning unit. The suggests that strong predictions can be made about the preferred
class of resource. However predicting what will be used first is still a difficult
classification task.
5 Conclusions
In this paper the predictive engine in EDUCE was described. The prediction task was
defined as the resource students prefer to use. The input representation model is a
fine-grained set of features that describe how the resource is used. On entry to a
particular unit, the learning scheme predicts which resource the student will use.
A number of evaluations were carried out and the performance of the predictive
engine was compared against the real behaviour of students. The purpose of the
evaluation was to determine whether it is possible to model a concept such as learning
characteristics. The results suggest that strong predictions can be made about the
students preferred resource. It can be determined with a relatively high degree of
probability that the student will use the predicted preferred resource in a learning unit.
However to determine if the preferred resource will be used first is more difficult
Predicting Learning Characteristics 687
task. The results also suggest that predictions about the preferred resource are
relatively stable, that students only use a subset of resources and that different
students use different subsets. Combing the results together suggest that there is a
concept such as learning characteristics that is different for alternative groups of
students and that it is possible to model this concept.
Currently empirical studies are taking place to examine the reaction of students to
the predictive engine in EDUCE. The study is examining two instructional strategies,
that is giving students content they like to see and content they do not like to see. The
purpose of these studies is to examine the relationship between instructional strategy
and learning performance.
Future work with the predictive engine involves further analysis in order to
identify the relevance of different features. Other work will involve generalizing the
adaptive engine to use different categories of resources. Here the range of categories
is based on the Multiple Intelligence concept, however that can be easily be replaced
with another set of resources based on a different learning theory.
References
1. Riding, R. & Rayner. S, (1997): Cognitive Styles and learning strategies. David Fulton. .
2. Rasmussen, K. L. (1998): Hypermedia and learning styles: Can performance be
influenced? Journal of Multimedia and Hypermedia, 7(4).
3. Brusilovsky, P. (2001): Adpative Hypermedia. User Modeling and User-Adapted
Instruction, Volume 11, Nos 1-2. Kluwer Academic Publisher
4. Riding, R. J. (1991): Cognitive Styles Analysis, Learning and Training Technology,
Birmingham.
5. Carver, C., Howard, R., & Lavelle, E. (1996): Enhancing student learning by
incorporating learning styles into adaptive hypermedia. 1996 ED-MEDIA Conference on
Educational Multimedia and Hypermedia. Boston, MA.
6. Specht, M. and Oppermann, R. (1998): ACE: Adaptive CourseWare Environment, New
Review of HyperMedia & MultiMedia 4,
7. Stern, M & Woolf. B. (2000): Adaptive Content in an Online lecture system. In:
Proceedings of the First Adpative Hypermedia Conference, AH2000.
8. Castillo, G., Gama, J., Breda, A. (2003): Adaptive Bayes for a Student Modeling
Prediction Task based on Learning Styles. Proceedings of the User Modeling Conference,
Johnstown, PA, USA, 2003.
9. Gilbert, J. E. & Han, C. Y. (1999): Arthur: Adapting Instruction to Accommodate
Learning Style. In: Proceedings of WebNet’99, World Conference of the WWW and
Internet, Honolulu, HI.
10. Milne, S. (1997): Adapting to Learner Attributes, experiments using an adaptive tutoring
system. Educational Pschology Vol 17 Nos 1 and 2, 1997
11. Kelly, D. & Tangney, B. (2002): Incorporating Learning Characteristics into an Intelligent
Tutor. In: Proceedings of the Sixth International on ITSs, ITS2002.
12. Gardner H. (1983) Frames of Mind: The theory of multiple intelligences. New York. Basic
Books.
688 D. Kelly and B. Tangney
13. Kelly, D. (2003). A Framework for using Multiple Intelligences in an ITS. Proceedings of
EDMedia’03, World Conference on Educational Multimedia, Hypermedia & Telecom-
munications, Honolulu, HI.
14. Duda, R. & Hart, P. (1973). Pattern Classification and Scene Analysis. Wiley, New York.
15. Kelly, D. & Tangney, B. (2003). Learner’s responses to Multiple Intelligence
Differentiated Instructional Material in an ITS. Proceedings of the Eleventh International
Conference on Artificial Intelligence in Education, AIED’2003.
Alternative Views on Knowledge:
Presentation of Open Learner Models
Abstract. This paper describes a study in which individual learner models were
built for students and presented to them with a choice of view. Students found
it useful, and not confusing to be shown multiple representations of their
knowledge, and individuals exhibited different preferences for which view they
favoured. No link was established between these preferences and the students’
learning styles. We describe the implications of these results for intelligent tu-
toring systems where interaction with the open learner model is individualised.
1 Introduction
Many researchers argue that open learner modelling in intelligent tutoring systems
may enhance learner reflection (e.g. [1], [2], [3], [4]), and a range of externalisations
for learner models have been explored. In Mr Collins [1], learner and system negoti-
ate over the system’s representation of the learner’s understanding. Vismod [4] pro-
vides a learner with a graphical view of their Bayesian learner model. STyLE-OLM
[2] works with the learner to generate a conceptual representation of their knowledge.
ELM-ART’s [5] learner model is viewed via a topic list annotated with proficiency
indicators. These examples demonstrate quite varied interaction and presentation
mechanisms, but in any specific system, the interaction style remains constant.
It is accepted that individuals learn in different ways and much research into
learning styles has been carried out (e.g. [6], [7]). This suggests not all learners may
benefit equally from all types of interaction with an open learner model. Ideally, the
learner’s model may be presented in whatever form suits them best and they may
interact with it using the mechanism most appropriate to them. In discussion of
learner reflection, Collins and Brown [8] state: “Students should be able to inspect
their performance in different ways”, concluding that multiple representations are
helpful. However, there has been little research on offering a learner model with a
choice of representations or interaction methods. Some studies [9], [10], suggest
benefit in tailoring a learning environment to suit an individuals learning style, so it
may be worth considering learning style as a basis for adapting interaction with an
open learner model.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 689–698, 2004.
© Springer-Verlag Berlin Heidelberg 2004
690 A. Mabbott and S. Bull
This paper describes a study in which we use a short web-based test to construct
simple learner models, representing students’ understanding of control of flow in C
programming. Students are offered a choice of representations of the information in
their model. We aim to assess whether this is beneficial, or if it causes information
overload. We investigate whether there is an overall preference for a particular view,
or whether individuals have particular preferences, and if so, whether it is possible to
predict these from information about their learning style. We also consider other ways
of individualising the interaction with an open learner model, such as negotiation
between learner and system, and comparing individual learner models with those of
peers or for the group as a whole. The system employed is not intended to be a com-
plete intelligent tutoring system, and consists of only those aspects associated with
presenting the learner model. Such an arrangement would not normally be used in
isolation, but is useful for the purpose of investigating the issues described above.
learn X?”. For effective reflection on knowledge, the learner must be able to answer
these questions easily, particularly the first two. Thus a simple and intuitive system
was used where the learner’s knowledge on a topic is represented by a single coloured
node on a scale from grey, through yellow to green, with bright green indicating
complete knowledge and grey indicating none. Where a misconception is detected,
this overrides and the topic is coloured red. This simplicity means that learners should
require little time to familiarise themselves with the environment.
Figures 1 to 4 illustrate the four views available to the learner. Tabs above the
model allow navigation between views, with misconceptions listed above this.
The lectures view (Fig. 1) lists topics according to the order they were presented in
the lecture course. This may aid integration of knowledge gained from using the sys-
tem with knowledge learned from the course, and help students who wish to locate
areas of poor understanding to revise from the lecture slides. Factors such as concep-
tual difficulty and time constraints may affect decisions on the ordering of lecture
material, such that related topics are not always covered together. The related con-
cepts view (Fig. 2) shows a logical, hierarchically structured grouping of subject
matter. This allows a topic to be easily located and may correspond better to a stu-
dent’s mental representation of the course. The concept map view (Fig. 3) presents
the conceptual relationship between the topics. To date, research combining concept
maps (or similar) with open learner models has focused on learner constructed maps
[2], [11], but in the wider context of information presentation, arguments have been
made for the use of pre-constructed concept maps, or the similar knowledge maps
[12], [13]. Finally, the pre-requisites view (Fig. 4) shows a suggested order for
studying topics, similar to Shang et al’s [14] annotated dependency graph.
A student’s choice of view may not be based purely on task, but also preference. If
differences in learning style contribute to these preferences, the Kolb [6] and Felder-
Silverman [7] learning style models may have relevance in the design of the views.
According to these models, learning involves two stages: reception, and subsequent
processing, of information. In terms of reception, Felder and Silverman’s use of the
terms sensing and intuitive is similar to Kolb’s use of concrete and abstract. Sensing
learners prefer information taken in through the senses, while intuitive learners prefer
information arising introspectively. Both models label learners’ preferences for proc-
essing using the terms active or reflective. Active learners like to do something active
with the information while reflective learners prefer to think it over.
The Felder-Silverman model has two further dimensions, sometimes referred to as
dimensions of cognitive style, and defined by Riding & Rayner [15] as “an individ-
ual’s preferred and habitual approach to organising and representing information”.
The sequential-global dimension incorporates Witkin et al.’s [16] notion of field-
dependence/field-independence and Pask’s [17] serialist/holist theory. It describes
whether an individual understands new material through a series of linear steps or by
relating it to other material. The visual-verbal dimension describes which type of
information the individual finds easier to process: text or images.
In a multiple-view system, reflective learners may appreciate the opportunity to
view their knowledge from multiple perspectives while active learners may like to
compare different views to see how they are related. Intuitive learners may use the
concept map and pre-requisites view to focus on conceptual interrelationships, while
sensing learners may favour the simpler lecture-oriented presentation as a link with
the real world. The lectures and related concepts views are more sequentially organ-
ised while the concept map and pre-requisites view may more suit the global learner.
3 The Study
A group of students were given the 30-question test and presented with the four views
on their open learner model. They completed questionnaires indicating their opinions
on the usefulness of the different views, and the experience in general.
3.1 Subjects
Subjects were 23 Electronic, Electrical, and Computer Engineering students studying
a module entitled “Educational Technology”. Eighteen of these, on a one-year MSc
programme, had undertaken the course called “Introduction to Procedural Program-
Alternative Views on Knowledge: Presentation of Open Learner Models 693
ming and Software Design”. The remainder, finalists on a four-year MEng pro-
gramme, had previously covered the similar “Introduction to Computing Systems and
C Programming”. The subjects had yet to be introduced to the idea of open learner
modelling or indeed intelligent tutoring systems more generally.
3.3 Results
Students spent between 8 and 30 minutes on the test, scoring from 8 to 29 out of 30.
All but two students were identified as holding at least one misconception. None had
more than four. Seven students discovered that they could send multiple test submis-
sions. The maximum number of submissions from an individual was seven.
In the first questionnaire, students rated, on a five-point scale, how useful they
found each view. The number selecting each option is shown in Table 1. For com-
694 A. Mabbott and S. Bull
Responses appear more neutral regarding comparisons with peer or group models.
More detailed analysis of the individual responses shows a number of students are
very interested in comparing models, but this is offset by a number of students who
have very little interest. During the study, many students were seen viewing each
other’s models for comparison purposes, without any prompting to do so. One student
remarked that he would like to see “the distribution of other participant’s answers”,
another said: “Feedback in comparison to other students would be useful”.
Nineteen students completed the ILS questionnaire [18]. Table 3 shows the aver-
age learning style scores (in each of the four dimensions) for the group as a whole
Alternative Views on Knowledge: Presentation of Open Learner Models 695
compared to the average scores for the students who favour each view. The similarity
between the overall figures and the figures for each view indicates no obvious link
between any of the style dimensions and preferences for presentation form. The re-
sults also show that in most style categories the distribution is biased towards one end
of the scale. In the poll regarding accuracy of the ILS, seventeen students voted that
they “agreed” with their results, while two abstained.
3.4 Discussion
The important question is whether providing multiple views of an open learner model
may enhance learning. It is argued that an open learner model may help the learner to
become aware of their current understanding and reflect upon what they know, by
raising issues that they may not otherwise have considered [20]. A motivation for
providing multiple views of the learner model is that this reflection may be enhanced,
if the learner may view their model in a form they are most comfortable with. As each
representation was found to have several students regarding it the most useful, then if
any view were removed, some students would be left using a form they consider less
useful, and their quality of reflection may reduce. Providing a representation students
find more useful may help to counter problems discussed by Kay [21] or Barnard and
Sandberg [22], where few or no students viewed their model.
In addition to having knowledge represented in the most useful form, results show
that having multiple representations is considered useful. Students are not confused
by the extra information, as indicated by the fact that only two gave a negative re-
sponse to how easily they could tell the strength of their knowledge from the model.
It is important to remember that the information for the study comes from students’
self-reports on an open learner model in isolation. It does not necessarily follow that a
multiple-view system helps provide better reflection or increased learning, only that
students believe it may help. Nor can we assume students know which representation
is best for them. Such a system needs evaluating within an intelligent tutoring system.
Positive results here suggest this may be a worthwhile next step.
High levels of agreement with the system’s representation validate the modelling
technique used. However, they raise questions about the possibility of including a
negotiation mechanism, the intention of which would be to improve the accuracy of
the model and provide more dynamic interaction for active learners. While Bull &
Pain [1] conclude that students will negotiate their model with the system in cases of
696 A. Mabbott and S. Bull
4 Summary
This paper has described a study where students were presented with their open
learner models and offered a choice of how to view them. The aim was to investigate
Alternative Views on Knowledge: Presentation of Open Learner Models 697
whether this may be beneficial, and how it might integrate into an intelligent tutoring
system where the interaction with the open learner model is individualised.
Results suggest students can use a simple open learner model offering multiple
views on their knowledge without difficulty. Students show a range of preferences for
presentation so such a system can help them view their knowledge in a form they are
comfortable with, possibly increasing quality of reflection. Results show no clear link
with learning styles, but students were capable of selecting a view for themselves, so
intelligent adaptation of presentation to learning style does not seem beneficial.
A colour-based display of topic proficiency proved effective in conveying knowl-
edge levels, but to improve the quality of the experience, a much greater library of
misconceptions must be built with more detailed feedback available in the form of
evidence from incorrectly answered questions. Allowing the student to state confi-
dence in answers may be investigated as a means of improving the diagnosis. The
student should have the facility to inspect their learner model whenever they choose.
The limitations of self-reports and using a small sample of computer-literate sub-
jects necessitate further studies before drawing stronger conclusions. The educational
impact of multiple presentations must be evaluated in an environment where increases
in subjects’ understanding may be observed over time, and using subjects with less
computer aptitude. A learner model with several presentations is only the first part of
an intelligent tutoring system where the interaction with the model is personalisable.
Further studies may investigate individualising other aspects of the interaction, such
as negotiation of the model. Students like the idea of comparing models with others
and investigation may show which learners find this most useful.
References
1. Bull, S. and Pain, H.: “Did I Say What I Think I Said, and Do You Agree With Me?”:
Inspecting and Questioning the Student Model. Proceedings of World Conference on Arti-
ficial Intelligence in Education, Charlottesville, VA (1995) 501-508
2. Dimitrova, V.: StyLE-OLM: Interactive Open Learner Modelling. International Journal of
Artificial Intelligence in Education, Vol 13 (2002) 35-78
3. Kay, J.: Learner Know Thyself: Student Models to Give Learner Control and Responsi-
bility. Proc. of Intl. Conference on Computers in Education, Kuching, Malaysia (1997)
18-26
4. Zapata-Rivera, J.D., and Greer, J.: Externalising Learner Modelling Representations.
Workshop on External Representations of AIED: Muliple Forms and Multiple Roles. In-
ternational Conference on Artificial Intelligence in Education (2001) 71-76
5. Weber, G. and Specht, M.: User Modeling and Adaptive Navigation Support in WWW-
Based Tutoring Systems. Proceedings of User Modeling ’97 (1997) 289-300
6. Kolb, D. A.: Experiential Learning: Experience as the Source of Learning and Develop-
ment. Prentice-Hall, New Jersey (1984)
7. Felder, R. M. and Silverman, L. K.: Learning and Teaching Styles in Engineering Educa-
tion. Engineering Education, 78(7) (1998) 674-681.
698 A. Mabbott and S. Bull
8. Collins, A. and Brown, J. S.: The Computer as a Tool for Learning through Reflection. In
H. Mandl and A. Lesgold (eds.) Learning Issues for Intelligent Tutoring Systems.
Springer-Verlag, New York (1998) 1-18
9. Bajraktarevic, N., Hall, W., and Fullick, P.: Incorporating Learning Styles in Hypermedia
Environment: Empirical Evaluation. Proceedings of the Fourteenth Conference on Hy-
pertext and Hypermedia, Nottingham (2003) 41-52
10. Carver, C. A.: Enhancing Student Learning through Hypermedia Courseware and Incorpo-
ration of Learning Styles. IEEE Transactions on Education 42(1) (1999) 33-38
11. Cimolino, L., Kay, J. and Miller, A.: Incremental Student Modelling and Reflection by
Verified Concept-Mapping. Proc. of Workshop on Learner Modelling for Reflection, In-
ternational Conference on Artificial Intelligence in Education, Sydney (2003) 219-227
12. Carnot, M. J., Dunn, B., Cañas, A. J.: Concept Maps vs. Web Pages for Information
Searching and Browsing. Available from the Institute for Human and Machine Cognition
website: http://www.ihmc.us/users/acanas/Publications/CMapsVSWebPagesExp1/CMaps-
VSWebPagesExp1.htm, accessed 18/05/2004 (2001)
13. O’Donnell, A. M., Dansereau, D. F. and Hall, R. H.: Knowledge Maps as Scaffolds for
Cognitive Processing. Educational Psychology Review, 14 (1) (2002) 71-86
14. Shang, Y., Shi, H. and Chen, S.: An Intelligent Distributed Environment for Active
Learning. Journal on Educational Resources in Computing 1(2) (2001) 1-17
15. Riding, R. and Rayner, S.: Cognitive Styles and Learning Strategies. David Fulton Pub-
lishers, London (1998)
16. Witkin, H.A., Moore, C.A., Goodenough, D.R. and Cox, P.W.: Field-Dependent and
Field-Independent Cognitive Styles and Their Implications. Review of Educational Re-
search 47 (1977) 1-64.
17. Pask, G.: Styles and Strategies of Learning. British Journal of Educational Psychology 46.
(1976) 128-148.
18. Felder, R. M. and Soloman, B. A.: Index of Learning Styles. Available:
http://www.ncsu.edu/felder-public/ILSpage.html, accessed 24/02/04 (1996)
19. Felder, R.: Author’s Preface to Learning and Teaching Styles in Engineering Education.
Avail.: http://www.ncsu.edu/felder-public/Papers/LS-1988.pdf, accessed 05/03/04 (2002)
20. Bull, S., McEvoy, A. and Reid, E.: Learner Models to Promote Reflection in Combined
Desktop PC/Mobile Intelligent Learning Environments. Proceedings of Workshop on
Learner Modelling for Reflection, International Conference on Artificial Intelligence in
Education, Sydney (2003) 199-208.
21. Kay, J.: The um Toolkit for Cooperative User Modelling. User Modelling and User
Adapted Interaction. 4, Kluwer, Netherlands (1995) 149-196
22. Barnard, Y., and Sandberg, J. Self-explanations, Do We Get them from Our Students?
Proc. of European Conf. on Artificial Intelligence in Education. Lisbon (1996) 115-121
23. Loo, R.: Kolb’s Learning Styles and Learning Preferences: Is there a Linkage? Educa-
tional Psychology, 24 (1) (2004) 98-108
24. Linton, F., Joy, D., Schaefer, P., Charron, A.: OWL: A Recommender System for Organi-
zation-Wide Learning. Educational Technology & Society, 3(1) (2000) 62-76
25. Bull, S. and Broady, E.: Spontaneous Peer Tutoring from Sharing Student Models. Pro-
ceedings of Artificial Intelligence in Education ’97. IOS Press (1997) 143-150
26. Beck, J., Stern, M. and Woolf, B. P.: Cooperative student models. Proceedings of Artifical
Intelligence in Education ’97. IOS Press (1997) 127-134
Modeling Students’ Reasoning About
Qualitative Physics:
Heuristics for Abductive Proof Search
1 Introduction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 699–709, 2004.
© Springer-Verlag Berlin Heidelberg 2004
700 M. Makatchev, P.W. Jordan, and K. VanLehn
Fig. 2. An informal proof of the excerpt “The keys would be pressed against the ceiling
of the elevator” (From the essay in Figure 1). The buggy assumption is preceded by
an asterisk.
wrongly assumed that the elevator is not in freefall. A highly plausible wrong
assumption in the student’s reasoning triggers an appropriate tutoring action [6].
The theorem prover, called Tacitus-lite+, is a derivative of SRI’s Tacitus-
lite that, among other extensions, incorporates sorts (sorts will be described in
Section 2.3) [7, p. 102]. We further adapted Tacitus-lite+ to our application by
(a) adding meta-level consistency checking, (b) enforcing a sound order-sorted
inference procedure, and (c) expanding the proof search heuristics. In the rest of
the paper we will refer to the prover as Tacitus-lite when talking about features
present in the original SRI release, and as Tacitus-lite+ when talking about more
recent extensions.
The goal of the proof search heuristics is to maximize (a) the measure of
plausibility of the proof as a model of a student’s reasoning and (b) the measure
of utility of the proof for generating tutoring feedback. The measure of plausibil-
ity can be evaluated with respect to the misconceptions that were identified as
present in the essay by the prover and by a human expert. A more precise plau-
sibility measure may take into account plausibility of the proof as a whole. The
measure of utility for the tutoring task can be interpreted in terms of relevance
Modeling Students’ Reasoning About Qualitative Physics 701
of the tutoring actions (triggered by the proof) to the student’s essay, whether
the proof was plausible or not.
A previous version of Tacitus-lite+ was evaluated as part of the Why2-Atlas
evaluation studies, as well as on its own. The stand-alone evaluation uses man-
ually constructed propositional representation of essays, to measure the perfor-
mance of the theorem prover (in terms of the recognition of misconceptions in
the essay) on ‘gold’ input [8]. The results of the latter evaluation were encour-
aging enough for us to continue development of the theorem proving approach
for essay analysis.
In this paper we focus on the recent additions to the set of proof search
heuristics for Tacitus-lite+: a specificity-sensitive assumption cost and a rule
choice preference that is based on the similarity between the graph of cross-
references between the propositions in a candidate rule and the graph of cross-
references between the set of goals. The paper is organized as follows: Section 2
introduces knowledge representation aspects of the prover; Section 3 defines the
order-sorted abductive inference framework and describes the new proof search
heuristics; finally, a summary is given in Section 4.
Fig. 3. Representation for “The keys have a downward acceleration due to gravity.”
The atoms are paired with their sorted signatures.
We adopted first-order predicate logic with sorts [24] as the representation lan-
guage. Essentially, it is a first-order predicate language that is augmented with
an order-sorted signature for its terms and predicate argument places. For the
sake of computational efficiency and since function-free clauses are the natural
output of the sentence-level understanding module (see Section 1), we do not
implement functions, instead we use cross-referencing between atoms by means
of shared variables. There is a single predicate symbol for each rela-
tion. For this reason predicate symbols are omitted in the actual representation.
Each atom is indexed with a unique identifier, a constant of sort Id. The identi-
fiers, as well as variable names, can be used for cross-referencing between atoms.
For example, the proposition “The keys have a downward acceleration due to
gravity” is represented as shown in Figure 3, where a1, d1, and ph1 are atom
identifiers. For this example we assume (a) a fixed coordinate system, with a
vertical axis pointing up (thus Dir value is neg); (b) that the existence of an
acceleration is equivalent to existence of a nonzero acceleration (thus Mag-zero
value is nonzero).
2.4 Rules
Fig. 4. Representation for the rule “If the velocity of a body is zero over a time interval
then its initial position is equal to its final position.”
3 Abductive Reasoning
3.1 Order-Sorted Abductive Logic Programming
Similar to [25] we define the abductive logic programming framework as a triple
where T is the set of givens and rules, A is the set of abducible atoms
(potential hypotheses) and I is a set of integrity constraints. Then an abductive
explanation of a given set of sentences G (observations) consists of (a) subset
of abducibles A such that and satisfies I together with
(b) the corresponding proof of G. Since an abductive explanation is generally
not unique, various criteria can be considered for choosing the most suitable
explanation (see Section 3.2).
An order-sorted abductive logic programming framework is an ab-
ductive logic programming framework with all atoms augmented with the sorts
of their argument terms (so that they are sorted atoms) [8]. Assume the follow-
ing notation: a sorted atom is of the form where the
term is of the sort Then, in terms of unsorted predicate logic, formula
can be written as For our domain we restrict the
sort hierarchy to a tree structure that is naturally imposed by set semantics and
that has the property where is
equivalent to
Tacitus-lite+ does backward chaining using the order-sorted version of modus
ponens:
Modeling Students’ Reasoning About Qualitative Physics 705
from Figure 4 can be applied to prove the goal “(Axial, or total) position of
?body3 has magnitude ?mag–num3”:
The same rule can be applied to prove the more specific goal “Horizontal
position of ?body3 has magnitude ?mag-num3”:
and will generate the more specific subgoal “Horizontal velocity of ?body3 is
zero”:
Rule choice heuristics. Although the rules in Tacitus-lite are applied to prove
individual goal atoms, a meaningful proposition usually consists of a few atoms
cross-referenced via shared variables (see Section 2.3). When a rule is used to
prove a particular goal atom, (a) a unifier is applied to the atoms in the head
and the body of the rule; (b) atoms from the head of the rule are added to the
list of proven atoms; and (c) atoms from the body of the rule are added to the
list of goals. Consequently, suppose there exists a unifier that unifies both (a)
a goal atom with an atom from the head of the rule
so that can be proved with R via modus ponens, and (b) a goal atom
Modeling Students’ Reasoning About Qualitative Physics 707
with an atom from the head of the rule R so that can be proved via R.
Then, proving goal via R (and applying to and adds the atom
to the list of provens thus allowing for its potential factoring with goal In
effect, a single application of a rule in which its head atoms match multiple goal
atoms can result in proving multiple goal atoms via a number of subsequent
factoring steps. This property of the prover is consistent (a) with backchaining
using modus ponens (1), and (b) with the intuitive notion of cognitive economy,
namely that the shortest (by the total number of rule applications) proofs are
usually considered good by domain experts.
Moreover, if an atom in the body of R can be unified with a goal
then the application of rule R will probably not result in an increase of the total
cost of the goals due to the new goal since it is possible to factor it with
and set the cost of the resultant atom as the minimum of the costs of
and In other words, applying a rule where multiple atoms in its head and
body match multiple goal atoms is likely to result in a faster reduction of the
goal list, and therefore a shorter final proof.
The new version of Tacitus-lite+ extends the previous rule choice heuristics
described in [9] with rule choice based on the best match between the set of
atoms in a candidate rule and the set of goal atoms. To account for the structure
of cross-references between the atoms, a labeled graph is constructed offline for
every rule, so that the atoms are vertices labeled with respective sorted signatures
and the cross-references are edges labeled with pairs of respective argument
positions. Similarly a labeled graph is built on-the-fly for the current set of goal
atoms. The rule choice procedure involves comparison of the goal graph and
graphs of candidate rules so that the rule that maximizes the graph matching
metric is preferred.
The match metric between two labeled graphs is based on the size of the
largest common subgraph (LCSG). We have implemented the decision-tree-based
LCSG algorithm proposed in [27]. The advantage of this algorithm is that the
time complexity of its online stage is independent of the size of the rule graph:
if is the number of vertices in the goal graph, then the time complexity of the
LCSG is
Since the graph matching includes independent subroutines for matching ver-
tices (atoms with sorted signatures) and matching edges (cross-referenced atom
arguments), the precision of both match subroutines can be varied to balance the
trade-off between search precision and efficiency of the overall matching proce-
dure. Currently we are evaluating the performance of the theorem prover under
various settings.
4 Conclusion
We described an application of theorem proving for analyzing student’s essays
in the context of an interactive tutoring system. While formal methods have
been applied to student modeling, there are a number of challenges to overcome:
representing varying levels of formality in student language, the limited scope of
708 M. Makatchev, P.W. Jordan, and K. VanLehn
the rule base, and limited resources for generating explanations and consistency
checking. In our earlier paper [9] we argued that a weighted abduction theorem
proving framework augmented with appropriate proof search heuristics provides
a necessary deep-level understanding of a student’s reasoning. In this paper we
describe the recent additions to our proof search heuristics that have the goal of
improving the plausibility of the proofs as models of students’ reasoning as well
as the computational efficiency of the proof search.
Acknowledgments. This work was funded by NSF grant 9720359 and ONR
grant N00014-00-1-0600. We thank the entire Natural Language Tutoring group,
in particular Michael Ringenberg and Roy Wilson for their work on Tacitus-lite+,
and Uma Pappuswamy, Michael Böttner, and Brian ‘Moses’ Hall for their work
on knowledge representation and rules.
References
1. VanLehn, K., Jordan, P., Rosé, C., Bhembe, D., Böttner, M., Gaydos, A.,
Makatchev, M., Pappuswamy, U., Ringenberg, M., Roque, A., Siler, S., Srivas-
tava, R.: The architecture of Why2-Atlas: A coach for qualitative physics essay
writing. In: Proceedings of Intelligent Tutoring Systems Conference. Volume 2363
of LNCS., Springer (2002) 158–167
2. Rosé, C., Roque, A., Bhembe, D., VanLehn, K.: An efficient incremental archi-
tecture for robust interpretation. In: Proceedings of Human Language Technology
Conference, San Diego, CA. (2002)
3. Jordan, P., VanLehn, K.: Discourse processing for explanatory essays in tuto-
rial applications. In: Proceedings of the 3rd SIGdial Workshop on Discourse and
Dialogue. (2002)
4. Poole, D.: Probabilistic Horn abduction and Bayesian networks. Artificial Intelli-
gence 64 (1993) 81–129
5. Young, R.M., O’Shea, T.: Errors in children’s subtraction. Cognitive Science 5
(1981) 153–177
6. Jordan, P., Makatchev, M., Pappuswamy, U.: Extended explanations as student
models for guiding tutorial dialogue. In: Proceedings of AAAI Spring Symposium
on Natural Language Generation in Spoken and Written Dialogue. (2003) 65–70
7. Hobbs, J., Stickel, M., Martin, P., Edwards, D.: Interpretation as abduction. In:
Proc. 26th Annual Meeting of the ACL, Association of Computational Linguistics.
(1988) 95–103
8. Makatchev, M., Jordan, P.W., VanLehn, K.: Abductive theorem proving for ana-
lyzing student explanations to guide feedback in intelligent tutoring systems. To
appear in Journal of Automated Reasoning, Special issue on Automated Reasoning
and Theorem Proving in Education (2004)
9. Jordan, P., Makatchev, M., VanLehn, K.: Abductive theorem proving for analyzing
student explanations. In: Proceedings of International Conference on Artificial
Intelligence in Education, Sydney, Australia, IOS Press (2003) 73–80
10. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic anal-
ysis. Discourse Processes 25 (1998) 259–284
Modeling Students’ Reasoning About Qualitative Physics 709
11. McCallum, A., Nigam, K.: A comparison of event models for naive bayes text
classification. In: Proceeding of AAAI/ICML-98 Workshop on Learning for Text
Categorization, AAAI Press (1998)
12. Jonassen, D.: Using cognitive tools to represent problems. Journal of Research on
Technology in Education 35 (2003) 362–381
13. Conati, C., Gertner, A., VanLehn, K.: Using bayesian networks to manage uncer-
tainty in student modeling. Journal of User Modeling and User-Adapted Interac-
tion 12 (2002) 371–417
14. Zapata-Rivera, J.D., Greer, J.: Student model accuracy using inspectable bayesian
student models. In: International Conference of Artificial Intelligence in Education,
Sydney, Australia (2003) 65–72
15. Charniak, E., Shimony, S.E.: Probabilistic semantics for cost based abduction. In:
Proceedings of AAAI-90. (1990) 106–111
16. Matsuda, N., VanLehn, K.: GRAMY: A geometry theorem prover capable of
construction. Journal of Automated Reasoning 32 (2004) 3–33
17. Murray, W.R., Pease, A., Sams, M.: Applying formal methods and representations
in a natural language tutor to teach tactical reasoning. In: Proceedings of Interna-
tional Conference on Artificial Intelligence in Education, Sydney, Australia, IOS
Press (2003) 349–356
18. Self, J.: Formal approaches to student modelling. In McCalla, G.I., Greer, J.,
eds.: Student Modelling: the key to individualized knowledge-based instruction.
Springer, Berlin (1994) 295–352
19. Dimitrova, V.: STyLE-OLM: Interactive open learner modelling. Artificial Intelli-
gence in Education 13 (2003) 35–78
20. Forbus, K., Carney, K., Harris, R., Sherin, B.: A qualitative modeling environment
for middle-school students: A progress report. In: QR-01. (2001)
21. Ploetzner, R., Fehse, E., Kneser, C., Spada, H.: Learning to relate qualitative and
quantitative problem representations in a model-based setting for collaborative
problem solving. The Journal of the Learning Sciences 8 (1999) 177–214
22. Reimann, P., Chi, M.T.H.: Expertise in complex problem solving. In Gilhooly,
K.J., ed.: Human and machine problem solving. Plenum Press, New York (1989)
161–192
23. de Kleer, J.: Multiple representations of knowledge in a mechanics problem-solver.
In Weld, D.S., de Kleer, J., eds.: Readings in Qualitative Reasoning about Physical
Systems. Morgan Kaufmann, San Mateo, California (1990) 40–45
24. Walther, C.: A many-sorted calculus based on resolution and paramodulation.
Morgan Kaufmann, Los Altos, California (1987)
25. Kakas, A., Kowalski, R.A., Toni, F.: The role of abduction in logic programming.
In Gabbay, D.M., Hogger, C.J., Robinson, J.A., eds.: Handbook of logic in Artificial
Intelligence and Logic Programming. Volume 5. Oxford University Press (1998)
235–324
26. Stickel, M.: A Prolog-like inference system for computing minimum-cost abduc-
tive explanations in natural-language interpretation. Technical Report 451, SRI
International, 333 Ravenswood Ave., Menlo Park, California (1988)
27. Shearer, K., Bunke, H., Venkatesh, S.: Video indexing and similarity retrieval by
largest common subgraph detection using decision trees. Pattern Recognition 34
(2001) 1075–1091
From Errors to Conceptions – An Approach to Student
Diagnosis
Carine Webber
1 Introduction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 710–719, 2004.
© Springer-Verlag Berlin Heidelberg 2004
From Errors to Conceptions – An Approach to Student Diagnosis 711
taking into account student’s actions related to a particular task, the system is able to
provide explanation on the student’s reasoning by recognizing subjacent knowledge.
More precisely, we consider that a learning system must be able to represent student’s
actions in terms of knowledge used in a problem solving activity. In this direction, we
will introduce the Conception Model (section 3), which allows the representation of
errors in terms of knowledge having a specific domain of validity. Next, we will
briefly describe the spatial multiagent diagnosis approach we have implemented (sec-
tion 4). Finally, experiments we have carried out and the evaluation of the results that
have been obtained will be presented (sections 5 and 6).
In educational technology, the user model has deserved intensive research effort in
the last three decades but, so far, the best method has not been found. The very first
method employed was the method of overlay. This method assumes that student’s
knowledge is a subset of the expert’s knowledge in one domain. Learning is related to
the acquisition of expert’s knowledge, which is absent from the incomplete student’s
model. A learning environment based on this approach will try to create interactions
with the student in order to enrich student’s model and approximate it to the expert
one. Easy to implement, the overlay method was unable to give account of the stu-
dent’s misconceptions in the domain. Once the overlay model represents student’s
knowledge according to the scope of an expert model, it does not take into account
anything beyond that. It means that any knowledge outside expert’s knowledge is not
recognized and often taken by the system as incorrect knowledge. On these terms,
overlay modeling situates student’s knowledge as correct or incorrect regarding ex-
pert’s knowledge. If the student fails, the environment tries to apply different avail-
able learning strategies until the student succeeds. West [5] and Guidon [7] are sys-
tems based on overlay model.
The first solution to overcome limitations on the overlay model was to construct
bug libraries, or databases of misconceptions, which have originated the perturbation
model. The term bug, imported from computer science, was used to represent the
errors of systematic type. As static libraries have very quickly proved to be difficult
to construct and to maintain, machine learning algorithms have been applied in order
to overcome limitations of bug libraries construction and maintenance by inducting
bugs from examples of student’s behaviors. The perturbation model differs from the
overlay model since it does not perceive student’s knowledge as a simplification of
expert knowledge, but rather like perturbations over the expert knowledge.
Perturbation model is the first considered as a dynamic one, since it could evolve
using machine learning techniques. Such techniques were employed for the learning
and the discrimination of systematic errors and procedures of resolution. Errors were
identified from the analyses of protocols of students, or they were learned by using
machine learning algorithms (and in this case a representative set of examples were
required). Such algorithms allow as well modeling student’s intentions when solving
problems by associating actions with plans of resolution, that students could use in
712 C. Webber
the context of a problem. Ideally each systematic error could be associated to an erro-
neous conception in the domain. Among the systems built on the perturbation model,
we quote Buggy [4], and Andes [8]. Buggy is a system developed as an educational
game to prepare future teachers. In another field, Andes is a tutoring system in the
domain of physics for college students.
The third approach that we discuss here is the model tracing, which comes from
ACT theory (Active Control of Thought), proposed by Anderson [1]. Systems based
on the model tracing approach work in parallel with the student, simulating his be-
havior on each step toward the problem solution. This allows the system to interact
with him on each step. However, the system must be able to reconstruct each step of a
solution in order to simulate and understand student’s reasoning about the problem.
Each step of the resolution is a production rule; correct and incorrect rules need to be
represented. Once an error is detected by the system, an immediate feedback is gener-
ated. In fact, this model exerts a control on the solution built by the student, by pro-
tecting him from developing his solution in a direction that would not lead it to the
correct solution. Knowledge acquisition is attested by the application of correct rules.
This approach was implemented by John Anderson and his group in three domains:
LISP language with Lisp Tutor, the elementary geometry with Geometry Tutor, and
the algebra with Algebra I e II.
Important research has been developed about student’s conceptions. A relevant syn-
thesis is presented by Confrey, whose work has concerned the paradigm of “miscon-
ceptions” (erroneous conceptions) [9]. According to Confrey, if we attentively look
for a sense to a wrong answer given by a student, we may discover that it is reason-
able. The problem of dealing with student’s mistakes and misconceptions has been as
well deeply studied by Balacheff [2]. According to him, when analyzing students’
behavior, it must be considered the existence of contradictory and incorrect mental
structures from the viewpoint of an observer. Such (contradictory and incorrect)
mental structures may however be seen as coherent once applied to a particular con-
text (a class of problems or tasks). Following these important principles, when a stu-
dent solves a problem, he employs coherent and justifiable knowledge related to the
particular learning situation.
Although student’s knowledge may be recognized as contradictory or wrong
throughout multiple interactions, they can be taken as temporarily stable knowledge
structures. One main principle of our work is to consider that any topic of knowledge
has a valid domain, which characterizes it as knowledge. The matter of understanding
which valid domain was given by a student to one topic of knowledge is a condition
for a computer-based system to construct hypothesis about student’s behavior. In this
sense, the Conception Model, that we introduce here, constitutes a model with cogni-
tive and epistemic basis for representing and formalizing the student’s knowledge and
its valid domain. The conception model has been developed by researchers in the
field of mathematics education and the formalization that we employ was proposed
From Errors to Conceptions – An Approach to Student Diagnosis 713
by Balacheff [2]. On the purpose of this work, we consider the conception model as
the appropriated theoretical framework for representing student’s knowledge; its
formal model is presented in the next section.
Usually the word conception is taken in a very general sense for authors in the com-
puters in education’s field. Some of them use the word conception concerning some-
thing conceived in the mind, like a thought or an idea. In our sense, a conception is a
well-defined structure that can be ascribed by an observer to a student according to
his behavior. As our work is concerned with problem solving, we consider that a
conception has a valid domain; a domain in which the conception applies correctly.
Nonetheless describing precisely conceptions is a difficult problem, thus we use a
model developed in mathematics education with a cognitive foundation. In this model
a conception is characterized by a quadruplet where:
P represents a set of problems, which describe the conception’s domain of valid-
ity, it is the seminal context where the conception may appear;
R represents a set of operators or rules, which are involved in the solutions of
problems from P;
L is a representation system, it allows the representation of problems and opera-
tors;
is a control structure, which guarantees that the solution holds the conception’s
definition, it allows making choices and decision in the solution process.
We pursue this section by presenting examples of conceptions in the domain of re-
flection.
A common conception that students hold about reflection is the conception of “paral-
lelism” (figure 1). Holding such conception, students believe that two line segments
are symmetrical if they are parallel. We can easily observe that for some configura-
tions (figure 1, frame a), two symmetrical line segments are effectively parallels, even
though this condition is not always true (figure 1, frame b).
The field of reflection gave matter to several studies on the conceptions carried by
students and on their evolution in a learning process [3, 10]. Additional conceptions
714 C. Webber
The purpose of the micro level is to characterize the students’ state of knowledge
during a problem solving activity, in order to construct an image of students’ cogni-
tive abilities. A set of these images will allow observing the behavior of a particular
student and attesting changes in problem solving procedures, for instance.
The micro level is modeled by a multiagent system whose agents have sensors to
elements from the conception model (problems, operators, and control structures).
The multiagent system is composed by 150 different agents. They share an environ-
ment where the problem configuration and the proof (representing student solution)
are described. Agents react to the presence (of an encapsulated element) in the envi-
ronment. Interactions between agents and the environment follow the stimulus-
response model.
Once an agent perceives its own encapsulated element in the environment, it be-
comes active. Active agents have a particular behavior towards the spatial organiza-
tion of the society. Agents’ behavior has been formally described at [12]. A spatial
multiagent approach has been implemented where agents share a n-dimensional issue
space and form coalitions according to their proximity. Agents’ behavior is based on
group-decision making strategies (a spatial voting mechanism) and coalition forma-
tion [11]. Diagnosis is not seen as an exclusive function of an agent, but the result of
a collective decision making process. Dynamically agents organize themselves in a
spatial configuration according to the affinity of the particles they encapsulate.
Agents form coalitions, which positions in the Euclidian space represent conceptions.
When the process of coalition formation ends, groups of agents as spatially organized.
The winner coalition represents the conception(s) held by the student (as parallelism
shown on figure 1, for instance) that the majority considers to be the state of knowl-
edge of the student being observed. Coalitions of agents are observed and finally
interpreted by the macro-level in terms of conceptions.
Macro level has as a main goal to observe and interpret the final state of micro-level
agents in terms of a diagnosis result. The macro level has been modeled as a multi-
agent learning environment called Baghera [3]. One or more agents may have the role
of observing and interpreting the micro level. In the case of our implementation, we
have ascribed this role to a Tutor agent.
The role of tutor agents comprehends as well to decide about the better strategy to
apply in order to reach the goal of learning. It may include to propose a new activity
to the student in order to reinforce correct conceptions or to confront the student with
more complex situations; to show examples or counterexamples; to promote interac-
tions with pairs or teachers.
716 C. Webber
In order to carry out the necessary tests and analyze the results obtained from the
diagnoses of conceptions, we have created a corpus of solutions of students for five
problems. Problems that have been proposed belong to the domain of reflection and
involved proving that a line segment has a symmetrical one with respect to an axis.
As an example, consider for instance figure 3, where the problem was to prove, using
geometrical properties of reflection, that the line segment [NM] has a symmetrical
segment with respect to axis (d).
6 Evaluating Results
The purpose of this evaluation it to compare results coming from automatic diagnosis
to those obtained from human diagnosis. In order to realize this task, it has been cre-
ated a corpus containing students’ solutions given to five problems in the domain of
reflection. Around 150 students (11-15 years old) have participated solving problems
in a paper-pencil format. From the whole corpus, the work done by 28 students has
been chosen to be analyzed. The choice of students was made based on their diversity
of solutions presented and on the apparent engagement of students in the activities.
From Errors to Conceptions – An Approach to Student Diagnosis 717
Once these two-steps have been concluded, students’ solutions were submitted to
the diagnosis of 3 teams of researchers in mathematics education (Did@TIC team
from Grenoble (France), Math Education teams from the University of Pisa (Italy)
and University of Bristol (UK) [3]). Besides ascribing a diagnosis in terms of four
different conceptions to the solution presented by each student, each team of re-
searchers was asked to present arguments in order to justify their diagnosis. In paral-
lel, solutions were submitted as well to the automatic diagnosis system.
Once human and automatic diagnoses concluded, we were able to compare their
results. Four situations have been identified among human and automatic diagnoses:
total convergence, partial convergence, divergence and finally, situations were a com-
parison was not possible to be realized.
Situations of total convergence: in 17 cases (out of 28) human and automatic di-
agnoses have fully converged to the same diagnosis.
Situations of partial convergence: in 4 cases (out of 28) a situation of partial con-
vergence was observed. This situation occurs when at least one human diagnosis con-
verges to the automatic diagnosis. In a few cases, human teams have ascribed a low
degree of confidence to the diagnosis to reflect their uncertainty.
Situations of divergence: in 2 cases automatic and human teams have diverged
about the diagnosis ascribed to the solution.
Impossible to compare: in 5 cases comparison among the diagnoses could not be
carried out because of the great number of divergences between human teams and
abstentions.
In the next section, we proceed with an analysis of the divergent situations.
Two cases of divergence between human and automatic diagnoses were detected. In
both cases, the three human teams have converged towards an identical diagnosis. In
order to understand the divergent behavior of the system, it is important to verify the
arguments given by human teams used to justify the diagnoses. At this point, differ-
ences between human and automatic diagnoses become apparent.
In both cases of divergence, human teams have made a remark that students have
not employed clear steps to construct the solutions. Note that all problems have in-
volved the construction of a proof. It happened actually that they have employed
rather general properties and operators of Geometry trying to justify a preconceived
solution based on the graphical representation of the problem. Because of that, the
steps of the proof given by the students were not logically coherent with the given
answer. Human teams were able, without any effort, to identify such behavior,
whereas automatic diagnosis was not. Thus answers given by students strongly guided
human diagnoses. Concerning automatic diagnosis, agents engaged in the diagnosis
task were representing rather general notions of reflection. Even though students have
chosen a wrong answer to the 2 problems, they were not able to justify them by the
means of proving. This explains why the system was not able to exhibit a convergent
diagnosis.
718 C. Webber
We have attested a strong coherence between automatic and human diagnoses. In clear
cases when human diagnoses fully converged with a high degree of confidence (17
cases), automatic diagnoses have as well converged to the formation of one unique
coalition.
When handling incomplete or not easily interpretable cases, we have observed that
human teams had ascribed a low degree of confidence to the diagnoses (4 cases).
Moreover, for certain cases, the diagnosis task could not be carried out by some teams
(5 cases), not allowing any comparison between human and automatic diagnoses. In
addition, divergent human diagnoses were noticed. Concerning automatic diagnosis
for these incomplete cases, diagnosis has as well received a low degree of confidence.
Regarding some students’ solutions, neither humans nor the system were able to de-
cide between two or three candidate conceptions. In a few cases, the system converged
in a more restrictive way with at least one human team.
To conclude this analysis, we have observed that in the majority of the cases, when
the three human teams converged towards a diagnosis, the system arrived to this same
diagnosis. However, when humans diverged or they expressed a low degree of confi-
dence concerning the diagnosis, system has also exhibited this same behavior. Even
tough convergence of all diagnoses was not observed in all the cases, we consider that
the spatial multiagent approach to diagnosis is an effective and coherent approach.
We consider that any diagnosis system must not only “imitate” human behavior in
cases of convergence between them, but also in more difficult cases where there no
convergence is observed.
7 Conclusion
once it deals well with applications where crucial issues (distance, cooperation among
different entities and integration of different components of software) are found.
To conclude, the process of evaluating the automatic diagnosis has involved three
teams of researchers on mathematics education. Results obtained from the computer-
based diagnosis system have been positively evaluated by the human teams. As the
most important perspective so far, we have been working to apply the diagnosis ap-
proach to diagnose conceptions in the domain of programming learning.
Acknowledgement. The author would like to thank Did@ctic and Magma Teams,
from Leibniz Laboratory (Grenoble, France) where this work was developed when
the author was a PhD candidate (1999-2003).
References
1. Anderson, J.: The Architecture of Cognition. Cambridge: Harvard University Press (1983)
2. Balacheff, N.: A modelling challenge: untangling students’knowing. Journeés Internation-
ales d’Orsay sur les Sciences Cognitives: L’apprentissage (JIOSC’2000). (http://www-
didactique.imag.fr/Balacheff) (2000)
3. BAP: Designing an hybrid and emergent educational society. Research Report, Labora-
toire Leibniz, April, number 81. (http://www-leibniz.imag.fr/NEWLEIBNIZ/LesCahiers/)
(2003)
4. Brown, J.S., Burton, R.: Diagnostic models for procedural bugs in basic mathematical
skill. Cognitive Science, 2, (1978) 155-192
5. Burton, R., Brown, J.S.: An investigation of computer coaching for informal learning
activities. In: Sleeman, D., Brown, J. (eds.): Intelligent Tutoring Systems. Academic Press
Orlando FL (1982)
6. Carr, B., Goldstein, I.P.: Overlays: a theory of modeling for computer-aided instruction,
AI Memo 406, MIT, Cambridge, Mass (1977)
7. Clancey, W.J.: GUIDON. Journal of Computer-Based Instruction, Vol.10,n.1 (1983) 8-14
8. Conati, C., Gertner, A., VanLenh, K.: Using Bayesian Networks to Manage Uncertainty in
Student Modeling. J. of User Modeling and User-Adapted Interaction, Vol. 12(4) (2002)
9. Confrey, J.: A review of the research on students conceptions in mathematics, science, and
programming. In: Courtney C. (ed.): Review of research in education. American Educa-
tional Research Association, Vol.16 (1990) 3-56
10. Hart, K.D.: Children’s understanding of mathematics: 11-16.Alden Press, London (1981)
11. Sandholm, T.W.: Distributed Rational Decision Making. In: (ed.): Multiagent
Systems: A Modern Introduction to Distributed A. I. MIT Press (1999) 201-258
12. Webber, C. Pesty, S.: Emergent diagnosis via coalition formation. In: Garijo, F. (ed.):
Proceedings of Iberamia Conference. Lecture Notes in Computer Science, Vol.2527.
Springer-Verlag, Berlin Heidelberg New York (2002) 755-764
13. WebSite Conception, Knowledge and Concept Discussion Group.
http://conception.imag.fr
Discovering Intelligent Agent: A Tool for Helping
Students Searching a Library
Abstract. Nowadays, the explosive growth of the Internet has brought us such a
huge number of books, publications, and documents that hardly any student can
consider all of them. Finding the right book at the right time is an exhausting
and time-consuming task, especially for new students who have diverse learn-
ing styles, needs, and interests. Moreover, the growing number of books in one
subject can overwhelm students trying to choose the right book. This paper
overcomes this challenge by ranking books using the pyramid collaborative fil-
tering method. Based on this method, we have designed and implemented an
agent called Discovering Intelligent Agent (DIA). The agent searches both the
University of Montreal’s and Amazon’s library and then returns a list of books
related to students’ models and contents of the books.
1 Introduction
Currently, the rapid spread of the Internet has become a great resource for students
searching for papers, documents, and books. However, the variety of students’ learn-
ing styles, performances, and needs make finding the right book a complex task. Fre-
quently, students rely on recommendations from their colleagues or professors to get
the required books.
There are several methods used to support students. Recommendation systems try
to personalize users’ needs by building up information about their likes, dislikes and
interests [14]. Those systems rely on two techniques: the content-based filtering (CB)
and the collaborative filtering (CF) [2].
These approaches are acceptable and relevant; however, none of them considers
students’ models. To solve this problem, this paper uses a Pyramid Collaborative Fil-
tering Model (PCFA) [18] for filtering and recommending books. PCFA has four lev-
els. Moving from one level to another depends on three filtering techniques: domain
model filtering, user model filtering, and credibility model filtering. Based on these
techniques, we have designed and implemented an agent called Discovering Intelli-
gent Agent (DIA). This agent searches both the University of Montreal’s and Ama-
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 720–729, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Discovering Intelligent Agent: A Tool for Helping Students Searching a Library 721
zon’s library and then returns a list of books related to students’ models and contents
of the books.
This paper is organized as follows. Section 1 the above-mentioned introduction.
Section 2 briefly describes some related work. In section 3, we present, in detail, the
architecture of DIA. Section 4 shows the methodology of DIA. Section 5 discusses
the pending problems of implementation. Section 6 presents an online scenario. And
finally, section 7 concludes the paper and suggests future projects.
2 Related Work
Recommendation systems have been widely discussed in the past decade and two
main approaches have emerged: the Content-Based filtering (CB) and the Collabora-
tive Filtering (CF) [2], [6], [20]. The first approach recommends items to a user,
similar to those he liked in the past by studying the content of the item. Libra [15] for
example, proposes books based on the user’s ratings and the description of the book.
Web Watcher [10] and Letizia [12] use the CB filtering to recommend links and Web
pages to users.
The second approach, CF recommends items for which other users, with matching
tastes, have liked. In other words, the system determines a set of users similar to the
active user, and then recommends the items they have chosen (i.e. items highly rated
or already bought). Many CF systems have been implemented in research, with proj-
ects such as GroupLens [11] and MovieLens [5]. The first one is a Usenet news re-
commender; where as the second one is a movie recommender.
Each of these approaches (CB, CF), has its own advantages and disadvantages.
Since content-based filtering gets its influence from the information retrieval field, it
can be applicable only in text based recommendations. On the other hand, CF is suit-
able for most recommendable items; however, it suffers from the problem of scalabil-
ity, sparsity and synonymy [19]. Nevertheless, these two techniques should not be
seen as competing with one another, but as complementary to each other. Many de-
veloped systems have used both approaches, and thus, took advantage of the benefits
of both approaches while eliminating most, if not all, their weaknesses. Fab [1] and
METIOREW [3], for example, use this hybrid approach to recommend Websites
meeting the users’ interests.
In the past years, recommendation systems have witnessed a growing popularity in
the commercial field [8], [13], [21] and can be found at many e-commerce sites, such
as Amazon1 [13], CDNow2 and Netflix3. These commercial systems suggest products
to consumers based on previous transactions and feedbacks or based on the content of
the shopping cart. They are becoming part of a global e-marketing schema that can
enhance e-commerce sales by converting browsers to buyers, increasing cross-selling,
and building customer loyalty [22].
More recently, recommendation systems have entered the e-learning domain. In
[24], the system guides the learners by recommending online activities, based on their
profiles, their access history, and their collected navigation patterns. A pedagogy-
1 http://www.amazon.com
2 http://www.cdnow.com
3
http://www.netflex.com
722 K. Yammine et al.
oriented paper recommender [23] was developed to recommend papers based on the
learners’ profile and their expertise.
Book recommenders could be helpful for students. While many book recommen-
dation systems have been implemented [8], [9], [13], to our knowledge none of them
are well adapted for e-learning since they exploit the user profile in its general basic
form, and not in its academic form. In other words, these systems are not using stu-
dent models.
In this paper we propose a book recommendation system, adapted to an e-learning
environment, taking into consideration the learning style of each student, so it can
predict the most pedagogically and academically suitable book for him. To maximize
the utility of the system, it should recommend books from the local university library
due to its easy access and the lack of any additional cost to the student.
4 The Methodology
We have applied on Université de Montréal library books the first two levels of the
Pyramid Collaborative Filtering Approach (PCFA): domain model filtering and user
model filtering.
We use the dominant meaning distance between a query and the concept of the book
to measure the closeness between them [17]. That is to say, the less distance between
them, the more related they are. Suppose that is a concept of a book and
is the set of this concept’s dominant meanings. Suppose also that is the
dominant meaning set of a query Q. So, the aim is to evaluate books that have the
highest degree of similarity to the query Q. We can calculate the dominant meaning
similarity as follows,
Since users’ profiles contain many attributes, several of them might have sparse or in-
complete data [7]; the task of finding appropriate similarities is usually difficult. To
avoid this situation, we classify users according to their learning styles. Following
[16], we distinguish several learning styles (LS): visual (V), auditory (A), kinesthetic
(K), visual & kinesthetic (VK), visual & auditory (VA), auditory & kinesthetic (AK)
and visual & auditory & kinesthetic (VAK). Therefore, we can calculate the learning
style similarity LSS between users’ as follows,
724 K. Yammine et al.
5 System Overview
This system is mainly implemented with Java (J2SE v1.4.2_04) and XML, on a Win-
dows NT environment. For the Web interface we have used the Java Servlet technol-
ogy v.2.3 and Tomcat v. 4.1.30. Essentially, the system is divided into 3 stages: the
offline or the data collection stage, the profile update stage, and the online stage. In
the following section, we will present a brief description of each.
4
http://www.loc.gov/z3950/agency/
Discovering Intelligent Agent: A Tool for Helping Students Searching a Library 725
tracks any changes in the user’s preferences and adapts its future recommendations
due to the additional data.
Online Stage
The online stage represents the interaction between the user and DIA during a Web
session. There are 3 key tasks:
User modeling: A first-time user has to register using the registration form. During
this process, following [16], the user is asked a series of questions and depending on
the answers, DIA classifies the user by his learning style. The system will associate
one of the following learning styles to each user: visual, auditory, kinesthetic, visual-
auditory, visual-kinesthetic, auditory-kinesthetic, and visual-auditory-kinesthetic
style. This learning style is then saved in the learner’s profile since it will be used in
the computation of the users’ similarity.
Search Process: Once the user is registered or logged-in, he has access to the
search interface. When the user submits the query Q, the system compares it with the
dominant meaning of the previously analyzed subjects during the offline stage. Then,
DIA looks for books that match this query. This is achieved by building an ordered
set of books using the value of as seen in equation (1).
Recommendation: Based on the predicted learning style and the users’ selected
books, DIA computes the most suitable books for the active learner. This task has two
main subtasks: the computation of the user similarity and the ranking of the pertinent
books. By “active learner”, we mean the user seeking the recommendation. We com-
pute the user similarity (SIM) as the average of the learning style similarity
as seen in equation (2) and the dominant meaning distance between the
dominant meaning words W available in the users’ profile (see figure 2),
726 K. Yammine et al.
Basically, students borrow books from the university library so they could deepen
their knowledge in a domain, understand a course, or solve a special problem. By
submitting a simple query to the library database, they are faced with a huge number
of titles. Obviously, they do not have the time or the resources to choose the right
books. Some of them may feel the need to get personalized advice about which book
to look for.
Let us take the example of Frederic, a student following the artificial intelligence
course at the Université de Montréal. If he looks for books in the library, he will have
more then 900 books to choose from. Evaluating all of these books is not an easy task.
Since he wants the books most adapted to his learning style, he decides to use DIA to
get what he needs.
When Frederic enters the site for the first time, he must register. During this phase,
DIA will ask Frederic some questions so it can evaluate his learning style. The system
will save the obtained learning style in Frederic’s profile, so it can be accessed easily
the next time Frederic logs in.
When this process is finished, Frederic is invited to enter his search query. Since he
wants books about artificial intelligence, he submits to the system a query composed
of the following two keywords: “artificial” and “intelligence”. Consequently, DIA
searches the dominant meaning XML files so it can check to which domain this query
belongs. If the domain is found, using equation (3), the system looks for Frederic’s
Discovering Intelligent Agent: A Tool for Helping Students Searching a Library 727
most similar users (figure 3a) and recommends the books they have liked in the
past (figure 3b).
Finally, the list of the recommended books is shown to the user (figure 4).
Fig. 3. 3a illustrate the K most similar Algorithm. 3b shows the application of the Book Rank-
ing Algorithm
not consider student models. In contrast, this paper takes into consideration not only
the contents of books but also students’ learning styles (visual, auditory, kinesthetic).
We have developed a book recommending agent called Discovering Intelligent
Agent: DIA. DIA employs the first two levels of the pyramid collaborative filtering
model to index, rank, and present books to students.
Even if the system is still under validation, a first test using 30 users showed some
promising results. However, DIA can be improved in many ways in order to increase
its accuracy. In the long run, we are going to apply all the levels of the pyramid col-
laborative filtering model. Such an application could provide a useful service with re-
gard to the credibility and accuracy of books. We are also looking into ways to inte-
grate DIA in a global e-learning environment or in hypermedia adaptive environments
since these systems usually have rich learners’ profiles that can help DIA to amelio-
rate its recommendations.
Finally, we are interested in means to generalize the recommendations, i.e. to be
able to recommend books from any university library using the Z39.50 protocol. This
protocol, which is used by many university libraries like McGill or Concordia Univer-
sity (Canada), enables the client to query the database server without any knowledge
of its structure. By implementing this protocol, DIA is able to access and search all
the libraries employing this standard, and thus allows the learner to select the univer-
sity library he wants recommendations from.
References
[1] Balabanovic M., and Shoham Y., Fab: Content-based, collaborative recommendation as
classification. Communications of the ACM, pp. 66-70, March 1997.
[2] Breese J. S., Heckerman D., and Kadie C., Empirical analysis of predictive algorithms
for collaborative filtering. In Proceedings of the Fourteenth Conference on Uncertainty
in Artificial Intelligence, UAI-98, pp. 43-52, San Francisco, USA, July 1998.
[3] Bueno D., Conejo R., and David A., METIOREW: An Objective Oriented Content Based
and Collaborative Recommending System. In Revised Papers from the international
Workshops OHS-7, SC-3, and AH-3 on Hypermedia: Openness, Structural Awareness,
and Adaptivity, pp. 310-314, 2002.
[4] Corfield A., Dovey M., Mawby R., and Tatham C., JAFER ToolKit project: interfacing
Z39.50 and XML. In Proceedings of the second ACM/IEEE-CS joint conference on
Digital libraries, pp. 289-290, Portland OR, USA, 2002.
[5] Dahlen B. J., Konstan J. A., Herlocker J. L., Good N., Borchers A., and Riedl J., Jump-
starting movielens: User benefits of starting a collaborative filtering system with “dead
data”. Technical Report TR 98-017, University of Minnesota, USA, 1998.
http://movielens.umn.edu
[6] Goldberg K., Roeder T., Gupta D., and Perkins C., Eigentaste: A constant time collabo-
rative filtering algorithm. Information Retrieval, 4(2):133–151, 2001.
[7] Herlocker J. L., Konstan J. A., and Riedl J., Explaining Collaborative Filtering Recom-
mendations. In Proceedings of the ACM 2000 Conference on Computer Supported Coop-
erative Work, CSCW’00, pp. 241-250, Philadelphia PA, USA, 2000.
[8] Hirooka Y., Terano T., Otsuka Y., Recommending books of revealed and latent interests
in e-commerce. In Industrial Electronics Society, the 26th Annual Conference of the
IEEE, IECON 2000, pp. 1632-1637 vol: 3, Nagoya, Japan, October 2000.
Discovering Intelligent Agent: A Tool for Helping Students Searching a Library 729
[9] Huang Z., Chung W., Ong T., and Chen H., Studying users: A graph-based recommender
system for digital library. In Proceedings of the second ACM/IEEE-CS joint conference
on Digital libraries, pp. 65-73, Portland OR, USA, 2002.
[10] Joachims T., Freitag D., and Mitchell T., WebWatcher: A Tour Guide for the World
Wide Web. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelli-
gence, IJCAI97, pp. 770-777, Nagoya, Japan, 1997
[11] Konstan J. A., Miller B. N., Maltz D., Herlocker J. L., GordonL. R., and Riedl J., Grou-
pLens: Applying collaborative filtering to Usenet news. Communications of the ACM 40
(3), pp. 77-87, 1997.
[12] Lieberman H., Letizia: An Agent That Assists Web Browsing. International Joint Con-
ference on Artificial Intelligence, IJCAI-95, pp. 924-929, Montreal, Canada, August
1995.
[13] Linden G., Smith B., and York J., Amazon.com recommendations: item-to-item collabo-
rative filtering. Internet Computing, IEEE, 7(1):76-80, 2003
[14] Lynch C. Personalization and Recommender Systems in the Larger Context: New Direc-
tions and Research Questions. Second DELOS Network of Excellence Workshop on Per-
sonalization and Recommender Systems in Digital Libraries, Dublin, Ireland, June 2001.
[15] Mooney R. J., and Roy L., Content-based book recommending using learning for text
categorization. In Proceedings of the Fifth ACM Conference on Digital Libraries, DL’00,
pp. 195–204, San Antonio TX, USA, June 2000. http://www.cs.utexas.edu/users/libra/
[16] Razek M. A., Frasson C., and Kaltenbach M., Using Machine Learning approach To
Support Intelligent Collaborative Multi-Agent System. Technologies de l’Information et
de la Communication dans les Enseignements d’ingénieurs et dans l’industrie, TICE2002,
Lyon, France, November 2002.
[17] Razek M. A., Frasson C., and Kaltenbach M., A Context-Based Information Agent for
Supporting Intelligent Distance Learning Environments. In the Twelfth International
World Wide Web Conference, Budapest, Hungary, May 2003.
[18] Razek A. M., Frasson C., and Kaltenbach M., Building an Effective Groupware System.
IEEE/ITCC 2004 International Conference on Information Technology, Las Vegas NV,
USA, April 2004.
[19] Sarwar B. M., Karypis G., Konstan J. A., and Reidl J., Analysis of recommendation algo-
rithms for e-commerce. In Proceedings of the ACM Conference on Electronic Com-
merce, pp. 158-167, New York NY, USA, 2000.
[20] Sarwar B. M., Karypis G., Konstan J. A., and Reidl J., Item-based collaborative filtering
recommendation algorithms. In Proceedings of the 10th International World Wide Web
Conference, WWW10, pp. 285-295, Hong Kong, May 2001.
[21] Schafer J. B., Konstan J. A., and Riedl J., Recommender systems in e-commerce. In Pro-
ceedings of the ACM Conference on Electronic Commerce, EC’99, pp. 158-166, Denver
CO, USA, November 1999.
[22] Schafer J., Konstan J., and Riedl J., E-commerce recommendation applications. Data
Mining and Knowledge Discovery, pp. 115-153 vol:5, 2001.
[23] Tang T.Y., and McCalla G., Towards Pedagogy-Oriented Paper Recommendation and
Adaptive Annotations for a Web-Based Learning System. In the 18th International Joint
Conference on Artificial Intelligence, Workshop on Knowledge Representation and
Automated Reasoning for E-Leaming Systems, IJCAI-03, pp. 72-80, Acapulco, Mexico,
August 2003
[24] Zaïane O. R., Building a Recommender Agent for e-Learning Systems. In Proceedings of
the 7th International Conference on Computers in Education, ICCE 2002, pp. 55-59,
Auckland, New Zealand, December 2002.
Developing Learning by Teaching Environments That
Support Self-Regulated Learning
Abstract. Betty’s Brain is a teachable agent system in the domain of river eco-
systems that combines learning by teaching and self-regulation strategies to
promote deep learning and understanding. Scaffolds in the form of hypertext
resources, a Mentor agent, and a set of quiz questions help novice students
learn and self-assess their own knowledge. The computational architecture is
implemented as a multi-agent system to allow flexible and incremental design,
and to provide a more realistic social context for interactions between students
and the teachable agent. An extensive study that compared three versions of
this system: a tutor only version, learning by teaching, and learning by teaching
with self-regulation strategies demonstrates the effectiveness of learning by
teaching environments, and the impact of self-regulation strategies in improv-
ing preparation for learning among novice learners.
1 Introduction
of a teachable agent system, Betty’s Brain, and reports the results of an experiment
that manipulated the metacognitive support students received when teaching the agent
to determine its effects on the students’ abilities to subsequently learn new content
several weeks later.
Studies of expertise have shown that knowledge needs to be connected and organ-
ized around important concepts, and these structures should support transfer to other
contexts. Other studies have established that improved learning happens when the
students take control of their own learning, develop metacognitive strategies to assess
what they know, and acquire more knowledge when needed. Thus the learning proc-
ess must help students build new knowledge from existing knowledge (constructivist
learning), guide students to discover learning opportunities while problem solving
(exploratory learning), and help them to define learning goals and monitor their prog-
ress in achieving them (metacognitive strategies).
The cognitive science and education research literature supports the idea that
teaching others is a powerful way to learn. Research in reciprocal teaching, peer-
assisted tutoring, small-group interaction, and self-explanation hint at the potential of
learning by teaching [3,4]. The literature on tutoring has shown that tutors benefit as
much from tutoring as their tutees [5]. Biswas et al. [6] report that students preparing
to teach made statements about how the responsibility to teach forced them to gain
deeper understanding of the materials. Other students focused on the importance of
having a clear conceptual organization of the materials.
Teaching is a problem solving activity [7]. Learning-by-teaching is an open-ended
and self-directed activity, which shares a number of characteristics with exploratory
and constructivist learning. A natural goal for effective teaching is to gain a good un-
derstanding of domain knowledge before teaching it to others. Teaching also in-
cludes a process for structuring knowledge in communicable form, and reflecting on
interactions with students during and after the teaching task [5]. Good learners bring
structure to a domain by asking the right questions to develop a systematic flow for
their reasoning. Good teachers build on the learners’ knowledge to organize informa-
tion, and in the process, they find new knowledge organizations, and better ways for
interpreting and using these organizations in problem solving tasks. From a system
design and implementation viewpoint, this brings up an interesting question: “How do
we design learning environments based on the learning by teaching paradigm?” This
has led us to look more closely at the work on pedagogical and intelligent agents as a
mechanism for modeling and analyzing student-teacher interaction.
Intelligent agents have been introduced into learning environments to create better
and more human-like support for exploratory learning and social interactions between
tutor and tutee [8,9]. Pedagogical agents are defined as “animated characters designed
to operate in an educational setting for supporting and facilitating learning” [8]. The
agent adapts to the dynamic state of the learning environment, and it makes the user
aware of learning opportunities as they arise, much as human mentor can. Agents use
732 G. Biswas et al.
speech, animation, and gestures to extend the traditional textual mode of interaction,
and this may increase students’ motivation and engagement. They can gracefully
combine individualized and collaborative learning, by allowing multiple students and
their agents to interact in a shared environment [10]. However, the locus of control
stays with the intelligent agent, which plays the role of the teacher or tutor.
Recently, there have been efforts to implement the learning by teaching paradigm
using agents that learn from examples, advice, and explanations provided by the stu-
dent-teacher [11]. A primary limitation of these systems is that the knowledge struc-
tures and reasoning mechanisms used by the agent are not made visible to the student,
therefore, they find it difficult to uncover, analyze, and learn from their interactions
with the agent. Moreover, some of the systems provide outcome feedback or no feed-
back at all. It is well known that outcome feedback is less effective in supporting
learning and problem solving than cognitive feedback [12].
On the positive side, students like interacting with these agents. Some studies
showed increased motivation but it was not clear that this approach helped achieve
deep understanding of complex domain material. We have adopted a new approach to
designing learning by teaching environments that supports constructivist and ex-
ploratory activities, and at the same time suggests the use of metacognitive strategies
to promote learning that involves deep understanding and transfer.
Betty’s Brain provides important visual structures that are tailored to a specific form
of knowledge organization and inference to help shape the thinking of the learner-as-
teacher. In general, our agents try to embody four principles of design: (i) they teach
through visual representations that organize the reasoning structures of the domain;
(ii) they build on well-known teaching interactions to organize student activity; (iii)
they ensure the agents have independent performances that provide feedback on how
well they have been taught, and (iv) they keep the start-up costs of teaching the agent
very low (as compared to programming). This occurs by only implementing one
modeling structure with its associated reasoning mechanisms.
Betty’s Brain makes her qualitative reasoning visible through a dynamic, directed
graph called a concept map [13]. The fact that the TA environments represent knowl-
edge structures rather than the referent domain is a departure from many simulation-
based learning environments. Simulations often show the behavior of a process, for
example, how an algal bloom increases the death of fish. On the other hand, TAs
simulate the behavior of a person’s thoughts about a system. Learning empirical facts
is important, but learning to use the expert structure that organizes those facts is
equally important. Therefore, we have structured the agents to simulate particular
forms of thought that help teacher-students structure their thinking about a domain.
Betty’s Brain is designed to teach middle school students about the concepts of in-
terdependence and balance in river ecosystems [6,14]. Fig. 1 illustrates the interface
of Betty’s Brain. Students explicitly teach Betty, using the Teach Concept, Teach
Link and Edit buttons to create and modify their concept maps in the top pane of the
Developing Learning by Teaching Environments 733
window. Once taught, Betty can reason with her knowledge and answer questions.
Users can formulate queries using the Ask button, and observe the effects of their
teaching by analyzing Betty’s answers. Betty provides explanations for her answers
by depicting the derivation process using multiple modalities: text, animation, and
speech. Betty uses qualitative reasoning to derive her answers to questions through a
chain of causal inferences. Details of the reasoning and explanation mechanisms in
Betty’s Brain are presented elsewhere [15].
The visual display of the face with animation in the lower left is one way in which
the user interface attempts to provide engagement and motivation to users by in-
creasing social interactions with the system. We should clarify that Betty does not use
machine learning algorithms to achieve automated learning. Our focus is on the well-
defined schemas associated with teaching that support a process of instruction, as-
sessment, and remediation. These schemas help organize student interaction with the
computer.
To accommodate students who are novices in the domain knowledge and in
teaching, the learning environment provides a number of scaffolds and feedback
mechanisms. The scaffolds are in the form of well-organized online resources, struc-
tured quiz questions that support users in systematically building their knowledge,
and Mentor feedback that is designed to provide hints on domain concepts along with
strategies on how to learn and how to teach. We adopted the framework of self-
regulated learning, described by Zimmerman [16] as situations where students are
“metacognitively, motivationally, and behaviorally participants in their own learning
process.” Self-regulated learning strategies involve actions and processes that can
help one to acquire knowledge and develop problem-solving skills [17]. Zimmerman
describes a number of self-regulated learning skills that include goal setting and plan-
ning, seeking information, organizing and transforming, self-consequating, keeping
734 G. Biswas et al.
With time, as we refined the system, it became clear that an incremental, modularized
design strategy was required to keep to a minimum the changes to be made to the
code as and when we felt the need to further refine the system. We turned to multi-
agent architectures to achieve this goal. The current multi-agent architecture in
Betty’s Brain is organized into four agents: the teachable agent, Betty, the mentor
agent, Mr. Davis, and two auxiliary agents, the student agent and the environment
agent. The last two agents help achieve greater flexibility by making it easier to up-
date the scenarios in which the agents operate without having to recode the communi-
cation protocols. The student agent represents the interface of the student teacher into
the system. It provides facilities that allow the user to manipulate environmental
functions and to teach the teachable agent.
All agents interact through the Environment Agent, which acts as a “Facilitator.”
This agent maintains information about the other agents and the services they pro-
vide. When an agent sends a request to the Environment Agent, it decomposes the
request if different parts are to be handled by different agents and sends them to the
respective agents, and translates the communicated information to match an agent’s
vocabulary.
A variation of the FIPA ACL agent communication language [18] is used for agent
communication. Each message sent by an agent contains a description of the message,
message sender, recipient, recipient class, and the actual content of the message.
Communication is implemented using a Listener interface, where each agent listens
only for messages from the Environment Agent and the Environment Agent listens
for messages from all other agents.
contains two components: the reasoner and the emotion generator. It performs rea-
soning tasks (e.g., answering questions) and updates on the state of the agent. The
Executive posts multimedia (speech, text, graphics, animation) information from an
agent to the environment. This includes the agent’s answer to a question, explanation
of an answer and other dialog with the user. The Executive is made up of Agent
Speech and Agent View, which handle speech and visual communication, respec-
tively.
5 Experiments
An experiment was designed for fifth graders in a Nashville school to compare three
different versions of the system. The version 1 baseline system (ITS) did not involve
any teaching. Students interacted with the mentor, Mr. Davis, who asked them to con-
struct concept maps to answer three sets of quiz questions. The quiz questions were
ordered to meet curricular guidelines. When students submitted their maps for a quiz,
Mr. Davis, the pedagogical agent, provided feedback based on errors in the quiz an-
swers, and suggested how the students may correct their concept maps to improve
their performance. The students taught Betty in the version 2 and 3 systems. In the
version 2 (LBT) system, students could ask Betty to take a quiz after they taught her,
and the mentor provided the same feedback as in the ITS system. Here the feedback
was given to Betty because she took the quiz. The version 3 (SRL) system had the
new, more responsive Betty with self-regulation behavior (section 3), and a more ex-
tensive mentor agent, who provided help on how to teach and how to learn in addition
to domain knowledge. But this group had to explicitly query Mr. Davis to receive any
feedback. Therefore, the SRL condition was set up to develop more active learners by
promoting the use of self-regulation strategies. The ITS condition was created to
contrast learning by teaching environments from tutoring environments. The two
other groups, LBT and SRL, were told to teach Betty and help her pass a test so she
could become a member of the school Science club. Both groups had access to the
query and quiz features. All three groups had access to identical resources on river
ecosystems, the same quiz questions, and the same access to the Mentor agent, Mr.
Davis.
The two primary research questions we set out to answer were:
1. Are learning by teaching environments more effective in helping students to
learn independently and gain deeper understanding of domain knowledge than peda-
gogical agents? More specifically, would LBT and SRL students gain a better under-
standing of interdependence and balance among the entities in river ecosystems than
ITS students? Further, would SRL students demonstrate deeper understanding and
better ability in transfer, both of which are hallmarks of effective learning?
2. Does self-regulated learning enhance learning in learning by teaching environ-
ments? Self-regulated learning should be an effective framework for providing feed-
back because it promotes the development of higher-order cognitive skills [17] and it
is critical to the development of problem solving ability [13]. In addition, cognitive
feedback is more effective than outcome feedback for decision-making tasks [10].
Developing Learning by Teaching Environments 737
Cognitive feedback helps users monitor their learning needs (achievement relative to
goals) and guides them in achieving their learning objectives (cognitive engagement
by applying tactics and strategies).
Experimental Procedure
The fifth grade classroom in a Nashville Metro school was divided into three equal
groups of 15 students each using a stratified sampling method based on standard
achievement scores in mathematics and language. The students worked on a pretest
with twelve questions before they were separately introduced to their particular ver-
sions of the system. The three groups worked for six 45-minute sessions over a period
of three weeks to create their concept maps. All groups had access to the online re-
sources while they worked on the system.
At the end of the six sessions, every student took a post-test that was identical to
the pretest. Two other delayed posttests were conducted about seven weeks after the
initial experiment: (i) a memory test, where students were asked to recreate their eco-
system concept maps from memory (there was no help or intervention when per-
forming this task), and (ii) a preparation for future learning transfer test, where they
were asked to construct a concept map and answer questions about the land-based ni-
trogen cycle. Students had not been taught about the nitrogen cycle, so they would
have to learn from resources during the transfer phase.
In this study, we focus on the results of the two delayed tests, and the conclusions
we can draw from these tests on the students’ learning processes. As a quick review
of the initial learning, students in all conditions improved from pre- to posttest on
their knowledge of interdependence (p’ s<.01, paired T-tests), but not in their under-
standing of ecosystem balance. There were few differences between conditions in
terms of the quality of their maps (the LBT and SRL groups had a better grasp of the
role of bacteria in processing waste at posttest). However, there were notable differ-
ences in their use of the system during the initial learning phase.
Fig. 3. Resource Requests, Queries Composed, and Quizzes Requested per session
Fig. 3 shows the average number of resource, query, and quiz requests per session
by the three groups. It is clear from the plots that the SRL group made a slow start as
compared to the other two groups. This can primarily be attributed to the nature of the
feedback; i.e., the ITS and LBT groups received specific content feedback after a
quiz, whereas the SRL group tended to receive more generic feedback that focused on
self-regulation strategies. Moreover, in the SRL condition, Betty would refuse to take
a quiz unless she felt the user had taught her enough, and prepared her for the quiz by
asking questions. After a couple of sessions the SRL group showed a surge in map
738 G. Biswas et al.
creation and map analysis activities, and their final concept maps and quiz perform-
ance were comparable to the other groups. It seems the SRL group spent their first
few sessions in learning self-regulation strategies, but once they learned them their
performance improved significantly. Table 1 presents the mean number of expert
concepts and expert causal links in the student maps for the delayed memory test. Re-
sults of an ANOVA test on the data, with Tukey’s LSD to make pairwise comparisons
showed that the SRL group recalled significantly more links that were also in the ex-
pert map (which nobody actually saw).
We thought that the effect of SRL would not be to improve memory, but rather to
provide students with more skills for learning subsequently. When one looks at the
results of the transfer task in the test on preparation for future learning, the differ-
ences between the SRL group and the other two groups are significant. Table 2 sum-
marizes the results of the transfer test, where students read resources and created a
concept map for the land-based nitrogen cycle with very little help from the Mentor
agent (and which they had not studied previously). The Mentor agent’s only feed-
back was on the correctness of the answers to the quiz questions. All three groups re-
ceived the same treatment. There are significant differences in the number of expert
concepts in the SRL versus ITS group maps, and the SRL group had significantly
more expert causal links than the LBT and ITS groups. The effects of teaching self-
regulation strategies had an impact on the students’ abilities to learn a new domain.
6 Conclusions
The results demonstrate the significant positive effects of SRL strategies in under-
standing and transfer in a learning by teaching environment. We believe that the dif-
ferences between the SRL and the other two groups would have been even more pro-
nounced if the transfer test study had been conducted over a longer period of time.
Last, we believe that the concept map and reasoning schemes have to be extended to
Developing Learning by Teaching Environments 739
References
[1] Wenger, E. (1987). Artificial Intelligence and Tutoring Systems. Los Altos, California:
Morgan Kaufmann Publishers.
[2] Brasilovsky, P. (1999). Adaptive and Intelligent Technologies for Web-based Education,
Special Issue on Intelligent Systems and Teleteaching, C. Rollinger and C. Peylo (eds.),
4: 19-25.
[3] Palinscar, A. S. & Brown, A. L. (1984). Reciprocal teaching of comprehension-fostering
and comprehension -monitoring activities. Cognition and instruction, 1: 117-175.
[4] Chi, M. T. H., De Leeuw, N., Mei-Hung, C., & Levancher, C. (1994). Eliciting self ex-
planations. Cognitive Science, 18: 439-477.
[5] Chi, M. T. H., et al. (2001). “Learning from Human Tutoring.” Cognitive Science 25(4):
471-533.
[6] Biswas, G., Schwartz, D., Bransford, J., & The Teachable Agents Group at Vanderbilt
University. (2001). Technology Support for Complex Problem Solving: From SAD Envi-
ronments to AI. In Forbus & Feltovich (eds.), Smart Machines in Education, 71-98.
Menlo Park, CA: AAAI Press.
[7] Artzt, A. F. and E. Armour-Thomas (1999). “Cognitive Model for Examining Teachers’
Instructional Practice in Mathematics: A Guide for Facilitating Teacher Reflection.”
Educational Studies in Mathematics 40(3): 211-335.
[8] G. Clarebout, J. Elen, W. L. Johnson, and E. Shaw. (2002). “Animated Pedagogical
Agents: An Opportunity to be Grasped?” Journal of Educational Multimedia and Hyper-
media, 11: 267-286.
[9] Johnson W., Rickel, J.W., and Lester J.C. (2001). “Animated Pedagogical Agents: Face-
to-Face Interaction in Interactive Learning Environments”, International Journal of Arti-
ficial Intelligence in Education 11: 47-78
[10] Moreno, R. & Mayer, R. E. (2002). Learning science in virtual reality multimedia envi-
ronments: Role of methods and media. Journal of Educational Psychology, 94: 598-610.
[11] Nichols, D. M. (1994). Intelligent Student Systems: an application of viewpoints to in-
telligent learning environments, Ph.D. thesis, Lancaster University, Lancaster, UK.
[12] Butler, D. L. and P. H. Winne (1995). “Feedback and Self-Regulated Learning: A Theo-
retical Synthesis.” Review of Educational Research 65(3): 245-281.
[13] Novak, J.D. (1996). Concept Mapping as a tool for improving science teaching and
learning, in Improving Teaching and Learning in Science and Mathematics, D.F.
Treagust, R. Duit, and B.J. Fraser, eds. Teachers College Press: London. 32-43.
[14] K. Leelawong, K., et al. (2003), “Teachable Agents: Learning by Teaching Environments
for Science Domains,” Proc. Innovative Applications of Artificial Intelligence Conf,
Acapulco, Mexico, 109-116.
[15] Leelawong, K., Y. Wang, et al. (2001). Qualitative reasoning techniques to support
learning by teaching: The Teachable Agents project. International Workshop on
Qualitative Reasoning, San Antonio. Texas. AAAI Press. 73-80.
740 G. Biswas et al.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 741–750, 2004.
© Springer-Verlag Berlin Heidelberg 2004
742 G. Curilem, F.M. de Azevedo, and A.R. Barbosa
Didactic Ergonomy relates Characteristics and Attributes [15]. For example, for
visual students more images or animations, for active students, more exploration
environments, etc. Table 1 establishes the relationship between Attributes and Char-
744 G. Curilem, F.M. de Azevedo, and A.R. Barbosa
acteristics. The Tutor Module stores the knowledge of this table, that represents the
pedagogical conceptions of the human designer. It is interesting to note that all these
conceptions can be changed, as well as the variables, depending on the specific TLP
and on pedagogical conceptions of the educators in charge. An adaptation mechanism
was created to store and process correctly this knowledge.
where:
X: Student’s Model: is the finite set of system’s states. Each state corresponds to a
Student’s Model formed by the set of detected characteristics. The models are inferred
by the systems, during pedagogical activities.
xo: Student’s Initial Model: is the initial state of the system. This state corresponds
to a default model or to an initial diagnosis of the apprentice and is the starting point
for the system’s operation.
U: Student’s Action: is the finite set of inputs. Each input is a user action on the
interface. The set is formed by the commands, selections, questions, etc. requested by
the user. The user acts through the interface by means of controls (menus, icons, etc.)
or commands.
Y: System’s Interface: is the finite set of outputs. Each output is a specific inter-
face. The outputs depend on the selected attributes. To configure the output, the sys-
tem evaluates the user’s actions (inputs) and the Student’s Model (state).
is the transition state function. Depending on the apprentice’s actions
(inputs) and on the ITS’s pedagogical knowledge a new Student’s Model can be
reached (new model).
is the output function. Given an input and a specific state, the ITS’s
pedagogical knowledge determines how to configure next screen (new output).
As these elements define an automata [16], it can be concluded that the behavior of
an Intelligent Tutoring System can be modeled by means of the automata defined by
the sets U, Y and X, by the initial state xo and by the functions and The theorem
is therefore demonstrated.
Didactic Ergonomics defines the six elements that define an automata: the attrib-
utes of the interface define the input and outputs of the system; the apprentice’s char-
acteristics define the states; the pedagogical conceptions determine the and func-
tions. So, didactic ergonomics can be implemented as an ITS.
Adaptive Interface Methodology for Intelligent Tutoring Systems 745
The IAC (Interactive Activation and Competition) ANN type is an Associative Mem-
ory like ANN whose original model was proposed by [17]. In this model, neurons
representing specific “properties” or “values” are grouped in categories called pools
representing “concepts”. These pools, called visible pools, receive excitations from
the exterior. Connections among groups are carried out by means of an intermediary
pool, also called mirror pool (or hided pool), because it is the copy of the biggest pool
of the net. This pool has no connections with the exterior and its function is to spread
the activations among groups, contributing with a new competition instance. The
connections among neurons are bi-directional and have weights that can take only the
values –1 (inhibition), 1 (activation) or 0. Inside a pool, the connections are always
inhibitory taking the value -1, so the neurons compete to each other, resulting in a
winner (the “competition” of IAC). Among groups, the connections can be excitatory
taking the value 1, or null. When two neurons have excitatory connections, if one is
excited, the other one will be excited also (the “activation” of IAC). The connection’s
weights constitute a symmetric matrix W of dimension mXm where m is the number
of existent neurons in the network. So, if there is a connection from neuron i to neu-
ron j, it exists also a connection with the same value from neuron j to neuron i. As a
result, processing becomes interactive since processing in a given pool influences and
is influenced by processing in other pools (the “interactive” of IAC). Figure 1 shows
the IAC original model.
In this model, knowledge is not distributed among the weights of the net, like in
most ANN. Here knowledge is represented by the processing neurons, organized in
groups and by the connections among them.
The same as in many models, the net input of a neuron i is the pondered sum of the
influence from the connected neurons to that neuron, and the external input, as shown
in (1):
where represents the weight between neuron i and neuron j; are outputs from
other neurons and are external inputs.
746 G. Curilem, F.M. de Azevedo, and A.R. Barbosa
It is observed that the new activation depends on the current activation and on its
variation. The variation in the activation is proportional to the net inputs coming from
the other neurons, the external inputs and the decay factor, as shown in (4).
The parameters max, min, decay and rest of equation (4), define the values maxi-
mum, minimum, the decay factor and the rest value of the neurons, respectively. The
decay spreads to recover the rest value of the neurons.
The computer model presents other parameters such as and estr which pon-
ders the influences of the activations, inhibitions and external input that arrive to each
neuron. Their influences affect all the neurons at the same time.
In opposition to other paradigms where the main problem is the learning process,
in IAC network, the design task consists on defining the best topology that represents
a given problem. The design process doesn’t contemplate a phase of adjustment of the
weights, also known as learning phase. As it is obligatory that a total inhibition exists
among the neurons inside a group, the task of looking for the appropriate topology is
not trivial and, in many cases, impossible.
The “A” model [18] of IAC network was developed for trying to solve this restric-
tion. In this model, the connections can take fuzzy values, in the interval [-1 1].
Negative values represent inhibition and positive values represent activation. The
absolute value of the weights represents the force of the influence that exists between
two neurons. Inside the groups, the weights are negative, so the neurons compete with
each other. Among the groups, the values of the weights depend on the force of the
relationship that exists among the corresponding neurons.
The equations and parameters of the “A” model are similar to the Rumelhart’s.
Nevertheless a weights’ adjustment has to be performed by an activity similar to
knowledge engineering which is used in the implementation of Expert Systems [18].
As an algorithm of standard learning doesn’t exist for this model, the adjustment of
the weights is carried out in a manual way. The specialist should determine which are
the values and signs of the weights among all the neurons. This task is complex, be-
cause from –1 to 1 there exists infinite possible combinations. De Azevedo [18] dem-
onstrated that IAC ANN behaves like automata.
Adaptive Interface Methodology for Intelligent Tutoring Systems 747
To satisfy the requirements of didactic ergonomics, the AM should store the peda-
gogical knowledge of the ITS. To do so, the AM should fulfill three indispensable
properties: parallelism, bi-directionality and uncertainties treatment. The first one
guarantees that the apprentice’s characteristics are all processed simultaneously by the
system. The bi-directionality allows the AM to configure Interface, starting from the
apprentice’s characteristics (function but also to update the Student’s Model
(function according to the apprentice actions. Uncertainties treatment allows ob-
taining reasonable output with incomplete or uncertain data as input.
Several aspects led to the suitability of IAC networks for the implementation of the
AM. The first one is the automata formalization that relates the two approaches. IAC
also fulfill the three requirements exposed. The structure of groups whose concepts or
neurons compete internally and activate externally other concepts, offers a natural
problem representation, increasing system’s selectivity to some students’ stereotypes.
To implement the IAC network, the variables of the problem (Characteristics and
Attributes) were represented by neurons and their relationships by weights. The
groups of neurons were formed by excluding categories as shown in table 2, where
some of the groups are presented.
The parameters of the net were configured as: max = 1; min = -0.2; and
estr = 0.4 and 60 cycles. The most difficult task was to determine the weights of the
net, which represent the pedagogical concepts, stored in table I.
Two kinds of tests were developed to adjust the weights consequently with the
pedagogical conceptions:
Tests: the Characteristics are placed at the net input and the activated At-
tributes are analyzed at the output function).
Tests: the Attributes are placed at the input of the net and the activated
Characteristics are analyzed at the output function).
The IAC network performed correctly 94% of the Tests and 70% of the
Tests. For this last case, the reasons of the errors were identified, so corrections
can be made in future versions. The main conclusion of simulations is that an IAC
748 G. Curilem, F.M. de Azevedo, and A.R. Barbosa
4 ITS Design
The methodology obtained from this work, establishes how to model a TLP and how
to design each ITS module. The design method is resumed next.
Student Module stores the Student’s Models (States) and is formed by the appren-
tice’s characteristics. The methodology uses two ways to update the Student’s Model.
First, to obtain the initial State, the ITS presents a diagnostic activity implemented by
questionnaires. Once the system identified the apprentice, the initial state is reached,
the corresponding environments are configured and the system is ready to work. The
second way to update the Student’s Model occurs during the TLP. The Tutor Module
evaluates the student’s actions and updates the model using the function.
Tutor Module stores pedagogical knowledge and functions) and is imple-
mented by an “A” model IAC ANN. This module is permanently processing the ap-
prentice’s inputs to determine changes at the outputs or states. Once the Initial Stu-
dent’s Model is established, the Tutor Module configures the interface and waits for
the student’s actions. If the student’s actions are consequent with tutor’s plan, the
outputs are updated and the actual state is maintained. If the student’s actions change
tutor’s plan, by selecting new attributes of the interface (another topic or medias, for
example), the state changes: the Tutor Module analyses the new attributes and up-
dates the Student’s Model reaching a new state, that will influence future presenta-
tions.
Specialist Module stores the contents. To facilitate the design and implementation
of this module, contents are structured as a set of topics. Each topic is stored in sev-
eral files called nodes. Each node contains the topic’s information using a specific
media and a specific pedagogical activity (Figure 2a). Buttons and other controls are
also available as nodes. Links between nodes represent the relationship: “next node to
be presented”. The establishment of the links is dynamic and depends on the Stu-
dent’s Model and actions (Figure 2b). At the end of the process the system generates
a specific graph for each student (Figure 2c). To facilitate the management of the
great variety of interface attributes, each node must be stored in an independent file.
A database allows the Tutor Module to load in the screen the files corresponding to
each attribute activated by the specific Student’s Model. The design process is facili-
tated by the construction of a table that stores all the possible files needed to satisfy a
specific TLP.
Interface Module allows inputs and outputs. Once the initial state is reached, the
Tutor Module configures a specific interface (output). The interface offers controls
(icons, menus, buttons) or commands to make possible student’s interactions (input).
The Tutor Module processes this input, updates the interface and, eventually, the
Student’s Model. The interface is updated or reconfigured. As the output depends on
Adaptive Interface Methodology for Intelligent Tutoring Systems 749
the student’s input and model (which is continually being updated), the resulting
interface is configured at run – time and is highly personalized and dynamical.
5 Final Considerations
The formalization of ITS as an automata was necessary to have a mathematical vo-
cabulary to describe the components of the system and to unify the different ap-
proaches involved in its design and implementation.
The proposed system doesn’t try to solve all the problems that arise from the de-
velopment of ITS. Nevertheless this project tries to simplify the Student’s Model,
designing it as a group of characteristics that offers an approach to select pedagogical
activities. On the other hand, the domain is modeled using several strategies what
increases the possibilities of interacting in an appropriate way with the student. The
interface design is strongly bounded to didactic criteria, and can lead to the construc-
tion of more effective and efficient systems. Effective by interacting in an appropriate
way with the student. Efficient because technological resources are used strictly when
requested by pedagogical criteria on the specific TLP.
The contribution of this system depends in a large part of the relevance of its con-
tent, of the correct selection and identification of the users and the capacity of the
Tutor Module to suggest pedagogical activities appropriately. The interdisciplinary
work is an indispensable condition to achieve the proposed objectives. A great effort
in this sense can contribute to increase impact of pedagogical software.
The problem of “who has the control” during the process is a very polemic matter
in pedagogical software research. The system here described, allows that as much the
system as the student can have the control of the process, depending on the charac-
teristics detected in the student. Some characteristics, like “active” learning style,
inhibit the action of the system, leaving it in second plane and suggesting interactive
activities like exploration environments. Other characteristics require that the system
takes the control, planning the activities like in a tutorial. All this makes that the re-
sulting interface adapts the contents, the presentation form and also the pedagogical
strategy to the apprentice.
The case study allowed the design of a specific system. The design used Didactic
Ergonomy to model the TLP for diabetic people. The experimental model based on
an IAC ANN validated the Adaptive Mechanism. Future works must be developed to
750 G. Curilem, F.M. de Azevedo, and A.R. Barbosa
References
1. Curilem GMJ.: Metodologia para a Implementação de Interfaces Adaptativas em Sistemas
Tutores Inteligentes. Doctoral Thesis. Dpt. of Electrical Engineering Federal University of
Santa Catarina.), Florianópolis, Brasil (2002)
2. Brusilovsky, P.: Methods and techniques of adaptive hypermedia. In: P. Brusilovsky, A.
Kobsa and J. Vassileva (eds.): Adaptive Hypertext and Hypermedia. Kluwer Academic
Publishers, Dordrecht (1998) 1-43
3. Wenger E.: Artificial intelligence and Tutoring Systems. Computational and Cognitive
Approaches to the Communication of Knowledge, Morgan Kaufmann, San Francisco
(1987)
4. Bruillard E.: Les Machines a Enseigner. Editions Hermes, Paris (1997)
5. Briceño L.R.: Siete tesis sobre la educación sanitaria para la participatión comunitaria.
Cad. Saúde Públ., v. 12, n. 1, Rio de Janeiro (1996) 7-30.
6. Zagury L. Zagury T.: Diabetes sem medo, Ed. Rocco Ltda (1984)
7. Curilem, G.M.J., Brasil, L.M., Sandoval, R.C.B., Coral, M.H.C., De Azevedo F.M.,
Marques J.L.B.: Considerations for the Design of a Tutoring System Applied to Diabetes.
In Proceedings of World Congress on Biomedical Engineering’ Chicago, USA 25-27 July
(2000)
8. Rouet J.F.: Cognition et Technologies d’Apprentissage.
http://perso.wanadoo.fr/arkham/thucydide/rouet.html (Setembro 2001)
9. Curilem, G.M.J., De Azevedo, F.M.: Didactic Ergonomy for the Interface of Intelligent
Tutoring Systems in Computers and Education: Toward a Lifelong Learning Society.
Kluwer Academic Publishers (2003) 75-88
10. Choplin H., Galisson A., Lemarchand S.: Hipermedias et pedagogie: Comment promou-
voir l’activité de l’élève? Congrès Hypermedia et Apprentissage. Poitiers, France (1998)
11. Gagne R.M., Briggs L.J., & Wagner W.W.: Principles of instructional design. Third edi-
tion: Holt Rinehart and Winston, New York (1988)
12. Piaget J.: A psicologia da Inteligência. Editora Fundo de Cultura S.A Lisboa (1967).
13. Felder R.M.: Matters of Styles ASEE Prism 6(4), December (1996) 18-23
14. Gardner H.: Multiple Intelligences: The Theory in Practice. NY: Basic Books. (1993).
15. Curilem, G.M.J., De Azevedo, F.M.: Implementação Dinâmica de Atividades num Sis-
tema Tutor Inteligente. In Proceedings of the XII Brazilian Symposium of Informatics in
Education, SBIE2001, , Vitória, ES, Brasil 21-23 November (2001).
16. Hopcroft J.E., Ullman J.D.: Introduction to automata theory, Languages and Computa-
tion. Addison-Wesley. (1979).
17. Rumelhart, D.E., McClelland, J.L.: Explorations in Distributed Processing. A Handbook
of Models, Programs and Exercises. Ed. Bradford Book. Massachusetts Institute of Tech-
nology. (1989).
18. De Azevedo, F. M.: Contribution to the Study of Neural Networks in Dynamical Expert
System, PhD Thesis – Facultés Universitaires Notre-Dame de la Paix, Namur, Belgium.
(1993).
Implementing Analogies in an Electronic Tutoring
System
1 Introduction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 751–761, 2004.
© Springer-Verlag Berlin Heidelberg 2004
752 E. Lulis, M. Evens, and J. Michael
2 CIRCSIM-Tutor
five, correct inferences were made by the student four times. In the remaining thirty-
seven examples, inferences were requested by the tutor resulting in fifteen successful
mappings (correct inferences) and twenty-two failed mappings (incorrect inferences).
Out of the twenty-two failed mappings, the tutor successfully repaired/explained the
analogy resulting in correct inferences by the student fifteen times. The corpus
reflected an 81% success rate—the use of analogy, after an incorrect inference was
made by students, resulted in a correct inference made by students in 34 of the 42
times the tutors employed the strategy. The tutor abandoned the analogy in favor of a
different teaching plan only seven times.
Table 2 [5, 6] lists the different bases that appeared in the corpus with the number
of times they were found. Tutors proposed “another neural variable” twenty-nine
times resulting in successful inferences made by the students twenty-four times—83%
success rate. More interesting bases—balloons, compliant structures, Ohm’s Law, and
traffic jam—were used less frequently by tutors and students. However, their use
resulted in interesting and successful structural mappings, and was followed by
successful inferences by students.
4 Implementation
Joel Michael reviewed the examples of analogies identified and decided that we
should implement: “another neural variable,” “another procedure,” “Ohm’s Law”
(pressure/flow/resistance model), “balloons/compliant structure” (elastic properties of
tissues model), the “reflex and the control systems” model, and the
“accelerator/brake” analogy. “Another neural variable,” is most often used in tutoring
Direct Response phase (although it can be used in the RR and SS phases). It is
always invoked when the student gets one or two neural variables correct, but gets the
other(s) wrong. It is generally very effective. The work of Kurtz et al. [10] and Katz’s
series of experiments at Pittsburgh [11, 12, 13] have confirmed the importance of this
kind of explicit discussion of meta-knowledge and of reflective tutoring in general.
The use of this analogy to test Gentner’s mutual alignment theory of analogies [10] is
758 E. Lulis, M. Evens, and J. Michael
being explored. Analogies to other procedures are only evoked after the student has
been exposed to a number of different procedures. As a result, there are not many
examples of the use of this base in the human tutoring sessions, which typically
involve only one or two procedures. However, we expect students to complete 8-10
procedures in hour-long laboratory sessions with the ITS. Joel Michael believes that
it would be especially beneficial for students to be asked to recognize parallels
between different procedures that move MAP in the same direction.
Schemas have been created for the “another neural variable” and “another procedure”
analogies, based on the examples found in our human-human tutoring sessions. A
successful use of another neural variable analogy was seen in Example 5. The tutor
requests an inference and the student infers (that the new variable behaves like the
earlier one) correctly. This sequence happens most of the time and the tutor moves to
the next
topic. The tutor explains the analogy only when the student fails to understand the
analogy or fails to make the inference [5].
If the tutor decides to explore the analogy further
the tutor asks the student to map the analogs (or tells the student the
mapping)
Implementing Analogies in an Electronic Tutoring System 759
the tutor asks the student to map the relationships (or tells the student...)
the tutor prompts the student to make an inference to determine
understanding
Another neural variable analogy can be used in any phase (DR or RR or SS),
whenever the student has made an error in two or three of the neural variables, just
after the tutor has finished tutoring the first one. Assume that the tutor has just tutored
neural variable (NV1) successfully and that another non-clamped neural variable was
incorrectly predicted by the student.
If there is one other neural variable that was not predicted correctly and
if that variable is not clamped
the tutor asks “What other variable is neurally controlled?”
The explicit analogies that involve bases outside the domain, such as the balloon
analogies, are interesting to implement, but more complex. These analogies initiate
the biggest revelation, the most effective “aha” experience for the students. They also
provide the most opportunities for student misconceptions. It is, therefore, very
760 E. Lulis, M. Evens, and J. Michael
important for the tutor to forestall these possible misunderstandings by pointing out
where the analogy applies and where it does not and to correct any misconceptions
that may show up later.
We have chosen the Structure Mapping Engine (SME) [17, 18, 19] to implement
this second group of analogies. SME utilizes alignment-first mapping between the
target and the base, and then selects the best mapping and all those within 10% of it,
as described in Gentner [20]. SME appears to model our expert tutors’ behavior as
seen in the corpus, especially the example using Ohm’s Law as a base. In most of the
examples using this analogy, students understood the mapping, resulting in an
immediate clarification of the issue. This was not the case in Example 6 above. As a
result, we can observe the tutor pushing the student through each step in the mapping
process. SME will be used for handling the Ohm’s law (pressure/flow/resistance
model), balloons/compliant structure (elastic properties of tissues model), reflex and
the control systems model, and the accelerator/brake analogies.
5 Conclusion
In order to implement analogies in our ITS, CIRCSIM-Tutor, we analyzed eighty-one
human tutoring sessions conducted by experts Michael and Rovick for the use of
analogies. Although analogies were not very frequent, they were highly effective
when used. The analogies were categorized by the base and the target according to
Gentner’s [20] structure-mapping model. Analogies and models to implement in
CIRCSIM-Tutor have been chosen by Michael, who uses this system in a course he
teaches at Rush Medical College. CIRCSIM-Tutor already has a rule-based system set
up to utilize the schemas described here to generate tutor initiated “another neural
variable” and “another procedure” analogies. The SME model [17, 18, 19] is being
used to generate the other analogies—Ohm’s law (pressure/flow/resistance model),
balloons/compliant structure (elastic properties of tissues model), reflex and the
control systems model, and the accelerator/brake analogy. During the human tutoring
sessions, students also proposed analogies. Future research will include the
mechanisms for recognizing and responding to these proposals using the SME.
References
1. Michael, J., Rovick, A., Glass, M., Zhou, Y., & Evens, M. (2003). Learning from
acomputer tutor with natural language capabilities. Interactive Learning Environments,
11(3): 233-262.
2. Li, J., Seu, J. H., Evens, M. W., Michael, J. A., & Rovick, A. A. (1992). Computer
dialogue system: A system for capturing computer-mediated dialogues. Behavior Research
Methods, Instruments, and Computer (Journal of the Psychonomic Society), 24(4): 535-
540.
Implementing Analogies in an Electronic Tutoring System 761
3. Kim, J. H., Freedman, R., Glass, M., & Evens, M. W. (2002). Annotation of tutorial goals
for natural language generation. Unpublished paper, Department of Computer Science,
Illinois Institute of Technology.
4. Lulis, E. & Evens, M. (2003). The use of analogies in human tutoring dialogues. AAAI
7:2003 Spring Symposium Series Natural Language Generation in Spoken and Written
Dialogue, 94-96.
5. Lulis, E., Evens, M., & Michael, J. (2003). Representation of analogies found in human
tutoring sessions. Proceedings of the Second IASTED International Conference on
Information and Knowledge Sharing, 88-93. Anaheim, CA:ACTA Press.
6. Lulis, E., Evens, M., & Michael, J. (To appear). Analogies in Human Tutoring Sessions. In
Proceedings of the Twenty-Sixth Conference of the Cognitive Science Society, 2004.
7. Modell, H. I. (2000). How to help students understand physiology? Emphasize general
models. Advances in Physiology Educ. 23: 101-107.
8. Feltovich, P.J., Spiro, R., & Coulson, R. (1989). The nature of conceptual understanding in
biomedicine: The deep structure of complex ideas and the development of misconceptions.
In D. Evans and V. Patel (Eds.), Cognitive Science in Medicine. Cambridge, MA: MIT
Press.
9. Michael, J. A. & Modell, H. I. (2003). Active learning in the college and secondary
science classroom: A model for helping the learner to learn. Mahwah, NJ: Lawrence
Erlbaum Associates.
10. Kurtz, K., Miao, C., & Gentner, D. (2001). Learning by analogical bootstrapping. Journal
of the Learning Sciences, 10(4):417-446.
11. Katz, S., O’Donnell, G., & Kay, H. (2000). An approach to analyzing the role and
structure of reflective dialogue. International Journal of Artificial Intelligence and
Education, 11, 320-343.
12. Katz, S., & Albritton, D. (2002). Going beyond the problem given: How human tutors use
post- practice discussions to support transfer. Proceedings of Intelligent Tutoring Systems
2002, San Sebastian, Spain, 2002. Berlin: Springer-Verlag. 641-650.
13. Katz, S. (2003). Distributed tutorial strategies. Proceedings of the Cognitive Science
Conference. Boston, MA.
14. Yang, F.J., Kim, J.H., Glass, M. & Evens, M. (2000). Turn Planning in CIRCSIM-Tutor.
In J. Etheredge and B. Manaris (Eds.), Proceedings of the Florida Artificial Intelligence
Research Symposium. Menlo Park, CA: AAAI Press. 60-64.
15. Moore, J.D. (1995). Participating in explanatory dialogues. Cambridge, MA: MIT Press.
16. Moore, J.D., Lemaire, B. & Rosenblum, J. (1996). Discourse generation for instructional
applications: Identifying and using prior relevant explanations. Journal of the Learning
Sciences, 5(1), 49-94.
17. Gentner, D. (1998). Analogy. In W. Bechtel & G. Graham (Eds.), A companion to
cognitive science, (pp. 107-113). Oxford: Blackwell.
18. Gentner, D., & Markman, A. B. (1997). Structure mapping in analogy and similarity.
American Psychologist, 52(1): 45-56.
19. Forbus, K. D. Gentner, D., Everett, J. O. & Wu, M. (1997) Towards a computational
model of evaluating and using analogical inferences, Proc. of the 19th Annual Conference
of the Cognitive Science Society, Mahwah, NJ, Lawrence Erlbaum Associates.229-234.
20. Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive
Science 7(2):155-170.
Towards Adaptive Generation of Faded
Examples*
1 Introduction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 762–771, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Towards Adaptive Generation of Faded Examples 763
So far, faded examples have been produced manually. However, for realistic
applications, as opposed to lab experiments, it makes sense to generate several
variants of an exercise by fading and to do it automatically. For such an automatic
generation, a suitable knowledge representation of examples and exercises is
needed.
In our mathematics seminars we experienced the value of faded examples for
learning. We are now interested in generating adaptively faded examples which
can then be used in our learning environment for mathematics, ACTIVEMATH.
Several steps are needed before ACTIVEMATH’ course generator and suggestion
mechanism can present appropriate faded examples to the learner: the knowl-
edge representation has to be extended in a general way, the adaptive generation
procedure has to be developed, and finally, the ACTIVEMATH-components have
to request the dynamic generation of specially faded examples in response to
learners actions. This article concentrates on a knowledge representation of ex-
amples and exercises that allows for distinguishing parts to be faded and for
characterizing those parts. This is a non-trivial work because worked examples
from mathematics can have a pretty complex structure, even more so, if inno-
vative pedagogical ideas are introduced. We discuss general adaptations of the
fading procedure we are currently implementing.
2 Example
Example. [1], p. 82, provides a worked-out solution of the problem
The sequence is divergent
Solution
Step 1. This sequence is bounded (take M := 1), so we cannot
invoke Theorem 3.2.2. ...
Step 2. ... However, assume that exists. ...
Step 3. ... Let
Step 4. ... so that there exists a natural number such that
Step 1 is, formally seen, not necessary for the solution. But it provides a meta-
cognitive comment showing why an alternative proof attempt would not work.
It would be sensible to fade this step, and request from the learner to indicate
valid or invalid alternatives or to fade parts of this step.
In steps 2 and 3 two hypotheses are defined. These hypotheses are dependent.
Fading both hypotheses introduces more under-specification than fading only one
assumption.
Some good textbook authors omit little subproofs or formula-manipulations
and instead ask “Why?” in order to keep the attention of the reader and make
her think. For instance, the proof of the example in [1] contains:
... If is an odd number with this gives so that
(Why?) ...
A response trains application skills.
764 E. Melis and G. Goguadze
3 Psychological Findings
Some empirical studies investigated faded examples [11,12,10,9]. Merrienboer
[12] suggests positive effects of faded examples in programming courses. In a
context in which the subjects have little pre-knowledge Stark investigates faded
examples and shows a clear positive correlation of learning with faded examples
and performance on near and medium transfer problems [10]. He also suggests
that in comparison with worked-out examples, faded examples better prevent
a passive and superficial processing. His experiments included an immediate
feedback in form of a complete problem solving step.
Renkl and others found that backward fading of solution steps produces more
accurate solutions on far transfer problems [9] – an effect that was inconsistent
across experiments in other studies. These studies suggest that mixing faded
examples with worked-out examples (with self-explanation) is more effective than
self-explanation on worked-out examples only.
Understand the Problem. The description of the initial problem includes markup
elements situation-description and problem-statement. The first element
describes what is given and what it depends on. Dependencies can be provided
in the metadata of situation-description. The second element encodes the
question (statement) of the problem, i.e. what has to be found, proven, etc.
These elements prove to be useful not only for faded examples.
Carry out the Plan. The sequence of bottom nodes of the solution element is
the actual solution. In the encoding of the solution the steps, carrying out the
plan, occur inside the corresponding plan steps, in the presentation they can be
separated from the plan steps, if wished.
Look Back at the Solution. Here, an element conclude is used. This element
has the same meaning as in OMDOC and is used not only if the solution is the
proof of some fact. For example, if the root of the equation is calculated in the
solution, in the conclude step the result is verified.
The reference to other problems for which the result of the current problem
can be useful, is provided in the metadata record, as discussed below.
Figure 1 shows the internal representation of the previously considered ex-
ample 1, embedded into the Polya framework. The bold case shows the actual
steps of the exercise, italic shows additional steps, introduced for building the
Polya framework.
Choice of Fading. The structure of the worked-out example determines the pos-
sibilities of fading. The annotation of fadable parts gives rise to a reasoning
about the choices depending on the purpose of the faded example.
To start with, for adaptation we consider the student’s mastery of the concept
and the learning goal level. This information is available in ACTIVEMATH’ user
model. The rules we use for fading are still prototypical and not tested with
students. The reasoning underlying those fading rules includes
2
For full reference to all metadata extensions made by ACTIVEMATH, see [3].
Towards Adaptive Generation of Faded Examples 769
References
1. R.G. Bartle and D.R. Sherbert. Introduction to Real Analysis. John Wiley& Sons,
New York, 1982.
2. B.S. Bloom, editor. Taxonomy of educational objectives: The classification of educa-
tional goals: Handbook I, cognitive domain. Longmans, Green, New York, Toronto,
1956.
3. J. Büdenbender, G. Goguadze, P. Libbrecht, E. Melis, and C. Ullrich. Metadata
in activemath. Seki Report SR-02-02, Universität des Saarlandes, FB Informatik,
2002.
4. C. Conati and G. Carenini. Generating tailored examples to support learning
via self-explanation. In Seventeenth International Joint Conference on Artificial
Intelligence, 2001.
5. G. Goguadze, E. Melis, C. Ullrich, and P. Cairns. Problems and solutions for
markup for mathematical examples and exercises. In A. Asperti, B. Buchberger,
and J.H. Davenport, editors, International Conference on Mathematical Knowledge
Management, MKM03, LNCS 2594, pages 80–93. Springer-Verlag, 2003.
Towards Adaptive Generation of Faded Examples 771
1 Introduction
Empirical evidence has shown that natural language dialogue capabilities are a crucial
factor to making human explanations effective [16]. Moreover, the use of teaching strate-
gies is an important ingredient for intelligent tutoring systems. Such strategies, normally
called dialectic or socratic, have been demonstrated to be superior to pure explanations,
especially regarding their long-term effects [6,18,1]. Consequently, an increasing though
still limited number of state-of-the-art tutoring systems use natural-language interaction
and automatic teaching strategies, including some notion of hints.
Ms. Lindquist [9], a tutoring system for high-school algebra, uses some domain
specific types of questions in elaborate strategies, such as breaking down a problem
into simpler parts and elaborating examples. Thereby, the notion of gradually revealing
information by rephrasing the question is prominent, which can be considered some sort
of hint. The CIRCSIM-Tutor [10], an intelligent tutoring system for blood circulation,
applies a taxonomy of hints, relating them to constellations in a planning procedure
that solves the given tutorial task. AutoTutor [17] uses curriculum scripts on which
the tutoring of computer literacy is based, where hints are associated with each script.
AutoTutor also aims at making the student articulate expected answers and does not
distinguish between the cognitive function and the dialogue move realisation of hints.
The emphasis is put on self-explanation, in the sense of re-articulation, rather than on
trying to help the student to actively produce the content of the answer itself. Matsuda
and VanLehn [14] research hinting for helping students with solving geometry proof
problems. They orient themselves towards tracking the student’s mixed directionality,
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 772–781, 2004.
© Springer-Verlag Berlin Heidelberg 2004
A Multi-dimensional Taxonomy for Automating Hinting 773
which is characteristic of novices, rather than assisting the student with specific reference
to the directionality of a proof. Melis and Ullrich [15] are looking into Polya scenarios
in order to extract possible hints. They aim these hints for a proof presentation approach.
On the whole, these models of hints are somehow limited in capturing their various
underlying functions explicitly. Putting emphasis on making cognitive functions of hints
explicit, we present a multi-dimensional hint taxonomy where each dimension defines a
decision point for the associated function. Such hints are part of a tutoring model which
promotes actively producing the content of the answer itself, rather than just phrasing it.
We, thus, cater for over-emphasising self-explanation, which can be counter-productive
to learning as it directs the student’s attention to consciously tractable knowledge. The
latter can potentially hinter intractable forms of learing taking place, which is considered
superior [12].
The approach to automating hints presented here, is also oriented towards integrating
hinting in natural language dialogue systems [23]. In the framework of the DIALOG
project [2], we are currently investigating tutoring mathematics in a system where domain
knowledge, dialogue capabilities, and tutorial phenomena can be clearly identified and
intertwined for the automation of tutoring. More specifically, we aim at modelling a
socratic teaching strategy, which allows us to manipulate aspects of learning, such as
help the student build a deeper understanding of the domain, eliminate cognitive load,
promote schema acquisition, and manipulate motivation levels [25,13,24], within natural
language dialogue interaction.
The overall goal of the project is (i) to empirically investigate the use of flexible
natural language dialogue in tutoring mathematics, and (ii) to develop an prototype
system gradually embodying empirical findings. The prototype system will engage in a
dialogue in written natural language to help a student construct mathematical proofs. In
contrast to most existing tutorial systems, we envision a modular design, making use of
the powerful proof system [19]. This design enables detailed reasoning about
the student’s action and bears the potential of elaborate system feedback [21].
The structure of the paper is as follows: Section 2 looks at the pedagogical motivations
for our amended taxonomy. Section 3 reports on a preliminary evaluation on which our
enhanced taxonomy is based. Section 4 presents the taxonomy itself and briefly talks
about its different dimensions and classes.
3 Experiment Results
In order to test the adequacy of the hint categories and other tutoring components,
we have conducted a WOz experiment [3] with a simulated system [7], thereby also
collecting a corpus of tutorial dialogues in the naive set theory domain. In the course of
these experiments, a preliminary version of the hinting taxonomy was used, with very
limited meta-reasoning hints, and without the functional problem referential perspective.
24 subjects with varying educational background and prior mathematical knowledge
ranging from little to fair participated in the experiment. The experiment consisted of
three phases: (1) preparation and pre-test on paper, (2) tutoring session mediated by a
WOz tool, and (3) post-test and evaluation questionnaire, on paper again. During the
session, the subjects had to prove three theorems (K and P stand for set complement and
power set respectively): (i)
(ii) and (iii) If then The interface
enabled the subjects to type text and insert mathematical symbols by clicking on buttons.
The subjects were instructed to enter steps of a proof rather than a complete proof at
once, in order to encourage guiding a dialogue with the system. The tutor-wizard’s task
was to respond to the student’s utterances following a given algorithm, which selected
hints from our preliminary hint taxonomy [8].
In the experiments, our pre- and post-tutoring test comparison supported the didactic
method, which explained the solution without hinting, as opposed to the socratic con-
dition and a control group that received only minimal feedback on the correctness of
A Multi-dimensional Taxonomy for Automating Hinting 775
the answer. However, through the analysis of our data, we spotted some experimental
confounds, which might have been decisive [3]. For instance, the socratic subjects had a
late start due to the nature of the strategy, and it was de-motivating to be stopped because
of time constraints just as they had started following the hints. In fact, four out of six
subjects in the socratic condition who tried to follow hints did indeed improve during
tutoring, as evidenced by their attempts. Nonetheless, their performance did not improve
in the post-test. We also found that the didactic condition subjects spent significantly
more time on the post-test This can exactly derive from
parameters like frustration and low motivation. A side-effect of the above confounds
was that the didactic condition subjects were tutored on a larger part of every proof.
The same subjects also had a significantly higher level at set-off, as evidenced by the
pre-test This fact might explain their relative higher
improvement as depicted in the post-test.
Moreover, despite the results of the test, the analysis of questionnaires filled in by
the subjects after the post-test showed that the socratic condition subjects stated that
they learned significantly more about set theory than the didactic condition subjects
did However, the didactic condition subjects stated
significantly more that they had fun with the system
That might explain why they were motivated to reach a solution (i.e., spend more time
on it) in the post-test, which followed immediately after tutoring, and hence performed
better. In addition, all subjects of the didactic condition complained about the feedback
in open questions about the system, either for not having been given the opportunity to
reach the solution themselves, or for not having received more step-by-step feedback,
or for having been given too much feedback for their level. All these complaints can
be taken care of by the socratic method. On the contrary, most of the socratic condition
subjects chose aspects of the feedback as the best attribute of the system (four out of six).
In addition, all but one subjects said that they would use the system in a mathematics
seminar at university. The subject who would not use the system had one of the best
performances among all conditions, and was taught with the didactic method. This
subject also explicitly said that they would like a more eliciting feedback.
Such issues allow us to conclude that although the hinting tutoring strategy undoubt-
edly needs improvements, it can, contingent upon the specific improvements, become
better than the didactic method. Extra support for this claim comes from the psycholog-
ical grounding of hinting as a teaching method (cf. Section 2). The fact that the didactic
condition was nonetheless better, led us to search for improvements in the way this
strategy was performed. Our objective is to get the best of both worlds.
The most striking characteristic of the didactic method was the fact that the tutor
gave accompanying meta-reasoning information every time along with the proof step
information. However, he still avoided to giving long explanations, a characteristic of the
didactic method, which renders it easier for us to adapt such feedback. Not only can such
meta-reasoning reinforce the anchoring points necessary for the creation of a schema,
but it also reduces the cognitive load. This probably means for the socratic condition
that among the reasons why we did not manage to achieve the goal of self-sufficiency,
necessary for the post-test, was the of lack of meta-reasoning hints. Therefore, our major
improvement to hinting was to formalise meta-reasoning, deduced from suggestions by
776 D. Tsovaltzi, A. Fiedler, and H. Horacek
our human tutor, our own observations, the didactic condition feedback, and our new
in-detail defined teaching model for psychological motivation.
1. Speak-to-answer hints refer to the preceding answer of the student. They, for exam-
ple, indicate that some elements of a list are missing, narrow down possible choices,
or elicit a discrepancy between the student’s answer and the expected answer.
2. Point-to-information hints refer the student to some information given previously,
either during the dialogue or in the lesson material.
3. Take-for-granted hints ask the students to accept some information without further
explanation, for example, because that would require to delve into another math-
ematical topic, which would shift the focus of the session to its detriment. This is
motivated by local axiomatics [26], a prominent notion in teaching mathematics.
On the whole, the elicitation status dimension is the only one that most other approaches
capture explicitly, through designing sets of related hints with increasing degrees of
precision in revealing required information. Moreover, the three dimensions domain
knowledge, inferential role, and problem referential perspective, are typically combined
into a unique whole.
We now determine hint categories in terms of the four dimensions. We elucidate the
combinatory operation of the four dimensions by giving example hint categories.
The first example we consider is an active conceptual inference-rule performable-step
hint, which elicits the inference rule used in the proof step. This can be done by giving
away the relevant concept of the proof step: “You need to make use of P”1, where P is the
relevant concept. The passive counterpart would give away the inference rule: “You need
to apply the definition of P.”. An equivalent example of an active functional inference-
rule performable-step hint would be: “Which variable can you eliminate here?”. Its
passive counterpart would be: “You have to eliminate P”.
As a second example, consider an active conceptual inference-rule meta-reasoning
hint, which leads the student through a way of choosing with reference to the concrete
anchoring points. Such a hint produced by our human tutor is: “Think of a theorem or
lemma which you can apply and involves P and where P is the relevant concept
and the hypotactical concept. If the student already knows the general technique to
be applied, e.g. elimination, but they still do not know which specific inference rule can
1
Realisation examples come from the corpus collected in the WOz experiments, unless otherwise
stated.
A Multi-dimensional Taxonomy for Automating Hinting 779
help them realise this, the latter only needs to be elicited. A constructed example of the
active conceptual hint appropriate in this case is: “Can you think of a theorem or lemma
that would help you eliminate P?”.
The proof-step meta-reasoning hints address the step as a whole. However, because
of their overview nature their production makes sense at the beginning of the hinting
session to motivate the whole step. This way, these hint capture a hermeneutic process
formalised in the actual hinting algorithm. That is, the hinting session for a step starts
with a proof-step meta-reasoning hint and finishes with a proof-step performable-step
hint. A constructed active conceptual realisation is: “Do you have an idea where you
can start attacking this problem?”. Or it may recapitulate the meta-reasoning of the
step. Other proof-step meta-reasoning hints deal with techniques (methodology) and
technique-related concepts (e.g., premise, conclusion) in the domain. To name a con-
structed example, the passive conceptual hint of this sort could be realised as: “Your aim
is to try to manipulate the given expression in order to reach the conclusion.”
Let us now turn our attention to some pragmatic hints as well. Consider the situation,
where the student has mentioned two out of three properties of the definition that must
be applied and the one needed is missing. In this case, different forms of an active
speak-to-answer inference-rule proof-step hint can be used, according to the specific
needs of the hinting session. If the properties in the definition are ordered, a possible
realisation of the hint would be: “You missed the second property.” If the properties
are unordered, the hint could be realised simply as: “And?”. When the student gives an
almost correct answer, our tutor often elicited the discrepancy to the correct answer by
a plain but very helpful “Really?”. Another example for a pragmatic hint is an active
point-to-information hint where the student is conferred to the lesson material: “You
didn’t use the de Morgen rule correctly. Please check once again your accompanying
material.” The pedagogical motivation of this pragmatic aspect is that the student is
pointed to consulting the available material better, while being at the same time directed
to the piece of information currently needed for the task, which addresses the anchoring
points. When it appears that the student cannot be helped by tutoring because they have
not read the study material carefully enough, hint would point the student to the lesson
in general: “Go back and read the material again.”
So far, we have only seen combinations of the four dimensions which are motivated
by our teaching model. However, combinations like active conceptual domain-relation
performable-step hint, would serve the specific purpose of explicitely teaching such
relations in the form of declarative knowledge, which is not among our tutoring goals.
Such hints would elicit the relation between two mathematical objects in the proof step
(e.g., the duality between and The passive counterpart, in contrast, can be used to
elicit, for example, the relevant concept. If the student mentioned instead of a hint
could be formulated “Not really but something closely related.”
dynamically producing hints that fit the needs of the student with regard to the particular
proof and the hinting situation. Hinting situations are defined based on the values of
information fields, which are pedagogically relevant and relate to the dialogue context
as well as to the more specific tutoring status.
A significant portion of the taxonomy has been tested in a WOz experiment, which
has inspired us to incorporate improvements in the taxonomy. In terms of evaluating
the improved taxonomy and algorithm in our next phase of experiments, particular care
will be taken of issues like the sufficient preparation of subjects and of assigning them
the right level of tasks. Moreover, we want to evaluate the effectivity over time of our
modelled teaching method, taking into account how well declarative and procedural
knowledge have improved. This presupposes that the possibility of fatigue is minimised
from the experiment design and that the post-test is carefully chosen to test the results
of the aimed qualifications.
References
1. Kevin D. Ashley, Ravi Desai, and John M. Levine. Teaching case-based argumentation
concpets using dialoectic arguments vs. didactic explanations. In Proceedings of the 6th
International Conference on Intelligent Tutoring Systems, pages 585–595, 2002.
2. Chris Benzmüller, Armin Fiedler, Malte Gabsdil, Helmut Horacek, Ivana Kruijff-Korbayová,
Manfred Pinkal, Jörg Siekmann, Dimitra Tsovaltzi, Bao Quoc Vo, and Magdalena Wolska.
Tutorial dialogs on mathematical proofs. In Proceedings of the IJCAI Workshop on Knowledge
Representation and Automated Reasoning for E-Learning Systems, pages 12–22, Acapulco,
2003.
3. Chris Benzmüller, Armin Fiedler, Malte Gabsdil, Helmut Horacek, Ivana Kruijff-Korbayová,
Manfred Pinkal, Jörg Siekmann, Dimitra Tsovaltzi, Bao Quoc Vo, and Magdalena Wolska.
A Wizard-of-Oz experiment for tutorial dialogues in mathematics. In aied03 Supplementary
Proceedings, Workshop on Advanced Technologies for Mathematics Education, pages 471–
481, Sidney, Australia, 2003.
4. D. Berry and D. Broadbent. On the relationship between task performance and the associated
verbalizable knowledge. Quarterly Journal of Experimental Psychology, 36(A):209–231,
1984.
5. M. T. H. Chi, R. Glaser, and E. Rees. Expertise in problem solving. Advances in the Psychology
of Human Intelligence, pages 7–75, 1982.
6. Michelene T. H. Chi, Nicholas de Leeuw, Mei-Hung Chiu, and Christian Lavancher. Eliciting
self-explanation improves understanding. Cognitive Science, 18:439–477, 1994.
7. Armin Fiedler, Malte Gabsdil, and Helmut Horacek. A Tool for Supporting Progressive Re-
finement of Wizard-of-Oz Experiments in Natural Language. In Intelligent Tutoring Systems
— 6th International Conference, ITS 2002, 2004. In print.
8. Armin Fiedler and Dimitra Tsovaltzi. Automating hinting in mathematical tutorial dialogue.
In Proceedings of the EACL-03 Workshop on Dialogue Systems: Interaction, Adaptation and
Styles of Management, pages 45–52, Budapest, 2003.
9. Neil T. Heffernan and Kenneth R. Koedinger. Building a 3rd generation ITS for symbolization:
Adding a tutorial model with multiple tutorial strategies. In Proceedings of the ITS 2000
Workshop on Algebra Learning, Montréal, Canada, 2000.
10. Gregory Hume, Joel Michael, Allen Rovick, and Martha Evens. Student responses and follow
up tutorial tactics in an ITS. In Proceedings of the 9th Florida Artificial Intelligence Research
Symposium, pages 168–172, Key West, FL, 1996.
A Multi-dimensional Taxonomy for Automating Hinting 781
11. Gregory D. Hume, Joel A. Michael, Rovick A. Allen, and Martha W. Evens. Hinting as a
tactic in one-on-one tutoring. Journal of the Learning Sciences, 5(1):23–47, 1996.
12. Pawel Lewicki, Thomas Hill, and Maria Czyzewska. Nonconscious acquisition of informa-
tion. Journal of American Psychologist, 47:796–801, 1992.
13. Eng Leong Lim and Dennis W. Moore. Problem solving in geometry: Comparing the effects
of non-goal specific instruction and conventional worked examples. Journal of Educational
Psychology, 22(5):591–612, 2002.
14. Noboru Matsuda and Kurt VanLehn. Modelling hinting strategies for geometry theorem
proving. In Proceedings of the 9th International Conference on User Modeling, Pittsburgh,
PA, 2003.
15. Erica Melis and Carsten Ullrich. How to Teach it - Polya-Inspired Scenarios In ActiveMath.
In Proceedings of, pages 141–147, Biarritz, France, 2003.
16. Johanna Moore. What makes human explanations effective? In Proceedings of the Fifteenth
Annual Meeting of the Cognitive Science Society, Hillsdale, NJ, 1993.
17. Natalie K. Person, Arthur C. Graesser, Derek Harter, Eric Mathews, and the Tutoring Re-
search Group. Dialog move generation and conversation management in AutoTutor. In
Carolyn Penstein Rosé and Reva Freedman, editors, Building Dialog Systems for Tutorial
Applications—Papers from the AAAI Fall Symposium, pages 45–51, North Falmouth, MA,
2000. AAAI press.
18. Carolyn P. Rosé, Johanna D. Moore, Kurt VanLehn, and David Allbritton. A comparative
evaluation of socratic versus didactic tutoring. In Johanna Moore and Keith Stenning, ed-
itors, Proceedings 23rd Annual Conference of the Cognitive Science Society, University of
Edinburgh, Scotland, UK, 2001.
19. Jörg Siekmann, Christoph Benzmüller, Vladimir Brezhnev, Lassaad Cheikhrouhou, Armin
Fiedler, Andreas Franke, Helmut Horacek, Michael Kohlhase, Andreas Meier, Erica Melis,
Markus Moschner, Immanuel Normann, Martin Pollet, Volker Sorge, Carsten Ullrich, Claus-
Peter Wirth, and Jürgen Zimmer. Proof development with In Andrei Voronkov,
editor, Automated Deduction — CADE-18, number 2392 in LNAI, pages 144–149. Springer
Verlag, 2002.
20. J. Sweller. Cognitive technology: Some procedures for facilitating learning and problem
solving in mahtematics and science. Journal Educational Psychology, 81:457–66, 1989.
21. Dimitra Tsovaltzi and Armin Fiedler. An approach to facilitating reflection in a mathematics
tutoring system. In aied03 Supplementary Proceedings, Workshop on Learner Modelling for
Reflection, pages 278–287, Sydney, Australia, 2003.
22. Dimitra Tsovaltzi and Armin Fiedler. Enhancement and use of a mathematical ontology in a
tutorial dialogue system. In Proceedings of the IJCAI Workshop on Knowledge and Reasoning
in Practical Dialogue Systems, pages 19–28, Acapulco, Mexico, 2003.
23. Dimitra Tsovaltzi and Elena Karagjosova. A dialogue move taxonomy for tutorial dialogues.
In Proceedings of 5th SIGdial Workshop on Discourse and Dialogue, Boston, USA, 2004. In
print.
24. B Weiner. Human Motivation: metaphor, theories, and research. Sage Publications Inc.,
1992.
25. Brent Wilson and Peggy Cole. Cognitive teaching models. In D.H. Jonassen, editor, Handbook
of Research for educational communications and technology. MacMillan, 1996.
26. H. Wu. What is so difficult about the preparation of mathematics teachers. In National
Summit on the Mathematical Education of Teachers: Meeting the Demand for High Quality
Mathematics Education in America, November 2001.
Inferring Unobservable Learning Variables from
Students’ Help Seeking Behavior
1 Introduction
One of the main components of an interactive learning environment (ILE) is the help
provided during problem solving. Some studies have found a link between students’
help seeking and learning, suggesting that higher help seeking behaviors result in higher
learning (Wood&Wood, 1999; Renkl, 2002). However, there is growing evidence that
students may have non-optimal help seeking behaviors, and that they seek and respond to
help depending on student characteristics, motivation, attitudes, beliefs, gender (Aleven,
2003; Ryan&Pintrich, 1997; Arroyo, 2001). There are yet many questions to answer in
relation to suboptimal use of help in tutoring systems, such as: 1) How do different
attitudes towards help and beliefs about the system get expressed in actual help seek-
ing behavior? 2) Can attitudes be diagnosed from students’ behavior with the tutoring
system? 4) If non-productive attitudes, goals and beliefs can be detected while using
the system, what are possible actions that can be taken to encourage positive learn-
ing attitudes? This paper begins to explore these questions by showing the results of a
quantitative analysis of the presence and strength of these links, and our work towards
building a Bayesian Network that diagnoses attitudes from behaviors, with the final goal
of building tutoring systems that are responsive and adaptable to students’ needs.
2 Methodology
The tutoring system used was Wayang Outpost, a geometry tutor that provides multime-
dia web-based instruction. If the student requests help, step-by-step guidance is provided.
The hints provided in Wayang Outpost therefore resemble what a human teacher might
provide when explaining a solution to a student, e.g., by drawing, pointing, highlighting
critical parts of geometry figures, and talking. Wayang was used in October 2003 by
150 students (15–18 year olds) from two high schools in Massachusetts. Students were
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 782–784, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Inferring Unobservable Learning Variables from Students’ Help Seeking Behavior 783
provided headphones, and used the tutor for about 2 hours. After using the tutor, students
filled out a survey about their perceptions of the system, and attitudes towards help and
the system. Results of a correlation analysis of multiple student variables are shown in
figure 1.
Fig. 1. Correlations among attitudes, perceptions and student behaviors in the tutor
Variables on the left of figure 1 are survey questions about attitudes, those on the right
are obtained from log files of students’ use of the system. Two learning measures were
considered. One of them is students’ perception of how much they learned (Learned?),
collected from surveys. The second one is a ‘Learning Factor’that describes how stu-
dents decrease their need for help in subsequent problems during the tutoring session.
Performance at each problem is defined as the ‘expected’ number of requested hints
for this problem (for all subjects) minus the help requests made by the student at the
problem, divided by the expected number of requested hints for the problem by the cur-
rent student. For instance, if students on average tended to ask for 2 hints in a problem
before answering it correctly, and the current student requested 3 hints, performance
was 50% worse than expected, and thus performance is -0.5. Ideally, students would
perform better as tutoring progresses, so these values should increase with time. The av-
erage difference of performance between pairs of subsequent problems
in the whole tutoring session becomes a measure of how students’ need
for help fades away before choosing a correct answer. This measure of learning is higher
when students learn more.
From the correlation graph in figure 3, a directed acyclic graph was created by: 1)
eliminating the links among observable variables; 2) giving a single direction to the
784 I. Arroyo et al.
3 Conclusions
We conclude that there are links between students’ behaviors with the tutor, attitudes
and perceptions exist. We found correlations between help requests and learning, which
are consistent to other authors’ findings (Wood&Wood, 1999; Renkl, 2002). However,
help seeking by itself does is not sufficient to achieve learning: students need to stay
within hints for higher learning. Learning and learning beliefs are linked to behaviors
such as hints per problem, time spent per problem or in hints. Data collected from post-
test surveys were merged with behavioral data of interactions with the system to build
a Bayesian model that infers negative and positive attitudes of student users, while they
are using the system. Future work involves estimation of accuracy of this model, and
evaluations with students of a new tutoring system that detects and remediates negative
attitudes and beliefs towards help and the system.
References
Aleven, V., Stahl, E., Schworm, S., Fischer, F., & Wallace R. (2003) Help Seeking and Help Design
in Interactive Learning Environments Review of Educational Research.
Arroyo, I., Beck, J. E., Beal, C. R., Rachel E. Wing, Woolf, B. P. (2001) Analyzing students’
response to help provision in an elementary mathematics Intelligent Tutoring System. Help
Provision and Help Seeking in Interactive Learning Environments Workshop. Tenth Inter-
national Conference on Artificial Intelligence in Education.
Renkl, A., & Atkinson, R. K. (2002). Learning from examples: Fostering selfexplanations in
computer-based learning environments. Interactive Learning Environments, 10, 105–119.
Ryan, A. & Pintrich,P (1997) Should I ask for help? The role of motivation and attitudes in
adolescents’ help-seeking in math class. Journal of Educational Psychology, 89, 1–13
Wood, H.; Wood, D. (1999). Help seeking, learning and contingent tutoring. Computers & Edu-
cation, 33(2-3):153–169.
The Social Role of Technical Personnel in the
Deployment of Intelligent Tutoring Systems
Ryan Shaun Baker, Angela Z. Wagner, Albert T. Corbett, and Kenneth R. Koedinger
1 Introduction
In recent years, Intelligent Tutoring Systems (ITSs) have emerged from the research
laboratory and pilot research classrooms into widespread use [2]. Before one of our
laboratory’s ITSs reaches a point where it is ready for large-scale distribution, it goes
through multiple cycles of iterative development in research classrooms. In the first
stage of tutor development, this process is supported by a teacher who both teaches
the tutor class and participates in its design. In a second stage, the tutoring curriculum
is deployed from the teacher-designer’s classroom to further research classrooms, and
refined based on feedback and data from those classrooms. Finally, a polished tutor-
ing curriculum is disseminated in collaboration with our commercial partner, Carne-
gie Learning Inc. This process requires considerable collaboration and cooperation
across several years from individuals at partner schools, from principals and assistant
superintendents, to teachers, to school technical staff.
In this paper, we briefly discuss how the deployment of prototype ITSs to research
classrooms is facilitated by the creation of working and social relationships between
school personnel and project technical personnel. We discuss the role played by a
member of our research laboratory, “Rose” (a pseudonym), whose job was first con-
ceived as being primarily technical -- including tasks such as writing tutor problems,
testing tutoring software, installing tutoring software on school machines (in collabo-
ration with school technical staff), and developing workarounds for bugs. We studied
Rose’s work practices and collaborative relationships by conducting a set of retro-
spective contextual inquiries [1], an interview technique based on developing under-
standing of how a participant understands his or her own process.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 785–787, 2004.
© Springer-Verlag Berlin Heidelberg 2004
786 R.S. Baker et al.
Fig. 1. The primary roles in our project, according to our contextual inquiry
The Social Role of Technical Personnel 787
able to commiserate with the teachers about the problem and then bring the informa-
tion back to the programmer or researcher who can fix the problem.
Rose’s relationships with teachers have also aided her in the technical part of her
job. Interviews with staff at other intelligent tutoring projects suggest that it is com-
mon for project staff to have difficulty obtaining cooperation from school technical
staff (the “techs”). Getting the tutor software working is a low priority for the techs --
since the tutor software is supplied and supported by our laboratory, there is simulta-
neously comparatively little reward for the techs if the tutor software is working
properly, and a natural and credible scapegoat (our programmers) if it is working
poorly. By contrast, teachers have a strong interest in getting the software to work,
since if it fails to work, it is very disruptive to their classes. Hence, Rose enlists
teacher assistance in getting cooperation from the techs.
3 Conclusions
Our findings suggest that even in an educational project built around technology, the
human relationships supporting that technology are essential to the project’s success.
Rose’s example shows a way to enhance the communication between large-scale
educational projects and partner school, by placing an individual in regular and mutu-
ally beneficial contact with teachers -- creating an informal conduit for negotiation,
communication, and problem-solving. Our wider research (discussed in a CMU tech-
nical report available off of the first author’s website) suggests that other individuals
can also play a similar role – but however it is accomplished, educational technology
projects will benefit from having an individual on their team who serves as a bridge
to partner schools.
As a final note, we would like to thank Jack Mostow, Laura Dabbish, Shelley
Evenson, John Graham, and Kurt vanLehn for helpful suggestions and feedback.
References
Flávia de Almeida Barros1, Fábio Paraguaçu2, André Neves1, and Cleide Jane Costa3
1
Universidade Federal de Pernambuco
Centro de Informática
[email protected], [email protected]
2
Universidade Federal de Alagoas
Departamento de Tecnologia da Informação
Maceió – AL-Brazil
[email protected]
3
SEUNE
Av. Dom. Antonio Brandão, 204-Maceió-Alagoas-Brazil
[email protected]
1 Introduction
The growth of the Internet in the past decade, together with the emergence of the
social-interactive-constructivism pedagogical approaches [5], has posed a new
demand for computational tools capable of supporting cooperation during computer
mediated learning processes. Some attempts have been made by the Computer
Science community to build such tools. However, in general, the systems available so
far are either incomplete regarding pedagogical needs, or they offer domain-
dependent solutions. In this sense, the Internet has emerged as a promising media to
overcome these problems, offering information regarding the most varied domains
(subjects), as well as synchronous and asynchronous communication via the so-called
Virtual Learning Environments (VLEs) [1].
In this light, we are developing the FIACI project based on the experience
acquired in the construction of tools that follow the cooperative pedagogical
approach. These tools are being applied to the construction of virtual learning
environments based on the Web. The VLEs can be used as a complement to ordinary
classes as well as in distance learning.
We present here a general description of the FIACI project, as well as the main
obtained results. Section 2 gives a general description of the project. Section 3
presents the development phases of our research work, as well as results obtained so
far. Finally, we have the conclusions in the section 4.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 788–790, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Intelligent Tools for Cooperative Learning in the Internet 789
2 Project’s Overview
The FIACI project falls within the cooperative model, following the social-
interactive-constructivism pedagogical approaches [5], which (we believe) are the
most appropriated to lead learning (virtual or real) groups. Our central aim was to
provide software tools to give support to the construction of cooperative virtual
learning environments based on the Web. As we have said before, these VLEs can be
used as a complement to ordinary classes as well as in distance learning.
This project was developed by a consortium of three groups, and two different
kinds of VLEs were investigated in a collaborative fashion. The group SIANALCO
concentrated on the development of VLEs to teach children between 6 and 7 years old
how to read, which was already their main research interest. Their starting point was
the SIANALCO environment (Sistema de Análise da Alfabetização Colaborativa) [2],
[3]. The group Virtus, on the other hand, focused on VLEs for mature students,
having as a starting point the VIRTUS project [1].
Both systems were used in the initial fieldwork phase, reaching some common
conclusions. In the following, they were modified to incorporate some features that
would help them to provide for cooperative VLEs: (1) communication between
teachers and students as well as between students; (2) easiness of use of the
environment for non-expert users in Computer Science; and (3) students’ individual
monitoring within the environment. The main inovation of this FIACI methodology
is its empirical nature through the realisation of three phases: intitial design,
experimentation, and changes and experimentation. In what follows, we describe the
tools development and obtained results.
Presenter, so far, this agent has been implemented only for the literacy VLE. It is
responsible for showing the interactive stories (course material) to the students.
Librarian: this agent has been implemented by the group VIRTUS. It searches
the Web for pages with bibliography citations and/or tutorials related to the
course domain. Monitor: two versions of this agent are needed, due to the
environments’ implementation differences. As it stands, the VIRTUS just keeps
the logs of each student’s session and creates individual reports. In the
SIANALCO environment, this agent also offers some help to the students in the
resolution of proposed exercises. Case-based: this agent is particular to the
literacy VLEs. It presents to the students tasks which are similar to the one he/she
has executed wrongly, as well as fragments of stories related to the one being
learned.
4 Final Remarks
We presented here the FIACI project, whose main aim is to develop a methodology
for the construction of software tools to give support to cooperative learning on the
Internet, following the social-interactive-constructivism pedagogical approaches. The
Agents technology was used, since it offers the needed functionalities for such kind of
VLE.
As a result, teachers will be able to easily build new VLEs or to update existing
ones, and students will work within easy-to-use VLEs which facilitate their
cooperation and the learning process as a whole.
References
1. Neves, A.M.M. “Ambientes Virtuais de Estudo Cooperativo”. Master Dissertation,
Universidade Federal de Pernambuco. 1999.
2. Paraguaçu, F. & Jurema, A. “Literacy in a Social Learning Environment (SLE):
collaborating with cognitive tools”. X Simpósio Brasileiro de Informática na Educação
(SBIE’1999). pp. 318-324. Curitiba, PR. Editora SBC. 1999.
3. Paraguaçu, F. & Costa, C. “Em direção a novas tecnologias colaborativas para alfabetizar
crianças em idade escolar”. XI Simpósio Brasileiro de Informática na Educação
(SBIE’2000) pp. 148-153. Editora SBC. 2000.
4. Paraguaçu, F., Prata, D. & Reis, A. “A Collaborative Environment for Visual
Representation of the Knowledge on the Web – VEDA”. ED-MEDIA Word Conference on
Educational Multimedia, Hypermedia & Telecommunications. Pp. 324-325. Tempere,
Finlândia, Editora AACE. 2001.
5. Vygotsky LS. “The Genesis of Higher Mental Functions”. In J. V. Wertsch (ed.) The
concept of activity in Soviet Psychology. Armonk: Sharp. 1981.
A Plug-in Based Adaptive System: SAAW
Abstract. The expansion of the World Wide Web and the use of computers in
education have increased the demand for Web courses and, consequently, the
need for systems that simplify their production and reuse. Such systems must
provide means to show the contents in an individualized and dynamic way,
which requires they present flexibility and interactivity as main characteristics.
Nowadays, Adaptive Hypermedia Systems (AHS) have been released to support
these characteristics. However, most of them do not allow the extension or
modification of their resources. In this work we present the SAAW, a prototype
of an AHS that allows the insertion/removal of plug-ins, among them the
iGeom, an application for geometry learning, that makes it more interactive and
dynamical.
1 Introduction
Despite the importance of the mathematics and geometry in the engineering and
computer sciences, there are a lot of difficulties in developing mathematical and
geometric abilities among the university students, as well as among high school
students. In this work we present a prototype of such an AHS, SAAW (Adaptive
System for Learning on the Web). We also present a plug-in for geometry, iGeom -
Interactive Geometry for Internet. The iGeom is a complete multi-plataform dynamic
geometry software (DGS), that we are developing since 2000. iGeom can be freely
downloaded from http://www.matematica.br/igeom. The SAAW isn’t available since it
is in its first test.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 791–793, 2004.
© Springer-Verlag Berlin Heidelberg 2004
792 L. de Oliveira Brandão, S. Isotani, and J. Gomes Moura
The plug-in is an important part of the SAAW architecture, because they are
directly related to the application domain. In addition, they are responsible for the
evaluation of the user’s interactions and for the largest interactivity with the system.
The iGeom [1] is a DGS, used to draw any euclidean constructions that are
traditionally made with ruler and compass. However, with a DGS the student gets a
more precise drawing and can freely move points over the screen. iGeom is
implemented in Java and can be used as an stand-alone application or as an applet. It
has some specific features as “recurrent scripts” and “automatic evaluation of
exercises”. The use of iGeom in SAAW allows: the creation/edition of exercises;
automatic evaluation; the adaptation of resources, taking into account the exercises
evaluation; to communicate to the server results of interactions with the user.
The SAAW prototype use the language PHP, the database manager MySQL and the
first plug-in used is the iGeom. This prototype dynamically generates HTML pages
adapted for each course and user, considering the system preferences and the student’s
model. This prototype (figure 2) is being used by students and teachers in a
compulsory discipline offered for an undergraduate course in mathematics in the
University of São Paulo (http://www.ime.usp.br/~leo/mac118/04).
A Plug-in Based Adaptive System: SAAW 793
4 Conclusion
In this work we present the architecture for an AHS (SAAW), based on plug-ins. The
plug-in is responsible for subject related interactivity with user. A prototype (SAAW)
of this system is in use with a plug-in to teach/learn geometry (iGeom). The iGeom
and SAAW produce an interactive environment allowing: teachers to produce on-line
lessons, with automatic evaluation of exercises; students to make geometry
constructions directly over the Internet pages; an individualized instruction
considering the student navigation style, knowledge level and learning rhythm.
References
1. Brandão, L. O., Isotani, S.: A tool for teaching dynamic geometry on Internet: iGeom. In
Proceedings of the Brazilian Computer Society Congress, Campinas, Brazil (2003) 1476-
1487
2. Brusilovsky, P. and Nijhawan, H. (2002) A Framework for Adaptive E-Learning Based on
Distributed Re-usable Learning Activities. In: M. Driscoll and T. C. Reeves (eds.):
Proceedings of World Conference on E-Learning, Montreal, Canada (2002) 154-161
3. Fiala, Z., Hinz, M., Houben, G., Frasincar, F.: Design and implementation of component-
based adaptive Web presentations. In Proceedings of ACM Symposium on Applied
Computing, Nicosia, Cyprus (2004) 1698-1704
4. Ritter, S., Brusilovsky, P., Medvedeva, O.: Creating more versatile intelligent learning
environments with a component-based architecture. In Proceedings of International
Conference on Intelligent Tutoring Systems, Texas, USA (1998) 554-563
Helps and Hints for Learning with Web Based Learning
Systems: The Role of Instructions*
Abstract. This study investigated the role of specific and unspecific tasks for
learning declarative knowledge and skills with a web based learning system.
Results show that learners with specific tasks where better for both types of
learning. Nevertheless, not all kinds of learning outcomes were equally
influenced by instruction. Therefore, instructions should be selected carefully in
correspondence with desired learning goals.
1 Introduction
Web based learning systems have some interesting properties that make them suitable
for knowledge acquisition and are expected to support an active, self guided, and
lifelong learning.
An advanced design of web based learning systems and an appropriate instruction
are both expected to improve E-Learning. So it is often reported that presented
instruction is an essential factor for navigating and learning with hypertext [e.g. 1].
Instructions sometimes dominate the influence of hypertext design [e.g. 2]. Or it is at
least postulated that different forms of design may be appropriate for different tasks
[3].
Two plausible goals for using hypertext systems are either unspecific as reading
chapters of a learning system or specific as searching for details within them or
practicing specific tasks with help of the system. Reading a hypertext requires to
decide which information is essential. However, there are only few navigation
decisions. Searching for details and practicing specific tasks within the hypertext
require to decide where to go next to find the desired information. However, they do
not have to separate central and secondary information already given by their tasks,
[cf. 4]
In one of our studies we tested the following hypotheses: We expected that readers
should acquire unspecific knowledge and searchers and users should acquire specific
knowledge and skills without piggyback details beside their tasks. Therefore
searchers should demonstrate more declarative knowledge after processing the
learning system than readers and users should demonstrate a higher amount of skill
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 794–796, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Helps and Hints for Learning with Web Based Learning Systems 795
2 Methods
3 Results
Skills. All three groups performed better after processing the chapter (64% of the
items answered correctly) than before (58% of the items answered correctly), F (1,
53) = 7.14, p = 0.01). Moreover, there was an effect of performed task on skill level
improvement, F (2, 53) = 3.31, p < .05. Contrary to our expectations, searchers
796 A. Brunstein and J.F. Krems
performed best after processing the chapter (68%) and improved most by processing
the chapter (7.1%). Users (3.5%) and readers (6.4%) both improved their
performance. Nevertheless, their gain in experience was less pronounced than the
improvement of searchers. Moreover, users (M = 65%) and readers (M = 59%)
performed worse than searchers after processing the chapters.
4 Discussion
This study has shown that knowledge and skill acquisition is affected by instructions
even with exactly the same hypertext design: Searchers answered more multiple
choice items on declarative knowledge than readers and users. Moreover, searchers
also demonstrated better application skills than readers and users. Therefore, only one
of two specific learning tasks affected better learning with a web based learning
system for advanced learners. One reason for these findings could be that learning to
practice a foreign language is a difficult task that can be hardly managed within 30
minutes. In contrast, it is much easier to answer detailed questions on application
instead. Remarkable is also that not all tasks were affected by instruction in the same
manner: open ended questions were answered equally well after processing the
chapter for all three groups.
For the design of web based learning tools, the results show the following: First, it
can be useful not only to manipulate the appearance of the system but also to guide
learners through the material by instruction relevant to their goals.
Second, not all kinds of desired knowledge are susceptive to manipulation of
instruction and web design. It seems so that some of them have to be practiced in
“real life” instead of been simulated by learning systems.
References
1. Chen, C., Rada, R.: Interacting with Hypertext: A Meta-Analysis of Experimental Studies.
Human-Computer-Interaction 11 (1996) 125-156
2. Foltz, P.W.: Comprehension, Coherence, and Strategies in Hypertext and Linear Text. In:
Rouet, J.F., Levonen, J.J., Dillon, A.P., Spiro, R.J. (eds.): Hypertext and Cognition.
Erlbaum, Hillsdale, NJ (1996) 109-136
3. Dee-Lucas, D.: Instructional Hypertext: Study strategies for different types of learning tasks.
Proceedings of the ED-MEDIA 96. AACE, Charlottesville, VA (1996)
4. Dee-Lucas, D., Larkin, J.H.: Hypertext Segmentation and Goal Compatibility: Effects on
Study Strategies and Learning. Journal of Educational Multimedia and Hypermedia 9 (1999)
279-313
Intelligent Learning Environment for Film Reading in
Screening Mammography
Joao Campos1, Paul Taylor1, James Soutter2, and Rob Procter2
1
Centre for Health Informatics, University College London, UK
2
School of Informatics, University of Edinburgh, UK
1 Introduction
In this paper we describe our work on building a computer based training system to
support breast cancer screening. We examine the design constraints required by
screening practices and consider the contributions of teaching and learning principles
of existing theoretical frameworks. Breast cancer is one of the main forms of cancer.
In Britain more than 40,000 cases are diagnosed each year [1]. The scale of the
problem has led several countries to implement screening programmes. In the UK, the
women aged between 50 and 64 are invited for screening every three years.
2 Screening Practice
Breast screening demands a high level of skill. Readers must identify abnormal
features and then decide whether or not to recall the patient. Radiological signs may
be very small, faint and are often equivocal. The interpretation of such signs involves
setting a threshold for the risk of disease that warrants recall. The threshold should
maximise the detection of cancer without recalling too many healthy women. The
boundary between recallable and non-recallable will vary. Interpretation, therefore,
involves recognising signs of both normal and abnormal appearance and also an
understanding of the consequences of decision errors.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 797–799, 2004.
© Springer-Verlag Berlin Heidelberg 2004
798 J. Campos et al.
number of cases read and the sensitivity and specificity of readers [2]. However, the
low prevalence of cancer means radiologists must examine a large number of cases to
detect even a small number of cancers. The quality of feedback is also a factor [3].
Side-by-side mentoring, third reading, assessment clinics and reviews of missed
cancers all provide opportunities for feedback.
5 Our Design
Our work is carried out as part of a larger project [6] to demonstrate the advantages of
a digital infrastructure for breast screening. The aim is to trial a small high-bandwidth
network providing access to a substantial database of digital mammograms and to
demonstrate a number of applications including a CBT. The data used in this work
have been gathered through interviews, group discussions and observational work.
The aim of the first prototype is to provide readers with additional reading
experience from a broad range of cases accompanied by immediate, appropriate and
accurate feedback. Training will be provided using high-resolution digital images and
a soft copy reading workstation. The Grid infrastructure allows both the cases and
work involved in annotating them to be shared between centres.
Our design allows for exploratory and experiential learning. It will permit
experiments to evaluate how users explore the available data; to collect data on user
Intelligent Learning Environment for Film Reading in Screening Mammography 799
performance, skill and expertise; and on individual case difficulty and roller
composition. The course of a typical training session would be: start by choosing
which set of cases to view, then for each case, identify all the notable features on each
mammogram. Next, decide whether the case as a whole is recallable or non-recallable
and, after all the cases have been read, complete the session by reviewing the correct
solutions and performance statistics. Feedback would be provided on each task and on
the overall progress of the user. The difficulty of the tasks may be adjusted. The
system would also present suggestions of areas that the user might wish to review
again or to concentrate on, and would keep a record of what the user has done. In this
way, the training system can induce users to reflect on strategy and plans.
References
1. Cancer Research UK: Press Release, (2003) 2 June.
2. L. Esserman, H. Cowley, C. Eberle, et al. Improving the accuracy of mammography:
volume and outcome relationships, JNCI (2002) 94 (5), 369-375.
3. M. Trevino and C. Beam: Quality trumps quantity in improving mammography
interpretation, Diagnostic Imaging Online, (2003)18 March.
4. B. du Boulay. What does the AI in AIED buy? In Colloquium on Artifficial Intelligence in
Educational Software, (1998) .3/1-3/4. IEE Digest No: 98/313.
5. Akhras, F. and Self, J.: System Intelligence in Constructivist Learning. International Journal
of Artificial Intelligence in Education,(2000)11(4):344-376.
6. J.M. Brady, D.J. Gavaghan, A.C. Simpson et al. eDiaMoND: A Grid-enabled federated
database of annotated mammograms. In Berman, Fox, and Hey, Grid Computing: Making
the Global Infrastructure a Reality, (2003) 923-943, Wiley.
Reuse of Collaborative Knowledge in Discussion Forums
Weiqin Chen
1 Introduction
Discussion forums have been widely used in Web-based education and computer
supported collaborative learning (CSCL) to assist learning and collaboration. These
discussion forums include questions and answers, examples, articles posted by former
students, thus they contain tremendous educational potentials for future students [1].
By reusing these discussion forums as new learning resources, future students can
benefit from previous students’ knowledge and experiences.
However, it is not a trivial task to extract relevant information from discussion fo-
rums given the thread-based structure of them. Some efforts have been made on re-
using the discussion forms. Helic and his colleagues [1] described a tool to support
conceptual structuring of discussion forums. They attached a separate conceptual
schema to a discussion forum and the students manually assigned their messages to
the schema. From our experience in fall 2003, this method has two drawbacks. First,
some messages could be assigned to more than one concept in the schema. Second,
the students were not motivated enough to make extra effort in assigning their mes-
sages to concepts.
In our research, we combine an automatic document classification approach with a
domain model to find relevant messages (with a certainty factor) from previous
knowledge building process and present them to students. The students’ feedback is
used to improve its performance of the system.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 800–802, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Reuse of Collaborative Knowledge in Discussion Forums 801
In this section we present the main elements in reusing the collaborative knowledge,
including the conceptual domain model, the message classification method and the
integration with a learning environment.
A conceptual domain model is used to describe the domain concepts and the relation-
ships among them, which collectively describe the domain space. A simple concep-
tual domain model can be represented by a topic map. Topic maps [4] are a new ISO
standard for describing knowledge structures and associating them with information
resources. It is used to model topics and their relations in different levels. The main
components in Topic maps are topics, associations, and occurrences. The topics rep-
resent the subjects, i.e. the things, which are in the application domain, and make
them machine understandable. A topic association represents a relationship between
topics. Occurrences link topics to one or more relevant information resources. Topic
maps provide a way to represent semantically the conceptual knowledge in a certain
domain.
In our prototype, we use a topic map to represent the domain model of Artificial
Intelligence (AI). This domain model includes AI concepts and their relations such as
machine learning, agents, knowledge representation, searching algorithm, etc. These
concepts are described as topics in the topic map. Relations between the concepts are
represented as associations. The occurrence describes the links to the messages where
the concept was discussed in the discussion forum. The occurrence is generated by
the automatic classification algorithm presented in next subsection.
Once the conceptual domain model is constructed, messages from previous knowl-
edge building can be classified based on this model [2].
In the prototype we designed a keyword recognizer and an algorithm to determine
the relevance of a message to a concept in the domain model. The keyword recog-
nizer identifies the occurrence of the concepts, including their basenames and variants
of the basenames in the domain model. Relevance is determined using an algorithm
that applies a weight to the keywords in the documents. There are several factors that
the algorithm uses to compute the relevance. For example:
Keyword weight is based on where a concept or its variant is located within
a message. A keyword receives the highest rating if it appears in a title.
Frequency of occurrence is based on the number of times a concept or its
variant appears in a message in relation to the size of the message.
The classification results are stored in a MySQL database. The database includes
both the messages (title, author, timestamp, thread information) and the concepts they
are related to with values of relevance.
802 W. Chen
Acknowledgments. The author would like to thank the anonymous reviewers for
their constructive comments which helped improve this paper.
References
1. Helic, D., H. Maurer, and N. Scerbakov, Reusing discussion forums as learning resources in
WBT systems, in Proc. of the IASTED Int. Conf. on Computers and Advanced Technology
in Education. 2003: Rhodes, Greece. p. 223-228.
2. Craven, M., et al. Learning to extract symbolic knowledge from the World Wide Web. in
Prof. of the 15th National Conference on AI. 1998: Madison, Wisconsin. p. 509-516
3. Muukkonen, H., K. Hakkarainen, and M. Lakkala. Collaborative technology for facilitating
progressive inquiry: future learning environment tools. in Proc. of the Int. Conf. on Com-
puter Supported Collaborative Learning (CSCL’99). 1999. Palo Alto, CA. p. 406-415.
4. Pepper, S. and G. Moore, XML Topic Maps (XTM) 1.0 -TopicMaps.Org Specification.
2001. http://www.topicmaps.org/xtm/1.0/
A Module-Based Software Framework for E-learning
over Internet Environment*
1 Introduction
E-learning overcomes spatial and temporal limitations of traditional education, pro-
motes interaction between teachers and learners, and enables personalized instruction
[1]. However, in many countries, E-learning is not as much popularized yet as we
expect, although Internet infrastructure and number of Internet users grow rapidly.
This leads to an important idea: most problems of E-learning lie in its contents, soft-
ware, and human aspect, not in the Internet infrastructure. This paper discusses vari-
ous problems of E-learning, and proposes a novel software framework to avoid them.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 803–805, 2004.
© Springer-Verlag Berlin Heidelberg 2004
804 S.-J. Cho and S. Lee
into cognitive overload, because they have to judge whether it is helpful to their
learning whenever they encounter searched or linked materials. They sometimes miss
core information, since they have to understand and process it by themselves.
Contents: Internet educational contents often lacks of systematic, well-organized
and well-developed materials. Many web sites have duplicated or overlapped con-
tents. Many educational contents on free web sites lack profundity, because in many
cases, volunteers prepare them as their hobbies. Teachers hardly discover useful ma-
terials from Internet since they are widely spread over Internet without systematic
connections, systematic arrangement, and mutual correlation.
4 Conclusion
In this paper, a novel E-learning software framework in the Internet environment is
proposed. It is a module-based learning engine with five modules. By reconfiguring
connection status, it can be adopted with flexibility for various educational applica-
tions such as distance education and collaborative learning. A user can search other
users with same interests over Internet, and he can access and utilize their modules to
compose an effective learning engine, saving a lot of time and money by reusing
them.
References
1. Moore, M.G., Kearsley, G.: Distance Education, Wadsworth Publishing (1996)
2. Yi, D.B.: The Psychology of Learners in Multimedia Assisted Language Learning, Multi-
media-Assisted Language Learning 1 (1998) 163-176
3. IEEE P1484 LTSA Draft 8: Learning technology standard architecture, http://ltsc.ieee.org/
doc/wg1/IEEE_1484_01_D08_LTSA.doc4.
4. ISO/IEC JTC1/SC29/WG1 15938: Multimedia Content Description Interface, http://www.
cselt.it/mpeg/standards/mpeg-7/mpeg-7.zip
Improving Reuse and Flexibility in Multiagent Intelligent
Tutoring System Development Based on the COMPOR
Platform
1 Introduction
Currently, multiagent systems have been widely used as an effective approach for
developing different kinds of complex software systems. Indeed, Intelligent Tutoring
Systems can be considered as complex systems and have been influenced by this
trend. The designer of an ITS must take into account different kinds of complex and
dynamic expertise such as the domain knowledge and pedagogical aspects, among
others. Thus, the design of an ITS is difficult and a time-consuming task. To build an
ITS requires not only knowledge of the tutoring domain and different pedagogical
approaches, but also various technical efforts in terms of software engineering.
In this paper we adopt the COMPOR platform [1] as a multiagent development in-
fraestructure to support the development of Cooperative Intelligent Tutoring Systems
(ITS) based on the Mathema environment [2], as shown in the next section. By
adopting COMPOR, we can provide the ITS designers with software engineering
facilities such as reuse and flexibility, saving time on ITS development.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 806–808, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Improving Reuse and Flexibility 807
4 Final Remarks
In this paper we have briefly introduced the use of COMPOR as a software engi-
neering platform to improving reuse and flexibility in multiagent intelligent tutoring
system development. By means of the encapsulation of the ITS functionalities in
functional components and using the COMPOR for assembling these components, it
is possible to develop multiagent ITS with more effectiveness, reducing time con-
suming.
References
1. Costa, E. B., Almeida, H. O., Perkusich, A., Paes, R. B. COMPOR: A component-based
framework for building Multi-agent systems. In Proceedings of Software Engineering
Large-scale Multi-agent systems - SELMAS’03, Portland – Oregon - USA, (2003) 84-89
2. Costa, E.B.; Perkusich, A.; Ferneda, E. From a Tridimensional view of Domain Knowledge
to Multi-agent Tutoring System. In F. M. De Oliveira, editor, Proc. of 14th Brazilian Sym-
posium on Artificial Intelligence, Volume 991, LNAI 1515, Springer-Verlag, Porto Alegre,
RS, Brazil, (1998) 61-72
3. Costa, E. B., Almeida, H. O., Lima, E. F., Nunes Filho, R. R. G, Silva, K. S., Assunção, F.
M. A Cooperative Intelligent Tutoring System: The case of Musical Harmony domain. Pro-
ceedings of 2nd Mexican International Conference on Artificial Intelligence - MICAI’02,
Mérida, Yucatán, México, LNAI, Springer Verlag (2002) 367-376.
Towards an Authoring Methodology in Large-Scale
E-learning Environments on the Web
Evandro de Barros Costa1, Robério José R. dos Santos2, Alejandro C. Frery1, and
Guilherme Bittencourt3
1
Departamento de Tecnologia da Informação, Universidade Federal de Alagoas,
Campus A. C. Simões, Tab. do Martins, Maceió -AL – Brazil, Phone: +55 82 214-1401
{Evandro, frery}@tci.ufal.br
2
Instituto de Tecnologia em Informática e Informação do Estado de Alagoas
Maceió, Alagoas, Brazil
[email protected]
3
Universidade Federal de Santa Catarina
Santa Catarina, Brazil
[email protected]
1 Problem Statement
We propose a critical evaluation of some assumptions and paradigms adopted by the
AI community during the last three decades, mainly examining the gap between per-
ception and description in the process of content annotation. In particular, we focus
on that gap in AI-ED research in the context of distributed environments speculating
about the content annotation process in authoring systems.
The problem of authoring educational content for limited and controlled commu-
nities has been extensively studied. This paper tackles the broader problem of
authoring for large-scale, distributed, fuzzy communities, as those emerging in mod-
ern e-learning systems on the Web. Differently from other approaches in such
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 809–811, 2004.
© Springer-Verlag Berlin Heidelberg 2004
810 E. de Barros Costa et al.
The Depth dimension provides room for epistemological refinements in our per-
ceptions of each context, depending on the methodologies used to deal with objects
and their relationships inside that context.
The Laterality dimension describes ecological facilities for each context and
depth. These facilities allow grasping other related bodies of annotated content, fa-
voring reuse and share of annotated content.
Consider the problem of modeling the classical logic domain for pedagogical pur-
poses. Should it be made with an axiomatic, with a natural deduction or with a se-
mantic approach (three possible contexts for the same domain)? If we choose the
semantic approach, to which depth should one go, namely, to the zero order (proposi-
tional logic), to the first order (predicate logic) or to higher order logics?
Assume that the axiomatic context with zero order depth have been chosen. Two
possible lateralities for this view are set theory and the principle of finite induction.
3 Conclusions
In this work we made a critical review of some assumptions and paradigms adopted
by the AI community during the last three decades with special attention in examin-
ing the gap between perception and description. A new set of requirements to main-
tain an adaptive behaviour in the process of content annotation and authoring for
large-scale, distributed, fuzzy communities was identified. Such communities emerge,
for instance, in modern e-learning systems on the Web. In doing so, we have pre-
sented steps towards a formal definition of a new methodology for generating anno-
tated content in the context of AI-ED community.
References
1. Sowa, J.F., Conceptual Structures: Information Processing in Mind and Machine, Addison
Wesley Publishing Company, Reading, MA (1984)
2. Mizoguchi, R.; Bourdeau J. - Using Ontological Engineering To Overcome Common AI-
ED Problems, IJAIED (2000)
3. Staab, S.; Maedche A - Ontology Engineering Beyond The Modelling of Concepts and
Relations In Proceedings of the ECAI’2000 Workshop on Application (2000)
4. Costa, E.B.; Lopes, M.A.; Ferneda, E. “MATHEMA: A Learning Environment Based On
Multi-Agent Architecture”, Proceedings of the 12th Brazilian Symposium on Artificial In-
telligence, Campinas-Brazil, Wainer J.;Carvalho A. (Eds), Volume 991 of Lecture Notes in
Artificial Intelligence, SPringer-Verlag (1995) 141-150
5. Costa, E.B.; Perkusich, A. “A Multi-Agent Interactive Learning Environment Model”,
Proceedings of the 8th World World Conference on Artificial Intelligence in Education /
Workshop on Pedagogical Agents, Kobe (Japão), august (1997)
6. Costa, E.B.; Perkusich, A.; Ferneda, E. “From a tridimensional view of domain knowledge
to multi-agents tutoring systems”, Advances in Artificial Intelligence. 14th Brazilian Sym-
posium on Artificial Intelligence, SBIA´98, Campinas Brazil, Lecture Notes in Artificial
Intelligence, Vol. 1010. Springer (1998)
ProPAT: A Programming ITS Based on Pedagogical
Patterns
1 Introduction
Research on programming psychology points out two challenges that a novice pro-
grammer has to handle: (i) learning a new programming language, requiring learning
and and memorizing the syntax and semantics of a programming language; (ii)
learning how to solve problems to be executed by a computer: where the student has
to learn the computer operations.
Although a programming language has a lot of details, the first challenge is not the
most difficult part. Evidences show that learning a second language is, in general,
easier. A hypothesis is that the student has already acquired abilities to solve prob-
lems using the computer which is the common skill for learning different languages.
In relation to the second challenge, research on cognitive theories about program-
ming learning has shown evidences that experienced programmers store and retrieve
old experiences on problem solving that can be applied to a new problem and can be
adapted to solve it. However, a novice programmer does not have any real experi-
ences but the primitive structures from the programming language he is currently
learning [3].
Inspired by these ideas, the Pedagogical Patterns community proposes a strategy
to teach how to program by presenting small programming pieces (elementary pro-
gramming patterns), instead leaving the student to program from scratch. Supposing
that students who learned elementary programming patterns will, in fact, construct
programs with them, an Intelligent Tutoring System (ITS) could take a number of
advantages from this teaching strategy, such as: (i) the tutor can establish a dialogue
with the student in terms of problem solving strategies [3]; (ii) the tutor module for
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 812–814, 2004.
© Springer-Verlag Berlin Heidelberg 2004
ProPAT: A Programming ITS Based on Pedagogical Patterns 813
diagnosing the program of the student would be able to reason about the patterns in a
hierarchical fashion, i.e., to detect program faults in different levels of abstraction.
In this paper, we present a new Eclipse IDE for programming learning based on
the Pedagogical Patterns teaching strategy, extended with a Model Based Diagnosis
system, to detect errors in the student program in terms of: (1) wrong use of the lan-
guage sentences and; (2) wrong use and decomposition of Pedagogical Patterns.
3 Diagnosis
The basic idea for diagnosing programs is to derive a component model directly from
the program and from the programming language semantics. This model must distin-
guish components, connections, describe their behavior and the program structure.
Similar to diagnosis of physical devices, the system description, in this case, is the
student program behavior which reflects its errors. The observations are the incorrect
outputs in the different points of the original program code. The predictions are not
made by the system, but by the student and therefore in this situation it is possible for
the student to communicate her programming goals to the tutor. We propose an addi-
tion to the diagnosis method described in [2] so that Programming Patterns can also
be modeled as new components. Thus, the diagnosis module would be able to reason
about patterns in a hierarchical fashion, i.e., to detect program faults in different lev-
els of abstraction.
Figure 1 shows the component model (for a C program) for the problem: Read
numbers, taking their sum until the number 99999 is seen. Report the average. Do not
include the final 99999 in the average. By identifying patterns in the program model,
we can construct a new model with a reduced number of components. By doing so,
814 K.V. Delgado and L. Nunes de Barros
besides getting a model that can improve efficiency on the diagnosis process, the
student will be asked to make predictions in terms of high-level strategies and goals.
Fig. 1. A structural model of a program solution. The box including four components repre-
sents a pattern that can be treated as a regular component of the language for the MBD system.
The identification of the patterns used by the student can be done in two different
programming modes in PROPAT: (I) high control mode, where the teacher has to
specify all the problem subgoals and the student has to select a pattern to solve each
one of them; (II) medium control mode, where the student can also freely type his
own code.
4 Conclusions
PROPAT is a new programming environment, that allows the student to program
using pedagogical patterns. By using a model based diagnosis approach for detecting
the student errors, we add to PROPAT the state of the art on program diagnosis. We
also propose the identification of the patterns used by the student to create a program
model including these patterns as components. This idea will allow for a better com-
munication between the tutor system and the student. The PROPAT programming
interface is already implemented, as an Eclipse plug-in, in two programming modes:
high control e medium control.
References
1. Benjamins, R.: Problem Solving Methods for Diagnosis. PhD thesis, University of Amster-
dam (1993)
2. Stumptner, M, Mateis, C., Wotawa, F.: A Value-Based Diagnosis Model for Java Programs.
In: Eleventh International Workshop on Principles of Diagnosis (2000)
3. Jonhson, W. L.: Understanding and Debugging Novice Programs. In: Artificial Intelligence,
Vol. 42. (1990) 51-97
4. Wallingford, E.: The Elementary Patterns home page,
http://www.cs.uni.edu/~wallingf/patterns/elementary (2001)
AMANDA: An ITS for Mediating
Asynchronous Group Discussions
1 Introduction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 815–817, 2004.
© Springer-Verlag Berlin Heidelberg 2004
816 M.A. Eleuterio and F. Bortolozzi
3 System Implementation
AMANDA was firstly implemented in Lisp, where most of the research on its
mediation mechanisms was conducted. When the algorithms were properly tested and
tuned, the system was redeveloped in Java. The current version of AMANDA [6] is
composed of a Java core on the server side and a web-based interface on the client
side. The Java core comprises the mediation algorithms, while the web-based
interface provides tutors and learners with a suitable means for interacting with the
discussion.
AMANDA: An ITS for Mediating Asynchronous Group Discussions 817
AMANDA has been used in several group discussions and the results obtained in the
field so far are promising. AMANDA is capable to autonomously mediate collective
discussions and motivate the students by finding patterns of interaction among the
group, regardless of the type of learners, the subject under discussion and the number
of participants. AMANDA has proven to be advantageous over traditional (human-
mediated) forum systems by improving group interaction. In AMANDA-mediated
discussions, with no human mediating effort, we have observed high participation
rates (over 78% in average). Another positive outcome is that AMANDA discussions
tend to remain focused on the proposed issues, with little or no deviations of subject,
due to the strong argumentative nature of the mediation.
In addition, AMANDA has proven to be an effective tool for online tutors. It is
known that in traditional discussion forums, tutors spend considerable effort in
articulating the students’ ideas, filtering unrelated postings and keeping track of the
discussion. In AMANDA discussions, tutors tend to play more cognitive roles, such
as resolving specific disagreements, clarifying concepts and providing disturbing
ideas to motivate reflection and debate.
Ongoing research on AMANDA involves the design of algorithms that assess the
learners according to their contribution for the discussion. This research aims at
providing online tutors with a computational assessment method that takes into
account the contribution of each participant to the collective learning.
References
1. Quignard M., Baker M. Favouring modellable computer-mediated argumentative dialogue
in collaborative problem-solving situations; Proceedings of the Conference on Artificial
Intelligence in Education (AI-Ed 99) 1-8, Le Mans. IOS Press, Amsterdam, 1999.
2. Veerman, A. Computer-supported collaborative learning through argumentation; PhD
Thesis; University of Utrecht, 2000.
3. Leary D. Using AI in Knowledge Management: Knowledge Bases and Ontologies; IEEE
Intelligent Systems, May/June, 1998.
4. Greer, Jim et al. Lessons Learned in Deploying a Multi-Agent Learning Support System:
The I-Help Experience. Artificial Intelligence in Education; J.D. Moore et al. (Eds.). IOS
Press 410-421, 2001.
5. Eleuterio M. AMANDA – A Computational Method for Mediating Asynchronous Group
Discussions. PhD Thesis. Université de Technologie de Compiègne and Pontifícia
Universidade Católica do Paraná, 2002.
6. Amanda website, available at www.amanda-system.com.br
An E-learning Environment in Cardiology Domain
Abstract. The research reported in this short paper explores the integration of
virtual reality, case-based reasoning, and multiple linked representations in a
learning environment concerning medical education. We have focused on car-
diology domain by adopting a pedagogical approach based on case-based
teaching and cooperative learning. Our aim is to engage apprentices in appro-
priate problem situations connected with a rich and meaningful virtual medical
world. To accomplish this, we have adopted the MATHEMA environment to
model knowledge domain and to define an agent society in order to generate
productive interactions with the apprentices during problem solving regarding a
given case. Also, the agent society may provide apprentices with adequate
multimedia content support and simulators to help them in solving a problem.
1 Introduction
Case-based learning has been used in medical schools [2] [3]. In this approach, ap-
prentices learn by solving clinical problems, as for instance, engaged in a problem
situation where actual patient cases are presented for diagnosis. In so doing, for ex-
ample, the apprentices have the opportunity to summarize what they know, what their
hypotheses are, and what they still need to know. Also, they can plan their next steps;
and separately do whatever research is needed to continue solving the problem.
The research reported in this paper is part of an ongoing project which aims to
simulate a Web-based virtual medical office. In this paper, we present an e-Learning
*
Scholarship CNPQ. Electrical Engineering Doctorate Program COPELE/DEE.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 818–820, 2004.
© Springer-Verlag Berlin Heidelberg 2004
An E-learning Environment in Cardiology Domain 819
3 E-learning Environment
4 Conclusion
References
Abstract. This poster describes the pedagogical aspects of the ExpertCop tuto-
rial system, a multi-agent geosimulator of the criminality in a region. Assisting
the user, a pedagogical agent aims to define interaction strategies between the
student and a geosimulator in order to make simulated phenomena better under-
stood.
1 Introduction
The police force allocation in an urban area to perform a preventive policing is a tacti-
cal management activity that is usually decentralized by sub sectors in police depart-
ments spread in this area. What is intended from these tactical managers is that they
analyze the disposition of crime in their region and that they perform the allocation of
the police force based on this analysis.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 821–823, 2004.
© Springer-Verlag Berlin Heidelberg 2004
822 E.V. Filho, V. Pinheiro, and V. Furtado
Experiments in this domain cannot be performed without a high risk and high costs
once they involve human lives and public patrimony. In this context, simulation sys-
tems for teaching and decision support are a primordial tool. The ExpertCop system
aims to support education through the induction of reflection on simulated phenomena
of crime rates in an urban area. The system receives as input a police resource alloca-
tion plan and it makes simulations of how the crime rate would behave in a certain
period of time. The goal is to lead the student to understand the consequences of
his/her allocation as well as the cause-and-effect relations.
In the ExpertCop system, the simulations occur in a learning environment and
along with graphical visualizations that helps the student’s learning. The system al-
lows the student to enter parameters dynamically and analyze the results, besides
giving support to the educative process by means of an intelligent tutorial agent, the
pedagogical agent.
The pedagogical agent (PA) is responsible for helping the student to understand the
implicit and explicit information generated during the simulation process. It is also a
PA mission to induct the student to reflect about the causes of the events.
PA, endowed with a set of pedagogical strategies, contemplates the tutorial module
of the ExpertCop system. These strategies are the following:
The computational simulation per si, which leads the student to learn by do-
ing and to understand the cause-effect relationship of his/her interaction;
An interactive system providing usable interface with graphics showing the
evolution of the simulation and allowing user/student intervention;
User-adaptive explanation capabilities, which allow macro and micro level
explanation of the simulation. Adaptation is done in terms of vocabulary and
level of detail according to the user’s profile.
Micro-level explanation refers to the agent’s individual behavior. The criminal be-
havior in ExpertCop is modeled in terms of Problem Solving Method - PSM (Fensel
et al 2000), where the phases of the evaluation reasoning process of committing a
crime is represented. ExpertCop explains the simulation events by accessing a log of
the evaluation PSM of criminals for all crimes.
Macro-level explanation refers to emergent or global behavior. In ExpertCop the
emergent behaviour represents the growth or reduction of the crime and its
tendencies. This emergent behavior reflects the effect of the events generated by the
agents and their interaction on the environment. To identify this emergent behavior,
the pedagogical agent applies a Knowledge Data Discovery – KDD process (Fayyad
1996), searching for patterns, in the database generated by the simulation process (Fig.
1). First it collects the simulation data (events generated from the interaction of the
agents as date, motive, crime type, start time, final time and patrol route) and pre-
processes them, adding geographic information as escape routes, notable place
coordinates, distance between events, agents and notable places and social and
Mining Data and Providing Explanation to Improve Learning in Geosimulation 823
References
Jean-Mathias Heraud
1 Introduction
During a learning process, when a learner hesitates when choosing what educational
activity to do next, it would be interesting to use similar situations to propose a new
way to learn the targeted concept. Therefore we propose an adaptation of a path with
alternate educational activities which has been successful in the past in a similar
situation. In Pixed, teachers can index educational activities by concepts of the
domain knowledge ontology. Next, learners can access these educational activities via
three navigation modes according to a chosen concept. These modes are:
The free path mode: a hyperspace map representing the whole domain knowledge
ontology is the only navigation means available. The learner is free to navigate
among all the course concepts. Moreover, for each concept s/he can choose among
associated educational activities.
The assisted mode: the learner gets a graphical map representing a conceptual path.
This map represents the concepts semantically close to the concepts preceding the
goal concept.
The experience-based mode: the learner gets an experience path. The learner can
navigate in this experience path, choose notions, play educational activities that
previously have helped other learners to reach the same goal, and consult
annotations on these educational activities. This navigation mode is described in
the next section.
When the learner navigates, the system traces learning interactions as learning
episodes. Using episode dissimilarity, the system retrieves similar episodes to this
desired situation. From these episodes, the system creates an adapted episode, trying
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 824–826, 2004.
© Springer-Verlag Berlin Heidelberg 2004
A Web-Based Adaptive Educational System Where Adaptive Navigation Is Guided 825
to maximize the episode potential. Then an experience path is extracted from this
adapted episode.
A learning episode is a high level modeling of the student’s learning task composed
of the learning context, the actions performed and the episode result.
The different parts of the learning context are the learner identifier, the timestamp,
the list of previous educational activities exploited by the learner with optional
evaluation results, the domain knowledge ontology, the goal concept in this episode,
the current conceptual path and the concepts the learner is supposed to master
represented by the learner’s domain knowledge model.
Actions performed by the learner to try to reach the targeted concept make up a
sequence of elements called trials. A trial is an ordered sequence of logged elements.
A trial always begins with a unique concept currently selected by the learner to try to
progress towards the targeted concept: the current concept. The following elements
are a combination of educational activities, annotations, and quizzes about the
mastering level of the current concept. A trial ends with the beginning of a new trial
(the choice of a new current concept) or by the last part of an episode.
The different parts of the episode result are quizzes played by the learner
concerning the goal concept and the learner’s domain knowledge model at the end of
the episode.
In order to use the past users’ experience to guide future users, it is important that the
system has some way of evaluating the quality of those past experiences. However,
before selecting good cases, we filter the experience base to the similar experience.
We use a set of similarities and dissimilarities according to the specificities of a
learning episode’s features. We choose a metric for both notions and trial
dissimilarities.
The analysis of simple dependences between how trials work and the result of the
episodes allow us to build what we called the “potential” of a source trial, which will
be combined with other trial potentials to get the episode potential.
We compose trial dissimilarities and trial potentials in order to respectively build
what we call trials sequence dissimilarities and trials sequence potential. Moreover,
we propose to calculate the potential of educational activities for a specific goal, in
order to enable a finer adaptation of the episode.
The adaptation consists of building and proposing a new episode adapted from
existing ones. This episode is presented as an adapted path in existing experience. The
learner can navigate in this experience path through the interface. The adaptation is
based on the addition of best potential educational activities within an adapted list of
the best potential trials (worst ones are cancelled, new ones are added on their
potential value).
Fig. 1. Left) a conceptual path and Right) an experience path in the Pixed navigation frame
Figure 1 is composed of two screenshots of the system Pixed. The left one
illustrates a frame containing a conceptual path and the right one contains an
experience path. When the learner navigates in the experience path, s/he can choose
notions (dots), quizzes (question marks) or educational activities (document icons)
already played by past users, and annotations written by past users (note icons)
concerning these educational activities.
Improving Knowledge Representation, Tutoring, and
Authoring in a Component-Based ILE
1 Introduction
Research into reducing the high expense and complexity of ITS development is taking
on increased significance as progressively more and more systems are built for use in
the classroom. The rationale for such research is clear, it takes approximately 100-200
hours of development time to produce 1 hour of instruction from an ITS [7]. Although
a reusable interface and a separate tutoring component can reduce the complexity it
does not overcome one of the major challenges; that of enabling domain experts to be
directly involved in authoring. This can be achieved by appropriate authoring tools
which have the potential to decrease the time, cost and skill threshold as well as
support the whole design process and enable rapid prototyping [7]. Additionally, the
expense of system development can be reduced by designing with interoperability and
component reusability in mind. This approach has been successfully demonstrated in
[4].
This paper highlights work in progress to improve the authoring and tutoring
capabilities of DANTE; an applied ILE in the field of mathematics (see [5],[6]). We
discuss the improvements made in the system’s knowledge representation by the
employment of the Java Expert System Shell (JESS3). In addition, we outline our
evaluation of CTAT; a suite of Cognitive Tutor Authoring Tools [3] and present our
research into the feasibility of integrating it with the existing framework.
1
Parts of the research reported here have been completed while the first author was studying
for an MSc in the School of Informatics and other while he was employed in the School of
Mathematics under a University of Edinburgh Moray Endowment Fund.
2
Corresponing author: [email protected]. School of Mathematics, The University of
Edinburgh, Mayfield Road, EH93JZ, Edinburgh, UK. Tel: +44-131-6505086
3
JESS: http://herzbere.ca.sandia.gov/Jess
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 827–829, 2004.
© Springer-Verlag Berlin Heidelberg 2004
828 C. Hunn and M. Mavrikis
Fig. 1. DANTE applied in different situations with activities for triangles and vectors
2 Employing Jess
Although DANTE’s framework was adequate for observations, cognitive task
analysis and small activities, it was quite limited particularly with respect to the time
taken to author and modify the embedded knowledge. Therefore, we first employed
the Java Expert System Shell (JESS). The execution engine of JESS offers a complete
programming language from which one can invoke JavaBean code (allowing us to use
DANTE’s state-aware JavaBean objects). In addition, it gives us the flexibility to
have an advanced solution even on the web. JESS has a storage area which can be
used to store and fetch Java objects. This allows inputs and results to be passed
between JESS and Java. More importantly, facts and rules can be comprised from
properties of Java objects as well as fuzzy linguistic terms.
By employing JESS, DANTE’s architecture includes the inference engine that
JESS provides, a working memory where the current state of the student is kept, and a
rule base that provides the generic mechanism that tackles general aspects of user
behaviour and goal-subgoal tracking. For each activity a second set of rules represent
the domain knowledge. Authoring is now easier both in a conceptual and at a
technical level and the rules in Jess are isolated from each other both procedurally and
logically and the semantics of the syntax (even for authors with less programming
experience) are a lot more intuitive than in a procedural programming language.
In order to reduce the authoring time for activities, we tried to integrate CTAT with
DANTE and highlighted differences between the frameworks. For example, a
limitation in representing ranges of values at the state-aware components presents
problems in using some of DANTE’s components (eg. a slider). In addition, CTAT
tutors are based on modeling discrete states, thus the modeling of DANTE’s
exploratory activities presented a problem. However, we were able to replicate, other
purely procedural, activities. We constructed a custom Dormin Widget (a matrix
control) for using it with activities which involve matrices. Using this widget we
authored a tutor that can teach conversion of quadratic equations to their standard
form. Using the behaviour recorder, debugging and validation our rules was
substantially faster. Our study indicates favorably that there is a basis for further
integration of CTAT with elements of DANTE.
References
1. S. B. Blessing. A Programming by Demonstration Authoring Tool for Model-Tracing
Tutors. International Journal of AIED (1997), 8, 233-261
2. Hunn, C. Employing JESS for a web-based ITS. Master’s thesis, The University of
Edinburgh, School of Informatics (2003)
3. Koedinger, K., Aleven, V. & Heffernan, N.T. Toward a Rapid Development Environment
for Cognitive Tutors. 12th Annual Conference on Behaviour Representation in Modelling
and Simulation. SISO (2003)
4. Koedinger, K. R., Suthers, D. D., & Forbus, K. D. Component-based construction of a
science learning space. International Journal of AIED, 10 (1999).
5. M. Mavrikis. Towards more intelligent and educational DGEs. Master’s thesis, The
University of Edinburgh, Division of Informatics; AI, 2001.
6. M. Mavrikis and A. Maciocia. WaLLiS: a web-based ILE for science and engineering
students studying mathematics. Workshop of Advanced Technologies for Mathematics in
11th International Conference on AIED, Sydney, 2003.
7. Murray, T. An Overview of ITS Authoring Tools: Updated analysis of the state of the art in
Authoring Tools for Advanced Learning Environments. Murray, T., Blessing, S. and
Ainsworth S. Kluwer Academic Publishers (2003)
A Novel Hybrid Intelligent Tutoring System and Its Use
of Psychological Profiles and Learning Styles
Weber Martins1,2, Francisco Ramos de Melo1,
Viviane Meireles1, and Lauro Eugênio Guimarães Nalini2
1
Federal University of Goias, Computer Engineering,
{weber, chicorm, vmeireles}@pireneus.eee.ufg.br
2
Catholic University of Goias, Department of Psychology,
[email protected]
1 Introduction
In classical tutorial, users access the content in basic, intermediary and advanced
levels progressively. In the tutorial focused in activities, another activity with some
information or additional motivations precedes the accomplishment of the goal activ-
ity. In the tutorial customized by the apprentice, between the introduction and the
summary, there are cycles of pages of options (navigation) and content pages. The
page of options presents a list of alternatives for the apprentice or a test in the sense of
defining the next step. In the progress by knowledge tutorial, the apprentice can omit
contents dominated already, being submitted to tests of progressive difficulty to de-
termine the entrance point in the sequence of contents. In exploratory tutorial, the
initial page of exploration has access links to documents, databases or other informa-
tion sources. In lesson generating tutorial, the result of the test defines the personal-
ized sequence of topics to be exposed the apprentice [1].
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 830–832, 2004.
© Springer-Verlag Berlin Heidelberg 2004
A Novel Hybrid Intelligent Tutoring System and Its Use of Psychological Profiles 831
2 Proposed System
The presented work is based on the capacity of artificial neural networks (ANN) [4]
to extract useful patterns to content navigation in intelligent tutor systems by selection
of better historical examples. This proposal improves the student’s performance
through the consideration of personal characteristics (and technological ability of
interface usage) in the perception of proper navigation patterns [5]-[6]. A navigation
pattern establishes global distributions of probabilities of visitations of the five levels
in each context in the structure of the connectionist tutoring system. To treat the local
situation, expert (human) rules [7] are introduced by means of probability distribu-
tions. By integrating the global and local strategies, we have composed a hybrid in-
telligent tutoring system. In the proposed structure (see Figure 1), there is a single and
generic net for the whole tutor. The decision of the proposed ITS is based on the
navigation pattern (defined by ANN) and on the apprentice’s local acting (current
level and the score at the test).
The use of individual psychological and learning styles characteristics in the tutor’s
guidance through the course contents allows the system to decide what should be
presented based on the student’s individual preferences. The dimensions that charac-
terize the psychological characteristics [8] and learning styles [9] are used in the de-
termination of the navigation patterns. Such patterns are extracted for the neural net-
works starting from individual preferences (dimensions that characterize the type) of
the best students.
832 W. Martins et al.
The composition of the (neural) training set has lead to the implementation of a tu-
toring system for the data collection, called Free Tutor, and a guided tutor (without
intelligence) denominated Random Tutor for evaluation of the decisions of navigation
of the intelligent tutor. The Free Tutor and the Random Tutor possess the same
structure of the Intelligent Tutor, but with no advice of the ANN and the set of expert
rules. The Intelligent Tutor has employed two individual characterizations: psycho-
logical profiles (PP) and learning styles (LE). Descriptive results are shown in
Table 1.
By using t-Student test with 5% significance level, there are significant differences
of resulting improvements (normalized gains) between Intelligent and Free navigation
(p-value= 0.2%) and between Intelligent and Random (p-value = 0.02%).
References
1. Horton, William K. Designing Web-based Training, Wiley, USA, 2000.
2. Martins, W. & CARVALHO, S. D. “Mapas Auto-Organizáveis Aplicados a Sistemas
Tutores Inteligentes”. Anais do VI Congresso Brasileiro de Redes Neurais, pp. 361-366,
São Paulo, Brazil, 2003. [in Portuguese].
3. Alencar, W. S., Sistemas Tutores Inteligentes Baseados em Redes Neurais. MSc disserta-
tion. Federal University of Goias, Goiânia, Brazil, 2000. [in Portuguese].
4. Haykin, S. S.; Redes Neurais Artificiais - Princípio e Prática. Edição, Bookman, São
Paulo, Brazil, 2000 [in Portuguese].
5. Martins, W. Melo, F.R. Nalini, L. E. G. Meireles, V. “Características psicológicas na
condução de Sistemas Tutores Inteligentes”. Anais do VI Congresso Brasileiro de Redes
Neurais, pp. 367-372, São Paulo, Brazil, 2003. [in Portuguese].
6. Martins, W. Melo, F.R. Nalini, L. E. G. Meireles, V. “Sistemas Tutores Inteligentes em
Ambiente Web Baseados Em Tipos Psicológicos”. X Congresso Internacional de
Educação A Distancia – ABED. Porto Alegre, Brazil. 2003. [in Portuguese].
7. Norvig, P. & Russel, S. Artificial Intelligence: a modern approach. Prentice-Hall, USA,
1997.
8. Keirsey, D. and Bates, M. Please Understand Me – Character & Temperament Types,
Intelligence, Prometheus Nemesis Book Company, USA, 1984.
9. Kolb, D. A. Experiential Learning: Experience as The Source of Learning and Develop-
ment. Prentice-Hall, USA, 1984.
Using the Web-Based Cooperative Music Prototyping
Environment CODES in Learning Situations
Prototyping is a cyclic process used in industry for the creation of a simplified version
of a product in order to understanding its characteristics and the process of conception
and construction. This process aims at creating successive product versions
incrementally, providing improvements from one version to the next. The final
product is that result of several modifications that occurred since the first version.
However, in the musical field, some peculiarities make the creation and conception
process different from those carried out in other fields. Musical composition is a
complex activity with no consensually established systematization: each person has a
unique style and way of working. Most composers still do not have a tradition of
sharing their musical ideas.
In our point of view, music is an artistic product that can be designed through
prototyping. A musical idea (either a note, a set of chords, a rhythm, a structure or a
rest) is created by someone (typically for a musical instrument) and afterwards
cyclically and successively modified and refined according to her initial intention or
to ideas that come up during the prototyping process. Besides musicians, non-
specialists (laymen) in music are also probably interested in creating and participating
in musical experiments.
CODES - Cooperative Sound Design is a web-based environment for cooperative
music prototyping, that aims to provide users (musicians or non-specialists in music)
with the possibility of interacting with the environment and each other in order to
create musical prototypes. In fact, CODES is related to other works – like FMOL
System (F@ust Music On Line) [6] , EduMusical System [3] , TransJam System [1] ,
PitchWeb [2], CreatingMusic [4] and HyperScore [5] – that enable nonmusicians to
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 833–835, 2004.
© Springer-Verlag Berlin Heidelberg 2004
834 E.M. Miletto et al.
3 Final Considerations
CODES approach for cooperation among users in order to create collective music
prototypes is an example of a very promising educational tool for musicians and
laymen because it enables knowledge sharing by means rich interaction and
argumentation mechanisms associated to each prototype modification. Consequently,
each participant may understand the principles and the rules involved in the complex
process of music creation and experimentation.
Our cooperative approach for music prototyping has been applied in private actual
case study in order to validate the results obtained , to identify and correct problems
and to determine new requirements. An ultimate goal of our work is to make CODES
available to public usage to amplify our audience.
References
[1] Burk, P. (2000) Jammin’ on the Web - a new Client/Server Architecture for Multi-User
Musical Performance – International Computer Music Conference - ICMC2000.
[2] Duckworth, W. Making Music on the Web. Leonardo Music Journal, Vol. 9, pp. 13 – 18,
MIT Press, 2000
[3] Ficheman, I. K.(2002) Aprendizagem colaborativa a distância apoiada por meios
eletrônicos interativos: um estudo de caso em educação musical. Master Thesis . Escola
Politécnica da Universidade de São Paulo. São Paulo, 2002. (in Portuguese)
[4] Subotnick, M. Creating Music. Available in the web at http://creatingmusic.com/,
accessed in June/2004.
[5] Farbood, M.M.; Pasztor, E.; Jennings, K. Hyperscore: A Graphical Sketchpad for Novice
Composers, IEEE Computer Graphics and Applications, Volume: 24, Issue: 1, Year:
Jan.-Feb. 2004
[6] Jordà, S. (1999) Faust Music On Line: An approach to real-time collective composition
on the Internet. Leonardo Music Journal, Vol 9, 5-12., 1999.
A Multi-agent Approach to Providing Different Forms of
Assessment in a Collaborative Learning Environment
1 Introduction
A collaborative learning environment is an environment that allows participants to col-
laborate and share access to information, instrumentation, and colleagues [1]. It is rec-
ognized that the main goal of professional education is to help students develop into
reflective practitioners who are able to reflect critically upon their own professional
practice. Assessment is now represented as a tool for learning, and present approaches
to it focus at one new dimension of assessment innovation, namely the changing place
and function of assessor. Therefore alternatives in assessment have received many at-
tentions in the last decade, and with respect to this, several forms of more authentic
assessments such as skills of self-, peer- and co-assessment are introduced [4].
As building assessment systems in different contexts and for different forms of as-
sessment is a very expensive, exhaustive and time-consuming process [2,3], a multi-agent
approach to design an Intelligent Assessment System, has been used that provides three
advantages for the developers: easier programming and expansion, harmless modifica-
tion, and distribution of the system within different computers [2].
In the next sections, the proposed multi-agent framework and its components are
introduced, and finally arguments of possibility and applicability of the system are pre-
sented.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 836–838, 2004.
© Springer-Verlag Berlin Heidelberg 2004
A Multi-agent Approach to Providing Different Forms 837
schema is illustrated in Figure 1 .The first layer is called the test layer, which is similar to
the general multi-agent architecture of an Intelligent Tutoring System, but also concerns
the basic requirements of an assessment process. The second layer is called the assessor
layer, and is responsible to set the best form of assessment for the current situation based
on the decision made by the test administrator or the critic agent.
This is the main underlying part of the system, where selected theory of measurement,
methods of adaptive testing, activity selection, response processing and scoring exists.
The task library is a database of task materials (or references to such materials) along
with all the information necessary to select, present, and score the task. The test layer
consists of four different agents (tutor, assessor, student model and presentation), each
of which has its own responsibilities.
Tutor agent is responsible for managing and administrating the tests. Estimation of item
parameters, test calibration, equation and selection of the next task to be represented to
the user, are among its main responsibilities.
Assessor agent is responsible for response processing (key matching) and also estimation
of students’ abilities according to their obtained raw scores. This agent focuses on aspects
of the student response and assigns them to categories. The results of assessor agent
estimations of learners’ abilities are used as the criterion for evaluation of results obtained
from other forms of assessment.
Student Model agent is responsible for modeling individual students’ knowledge and
abilities in that special domain.
Presentation agent is responsible for presenting the task to the examinee and also col-
lecting his/her responses.
3 Concluding Remarks
The framework envisioned in this paper is an environment where non-co-located learners
can gather and interact with each other to reach goals of assessments. One can construct
a class of students from different parts of the world, whom can be assessed according to
the modern learner-centered methods of assessment and can benefit from the advances
of technology to attend more reliable learning courses and receiving feedbacks of their
peers, and tutors. Also, they can evaluate themselves and finally reach a better agreement
on their abilities and failures.
The proposed framework has certainly some other advantages: First, it can be seen
as a general standard framework of assessment that can be easily added to the former
designs with fewer modifications. Secondly, educational researches can benefit from
having an integrated basis for comparative analysis of different forms of assessment,
which, not only brings them more accuracy and precisions in research outcomes, but also
reduces the complexity of their work. And finally, using artificial intelligence techniques,
it can be the basis for building an adaptive assessment system that changes its forms of
assessment to reach better performance and learning outcomes accordingly.
To sum up, for maintaining different forms of learner assessment, where a variety of
possible forms of assessment exists, uniformity is needed from which we can converge
in several directions. With this purpose in mind, we proposed an integrated multi-agent
framework that enables provision of different forms of assessment. In designing the
proposed system, we considered to be consistent with general multi-agent frameworks
of Intelligent Tutoring Systems.
References
1. M.c. Dorneich, P.M. Jones, The Design and Implementation of learning collaboratively, IEEE
International Conference on Systems Man and Cybernetics, (2000).
2. M. Badjonski, M. Ivanovic, Z. Budimac, Intelligent Tutoring System as Multi-agent System,
IEEE International Conference on Intelligent Processing Systems,(1997).
3. L.M.M. Giraffa, R.M. Viccari, The use of Agents Techniques on Intelligent Tutoring Systems,
IEEE International Conference on Computer Science,SCCC’98, (1998).
4. D. Sluijsman, F. Docky, G. Moerkerky, The use of self-,peer- and co-assessment in higher
education a review of litreture, Studies in Higher Education, Vol. 24, No. 3, (1991), p. 331.
The Overlaying Roles of Cognitive and Information
Theories in the Design of Information Access Systems
McGill University
Education Building, room 513
3700 McTavish Street
Montreal, Quebec H3A 1Y2
[email protected]
1 Introduction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 839–841, 2004.
© Springer-Verlag Berlin Heidelberg 2004
840 C. Nakamura and S. Lajoie
The BioWorld online library and the patient chart are the two sources of additional
information that students can use to solve a patient case. From an information science
perspective, the patient chart does not represent a great design challenge since it only
contains a very limited amount of information that is directly related to the virtual
patient’s disease. If students work from the hypothesis that the patient is afflicted by
diabetes, for example, they can order urine and blood glucose tests to confirm or
refute their hypothesis. The online library, on the other hand, contains a much larger
body of information that is not directly related to any specific patient case. Therefore,
it is a much more prolific ground to test and find new ways of facilitating access to
information.
The design of the online library involves four main tasks: (1) the definition of the
library’s content; (2) the design of the database structure; (3) the definition of how
information will be presented; and (4) the design of the user-interface. The outcomes
of these four tasks will define the effectiveness of the online library in supporting
BioWorld’s instructional goals.
Contextualized but indirect recommendations relate to the specific search the user is
performing but have a less explicit directive character.
4 Implications
We can argue that IAS’s provide indirect support to the development of higher order
cognitive skills in a PBLE by delivering just-in-time declarative knowledge.
However, we are still trying to define to what extent can lAS’s provide direct support
the development of higher order cognitive skills. Even among the educational
research community there is not a full consensus about the interplay between lower
and higher order cognitive skills in problem-solving contexts. Back in the seventies,
the work of Minsky and Papert on artificial intelligence had already suggested a shift
from a power-based to a knowledge-based paradigm. In other words, machine
performance-wise, better ways to express, recognize, and use particular forms of
knowledge were identified as more important than computational power per se [3].
However, tracing the connection between expert performance and domain-specific
problem-solving heuristics does not necessarily mean being able to precisely identify
at what point, in a problem-solving context, lower order cognitive skills become
insufficient and higher order cognitive skills take over. Even in ill-structured domains,
the most trivial problems can be solved by a simple pattern-matching strategy. As the
complexity of the problems increase, more robust analogies, more complex reasoning,
becomes necessary. Establishing how far one can go with a pattern-matching strategy
will define an IAS’ limits in providing direct support to problem-solving skills.
Hence, the next question becomes: how atypical a patient case must be in order to
define a problem that goes beyond the kind of help an IAS can provide. That is one of
the questions that the Bio World research team is currently trying to answer, and that
could have only emerged from an interdisciplinary approach that feeds both on
cognitive and information theories.
References
1. Lajoie, S., Lavigne, N. C., Guerrera, C., & Munsie, S. D. (2001). Constructing knowledge in
the context of BioWorld. Instructional Science 29: 155-186.
2. Belkin, N.J., (2000). Helping People Find What They Don’t Know. In Communications of
the ACM, vol. 43, no. 8, pp.58-61,
3. Minsky, M. & Papert, S. (1974). Artificial intelligence. Condensed lectures, Oregon State
System of Higher Education, Eugene.
A Personalized Information Retrieval Service for an
Educational Environment
Abstract. The paper presents the PortEdu Project (an Educational Portal),
which consists of a MAS (MultiAgent System) architecture for a leaning
environment in the web strongly based on personalized information retrieval.
Experiencing search mechanisms it has been detected that the success of
distance learning mediated by computer is linked to the contextual search tools
quality. Our goal with this project is to aid the student in his learning process
and retrieve information pertaining to the context of the problems, which are
being studied by the students.
1 Introduction
The experience about distance learning demonstrates that students with difficulties in
specifics, during the use of a distance learning environments, going though, in most
cases, in web research with the intention to find additional information on the studied
topic. However, this search is not always satisfactory. The existing tools make a result
classification in a generic way not taking in consideration the specific needs of the
user, nor the purpose of the search.
Most of the personalized search tools simply ranking, in a binary way, the obtained
results. That is, “interesting” and “non-interesting” according to previous and
explicitly elaborated user profile.
Before this problem, it has been idealized a model to toil with these difficulties in
the educational context. This model is based on two autonomous agents: User Profile
Agent (UP Agent) and Information Retrieving Agent (IR Agent). Such agents will be
communicating with each other and among other agents of the learning environment
“anchored” in the portal, through the multiagent platform FIPA-OS [1].
In our system, the search refinement is done automatically, based on available info
in the users profiles, student’s models (information about the student cognitive level)
and ontology (learning environment has its own ontology). So, the student makes a
high-level information request and receives a distilled reply.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 842–844, 2004.
© Springer-Verlag Berlin Heidelberg 2004
A Personalized Information Retrieval Service for an Educational Environment 843
The term agent is very much diffused and has several definitions. This work is based
on the definition made by Russel and Norvig who determines an agent as being all
that which may be seen as something that perceives its surroundings by the use of
sensors and that may act directly to its causes in this environment. These authors
define agent as software used in Artificial Intelligence techniques [5].
In order to bestow intelligence to the consultation, two agents that compose the
multiagent society will provide information to the IR Agent: the agent that obtains the
user profile making available search terms starting with information on students
behavior when he interacts with his classmates and uses the web; and student model
agent (from the educational application running in PortEdu, which has information on
the knowledge of each student concerning the pedagogical content at issue and
specific information on each student’s cognitive level.
The UP Agent has the following characteristics: reactivity and continuity. It is
reactive because it perceives some changes in the student’s behavior as to his
deportment once away from the foreseen activities in learning application. That is, it
perceives the actions done by the student in the PortEdu. It is continuous due to its
constant execution in the portal.
The IR Agent is cognitive and proactive for it elaborates search plans starting with
received info by the UP Agent and the students model. Different from the UP Agent,
this agent does not have the continuous characteristic. It acts when requested by the
student or offers help to the student (a search result, for example) when activated by
the students model. Thence, our research is based on additional cognitive information,
different from [4], where the extra information used to improve the search is obtained
through DNA algorithms.
3 The Agents
The creation of parameters for intelligent search must be taken in account the result to
be obtained. In this work, the intention is to aid the student in his learning during the
use of the learning systems anchored to the portal.
The aid to the student will be carried out by the obtained contents through the
intelligent search mechanism or the indication of a participant in the group that has
the knowledge to help him out in the learning of a specific subject. UP Agent is
independent from the educational application nowadays.
The Learner Model Agent is who will supply the UP Agent the pertinent info on
specific knowledge of the educational system in use. By the attained info as much as
from the pedagogical agent as of the interface, summing the inferred info by the
student’s behavior, a search term will be made to carry out into effect the retrieving
desired information.
We may observe that the user profile will be updated at all times. Thus, there is the
intention to obtain a closer modeling to that in which represents the user at his last
instant in the environment and not only a historical profile.
Nowadays, there are many applications based on intelligent agents, such as Letizia[3],
and InfoFinder[2]. However, few agents are capable to obtain knowledge on the
844 L. Nakayama, V. Nóbile de Almeida, and R. Vicari
4 Final Considerations
In this short paper, we have presented the ongoing project PortEdu, a distance
learning portal based on a multiagent architecture with the intention to aid the student
in the learning process through a personalized search tool.
The main contribution in this work is to make available a personalized search tool
considering the specific needs of the educational context and user preference. We
believe that a refined and personalized retrieving information in this context
contributes to distance learning via Web.
References
1. FIPA – FIPA2000 Specification Part 2, Agent Communication Language. URL:
www.fipa.org
2. Krulwich, B. and Burkley, C., (1997) ‘The InfoFinder Agent: Learning User Interests
through Heuristic Phrase Extraction’, IEEE Intelligent Systems, Vol. 12, No. 5, pp. 22-27
3. Lieberman, Henry. (1995). Letizia: An agent that assist web browsing. In Proceeding IJCAI-
95. URL: http://lieber.www.media.mit.edu/people/lieber/Lieberary/Letizia/Letizia.html
4. MacMillan, I. C. (2003) ‘In Search of Serendipity: Bridging the Gap That Separates
Technologies and New Markets’, 2 July,
http://knowledge.wharton.upenn.edu/index.cfm?fa=viewarticle&id=812
5. Russell, S., Norvig, P., (1995) Artificial Intelligence A Modern Approach, Prentice Hall,
Upper Saddle River,NJ, USA.
Optimal Emotional Conditions for Learning
with an Intelligent Tutoring System
1 Introduction
People often separate emotions and reason, believing that emotions are an obstacle in
rational decision making or reasoning. However, recent research has shown that in
every case, the cognitive process, for instance decision making [3], of an individual is
strongly dependent on his emotions.
An important special case of a cognitive process, involving a variety of different
cognitive abilities, is the learning process. Learning requires to fulfill a variety of
tasks such as understanding, memorizing, analyzing, reasoning, or applying. Given
the above-mentioned relation between feeling and thinking, the student’s performance
in these different learning tasks will depend on his emotions. Systems have been
proposed for modeling learners’ emotions and their variation during a learning session
with an Intelligent Tutoring System (ITS). However, all previous work is based on the
hypothesis that only a very restricted class of—mainly positive—emotions can have a
positive influence on learning.
The goal of this paper is to improve the effectiveness of emotion-based tutoring
systems by determining in much more detail than previously done the impact of
different emotions on learning. This analysis allows us to define the optimal
emotional conditions of learning. More precisely, we aim at determining the optimal
emotional state of the learner which leads to the best performance, and how an ITS
can directly use the influence of emotions connected to the learning content to
improve learner’s cognitive abilities.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 845–847, 2004.
© Springer-Verlag Berlin Heidelberg 2004
846 M. Ochs and C. Frasson
Teaching and learning are emotional processes: a teacher who communicates the
content in an emotional way will be more successful than another who behaves and
communicates unemotionally. In fact, situations, objects, or data with emotional
charge are better memorized [1]. An ITS should be able, like a successful human
teacher, to emotionalize the learning content by giving it an emotional connotation.
This connotation can be naturally linked to the learning content: for instance, events
in history can naturally generate certain emotions. However, emotions can also be
artificially added to a learning content which is a priori unemotional, for example by
associating images with emotional charge to random words or numbers.
It has been shown that people in a given emotional state will attribute more
attention to stimulus events, objects, or situations that are affectively congruent with
their emotional state [1]. An ITS can use this fact for gaining more attention from the
learner by emotionalizing the learning content. Two approaches can be used: In fact, it
can adaptively add an emotional charge to the learning content which is similar or
related to the present emotional state of the learner, who will then pay more attention
to the material presented to him. On the other hand, an ITS can as well change the
emotional state of the learner such as to make it more similar to the emotional charge
of the content to be learned.
When a large quantity of data lacking any emotional content has to be memorized
and later retrieved, i.e., distinguished, then adding emotional charges with respect to
very different emotions—saddening, comforting, disturbing, disgusting, arousing—
can help the memorization process. If, during the step of memorization, an ITS
associates the learning content with an emotional charge, the learner will be
conditioned such as to establish a connection between the subject matter and his
specific emotional reaction. Then, so conditioned on having certain emotional
reactions to different matters, the learner will be able to recall and distinguish the (non
emotional) learning contents as easily as his emotional reactions to them. An ITS
therefore pushes the learner to structure and memorize the knowledge in categories of
emotion, which is in fact precisely the natural organization of memory [1].
References
1. Bower G. 1992. How Might emotions affect learning?. Handbook of emotion and memory,
edited by Sven-Ake Christianson.
2. Compton R. 2000. Ability to disengage attention predicts negative affect. Cognition and
Emotion.
3. Damasio. 1995. Eds. L ’erreur de Descartes: La raison des émotions. Edition Odile Jacob.
4. Isen A. M. 2000. Positive Affect and Decision making. Handbook of emotions, second
edition, Guilford Press.
5. Lisetti, Schiano. 2000. Automatic Facial Expression Interpretation : Where Human-
Computer Interaction, Artificial Intelligence and Cognitive Science Intersect. Pragmatics
and Cognition, Vol 8(1): 185-235.
6. Orthony A., Clore G.L, Collins A. 1988. The cognitive Structure of Emotions. Cambridge
University Press.
FlexiTrainer: A Visual Authoring Framework for
Case-Based Intelligent Tutoring Systems
Stottler Henke Associates, Inc., 951 Mariner’s Island Blvd. #360, San Mateo, CA, 94404
{sowmya, remolina, fu}@stottlerhenke.com
Abstract. The need for rapid and cost-effective development Intelligent Tutor-
ing Systems with flexible pedagogical approaches has led to a demand for
authoring tools. The authoring systems developed to date provide a range of
options and flexibility, such as authoring simulations, or authoring tutoring
strategies. This paper describes FlexiTrainer, an authoring framework that en-
ables the rapid creation of pedagogically rich and performance-oriented learn-
ing environments with custom content and tutoring strategies. FlexiTrainer pro-
vides tools for specifying the domain knowledge and derives its power from a
visual behavior editor for specifying the dynamic behavior of tutoring agents
that interact to deliver instruction. The FlexiTrainer runtime engine is an agent
based system where different instructional agents carry out teaching related ac-
tions to achieve instructional goals. FlexiTrainer has been used to develop an
ITS for training helicopter pilots in flying skills.
1 Introduction
As Intelligent Tutoring Systems gain currency in the world outside academic re-
search, there is an increasing need for re-usable authoring tools that will accelerate
creation of such systems. At the same time there exists a desire for flexibility in terms
of the communications choices made by the tutor. Several authoring frameworks have
been developed that provide varying degrees of control, such as content, student mod-
eling and instructional planning [3]. Some allow the authoring of simulations [2],
while some provide a way to write custom tutoring strategies [1,4]. However, among
the latter type, none can create tutors with sophisticated instruction including rich in-
teractions like simulations [3]. Our goal was to develop an authoring tool and engine
for domains that embraced simulation-based training. In addition, our users needed
facilities for creating and modifying content, performance evaluation, assessment pro-
cedures, student model attributes, and tutoring strategies. In response, we developed
the FlexiTrainer framework which enables rapid creation of pedagogically rich and
performance-oriented learning environments with custom content and tutoring strate-
gies.
2 FlexiTrainer Overview
FlexiTrainer consists of two components: the authoring tool, and the runtime engine.
The core components of the FlexiTrainer authoring tool are the Task-skill-principle
Editor, the Exercise Editor, the Student Model Editor, and the Tutor Behavior Editor.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 848–850, 2004.
© Springer-Verlag Berlin Heidelberg 2004
FlexiTrainer: A Visual Authoring Framework 849
FlexiTrainer’s behavior model is a hierarchical finite state machine where the flow
of control resides in stacks of hierarchical states. Condition logic is evaluated ac-
cording to a prescribed ordering, showing very obvious flow of control. FlexiTrainer
employs four constructs: actions, which define all the different actions FlexiTrainer
850 S. Ramachandran, E. Remolina, and D. Fu
can perform; behaviors that chain actions and conditional logic; predicates, which set
the conditions under which each action and behavior will happen; and connectors,
which control the order in which conditions are evaluated, and actions and behaviors
take place. These four allow one to create behavior that ranges from simple sequences
to complex conditional logic. Figure 1 shows an example “teach for mastery” behav-
ior invoked whenever the student wants to improve his flying skills. It starts in the
upper left rectangle. The particular skill to practice is determined by the selectSkill
behavior. Once the skill to practice is chosen, the teachSkill behavior is invoked: it
will pick an exercise that reinforces the skill (and is appropriate for the student mas-
tery level) and then will call the teachExercise behavior to actually carry out the exer-
cise. If the student has not taken the assessment test yet, he will take the test before
any skills are selected.
Instructional agents carry out teaching-related actions to achieve instructional
goals. The behaviors specified with the Behavior Editor define how agents satisfy dif-
ferent goals. The engine also incorporates a student modeling strategy using Bayesian
inference.
So far the FlexiTrainer framework has been used to develop an ITS to train novice
helicopter pilots in flying skills [5]. We plan to add other functionality such as: ability
to support development of web-based tutoring systems; support for creating ITSs for
team training; a pre-defined library of standard tutoring behaviors reflecting diverse
instructional approaches for different types of skills and knowledge.
The work reported here was funded by the Office of the Secretary of Defense un-
der contract number DASW01-01-C-5317.
References
1. Major, N., Ainsworth, S. and Wood, D. (1997) REDEEM: Exploiting symbiosis between
psychology and authoring environments. International Journal of Artificial Intelligence in
Education, 8 (3-4) 317-340.
2. Munro, A., Johnson, M.C., Pizzini, Q.A., Surmon, D.S., Towne, D.M. and Wogulis, J.L.
(1997). Authoring simulation-centered tutors with RIDES. International Journal of Artifi-
cial Intelligence in Education. 8(3-4), 284-316.
3. Murray, T (1999). Authoring Intelligent Tutoring Systems: An analysis of the state of the
art. International Journal of Artificial Intelligence in Education, 10, 98-129.
4. Murray T. (1998). Authoring knowledge-based tutors: Tools for content, instructional strat-
egy, student model, and interface design. Journal of the Learning Sciences, 7(1).
5. Ramachandran, S. (2004). An Intelligent Tutoring System Approach to Adaptive Instruc-
tional Systems, Phase II SBIR Final Report, Army Research Institute, Fort Rucker, AL.
Tutorial Dialog in an Equation Solving Intelligent
Tutoring System
Leena M. Razzaq and Neil T. Heffernan
1 Introduction
This research is focused on building a better tutor for the task of solving equations by
replacing traditional model-tracing feedback in an ITS with a dialog-based feedback
mechanism. This system, named “E-tutor”, for Equation Tutor, is novel because it is
based on the observation of an experienced human tutor and captures tutorial
strategies specific to the domain of equation-solving. In this context, a tutorial dialog
is the equivalent of breaking down problems into simpler steps and then asking new
questions before proceeding to the next step. This research does not deal with natural
language processing (NLP), but rather with dialog planning.
Studies indicate that experienced human tutors provide the most effective
form of instruction known [2]. They raise the mean performance about two standard
deviations compared to students taught in classrooms. Intelligent tutoring systems can
offer excellent instruction, but not as good as human tutors. The best ones raise
performance about one standard deviation above classroom instruction [7]. Although
Ohlsson [9] observed that teaching strategies and tactics should be one of the guiding
principles in the development of ITSs, incorporating such principles in ITSs has
remained largely unexplored [8].
2 Our Approach
E-tutor is able to carry on a coherent dialog that consists of breaking down problems
into smaller steps and asking new questions about those steps, rather then simply
giving hints. Several tutorial dialogs were chosen from the transcripts of human
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 851–853, 2004.
© Springer-Verlag Berlin Heidelberg 2004
852 L.M. Razzaq and N.T. Heffernan
tutoring sessions collected to be incorporated in the ITS. The dialogs were designed to
take the place of the hints that are available in the control condition. E-tutor does not
have a hint button. When students make errors they are presented with a tutorial
dialog if one is available. The student must respond to the dialog to exit it and return
to solving the problem in the problem window. Students stay in the loop until they
respond correctly or the tutor has run out of dialog. This forces the student to
participate actively in the dialog. It is this loop that we hypothesize will do better at
teaching equation-solving than hint sequences do. When the tutor has run out of
dialog, the last tutorial response presents the student with the correct action and input
similar to the last hint in a hint sequence. A close mapping between the human tutor
dialog and the ITS’ dialog was attempted.
Evaluation. E-tutor was evaluated with a traditional model-tracing tutor as a control.
We will refer to this tutor as “The Control.” The Control did not engage a student in
dialog, but did offer hint and buggy messages to the student. Table 1 shows how the
experiment was designed.
Because of the small sample size, statistical significance was not obtainable in most
of the analyses done in the following sections. It should be noted that with such small
sample sizes, detecting statically significant effects is less likely. A large note of
caution is also called for, since using such small sample sizes does make our
conclusions more sensitive to a single child, thus possibly skewing our results.
3 Conclusion
The experiment showed evidence that suggested incorporating dialog in an equation-
solving tutor is helpful to students. Although the sample size was very small, there
Tutorial Dialog in an Equation Solving Intelligent Tutoring System 853
were some results in the analyses that suggest that, when controlling for number of
problems, E-tutor performed better than the Control with an effect size of 0.4 standard
deviations for overall learning by condition.
There were some limitations in this research that may have affected the results of
the experiment. E-tutor presented tutorial dialogs to students when they made certain
errors. However, the Control depended on student initiative for the appearance of
hints. That is, the students had to press the Hint button if they wanted a hint. Although
students in the control group were told that they could request hints whenever they
wanted, the results may have been confounded by this dependence on student
initiative in the control group. We may also be skeptical about the results because the
sample size was very small. Additionally, the experimental group performed better on
the pre-test than the control group, so they were already better at solving equations
than the control group.
In the future, an experiment could be run with a larger and more balanced sample
of students which would eliminate the differences between the groups on the pre-test.
The confound with student initiative could be removed for a better evaluation of the
two conditions. Another improvement would be to employ more tutorial strategies.
Another experiment that controls for time rather than for the number of problems
would examine whether E-tutor was worth the extra time.
References
1. Anderson, J. R. & Pelletier, R. (1991). A development system for model-tracing tutors. In
Proceedings of the International Conference of the Learning Sciences, 1-8. Evanston, IL.
2. Bloom, B. S. (1984). The 2 Sigma Problem: The Search for Methods of Group Instruction
as Effective as One-to-one Tutoring. Educational Researcher, 13, 4-16.
3. Graesser, A.C., Person, N., Harter, D., & TRG (2001). Teaching tactics and dialog in
AutoTutor. International Journal of Artificial Intelligence in Education.
4. Heffernan, N. T., (2002-Accepted) Web-Based Evaluation Showing both Motivational and
Cognitive Benefits of the Ms. Lindquist Tutor. SIGdial endorsed Workshop on “Empirical
Methods for Tutorial Dialogue Systems” which was part of the International Conference
on Intelligent Tutoring System 2002.
5. Heffernan, N. T (2001) Intelligent Tutoring Systems have Forgotten the Tutor: Adding a
Cognitive Model of Human Tutors. Dissertation. Computer Science Department, School
of Computer Science, Carnegie Mellon University. Technical Report CMU-CS-01-127
<http://reports-archive.adm.cs.cmu.edu/anon/2001/abstracts/01-127.html>
6. Koedinger, K. R., Anderson, J. R., Hadley, W. H. & Mark, M. A. (1995). Intelligent
tutoring goes to school in the big city. In Proceedings of the 7th World Conference on
Artificial Intelligence in Education, pp. 421-428. Charlottesville, VA: Association for the
Advancement of Computing in Education.
7. Koedinger, K., Corbett, A., Ritter, S., Shapiro, L. (2000). Carnegie Learning’s Cognitive
™
Tutor : Summary Research Results.
http://www.carnegielearning.com/research/research_reports/CMU_research_results.pdf
8. McArthur, D., Stasz, C., & Zmuidzinas, M. (1990) Tutoring techniques in algebra.
Cognition and Instruction. 7 (pp. 197-244.)
9. Ohlsson, S. (1986) Some principles for intelligent tutoring. Instructional Science, 17, 281-
307.
10. Razzaq, Leena M. (2003) Tutorial Dialog in an Equation Solving Intelligent Tutoring
System. Master Thesis. Computer Science Department, Worcester Polytechnic Institute.
<http://www.wpi.edu/Pubs/ETD/Available/etd-0107104-155853>
A Metacognitive ACT-R Model of Students’ Learning
Strategies in Intelligent Tutoring Systems
Ido Roll, Ryan Shaun Baker, Vincent Aleven, and Kenneth R. Koedinger
1 Introduction
Studies have found some evidence to the connection between students’ metacognitive
decisions while working with ITS and their learning gains (Aleven et al. in press,
Baker et al. 2004, Wood and Wood 1999). We describe here a computational model
that explains such relations, by identifying various learning goals and strategies, as-
signing them to students, and relate them to learning outcomes.
We based our model on log-files of students working with the Geometry Cognitive
Tutor, an ITS based on ACT-R theory (Anderson et al, 1995), which is now in exten-
sive use in American public high schools.
2 The Model
The model identifies various goals and associates each goal with a different local-
strategy that attempts to accomplish it. It assumes that students’ actions, which are
determined by the strategies, are driven by (i) their estimated ability to solve the step,
(ii) their earlier actions and the system’s feedback (e.g., error messages), and (iii)
their tendency towards the different goals. The model assumes that every student has
some tendency towards all goals. The exact combination of tendencies uniquely iden-
tifies the pattern of the individual student.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 854–856, 2004.
© Springer-Verlag Berlin Heidelberg 2004
A Metacognitive ACT-R Model of Students’ Learning Strategies 855
The model is implemented in ACT-R, a theory of mind and a framework for cog-
nitive modeling (Anderson et al., 1998)
The correlation between the data to the model’s prediction is 1.00 for all students, and
the average SD across all students is 0.09 (SD = 0.02). The high correlation is proba-
bly an over-fit as a result of too many parameters.
We see a high tendency towards Learning-Orientated and Help-Avoider (0.29 and
0.28 respectively), whereas tendencies towards I-know-it, Performance-Oriented and
Least-Effort are 0.15, 0.15 and 0.12 respectively. These values make sense, given that
students take their time and rarely use hints on their first actions on a new step.
We calculated the correlation between these tendencies and an independent meas-
ure of learning outcomes (as measured by the progress students made from pre- to
post-test, divided by their maximum possible improvement). The only significant
result is that Help-Avoider is highly correlated with learning gain, F(1,9)=5.14,
p<0.05, r=0.58, suggesting that students with higher tendency to avoid help on their
first actions did better in the overall learning experience.
We observe high correlation with the actions of students, but poorer than expected
correlation to learning gains. We hypothesize that due to too many parameters, the
students’ behavior can be explained in more than one manner, affecting the single
representation of each student and the correlation to learning outcomes. We currently
reduce the number of parameters and update the characteristics of the strategies.
The model should be fitted to all collected data, across all skill levels and including
actions taken after errors and hints. In addition, we plan to run the model on data
from other tutors and correlate the findings to other means of analysis.
We would like to thank John R. Anderson for his suggestions and helpful advice.
References
1. Aleven, V., McLaren, B., Roll, I., Koedinger, K. Toward Tutoring Help Seeking: Applying
Cognitive Modeling to Meta-Cognitive Skills. To appear at Intelligent Tutoring Systems
Conference (2004)
2. Anderson, J. R., A. T. Corbett, K. R. Koedinger, and R. Pelletier, (1995). Cognitive tutors:
Lessons learned. The Journal of the Learning Sciences, 4, 167-207.
3. Baker, R. S., Corbett, A. T., Wagner, A. Z. & Koedinger, K. R., Off-Task Behavior in the
Cognitive Tutor Classroom: When Students “Game the System”, Proceedings of the
SIGCHI conference on human factors in computing systems (2004), p. 383-390, Vol. 6 no.
1.
4. McNeil, N.M. & Alibali, M.W. (2000), Learning Mathematics from Procedural Instruction:
Externally Imposed Goals Influence What Is Learned, Journal of Educational Psychology,
92 #4, 734-744.
5. Wood, H., & Wood, D. (1999). Help seeking, learning and contingent tutoring. Computers
and Education, 33, 153-169.
Promoting Effective Help-Seeking Behavior
Through Declarative Instruction
Abstract. Research has shown that students’ help-seeking behavior is far from
being ideal. In trying to make it more efficient, 27 students using the Geometry
Cognitive Tutor regularly received individual online instructions. The instruc-
tion to the HELP group, aimed to improve their help-seeking behavior, in-
cluded a walk-through metacognitive example. The CONTROL group received
“placebo instruction” with a similar walk-through but without the help-seeking
content. In two subsequent weeks, the HELP group used the system’s hints
more frequently than the CONTROL group. However, we didn’t observe a sig-
nificant difference in the learning outcomes. These results suggest that appro-
priate instruction can improve help-seeking behavior in ITS usage. Further
evaluation should be performed in order to design better instruction and im-
prove learning.
1 Introduction
Efficient help-seeking behavior in intelligent tutoring systems (ITS) can improve
learning outcomes and reduce learning duration (Renkl, 2002; Wood & Wood, 1999).
Nevertheless, studies have shown that students use help in suboptimal ways in various
ITS (Mandl et al. 2000, Aleven et al. 2000).
The Geometry Cognitive tutor, investigated in this study, is now in extensive use
in American public high schools. The tutor has two forms of on-demand help: Con-
text-sensitive hints and decontextualized glossary.
One way to try to improve students’ help use is through guiding students to more
effective one. White et al. (1998) showed that by developing students’ metacognitive
knowledge and skills the students learn better. McNeil et al. (2000) showed that stu-
dents’ goals can be modified in lab settings by prompting them appropriately. These
studies suggest that appropriate instruction about desired help seeking behavior might
be effective in improving that behavior.
2 Experiment
Students from an urban high school were divided into two groups:
The HELP group (including 14 students) received instruction aimed to improve
their help-seeking behavior. The CONTROL group (including 13 students) received
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 857–859, 2004.
© Springer-Verlag Berlin Heidelberg 2004
858 I. Roll, V. Aleven, and K. Koedinger
“placebo-instruction” which focused only on the subject matter without any metacog-
nitive content.
The instructions were given through a website, and students read it at their own
pace. Both the HELP and CONTROL instruction led the students through solved
examples in the unit the students were working on. The HELP instruction incorpo-
rated the desired help-seeking behavior into it, and included the following principles:
(i) Ask for a hint when you don’t know what to do (ii) read the hint before you ask
for an additional one, and (iii) don’t guess quickly after committing an error.
Fig. 1. A snapshot from the instruction. Both the HELP instruction (left hand side) and the
CONTROL instruction (right hand side) convey the same cognitive information. In addition,
the help-instruction offers a way to obtain that information.
The study was built into the students’ existing curriculum, and the students were
proficient in working with the Geometry Cognitive Tutor. Students took a pre- and
post-test before and after the study, and reported how much attention they paid to the
instruction. Since the students were in different lessons of the same unit, each student
took a test that matched her progress in the curricula.
On the first day students went individually through the help-seeking instruction,
which took about 15 minutes. In the second day, the students went through additional
5 minutes of similar instruction. This time they had to solve a question. In addition to
the feedback on the cognitive level, which students in both groups received from the
tutor, students in the HELP group received feedback on their help-seeking actions.
In total, students worked on the tutor for approximately 3 hours spread out across 2
weeks. At the end of the two weeks, the students took a post-test.
3 Results
As in Wood & Wood (1999), we calculated the ratio of hints to errors, measured by
hints/(hints+errors). This ratio was much higher for the HELP group (0.24) than for
the CONTROL group (0.09). The result is marginally significant (F(1,21)=2.96,
p=0.10). However, this does not reveal whether the hint-requests were appropriate.
The students’ self-reported attention didn’t affect the help-use of the CONTROL
group (the hints-to-errors ratio for both low- and high-attention students was 0.09).
Promoting Effective Help-Seeking Behavior Through Declarative Instruction 859
However, students which reported paying low attention in the HELP group used sig-
nificantly more help than those reported paying high attention (0.43 hints-to-errors
for low-attention students vs. 0.12 for the high-attention ones, F(1, 11)=8.31, p=0.01).
We hypothesize that students who paid low attention to the instruction understood
only that they should use help a lot, and thus engaged in an inappropriate hint abuse.
Students showed learning during the experiment (average pre-test score: 1.15 out
of 4; average post-test score: 1.67). This improvement was significant, T(0,26)=2.10,
p=0.04. Direct comparison between conditions was difficult, given the design of our
study where students were working on different tutor lessons, and thus we did not
observe any significant influence of the condition on the learning outcomes.
References
1. Aleven, V., & Koedinger, K. R. (2000). Limitations of student control: Do students know
when they need help? In C. F. G. Gauthier & K. VanLehn (Eds.), Proceedings of the 5th
International Conference on Intelligent Tutoring Systems, ITS 2000 (pp. 292-303).Berlin:
Springer Verlag.
2. Anderson, J. R., A. T. Corbett, K. R. Koedinger, and R. Pelletier, (1995). Cognitive tu-
tors: Lessons learned. The Journal of the Learning Sciences, 4, 167-207.
3. Mandl, H., Gräsel, C. & Fischer, F. (2000). Problem-oriented learning: Facilitating the use
of domain-specific and control strategies through modeling by an expert. In W. J. Perrig
& A. Grob (Eds.), Control of Human Behavior, Mental Processes and Consciousness
(pp.165-182). Mahwah: Erlbaum.
4. McNeil, N.M. & Alibali, M.W. (2000), Learning Mathematics from Procedural Instruc-
tion: Externally Imposed Goals Influence What Is Learned, Journal of Educational Psy-
chology, 92 #4, 734-744.
5. Renkl, A. (2002). Learning from worked-out examples: Instructional explanations sup-
plement self-explanations. Learning & Instruction, 12, 529-556.
6. White, B.Y. & Federistan, J.R. (1998), Inquiry, Modeling, and Metacognition: Making
Science Accessible to All Students. Cognition and Instruction, 16(1), 3-118
7. Wood, H., & Wood, D. (1999). Help seeking, learning and contingent tutoring. Comput-
ers and Education, 33, 153-169.
Supporting Spatial Awareness in Training on a
Telemanipulator in Space
Département d’informatique,
1
Université du Québec à Montréal, Montréal (Québec) H3C 3P8 Canada
2
Université de Sherbrooke, Sherbrooke (Québec) JIK 2R1 Canada
{roy.jean-a, nkambou.roger}@uqam.ca, [email protected]
1 Introduction
The capabilities of spatial representation and reasoning required by the operation of a
remote manipulator, such as Canadarm II on the International Space Station (ISS) or
other remotely operated devices, are often compared to those required by the opera-
tion of a sophisticated crane. In the case of a remote manipulator, however, the ma-
nipulator has several joints to control and there can be several operating modes based
on distinct frames of reference. Furthermore, and most importantly, the task is re-
motely executed and controlled on the basis of feedback from video cameras. The
operator must not only know how to operate the arm, avoiding singularities and dead
ends, but he must also choose and orient the cameras so as to execute its task in the
safest and most efficient manner. Computer 3D animation provides an complemen-
tary tool for increasing the safety of operations.
The goal of training on operating a telemanipulator like Canadarm II is notably to
improve the situation awareness (Currie & Peacock, 2002) and the spatial awareness
(Wickens 2002) of astronauts. Distance evaluation, orientation and navigation are
basic dimensions of spatial awareness. Two key limits of traditional ITS in this re-
spect are cognitive tunnelling, i.e. the fact that observers tend to focus attention on
information from specific areas of a display to the exclusion of information presented
outside of these highly attended areas, and the difficulty to integrate different camera
views. Our challenge is to produce animations (as learning resources) that are effi-
cient in restoring spatial awareness, i.e. in improving the distance estimation, the
orientation and the navigation. A training environment based on the use of automati-
cally generated animations offers a natural integration of different camera views that
represents a spatial and temporal continuity. Pedagogically, the use of such anima-
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 860–863, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Supporting Spatial Awareness in Training on a Telemanipulator in Space 861
tions is justified by the fact that astronauts who look alternatively at different displays
are compelled to achieve such an integration of different camera views.
To examine the learning of these three tasks, we have developed a 3D environment
(figure 1) reproducing different configurations of the International Space Station and
Canadarm II. This environment includes a simulator enabling the manipulation of
Canadarm II robot manipulator, different viewpoints and camera functionalities, as
well as an automated movie production module.
References
Berendt, B. 1999. Representation and Processing of Knowledge about Distances in Environ-
mental Space. Amsterdam: IOS Press.
Currie, N. and B. Peacock 2002. International Space Station Robotic Systems Operations: A
Human Factors Perspective. Habitability & Human Factors Office (HHFO). NASA.
Wickens, C. D. 2002. Spatial Awareness Biases, University of Illinois Institute of Aviation
Final Technical Report (ARL-02-6/NASA-02-4). Savoy, IL: Aviation Research Lab.
Validating DynMap as a Mechanism to Visualize the
Student’s Evolution Through the Learning Process
Abstract. This paper describes a study conducted with the aim of validating
DynMap, a system based on Concept Maps, as a mechanism to visualize the
evolution of the students through the learning process of a particular subject.
DynMap has been developed with the aim of providing the educational commu-
nity with a tool that facilitates the inspection of the student data. It offers the
user a graphical representation of the student model and enables learners and
teachers to understand the model better.
1 Introduction
Up to now, the research community has considered visualization and inspection of the
student model [9]. This component collects the learning characteristics of the student
and his/her evolution during the whole learning process. [6] collects some of the rea-
sons that different authors argue for making the learner model available. [1] claims
that the use of simple learner models, easy to show in different ways, allows teachers
and students to improve the understanding of students’ learning of the target domain.
2 DynMap
CM-ED (Concept Map EDitor) [8] is a general purpose tool for editing Concept Maps
(CMs) [7]. The aim of the tool is to be useful in different contexts and uses of the edu-
cational agenda, concretely inside the computer-based teaching and learning area.
DynMap [8] uses the core of CM-ED and facilitates the inspection of the student
data. Taking into account that the student’s knowledge changes along the learning
process it will be useful if the student module reflects this evolution. Unlike most of
the revised student models, DynMap is able to show graphically this evolution. It
shows student models based on CMs following the overlay approach [5]. Thus, the
knowledge that a student has about a domain is represented as a subset of the whole
domain, which is represented in a CM. Considering Bull’s classification [2] DynMap
would be included in the viewable models. It is designed for student models automati-
cally inferred by a teaching/learning system or manually gathered from the teacher.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 864–866, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Validating DynMap as a Mechanism to Visualize the Student’s Evolution 865
Understandability [3] is the first criteria that an open student model should meet. Fo-
cusing on this criteria and, after validating in a preliminary study [8] the set of graphi-
cal resources selected to show different circumstances of the student model, a second
experiment has been carried out. The main aim of the new study is to evaluate Dyn-
Map, as a mechanism for visualising the evolution of the students through the learn-
ing sessions of a particular subject.
Context. The study has been conducted in the context of a Computer Security
course in the Computer Science Faculty at the University of the Basque Country [4].
In order to carry out continuous assessment, the teachers gather information on the
students’ performance throughout the term. Due to the complexity and dynamism of
the assessment system, the students need to check their marks frequently.
Participants. A group of 32 students from the Computer Security course in 03-04.
Procedure. A questionnaire was constructed to investigate students’ opinions about
DynMap. It was conducted anonymously during the first part of a normal lab session
and they did not receive any help in using the tool. First at all, they were asked to
search for some data in the CM that represented the learner model. Next, each student
answered a questionnaire composed of 6 open questions and 17 multiple choice ques-
tions, where they had to choose a number between 1 and 5.
Results. The first part of the questionnaire was related to the accessibility of the in-
formation that DynMap offered in carrying out the above mentioned searches. 63,6%
of students considered it easy to look for specific data. In the second part the partici-
pants were asked about the organization of the presented CM. 66,6% of students
thought a CM organization is a good approach for representing the student’s knowl-
edge. The third part evaluated the suitability of the information that DynMap pro-
vided. 73,9% of students considered the information provided by the CM was suffi-
cient. In part four, participants were questioned about accessing individual and group
data. 78,9% of students considered students’ data private. Even more, 92,8% of stu-
dents did not have much interest in accessing other students’ models. However,
64,03% of students thought that knowing the marks of his/her group was valuable.
46,87% agreed with knowing the marks of other groups learning the same subject.
Part five explored new uses of CMs in the teaching of a subject. 72,2% of students
were in favour of using CMs for management purposes inside the teaching/learning
process. CM would be useful for organizing the subject material (68%), for planning
the whole course (66,6%) or for managing personal assignments with the teacher
(50%). Finally, in part six users had the opportunity to contribute suggestions. Most
comments suggested improvements in the visualization of the student model such as
including the whole information on just a single screen or using some graphical re-
sources for highlighting special circumstances.
Regarding the other partner of the teaching/learning, the teacher of the subject said
that the tool could help in assessment decisions and also that it could be useful as a
medium to communicate the marks to the students. He added the next issues:
The graphical view of the student model allows the teacher to analyse the distribu-
tion of the students’ activities among the units of the subject. This is useful in
identifying weaknesses and strengths in the student’s knowledge and also to detect
learners that are focussing exclusively on some parts.
866 U. Rueda et al.
It is interesting to observe the evolution of the student through the learning process
due to the continuous assessment of the subject.
The teacher was more convinced about the utility of having group models. Again, this
feature would be useful in detecting weaknesses and strengths but at group level and
also to identify most popular contents.
4 Conclusions
The experiment confirmed that the graphical representation of the student model pro-
vided by DynMap is easily understandable. Even more, DynMap offers handy
mechanisms for inspecting the student information such as showing the evolution of
the learner’s knowledge. The study results confirmed that users are able to read, ma-
nipulate and communicate with conceptual maps.
The assessment of the subject here presented is carried out continuously along the
term and, therefore, it needs an appropriate medium to show the evolution of the
marks. Hence, the preparation of the study reported along this paper has got a tool for
visualizing graphically student’s marks for both teachers and students.
References
1. Bull, S. and Nghiem, T.: Helping Learners to Understand Themselves with a Learner Model
Open to Students, Peers and Instructors. In: Brna, P. and Dimitrova, V. (eds.): Proceedings
of Workshop on Individual and Group Modelling Methods that Help Learners Understand
Themselves, ITS2002 (2002) 5-13.
2. Bull, S., McEvoy, A.T. & Reid, E.: Learner Models to Promote Reflection in Combined
Desktop PC/Mobile Intelligent Learning Environments. In: Aleven, V., Hoppe, U., Kay, J.,
Mizoguchi, R., Pain, H., Verdejo, F., Yacef, K. (eds): AIED2003 Sup. Proc.(2003) 199-208.
3. Dimitrova, V.: Interactive cognitive modelling agents – potential and challenges. In: Brna,
P. and Dimitrova, V. (eds.): Proceedings of Workshop on Individual and Group Modelling
Methods that Help Learners Understand Themselves, ITS2002, (2002) 52-62.
4. Elorriaga, J.A., Gutiérrez, J., Ibáñez, J. And Usandizaga, I.: A Proposal for a Computer Se-
curity Course. ACM SIGCSE Bulletin (1998) 42-47.
5. Golstein, I.P.: The Genetic Graph: a representation for the evolution of procedural knowl-
edge. In: Sleeman, D. and Brown, J.S. (eds.): ITSs, Academic Press (1982) 51-77.
6. Kay, J.: Learner Control. User Modelling & User-Adapted Interaction, V.11(2001)111-127.
7. Novak, J.D.: A theory of education. Cornell University, Ithaca, NY (1977)
8. Rueda, U., Larrañaga, M., Ferrero, B., Arruarte, A., Elorriaga, J.A.: Study of graphical is-
sues in a tool for dynamically visualising student models. In: Aleven, V., Hoppe, U., Kay, J.,
Mizoguchi, R., Pain, H., Verdejo, F., Yacef, K. (eds): AIED (2003) Suppl. Proc. 268-277.
9. Workshop on Open, Interactive, and other Overt Approaches to Learner Modelling.
AIED’99, Le Mans, France, July, 1999 (http://cbl.leeds.ac.uk/ijaied/).
Qualitative Reasoning in Education of Deaf Students:
Scientific Education and Acquisition of Portuguese as a
Second Language*
Brazilian deaf students are nowadays integrated in the classroom along with non-deaf
students. In spite of all sorts of limitations for implementing bilingual education [6],
most educational methods have been oriented by the assumption that the Brazilian
Sign Language (henceforth, LIBRAS) is the native language of the deaf community,
Portuguese being their second language. In this context, tools to articulate knowledge
and mediate second language acquisition are required. Qualitative Reasoning (QR)
may support the education of deaf students, as they are articulate knowledgeable
models with explicit representations of causality. Our objective here is to verify the
understanding and use of the causal relations by deaf students, assuming that (i) the
causal relations represented in the models should be understood, due to their ability to
*
An extended version of this paper can be found in the Proceedings of the International
Workshop on Qualitative Reasoning, held in Evanston, Illinois, August 2004.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 867–869, 2004.
© Springer-Verlag Berlin Heidelberg 2004
868 H. Salles, P. Salles, and B. Bredeweg
work out logical deductions; (ii) the understanding of the causal relations and the
articulation of old and new vocabulary can be read off the linguistic description of
processes and the textual connectivity in their written composition in Portuguese; (iii)
while conceptual connectivity (coherence) is a function of the understanding of the
causal relations, grammatical connectivity (cohesion) is a function of the level of
proficiency in each language, LIBRAS and Portuguese [3].
We adopt the Qualitative Process Theory [2], an ontology that has been the basis for
a number of studies in cognitive science (for example, [4]), and implemented the
models in the qualitative simulator GARP [1]. Causal relations are modelled by using
two primitives: direct influences that represent processes (I+ and I–), and qualitative
proportionalities (P+ and P–) to represent how changes caused by processes
propagate through the system (see Figure 1).
Deaf students were presented with three models. The first model introduced
vocabulary and modelling primitives. The second model was used to explore logical
deductions. The third model (Figure 1) is inspired in an ecological accident occurred
in the Brazilian city of Cataguazes, involving the chemical pollution of several rivers
in a densely populated area in the Paraíba do Sul river water basin [5].
The study was run in a secondary state school in Brasília, with deaf students from the
year, their teachers and interpreters of LIBRAS-Portuguese in the classroom.
Questionnaires and diagrams were used as evaluation tools, and explored the
formulation of predictions and explanations about changes in quantities, by means of
exploring the causal model. The final question was a written composition about the
third model. The performance of five out of eight students allows for interesting
observations. They were successful in recognizing objects, quantities and changes of
quantity values during the simulations and building up causal chains based on the
Qualitative Reasoning in Education of Deaf Students: Scientific Education 869
given models. They were partially successful in building up causal chains given initial
values for some quantities, and identifying processes. Finally, they were successful in
reporting the consequences of the ecological accident in a (written) composition. The
results of the remaining three students are not conclusive at present.
3 Discussion
This paper describes exploratory studies on the use of qualitative models to mediate
the second language acquisiton by deaf students in the context of science education.
The consistence in the results allows for a correlation between the writing skills of the
students and their understanding of the causal model. In particular, conceptual
connectivity in the text seems to be a function of the ability to recognize objects and
processes, to build up causal chains and to apply them to a given situation, assessing
derivative values of quantities and making predictions about the consequences of their
changes. The results reported here constitute a first approach in a research program
concerned with the acquisition of Portuguese as second language by deaf students
(see below). Ongoing work includes a similar experiment with a qualitative model
developed for the understanding of electrochemistry in secondary schools [7].
Acknowledgements. We thank the deaf students that took part in the experiment, as
well as their teachers and educational coordinators and the APADA for their support.
H. and P. Salles are grateful to CAPES/MEC/ PROESP for the financial support to
the project Portuguese as a second language in the scientific education of deaf.
References
1. Bredeweg, B. (1992) Expertise in Qualitative Prediction of Behaviour. Ph.D. thesis,
University of Amsterdam, Amsterdam, The Netherlands, 1992.
2. Forbus, K.D. (1984) Qualitative process theory. Artificial Intelligence, 24:85–168.
3. Halliday, M. A. K. & R. Hasan (1976) Cohesion in Spoken and Written English. London:
Longman.
4. Kuhene, S. (2003) On the representation of physical quantities in natural language. In Salles,
P. & Bredeweg, B. (eds.) Proceedings of the Seventeenth International Workshop on
Qualitative Reasoning (QR’03), pages 131-138 Brasília, Brasil, August 20-22, 2003.
5. Martins, J. (2003) Uma onda de irresponsabilidades. Ciência Hoje, 33(195): 52-54.
6. Quadros, R. (1997) Educação de Surdos: a Aquisição da Linguagem. Porto Alegre: Artes
Médicas.
7. Salles, P.; Gauche, R. & Virmond, P. (2004) A qualitative model of the Daniell cell for
chemical education. This volume.
A Qualitative Model of Daniell Cell for Chemical
Education
Abstract. Understanding how students learn chemical concepts has been a great
concern for researchers of chemical education, who want to identify the most
important misunderstandings and develop strategies to overcome conceptual
problems. Qualitative Reasoning has great potential for building conceptual
models that can be useful for chemical education. This paper describes a
qualitative model for supporting understanding the interaction between
chemical reactions and electric current in the Daniell cell. We discuss the
potential of the model for science education of deaf students.
1 Introduction
Why does the colour of copper sulphate changes when the Daniell cell is functioning?
Any Brazilian student in a secondary school should be able to answer this question,
given that the Daniell cell is largely used to build up concepts on the relation between
chemical reactions and electric current. However, the students can hardly give a
causal account of the Daniell cell typical behaviour. Textbooks are widely used in
Brazilian schools, but they fail in developing fundamental concepts [3]. The
laboratory is not an option in this case, because experiments in general do not work
very well. Computer models and simulations are interesting alternatives. However,
are they actually being used by the teachers? Ribeiro [5] reviewed papers published in
15 leading international journals of chemical education during a period of 10 years
(up to 2002) and showed that the use of software is far less than expected. Qualitative
Reasoning (QR) has great potential for supporting science education. This potential
was explored by Mustapha [4], who describe a system for simulating a chemistry
laboratory. Here we describe a qualitative model for understanding the structure and
behaviour of the Daniell cell.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 870–872, 2004.
© Springer-Verlag Berlin Heidelberg 2004
A Qualitative Model of Daniell Cell for Chemical Education 871
The Daniell cell consists of a zinc rod dipping in a solution of zinc sulphate,
connected by a wire to a copper rod dipping in a copper sulphate (II) solution.
Spontaneous oxidation and reduction reactions generate electric current, with
electrons passing from the zinc rod (the anode) to the wire and from it to the copper
rod (the cathode). While the battery works, the zinc rod goes under corrosion and its
mass decreases, while concentration of ions increases in the half cell. The copper
rod receives a deposit of metal and its mass increase, so that concentration of ions
in the solution decreases. A bulb that goes on and off and the colour of the
solution in the cathode cell are external signs of the battery functioning. Copper
sulphate produces a blue coloured solution. As the concetration of this substance
decreases, the liquid becomes colourless. The process centred approach [2] was
chosen as an ontology for representing the cell and the models were implemented in
GARP [1]. Causality is represented by direct influences and qualitative
proportionalities. The former represents processes, the primary cause of change.
Proportionalities propagate to other quantities changes caused by processes. In this
case, the causal link is established via the derivatives (see Figure 1):
potentials between the electrodes (the chemical equilibrium), and the battery does not
work. The bulb is off and the copper sulphate solution becomes colourless.
This work describes a qualitative model of the Daniell cell. Textbooks do not describe
how chemical energy transforms into electric energy. A QR approach has an added
value because it focus on the causal relations that determine the behaviour of the cell.
A description of the mechanism of change, the electric current generation process,
indicates the origin of the dynamic phenomenon, which is then propagated to and
observed in the rest of the system. This way, inspecting only the causal model of the
battery the student can explain why the mass of rods change, and why the bulb goes
off while the colour of the solution at the cathode disappears. The work described
here is part of an umbrella project that aims at the development of Portuguese as a
second language for deaf students (see below). The use of qualitative models to
support second language acquisition by deaf students is already being investigated
and the results obtained so far are encouraging. Ongoing work includes exploring the
qualitative model of the Daniell cell with a group of deaf students in an experiment
similar to the one described in Salles [6], improved by the lessons learned.
Acknowledgements. This work was partially funded by the project “Português como
segunda língua na educação científica de surdos” (Portuguese as second language in
the scientific education of deaf), a grant MEC / CAPES / PROESP from the Brazilian
government.
References
1. Bredeweg, B. (1992) Expertise in Qualitative Prediction of Behaviour. Ph.D. thesis,
University of Amsterdam, Amsterdam, The Netherlands, 1992.
2. Forbus, K.D. (1984) Qualitative process theory. Artificial Intelligence, 24:85–168.
3. Lopes, A. R. C. (1992) Livros didáticos: obstáculo ao aprendizado da ciência Química.
Química Nova, 15(3): 254-261.
4. Mustapha, S.M.F.D.; Jen-Sen, P. & Zaim, S.M. (2002) Application of Qualitative Process
Theory to qualitative simulation and analysis of inorganic chemical reaction. In: N. Angell
& J. A. Ortega (Eds.) Proceedings of the International workshop on Qualitative Reasoning,
(QR’02), pages 177-184, Sitges - Barcelona, Spain, June 10-12, 2002.
5. Ribeiro, A.A. & Greca, I.M. (2003) Simulações computacionais e ferramentas de
modelização em educação química: uma revisão de literatura publicada. Química Nova,
26(4): 542-549.
6. Salles, H.; Salles, P. & Bredeweg, B. (2004) Qualitative reasoning in education of deaf
students: scientific education and acquisition of Portuguese as a second language. This
volume.
Student Representation Assisting Cognitive Analysis
1 Introduction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 873–876, 2004.
© Springer-Verlag Berlin Heidelberg 2004
874 A. Serguieva and T.M. Khan
2 Student Framework
Let us describe domain knowledge through four-dimensional generalized constraints:
Next, the student’s perception of a target domain will reflect the bounded human abil-
ity to resolve detail and unbounded capability for information compression [10]. Hu-
man cognition is by definition fuzzy granular, as a consequence of the fuzziness of
concepts like indistinguishability, proximity and functionality. Student perceptions
will be extracted as propositions in natural language. It is demonstrated in [11],[12]
how propositions in natural language translate into generalized constraints. Conven-
iently, the framework in Fig. 2 is already described through generalized constraints.
Therefore, we can introduce a unified approach to student representation based on
both their performance and perceptions.
3 Further Research
Further work involves developing a diagnostic strategy able to determine cognitive
states in terms of subsets of generalized constraints. According to the complexity of
the task, the strategy will employ propagation of generalized constraints or evolution-
ary computation [6]. A demonstration application will illustrate how the overall ap-
proach works in a real setting. This involves instantiating the domain framework
within the area of risk analysis, particularly assets and derivatives valuation and risk
analysis [6],[7],[8].
References
1. de Koning, K., Bredeweg, B., Breuker, J., Wielinga, B.: Model-Based Reasoning About
Learner Behaviour. Artificial Intelligence 117 (2000) 173-229
2. Forbus, K.: Using Qualitative Physics to Create Articulate Educational Software. IEEE
Expert 12 (1997) 32-41
3. Forbus, K., Whalley, P., Everett, J., Ureel, L., Brokowski, M., Baher, J., Kuehne, S.: Cy-
clepad: An Articulate Virtual Laboratory for Engineering Thermodynamics. Art. Intell.
114(1999)297-347
4. Khan, T., Brown, K., Leitch, R.: Managing Organisational Memory with a Methodology
Based on Multiple Domain Models. Proceedings of the Second International Conference
on Practical Application of Knowledge Management (1999) 57-76
5. Leitch, R., et al.: Modeling choices in intelligent systems. Artificial Intelligence and the
Simulation of Behavior Quarterly 93 (1995) 54-60
6. Serguieva, A., Kalganova, T.: A Neuro-fuzzy-evolutionary classifier of low-risk invest-
ments. Proceedings of the IEEE Int. Conf. on Fuzzy Systems (2002) 997-1002 IEEE Press
7. Serguieva, A., Hunter, J.: Fuzzy interval methods in investment risk appraisal. Fuzzy Sets
and Systems 142 (2004) 443-466
8. Serguieva, A., Khan, T.: Modelling techniques for cognitive diagnosis. EPSRC Deliver-
able Report on Cognitive Diagnosis in Training. Brunel University (2003)
876 A. Serguieva and T.M. Khan
9. Serguieva, A., Khan, T.: Domain Representation Assisting Cognitive Analysis. In Pro-
ceedings of the Sixteenth European Conference on Artificial Intelligence. IOS Press
(2004) to be published
10. Zadeh, L.: Toward a theory of fuzzy information granulation and its centrality in human
reasoning and fuzzy logic. Fuzzy Sets and Systems 90 (1997) 111-127
11. Zadeh, L.: Outline of Computational Theory of Perceptions Based on Computing with
Words. In: Soft Computing and Intelligent Systems, Academic Press (2000) 3-22
12. Zadeh, L.: A new direction in AI: Toward a computational theory of perceptions. Artificial
Intelligence Magazine 22 (2001) 73-84
An Ontology-Based Planning Navigation in
Problem-Solving Oriented Learning Processes
Abstract. Our research aims are to propose a support model for problem-
solving oriented learning and implement a human-centric system that supports
learners and thereby develops their ability. The characteristic of our research is
that our system understands the principle knowledge (ontology) to support us-
ers through human-computer interactions.
1 Introduction
Our research aims are to propose an ontology-based navigation framework for Prob-
lem-Solving Oriented Learning (PSOL) [1], and implement a human-centric system
based on the ontology to support learners and thereby develop their ability. By ontol-
ogy-based, we mean that we do not develop the ad hoc system but the theory-aware
system based on the principle knowledge.
We define problem-solving oriented learning as learning whereby a learner must
not only accumulate sufficient understanding for planning and performing problem-
solving processes but also acquire capacity for making efficient problem-solving
processes according to a sophisticated strategy. Therefore, in PSOL it is important for
learner not only to execute problem-solving processes or learning processes (Object
activity) but also to encourage meta-cognition (Meta activity) that monitors/controls
her internal mental image.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 877–879, 2004.
© Springer-Verlag Berlin Heidelberg 2004
878 K. Seta et al.
Figure 2 shows that the system provides appropriate information for a learner
when she reaches an impasse because the feasibility of learning process is not con-
firmed. Here by suggesting the causes of the impasses as well as showing the influ-
ence on problem-solving, the system encourages the learner to observe and control
her internal mental image (meta-cognition), which contributes to effective PSOL.
4 Concluding Remarks
This paper systemized PSOL task ontology and then proposed a human-computer
interactive navigation framework based on the ontology.
References
1. Kazuhisa Seta, Kei Tachibana, Ikuyo Fujisawa and Motohide Umano: “An ontological
approach to interactive navigation for problem-solving oriented learning processes,” Inter-
national Journal of Interactive Technology and Smart Education, (2004, to appear)
2. Rasmussen, J.: “A Framework for Cognitive Task Analysis”, Information Processing and
Human-Machine Interaction: An Approach to Cognitive Engineering, North-Holland, New
York, (1986) pp.5–8.
A Formal and Computerized Modeling Method of
Knowledge, User, and Strategy Models in PIModel-Tutor
Jinxin Si1,2
1
Institute of Computing Technology, Chinese Academy of Sciences
2
The Graduate School of the Chinese Academy of Sciences
[email protected]
1 Introduction
2 ULMM Overview
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 880–882, 2004.
© Springer-Verlag Berlin Heidelberg 2004
A Formal and Computerized Modeling Method of Knowledge 881
Knowledge logic model is the fine-grained knowledge base about concepts and rela-
tions for pedagogical process. We define a knowledge logic model to be a 4-tuple
where C is the concept set, is the set of semantic relations,
is the set of pedagogical relations, A is the set of axioms among concepts and their
relations.
In many cases, the designation of pedagogical relations de facto can be combined
intimately with semantic relations. Some literatures proposed that the whole is more
than the sum of its parts and the “glue” is specified to tie together the pieces of
knowledge. So, we give two examples about the translation rule involved some se-
mantic relations (i.e. “part-of”, “has-instance” and prerequisite) depicted as bellow.
User logic model can help ITSs to determine static characters and dynamic require-
ments for a given user in an interactive manner. Inspired by performance and expec-
tation of leaning event, student states can be depicted by the tuple anytime,
where symbol indicates practical student states from student perspectives and sym-
bol indicates planned student states from tutor perspectives. Both unit exercises and
class quizzes need to be considered during the pedagogical evaluation.
Error identification and analysis is a central task in user logic model as other UM
methods including bug library, model-tracing, constraint-based method etc. However,
concrete errors are thought to be dependent strongly on domain, pedagogical, even
psychological theory. To some extent the error abstraction decreases the complexity
of state computation, and increases the facility of state expression. For example, some
detailed explanations for misclassification and misattribution, which are two classical
error types in concept leaning, can be formalized with first-order logic as follows,
where the suffix-w denotes wrongness of atomic predicate:
of student state does not reside in sequence of actions the student executed, but in the
situation the student created. In fact some empirical research proposed that it is un-
certain for the effects on student state which results from imposed action. As a result,
it is a vital task for ITS designer to build large testing, feedback and remedial strate-
gies in order to obtain and evaluate student states. At the same time, strategy layer
should be able to provide an open authoring portal to formalize versatile sound-
psychological learning strategies.
The novelty of ULMM lies in that it can give a formal representation schema in
global modeling of an ITS architecture rather than local modeling of every compo-
nents. We think that it is not considerate that the modeling strategy in ITS research
domain should be regarded as “divide and rule”. Until now the ULMM method has
not been subject to an overall evaluation in our PIModel-Tutor system. In our future
works, we need to evaluate some issues to promote validness and soundness in con-
crete implementation of PIModel-Tutor. During the process of ITS authoring, how
can ULMM provide the mechanism about conflict detection and resolution to facili-
tate system construction? As a central element, how can the effective and automated
instructional remedy be generated as to adapt to student requirements in psychologi-
cal and pedagogical aspects? How should ULMM ease the difficulty of computation
about student states through a flexible interface?
References
1. Kay, J. 2001. Learner control, User Modeling and User-Adapted Interaction, Tenth Anni-
versary Special Issue, 11(1-2), Kluwer, 111-127.
2. Si J. ; Yue X.; Cao C.; Sui Y. 2004. PIModel: A Pragmatic ITS Model Based on Instruc-
tional Automata Theory, To appear in the proceedings of The 17th International FLAIRS
Conference, Miami Beach, Florida, May 2004. AAAI Press.
3. Si J.; Cao C.; Sui Y.; Yue X.; Xie N. 2004. ULMM: A Uniform Logic Modeling Method
in Intelligent Tutoring Systems, To appear in the proceedings of The 8th International
Conference on Knowledge-based Intelligent Information & Engineering Systems,
Springer.
4. Vassileva, J.; McCalla, G.; and Greer, J. 2003. Multi-Agent Multi-User Modelling in I-
Help, User Modelling and User Adapted Interaction, 2003, 13(1) 179-210.
5. Yue X.; and Cao C. 2003. Knowledge Design. In Proceedings of International Workshop
on Research Directions and Challenge Problems in Advanced Information Systems Engi-
neering, Japan, Sept.
SmartChat – An Intelligent Environment for
Collaborative Discussions
1 Introduction
2 Chat Environments
We have analyzed three environments: jXChat [1], Comet [2] and BetterBlether [3], ac-
cording to the following criteria: (1) record of the interaction log; (2) technique employed
in the interaction analysis; (3) goal of the interaction analysis; (4) way of intervening in
the conversation; (5) provision of feedback for the teacher or the student; (6) use of an
argumentation model; (7) interface usability. We have observed that when a chat offers
more resources to support teachers and/or students during the discussion, its interface
becomes a hindrance to the users. We have also observed that these systems only pro-
vide feedback to the teacher. Furthermore, even the systems that provide feedback, do so
through reports or statistics generated from the interaction log, and only at the end of the
discussion. And none of the systems make use of the argumentation model to structure
the conversation.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 883–885, 2004.
© Springer-Verlag Berlin Heidelberg 2004
884 S. de Albuquerque Siebra et al.
3 SmartChat
SmartChat’s prototype was implemented using RMI (Remote Method Invocation). Its
reasoning mechanism uses an agent society composed by two intelligent agents: The
Monitor Agent that is responsible for getting all the perceptions necessary for deciding
whether to interfere or not in conversation. And the Modeller Agent centralizes the main
activities in the chat, models the profile of the users logged-in. This agent communicates
with the rule database generated by JEOPS [4], which is used to classify the user as one
of the stereotypes [5] stored in the user model. The Modeller interferes in the discussion
to perform one of three actions: (1) send support messages to the users according to
their stereotype; (2) suggest references related to the subject being discussed; and (3)
name another user that may collaborate with the user having difficulties. Fig.1 shows
the SmartChat architecture.
SmartChat uses a simplified argumentation model, based on the IBIS model [6], to
structure the information contained in the interactions and to categorize the messages
exchanged between students. The user that wishes to interact with the environment
should select an abstraction from a predefined set (for example, Argument, Question,
etc.), in order to provide explicit information about the intention of her/his messages. The
use of an argumentation model favours the resolution of conflicts and the understanding
of problems, helping the participants to structure their ideas more clearly. Fig. 2 shows
the argumentation model used by SmartChat.
the classification of the users using the environment. In the near future, we intend to
extend the domain ontology, implement an on-line feedback area to inform the students
about their performance, and to enrich the student and teacher reports with more relevant
information.
References
1. Martins, F. J.; Ferrari, D. N.; Geyer, C. F. R. jXChat - Um Sistema de Comunicação Eletrônica
Inteligente para apoio a Educação a Distância. Anais do XIV Simpósio de Informática na
Educação - SBIE - NCE/UFRJ. (2003).
2. Soller, A.; Wiebe, J. ; Lesgold, A. A Machine Learning Approach to Assessing Knowledge
Sharing During Collaborative Learning Activities. In: Proceedings of Computer Support for
Collaborative Learning 2002. Boulder, CO, (2002). 128-137.
3. Robertson, J.; Good, J.; Pain, H. BetterBlether: The Design and Evaluation of a Discussion
Tool for Education. In: International Journal of Artificial Intelligence in Education, N.9 (1998),
219-236.
4. Figueira Filho, C.; Ramalho G. JEOPS - Java Embedded Object Production System. Monard,
M. C; Sichman, J. S (Eds). IBERAMIA-SBIA 2000, Proceedings. Lecture Notes in Computer
Science 1952. Springer (2000), 53-62.
5. Rich, E. Stereotypes and user modeling. A. Kobsa & W. Wahlster (Eds.), User Models in Dialog
Systems. Berlin, Heidelberg: Springer, (1989). 35-51.
6. Conklin, J.; Begeman, M. L. gIBIS: A Hypertext Tool for Exploratory Policy Discussion. In:
ACM Transactions on Office Information Systems. V. 16, N.4, (1998).
Intelligent Learning Objects:
An Agent Based Approach of Learning Objects
1 Introduction
Many people have been working hard to produce metadata specification towards a
construction of Learning Objects in order to improve efficiency, efficacy and reus-
ability of learning content based on Object Oriented design paradigm. According to
Sosteric and Hesemeier [7], learning objects have been on the educational agenda
now. Organizations such as the IMS Global Learning Consortium [4] and the IEEE
[3] have contributed significantly by helping to define indexing (metadata) standards
for object search and retrieval. There has also been some commercial and educational
work accomplished.
Learning resources are objects in an object-oriented model. They have methods
and properties. Typically methods include rendering and assessment methods. Typi-
cal properties include content and relationships to other resources [5]. Downes [2]
points out that a lot of work has to be done to use a learning object. You must first
build an educational environment in which they can function, you need, somehow, to
locate these objects, arrange them in their proper order, according to their design and
function. And you must arrange for the installation and configuration of appropriate
viewing software. Although it seems to be easier to do all this with learning objects,
we need smarter learning objects.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 886–888, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Intelligent Learning Objects: An Agent Based Approach of Learning Objects 887
Fig. 1. The Intelligent Learning Object designed as a Pedagogical FIPA agent implements the
same API specification, performs messages sending and receiving, and performs agents’ spe-
cific task, according to its knowledge base. As the agent receives a new FIPA-ACL message it
processes the API function according to its content, performing the adequate behavior and act
on SCO. According to the agent behavior model, the message-receiving event can trigger some
message sending, mental model updating and some particular specific agent action on the SCO
the course specialist The smaller and most simple is the pedagogical task performed
by the ILO, the most adaptable flexible and interactive is the learning experience
provided by it.The FIPA-ACL protocol performed by a FIPA agent communication
manager platform ensures an excellent support for cooperation. Fig 1 shows the pro-
posed architecture of the set of pedagogical agents.
The Sharable Content Object Reference Model (SCORM®) [1] is maybe the best
reference to start a thinking of how to build learning objects based on a agent archi-
tecture. The SCORM defines a Web-based learning “Content Aggregation Model”
and “Run-time Environment” for learning objects. At its simplest, it is a model that
references a set of interrelated technical specifications and guidelines designed to
meet the requirements for object learning. Learning content in its most basic form is
composed of Assets that are electronic representations of media, text, images, sound,
web pages, assessment objects or other pieces of data that can be delivered to a Web
client.
3 Conclusions
At this point, we quote Downes [2]: We need to stop thinking of learning objects as
chunks of instructional content and to start thinking of them as small, self-reliant
computer programs. When we think of a learning object we need to think of it as a
small computer program that is aware of and can interact with its environment.
This project is granted by Brazilian research agencies: CNPq and FAPERGS.
References
1. Advanced Distributed Learning (ADL)a. Sharable Content Object Reference Model
(SCORM ® ) 2004 Overview. 2004. Available by HTTP in: <www.adlnet.org>.
2. Downes , Stephen Smart Learning Objects, May 2002
3. IEEE Learning Technology Standards Committee (1998) Learning Object Metadata (LOM):
Draft Document v2.1
4. IMS Global Learning Consortium. IMS Learning Resource Meta-data Best Practices and
Implementation Guide v1.1. 2000.
5. Robson, Robby (1999) Object-oriented Instructional Design and Web-based Authoring.
[Online] Available by HTTP in:
<www.eduworks.com/robby/papers/objectoriented.pdf>
6. Shoham, Y Agent-oriented programming. Artificial Intelligence, Amsterdam, n.60, v.1,
p.51-92, Feb. 1993.
7. Sosteric, Mike, Hesemeier Susan When is a Learning Object not an Object: A first step
towards .a theory of learning objects International Review of Research in Open and Dis-
tance Learning (October - 2002) ISSN: 1492-3831
8. Wooldridge, M.; Jennings, N. R.; Kinny, D. A methodology for agent-oriented analysis and
design. In: International Conference on Autonomous Agents, 3. 1999. Proceedings
Using Simulated Students for Machine Learning
Abstract. In this paper we present how simulated students have been generated
in order to obtain a large amount of labeled data for training and testing a neu-
ral network-based fuzzy model of the student in an Intelligent Learning Envi-
ronment (ILE). The simulated students have been generated by modifying real
students’ records and classified by a group of expert teachers regarding their
learning style category. Experimental results were encouraging, similar to ex-
perts’ classifications.
1 Introduction
One of the critical issues that are currently limiting the real world application of ma-
chine learning techniques for user modeling is the need for large data sets of explic-
itly labeled examples [7]. Simulated students, originally proposed as a modeling ap-
proach 6, have been used in ITS studies [1] [6]. This paper presents how simulated
students have been generated in order to train a neural network-based fuzzy model
that updates the student model on student’s learning style in a Intelligent Learning
Environment (ILE). The ILE consists of the educational software “Vectors in Physics
and Mathematics” [4], and the neural network-based fuzzy model [5].
The educational software “Vectors in Physics and Mathematics” [4], is a discov-
ery learning environment that allows students carrying out selected activities which
refer to real-life situations, e.g. they experiment with forces acting on objects and run
simulations.
The neural network-based fuzzy model makes use of neuro-fuzzy synergism is or-
der to evaluate, taking into consideration teacher’s personal opinion/judgment, an
aspect of the surface/deep approach [3] of student’s learning style, in order to be used
to sequencing the educational material. Deep learners often prefer self-regulated
learning; conversely, surface learners often prefer externally regulated learning [2].
“Student’s tendency to learn by discovery in a deep or surface way” is described with
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 889–891, 2004.
© Springer-Verlag Berlin Heidelberg 2004
890 R. Stathacopoulou et al.
the term set {Deep, Rather Deep, Average, Rather Shallow, Shallow}. This process
involves dealing with uncertainty, and eliciting and expressing teacher’s qualitative
knowledge in a convenient and interpretable way. Neural networks are used to equip
the fuzzy model with learning and generalization abilities, which are eminently useful
when teacher’s reasoning process cannot be defined explicitly.
Fig. 1. Membership functions for the linguistic variable “problem solving speed”.
In order to construct the simulated students’ records, student’s actions until s/he
quits an activity are decomposed in terms of episodes of actions. Each episode in-
cludes a series of actions which begins or ends when the student clears the screen in
order to start a new attempt on the same activity, or a new equilibrium activity.
Within each episode the student conducts, successfully or unsuccessfully, an experi-
ment. The simulated students’ records have been produced by modifying the number
of episodes or elements of patterns within each episode or between episodes, i.e.
inserting, deleting or changing actions that are used to calculate the values
which represents the measured values of Thus, starting with 10 real stu-
dents’ records we can generate simulated students, altering the values of in
Using Simulated Students for Machine Learning 891
the students’ patterns by giving appropriate values within their universes of discourse
For example, reducing the number of episodes will cause a decrease to the value of
which gives the measured value of Thus, a particular student performing an
unsuccessful experiment, needs 5 episodes and 18 minutes overall to produce a cor-
rect solution in this activity. For the particular activity that the student is performing,
the group of experts estimated the average time is 10 minutes. Thus, calculating the
percentage that corresponds to 10 minutes multiplied by 2 (i.e. 20 minutes) for this
student which corresponds to the linguistic value “Slow” with membership
degree very close to 1 (see Fig. 1). Reducing the number of episodes of this activity to
4, the total time of the episodes needed to find the correct solution is 15 minutes; this
corresponds to a value of and the linguistic value for problem solving speed
is now “slow” with a degree of 0.5 and “Medium” with a degree of 0.5 (see Fig. 1).
References
1. Beck J. E. (2002). Directing Development Effort with Simulated Students. In Proc. of ITS
2002, Biarritz, France and San Sebastian, Spain, June 2-7, pp. 851-860, LNCS, Spr.-Verl.
2. Beshuizen J. J., Stoutjesdijk E. T., Study strategies in a computer assisted study environ-
ment, Learning and Instruction 9 (1999) 281-301.
3. Biggs J., Student approaches to learning and studying, Australian Council for Educational
Research, Hawthorn Victoria, 1987.
4. Grigoriadou M., Mitropoulos D., Samarakou M., Solomonidou C., Stavridou E. (1999).
Methodology for the Design of Educational Software in Mathematics and Physics for
Secondary Education. Computer Based Learning in Science, Conf. Proc. 1999 pB3.
5. Stathacopoulou R, Magoulas GD, Grigoriadou M., Samarakou M. (2004). Neuro-Fuzzy
Knowledge Processing in Intelligent Learning Environments for Improved Student Diag-
nosis. Information Sciences, in press, DOI information 10.1016/j.ins.2004.02.026
6. Vanlehn K., Niu Z. (2001). Bayesian student modeling, user interfaces and feedback: A
sensitivity analysis. Inter. Journal of Artificial Intelligence in Education 12 154-184.
7. Webb G. I., Pazzani M. J., Billsus D.(2001) Machine Learning for User Modeling User
Modeling and User-Adapted Interaction 11, 19-29.
Towards an Analysis of How Shared Representations Are
Manipulated to Mediate Online Synchronous
Collaboration
Daniel D. Suthers
Dept. of Information and Computer Sciences, University of Hawaii, 1680 East West Road
POST 317, Honolulu, HI 96822, USA
[email protected]
http://lilt.ics.hawaii.edu/
1 Introduction
The author is studying how software tools that support learners’ construction of
knowledge representations (e.g., concept maps, evidence maps) are used by collabo-
rating learners, and consequently how to design such tools to more effectively support
collaboration. A previous study [6] found that online collaborators treated a graphical
evidence map as a medium through which collaboration took place, proposing new
ideas by entering them directly in the graph before engaging in (usually brief) con-
firmation dialogues in a textual chat tool. In general, actions in the graph appeared to
be an important part of participants’ conversations with each other, and in fact was at
times the sole means of interaction. These observations led to the questions of
whether and in what sense we can say that participants are having a conversation
through the graph, and whether knowledge building taking place. To answer these
questions, the author identified interactions from the previous study that appeared to
constitute collaboration through the nonverbal as well as verbal media, and is en-
gaged in a qualitative analysis of these examples. The purpose of this analysis is to
understand how participants made use of the structured graph representation to medi-
ate meaning making activity, by examining how participants use actions on the repre-
sentations to build on each others’ ideas. The larger goal is to identify affordances of
shared representations for face-to-face and online collaboration and their implications
for the design of representational support for collaborative knowledge building.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 892–894, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Towards an Analysis of How Shared Representations Are Manipulated 893
2 The Study
The participants’ task was to propose and evaluate hypotheses concerning the cause
of ALS-PD, a neurological disease with an unusually high occurrence on Guam that
has been studied by the medical community for over 50 years. The experimental
software provided a graphical tool for constructing representations of the data, hy-
potheses, and evidential relations that participants gleaned from information pages.
An information window enabled participants to advance through a series of textual
pages presenting information on ALS-PD. The sequence was designed such that later
pages sometimes affected upon the interpretation of information seen several pages
earlier, making the use of an external memory important. In the study on which this
analysis derives its data [6], the software was modified for synchronous online col-
laboration with the addition of a chat tool. Transcripts were automatically logged in
the online sessions.
3 The Analysis
In order to “see” how participants were interacting with each other, the author and his
student (Ravikiran Vatrapu) began by identifying “information uptake” relations
between actions. Information uptake is said to hold between action A1 and action A2
if A2 builds on the information in A1. Examples include editing or linking to prior
information, or cross-modal references such as a chat comment about an item in the
graph. This uptake must be plausibly based on the informational content or attitude
towards that information of the uptaken act or representation. There must be evidence
that the uptaker is responding to one of these. For example, merely moving things
around to make the graph pretty is not counted.)
The analysis then proceeds in a bottom-up manner, working from the referential
level to the intentional level, similar to [5]. After having identified ways in which
information “flows” between participants, as evidenced by their references to infor-
mation in the graph, interpretations of the intentions behind these references are then
made.
The analysis seeks evidence of knowledge building, using a working definition of
knowledge building as the accretion of interpretations on an information base that is
simultaneously expanded by information seeking. Collaborative knowledge building
takes place when multiple participants contribute to this accretion of interpretations
by building, commenting on, transforming and integrating an information base. In
defining what counts as evidence for knowledge building, the analysis draws upon
several theoretical perspectives. Interaction via a graphical representation can be
understood as similar to interaction via language in terms of Clark’s model of
grounding [4] if grounding is restated in terms of actions on a representation: a par-
ticipant expresses an idea in the representation; another participant acts on that repre-
sentation in a manner that provides evidence of understanding the first participant’s
intent in a certain way; the first participant can choose to accept this action as evi-
dence of sufficient understanding, or, if the evidence is insufficient, initiate repair.
Under the grounding perspective, one would look for sequences of actions in which
894 D.D. Suthers
References
1. Bertelsen, Olav W. and Bødker, Susanne (2003). Activity Theory. In J. M. Carroll (Ed.),
HCI Models, Theories and Frameworks: Towards a Multidisiplinary Science. San Fran-
cisco, Mogan Kaufmann: 290-315.
2. Doise, W., and Mugny, G. (1984) The Social Development of the Intellect, International
Series in Experimental Social Psychology, vol. 10, Pergamon Press
3. Hollan, J., E. Hutchins, & Kirsh, D. (2002). Distributed Cognition: Toward a New Founda-
tion for Human-Computer Interaction Research. Human-Computer Interaction in the New
Millennium. J. M. Carroll. New York, ACM Press Addison Wesley: 75-94.
4. Monk, A. (2003). Common Ground in Electronically Mediated Communication: Clark’s
Theory of Language use. In J. M. Carroll (Ed.), HCI Models, Theories and Frameworks:
Towards a Multidisiplinary Science. San Francisco, Mogan Kaufmann: 265-289.
5. Mühlenbrock, M., & Hoppe, U. (1999). Computer Supported Interaction Analysis of Group
Problem Solving. In Proceedings of the Computer Support for Collaborative Learning
(CSCL) 1999 Conference, C. Hoadley & J. Roschelle (Eds.) Dec. 12-15, Stanford Univer-
sity, Palo Alto, California. Mahwah, NJ: Lawrence Erlbaum Associates.
6. Suthers, D., Girardeau, L. and Hundhausen, C. (2003). Deictic Roles of External Repre-
sentations in Face-to-face and Online Collaboration. Designing for Change in Networked
Learning Environments, Proceedings of the International Conference on Computer Support
for Collaborative Learning 2003, B. Wasson, S. Ludvigsen & U. Hoppe (Eds), Dordrecht:
Kluwer Academic Publishers, pp. 173-182.
7. Suthers, D., and Hundhausen, C. (2003). An Empirical Study of the Effects of Representa-
tional Guidance on Collaborative Learning. Journal of the Learning Sciences, 12(2), 183-
219
A Methodology for the Construction of Learning
Companions
Paula Torreão, Marcus Aquino, Patrícia Tedesco, Juliana Sá, and Anderson Correia
One of the essential factors for the success of any software is the use of a
methodology. This increases the probability of the final system being complete,
functional and accessible. Furthermore, such practice reduces risks, time and cost.
However, there is no clear description of a methodology for the construction of LCs.
This paper presents a proposal for a methodology for the construction of a LC used to
build a collaborator/simulated peer LC [1], VICTOR1, applied to a web-based virtual
learning environment, PMK Learning Environment2 (or PMK), which teaches Project
Management (PM). The PMK has the content of PMBOK®3, which provides a basic
knowledge reference and practices for PM, thus being a worldwide standard. The
construction of an intelligent virtual environment using a LC to teach PM is a pioneer
proposal. The application of the methodology described in this paper permitted a
better identification of the problem and the learning bottlenecks. Furthermore, it also
helped us to decide on a proper choice of domain concepts to be represented, as well
as clarifying the necessary requirements for the design of a more effective LC.
Several authors describe methodologies for the construction of Expert Systems (ES)
(e.g. [2]). A Learning Companion is a type of ES, used for instruction that diagnoses
the student’s behavior and cooperates with him/her learning [3,4]. The methodology
here presented is based on Levine et al. [4] and has the following six stages: (1)
identifying the problem; (2) eliciting relevant domain concepts; (3) conceptualizing
the pedagogical tasks; (4) building the LC’s architecture; (5) implementing the LC;
and (6) evaluating and refining the LC.
Identifying the Problem: At this stage, a preliminary investigation of the main domain
characteristics should be considered for the formalization of the knowledge.
Following, one should identify in which subjects there are learning problems and of
what type they are. This facilitates the conception of adequate pedagogical strategies
and tactics that the LC should use for the student’s learning. At the end of this stage,
two artifacts should be produced: (1) a document relating the most relevant domain
characteristics; and (2) a document enumerating the main learning problems found.
Eliciting Relevant Domain Concepts: After defining the task at hand (what are the
domain characteristics? Which type of LC is needed?), one should choose which are
the most important concepts to represent in the domain. Firstly, one should define the
1
Virtual Intelligent Companion for TutOring and Reflection
2
http://www.cin.ufpe.br/~pmk - Oficial Project Site
3 Project Management Body of Knowledge – http://www.pmi.org
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 895–897, 2004.
© Springer-Verlag Berlin Heidelberg 2004
896 P. Torreão et al.
domain ontology and choose how to represent domain knowledge. At the end of this
stage, two artifacts should be produced: (1) a model of the domain ontology and; (2) a
document containing the ontology constraints.
Conceptualizing the Pedagogical Task: After modeling the domain ontology, it is
necessary to define the LC’s goals, pedagogical tactics and strategies. In order to
define the LC’s behaviour three questions should be answered: What to do? When?
And How? The understanding of the learning process and of any factors (e.g.
reflection, aims) relevant to the success of this process facilitates this specification.
There are various ways of selecting a pedagogical strategy and tactics, one of them
being the choice of the teaching strategy according to the domain and the learning
goal [5]. For instance, agent LUCY [1] uses the explanation-based teaching strategy
to teach students physical phenomena about satellites. Choosing an adequate teaching
strategy depends on the following criteria: size and type of the domain, learning goals,
and safety and economical questions (e.g. training firefighters would qualify as a non-
safe domain). At the end of this stage, two artifacts should be produced: (1) a
document specifying the LC’s goal, pedagogical strategies and tactics; and (2) a
document specifying the LC’s actions and behaviors.
Building the LC’s Architecture: At this stage, the documents previously constructed
are used as a basis for the detailed project of the LC’s architecture. This project
should include the Behavior Model of LC and Knowledge Base (KB). The LC’s
behavior should be modeled according to tactics, strategies, and goals defined
previously. This behavior determines how, when and what the LC perceives and in
what way it responds to the student’s actions. The KB stores the contents defined
during the elicitation of domain concepts. It should contain all domain concepts,
terms and relationships among them. The representation technique chosen for the KB
will also determine which Inference Engine will be used by the LC. The LC’s
Architecture contains four main components: the student’s model, the pedagogical
module, the domain knowledge and the communication modules [6]. The student’s
model stores the individual information about each student and provides information
to the pedagogical module. The pedagogical module provides a model of the learning
process and contains the model of Behavior of LC. The domain knowledge module
contains information about what the tutor should teach. The communication module
mediates the LC’s interaction with the environment. It captures the student’s actions
in the interface and sends the actions suggested by the pedagogical module to the
interface. At the end of this stage, a document containing the detailed project of the
architecture should be produced.
Implementing the LC: At this stage, all the components of the architecture of LC
should be implemented, as well as the character’s animation, if any, according to the
emotional states defined in the behavior of the LC. A good implementation practice is
to construct first a prototype of the LC with the minimum set of functionalities
necessary for the system to run. This prototype is then used to validate the designed
LC’s behavior. At the end of this stage, the artifact produced is the prototype itself.
Evaluating and Refining the LC: The tests will validate the architecture of LC and
point out any necessary adjustments. At the end of this stage, two artifacts should be
produced: (1) a test and evaluation report, with any changes performed; (2) the final
version of the LC.
A Methodology for the Construction of Learning Companions 897
This paper proposes a novel methodology for the construction of LCs. The
methodology allowed a better organization, structuring, shaping of the system and the
common understanding of the development team about fundamental details for the
construction of VICTOR. The benefits of using the methodology could be observed
mainly at the stage of implementation, where all the requisites were clearly elicited,
modeled at previous stages and the nuances were perceived. Some risks could be
eliminated or mitigated in the beginning of this work, allowing us to cut costs and
save time. The definition of a methodology before starting the construction of
VICTOR facilitated greatly the achievement of the purposes of this work.
Integrating VICTOR to the PMK environment has enabled us to gather evidence of
the greater efficiency of an intelligent learning environment and of the various
behaviors of LC in different situations. This type of system helps the overcome the
main difficulties of the Distance Learning systems: discouragement and dropouts. The
LC here proposed aims at meeting the student’s needs in a motivational, dynamic and
intelligent way. VICTOR’s integration with the PMK resulted in an Open Source
Software research project4.
In the future, VICTOR will also be able to use the Case-based Teaching Strategy
as a pedagogical strategy, presenting PM real-world project scenarios. Another
researcher in our group is working on improving VICTOR’s interaction through the
use of natural language. In the near future, we intend to carry out more
comprehensive tests with users aspiring to PMP certificates, by comparing the
performance of those who used LC with those who did not.
References
1. Goodman, B., Soller, A., Linton, F., Gaimari, R.: Encouraging Student Reflection and
Articulation Using a Learning Companion. International Journal of Artificial Intelligence in
Education, Vol. 9 (1998) 237-255
2. Schreiber, A., Akkermans, J., Anjewierden, A., Hoog, R., Shadbolt, N., Velde, W.,
Wielinga, B.: Knowledge Engineering and Management: The CommonKADS
Methodology. MIT Press (2000)
3. Chou, Chih-Yueh., Chan Tak-Wai., Liin Chi-Jen.: Redefining the Learning Companion: The
Past, Present, and Future of Educational Agents Source. Computers & Education. Elsevier
Science Ltd., Vol. 40, Issue 3, Oxford UK (2003) 255-269
4. Levine, R., Drang, D., Edelson, B.: Inteligência Artificial e Sistemas Especialistas.
McGraw-Hill, São Paulo Brazil (1988)
5. Giraffa, L. M. M.: Uma Arquitetura de Tutor Utilizando Estados Mentais. Doctorate Thesis
in Computer Science. Instituto de Informática/UFRGS, Porto Alegre, Brazil (1999)
6. Self, J.: The Defining Characteristics of Intelligent Tutoring Systems Research: ITSs Care,
Precisely. International Journal of Artificial Intelligence in Education, Vol 10 (1999) 350-
364
4 Project PMBOK-CVA approved by CNPq in November/2003, in the call for the Program of
Research and Technological Development in Open Source Software.
Intelligent Learning Environment for Software
Engineering Processes
Abstract. The great number of software engineering processes and their deep
granularity constitute important obstacles for these to be taught properly.
Teachers generally train on what they master best and focus on the respect of
high-level representation formalisms. Consequently, it is up to the learner to go
in depth. An alternative for this situation is to build tools for learner to be
quickly qualified. Existing tools are generally “monoprocesses” and are devel-
opers oriented. In this article, we propose a “multiprocesses” intelligent learn-
ing environment that is opened to several software engineering processes. This
environment is facilitates the learning of processes compliant to SPEM (Soft-
ware Process Metamodel Engineering).
1 Introduction
The mastery of software engineering processes is more and more important for the
success of computer science projects. However, using a software development proc-
ess is not an obvious task. At least two levels of complexity are identifiable. The first
one is related to the problem to be solved, and the second one is related to the method
itself which presents large panel of solutions. With the numerous technological ori-
entations, these processes vary, merge or simply disappear with corresponding tools.
Another concern is the number of design approaches. Their great number has
pointed out the need for standardization. Based on this, a recommendation of the
Object Management Group (OMG) has defined a common description language that
resulted in the SPEM metamodel [1]. The mastery of their production strategy re-
quires a lot of knowledge and experience that learner’s memory cannot recall without
practice.
This work presents an approach of learning by doing or training through practice.
We suggest an open tool facilitating the knowledge acquisition on several processes.
To this effect, we have developed a set of intelligent agents that guide the learner
through the life cycle of a project and particularly during the production of artifacts.
We focus on the stability and the consistency of productions by an approach of verifi-
cation linked to the constraints of the process in use. This research is conducted
within the framework of the ICAD-IS [2] project.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, pp. 898–900, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Intelligent Learning Environment for Software Engineering Processes 899
2 System Modeling
Building training tools is not a novelty. The literature review shows that efforts were
also oriented toward automated solutions aiming to help developers. It should be
noticed that the existing tools are as numerous as processes. Every editor comes with
his approach and a tool that teaches it. Therefore, to master many processes, one
should acquire each of the corresponding tools. These limits led to numerous initia-
tives for independence of software engineering teaching environments.
Jocelyn Armeno proposes an online training system that allows students to exer-
cise and evaluate their experience in the acquirement of a given domain knowledge
[3]. The SETLAS (Software Engineering Teaching and Learning System) and
Webworlds environments are as much experience that permitted to improve perform-
ances and motivation of learners [4]. However, there are no generic tools to teach
knowledge on several existing processes and teaching.
Our system model has been built with consideration to the ontology of the process
and the rules on artifacts. The figure 1 shows the system architecture. Our ontologies
are centered on the SPEM concepts and artifacts of the used process. As state by the
model that we have built, processes specify realization guides of different activities
and artifacts. They will also identify the check points for verifications on artifacts.
Depending on the process, the validity of artifacts should respect this implemented
rules.
The architecture of our environment is built on four components: the multiagent
system (SMA), the training interface (TI), the learner profile (LP), the knowledge and
rule base (KRB). The multiagent system is made of six agents interacting through a
blackboard. They use data from the knowledge base to assess the rules to be applied
to a project. The training interface interacts with the agents of the system. All activi-
ties concerning the learner are sent to or capture from this interface that unify all the
element of the system. The learner profile records the learning activities of the stu-
900 R. Yatchou, R. Nkambou, and C. Tangha
dent. It contains all management elements of concerning the learner and is used by
the Tutor-Agent. The knowledge base includes: ontologies of tasks (tasks and links
between tasks), Ontologies of the domain (concepts and links between concepts) of
the process. It constitutes the knowledge of agent associate to Workflow, Activity and
Role.
References
1. OMG, Software Process Engineering Metamodel Specification, Spécification de 1’Object
Management Group (OMG), (2002)
2. Bevo, V., Nkambou, R., Donfack, H.,: Toward A Tool for Software Development Knowl-
edge Capitalization. In Proceedings of the 2nd international conference on information and
knowledge sharing. ACTA press, Anaheim, (2002) pp. 69-74
3. Ratcliffe, M., Thomas, L., Woodbury, J.: A Learning Environment for First Year Software
Engineers. p. 268-275, 14th Conference on Software Engineering Education and Training,
February 19 - 21, 2001, Charlotte, North Carolina
4. Armarego, J., Fowler, L., Geoffrey, G.,: Constructing Software Engineering Knowledge:
Development of an Online Learning Environment. P 258-267, 14th Conference on Software
Engineering Education and Training, February 19-21, 2001, Charlotte, North Carolina
Opportunities for Model-Based Learning Systems in the
Human Exploration of Space
Bill Clancey
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, p. 901, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Toward Comprehensive Student Models: Modeling
Meta-cognitive Skills and Affective States in ITS
Cristina Conati
University of British Columbia, Canada
[email protected]
Student modeling has played a key role in the success of ITS by allowing computer-
based tutors to dynamically adapt to a student’s knowledge and problem solving
behaviour. In this talk, I will discuss how the scope and effectiveness of ITS can be
further increased by extending the range of features captured in a student model to
include domain independent, meta-cognitive skills and affective states. In particular, I
will illustrate how we are applying this research to improve the effectiveness of
exploratory learning environments and educational games designed to support open
ended, student-led pedagogical interactions.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, p. 902, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Having a Genuine Impact on Teaching and Learning –
Today and Tomorrow
Education – especially for those in the primary and secondary grades – is in desperate
need of an upgrade. Children are bored in class; teachers still use 19th century
materials. And, as for the content, well, we still teach children about Mendel’s darn
peas. We are failing to prepare our children to be productive and effective in the 21st
century.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, p. 903, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Interactively Building a Knowledge Base for a Virtual
Tutor
Liane Tarouco
And will report the lessons learned from several experiments that we have
performed on the process of build the knowledge base for a virtual tutor to help
remote students and network operators to learn on networking and network
management area. It will describe how contextualization trough access to cases,
animations and network management tools is implemented allowing the tutor to
become more than a FAQ robot that uses only static data to answer.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, p. 904, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Ontological Engineering and ITS Research
Riichiro Mizoguchi
Ontology has attracted much attention recently. Semantic Web (SW) is accelerating it
futher. As far as the author is concerned, however, ontology as well as ontological
engineering is not well-understood by people. There exist two types of ontology: One
is computer-understandable vocabulary for SW and the other is something related to
deep conceptual structure closer to philosophical ontology. In this talk, I would like to
explain the essentials of ontological engineering laying much stress on the latter type
ontology.
The talk will be composed of two parts. The first part is rather introductory and
includes: (1) how ontological engineering is different from knowledge engineering,
(2) what is ontology and what is not, (3) what benefits it brings to ITS research, (4)
state of the art of ontology development, etc. The second part is an advanced course
and includes (1) what is an ontology-aware system, (2) knowledge systematization by
ontological engineering, (3) a successful deployment of ontological framework of
functional knowledge of artifacts, etc. To conclude the talk, I will envision the future
of ontological engineering in ITS research.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, p. 905, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Agents Serving Human Learning
Stefano A. Cerri
The nature of the Intelligent Tutoring System research has evolved during the years to
become one of the major conceptual as well as practical source of innovation for the
wider area of Human Learning support by advancements in Informatics. In the invited
paper we present our view on the synergic support of Informatics research and the
Human Learning context – in the tradition started by Alan Kay with Smalltalk and the
Dynabook more that 30 years ago – down to the most concrete plans and results
around Agents, the GRID and Human learning as a result of Human and Artificial
Agents conversations. The paper will be organised around three questions: what?,
why?, how?.
What: Current research priorities in the domain: the historical shift from a product
oriented view of the Web to a service oriented view of the Semantic Grid, with its
potential implications for Agent’s research and Intelligent Tutoring, and its
consequent methodological proposal for a different life cycle in service research
embodied in the concept of Scenarios for Service Elicitation, Exploitation and
Evaluation (SEES).
Why: Current motivation for research on service oriented models, experiments, tools,
applications and finally theories: the emergent impressive growth of the demand in
technologies supporting Human Learning as well as human ubiquitous bidirectional
access to Information and collaboration among Virtual Communities, with examples
ranging from empowerment of human Communities for their durable development
(the Virtual Institute for Alphabetisation for Development), to communities of top
scientists remotely collaborating for an Encyclopedia of Organic Chemistry, to
Continuing Education and dynamic qualification of learning services as well as their
concrete effects on human learners - the three being ongoing subprojects of ELEGI, a
long term EU project recently started - finally to the necessary integration scenario of
digital Information and biological Information supporting human collaboration and
Learning in the two most promising areas of competence for the years to come.
How: Our research approach and results for the integration of the above describes
themes, consisting of a model – STROBE – , a set of prototypical experiments, an
emerging architecture for the integration of components of the solution and the
expected results both within ELEGI and independently from it.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, p. 906, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Panels
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, p. 907, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Workshop on Modeling Human Teaching Tactics and
Strategies
The purpose of this workshop is to explore the issues concerned with capturing
human teaching tactics and strategies as well as attempts to model and evaluate those
and other tactics and strategies in Intelligent Tutoring Systems (ITSs) and Intelligent
Learning Environments (ILEs). The former topic covers studies both of expert as well
as “ordinary” teachers. The latter includes issues of modeling motivation, timing,
conversation, learning as well as simply knowledge traversal.
One of the promises of ITSs and ILEs is that they will teach and assist learning in
an intelligent manner. While ITSs have historically concentrated on issues of
representing the domain knowledge and skills to be learned and modeling the
student´s knowledge in order to guide instructional actions, addressing a more
teacher-centred view of AI in Education, ILEs have explored a more learner-centered
perspective in which the system plays a facilitatory role providing appropriate
situations and conditions that can lead the learners to experience their own knowledge
construction processes. One of the aims of this workshop is to explore the
implications of this change in perspective to the issue of modeling human teaching
tactics and strategies.
The issue of teaching expertise has been central to AI in Education since the start.
What the system should say or do, when to say or do it, how best to present its action
or express its comment have always been questions at the heart of the enterprise. Note
that this is intended to be a broad notion of teaching that includes issues of help
provision, choice of activity, provision of support and feedback, introduction and
fading of scaffolding, taking charge or relinquishing control to the learner(s) and so
on.
The workshop’s theme is modeling teaching tactics and strategies addressing the
following issues:
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, p. 908, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Workshop on Analyzing Student-Tutor Interaction Logs
to Improve Educational Outcomes
The goal of this workshop is to better understand how and what we can learn from
data recorded when students interact with educational software. Several researchers
have been working in these areas, largely independent of what others are doing. The
time is ripe to exchange information about what we’ve learned.
1. Learn about existing techniques and tools for storing and analyzing data.
Although there are many efforts in the ITS community to record and analyze tutorial
logs, there is little agreement on good approaches for storing and analyzing such data.
Our goal is to create a list of “best practices” that others in the community can use,
and to create a list of existing software that is helpful for analyzing such data.
2. Discover new possibilities in what we can learn from log files. Currently,
researchers are frequently faced with a large quantity of data but are uncertain about
what they can learn. Looking at the data in the proper way can uncover a variety of
information ranging from student motivation to the efficacy of tutorial actions.
4. Create sharable resources. Currently the only way to test a theory about how
students interact with educational software, or a theory about how to model such data,
is to construct the software, gather a large number of students, and collect their
interaction data.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, p. 909, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Workshop on Grid Learning Services
The historical domain of ITS is currently confronted with a double challenge. On the
one side the availability of Internet worldwide and the globalisation have
tremendously amplified the demand for distance learning (tutoring, training,
bidirectional access to Information, ubiquitous and lifelong education, learning as a
side effect of interaction). On the other, technologies evolve with an unprecedented
speed as well as their corresponding computational theories, models, tools,
applications. One of the most important current evolution in networking is
represented by GRID computing. Not only the concept promises the availability of
important computing resources to be significantly enhanced by GRID services, but
identifies an even more crucial roadmap for fundamental research in Computing
around the notion of Semantic GRID services, as opposed/complementary to the
traditional one of Web accessible products and, more recently, Web services. We do
not discuss here the two alternative viewpoints; just anticipate their co-existence, the
scientific debate about them and the choice in this workshop of the approach Grid
Service.
Assuming a service view for e-Learning, the adaptation to the service user – be it a
human, a community of humans or a set of artificial Agents operating on the GRID –
entails the dynamic construction of models of the service user by the service provider.
Services need to be adapted to users, thus they have to compose their configuration
according to their understanding of the user. When the user is a learner – as it is the
case in e-Learning – the corresponding formal model has to learn during its life cycle.
Machine learning meets necessarily human learning in the sense that it becomes a
necessary precondition for composing adaptive services for human needs.
The workshop addresses the issues of integrating human and machine learning into
models, systems, applications and abstracting them into theories for advanced Human
learning, based on the dynamic generation of GRID services.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, p. 910, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Workshop on Distance Learning Environments for
Digital Graphic Representation
New technologies opens up new directions in architecture and related areas, not
only in terms of the kinds of objects that they produce, but in redefining the role of
architects and designers in society. Recently, cyberspace, or the virtual world, a
global networked environment supported by Information and Communication
Technologies (TIC) has become a field of study and work for Architects and
Designers as an excellent approach to build virtual environments and to use it for
educational purposes.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, p. 911, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Workshop on Applications of Semantic Web
Technologies for E-learning
SW-EL’04 will focus on issues related to using concepts, ontologies and semantic
web technologies to build e-learning applications. It follows the successful workshop
on Concepts and Ontologies in Web-based Educational Systems, held in conjunctions
with ICCE’2002 in Auckland, New Zealand. Due to the great interest, the 2004
edition of the workshop will be organized in three sessions held at three different
conferences. The aim is to discuss the current problems in e-learning from different
perspectives, including those of web-based intelligent tutoring systems and adaptive
hypermedia courseware, and the implications of applying semantic web standards and
technologies for solving them.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, p. 912, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Workshop on Social and Emotional Intelligence in
Learning Environments
It has been long recognised in education that teaching and learning is a highly social
and emotional activity. Students’ cognitive progress depends on their psychological
predispositions such as their interest, confidence, sense of progress and achievement
as well as on social interactions with their teachers and peers who provide them (or
not) with both cognitive and emotional support. Until recently the ability to recognise
students’ socio-affective needs constituted exclusively the realm of human tutors’
social competence. However, in recent years and with the development of more
sophisticated computer-aided learning environments, the need for those environments
to take into account the student’s affective states and traits and to place them within
the context of the social activity of learning has become an important issue in the
domain of building intelligent and effective learning environments. More recently, the
notion of emotional intelligence has attracted increasing attention as one of tutors’
pre-requisites for improving students’ learning.
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, p. 913, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Workshop on Dialog-Based Intelligent Tutoring Systems:
State of the Art and New Research Directions
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, p. 914, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Workshop on Designing Computational Models of
Collaborative Learning Interaction
J.C. Lester et al. (Eds.): ITS 2004, LNCS 3220, p. 915, 2004.
© Springer-Verlag Berlin Heidelberg 2004
This page intentionally left blank
Author Index
Vol. 3220: J.C. Lester, R.M. Vicari, F. Paraguaçu (Eds.), Vol. 3153: J. Fiala, V. Koubek, J. Kratochvíl (Eds.), Math-
Intelligent Tutoring Systems. XXI, 920 pages. 2004. ematical Foundations of Computer Science 2004. XIV,
Vol. 3207: L.T. Jang, M. Guo, G.R. Gao, N.K. Jha, Embed- 902 pages. 2004.
ded and Ubiquitous Computing. XX, 1116 pages. 2004. Vol. 3152: M. Franklin (Ed.), Advances in Cryptology –
Vol. 3205: N. Davies, E. Mynatt, I. Siio (Eds.), UbiComp CRYPTO 2004. XI, 579 pages. 2004.
2004: Ubiquitous Computing. XVI, 452 pages. 2004. Vol. 3150: G.-Z. Yang, T. Jiang (Eds.), Medical Imaging
Vol. 3198: G.-J. de Vreede, L.A. Guerrero, G. Marín and Augmented Reality. XII, 378 pages. 2004.
Raventós (Eds.), Groupware: Design, Implementation and Vol. 3149: M. Danelutto, M. Vanneschi, D. Laforenza
Use. XI, 378 pages. 2004. (Eds.), Euro-Par 2004 Parallel Processing. XXXIV, 1081
Vol. 3194: R. Camacho, R. King, A. Srinivasan (Eds.), In- pages. 2004.
ductive Logic Programming. XI, 361 pages. 2004. (Sub- Vol. 3148: R. Giacobazzi (Ed.), Static Analysis. XI, 393
series LNAI). pages. 2004.
Vol. 3186: Z. Bellahsène, T. Milo, M. Rys, D. Suciu, R. Vol. 3146: P. Érdi, A. Esposito, M. Marinaro, S. Scarpetta
Unland (Eds.), Database and XML Technologies. X, 235 (Eds.), Computational Neuroscience: Cortical Dynamics.
pages. 2004. XI, 161 pages. 2004.
Vol. 3184: S. Katsikas, J. Lopez, G. Pernul (Eds.), Trust Vol. 3144: M. Papatriantafilou, P. Hunel (Eds.), Principles
and Privacy in Digital Business. XI, 299 pages. 2004. of Distributed Systems. XI, 246 pages. 2004.
Vol. 3183: R. Traunmüller (Ed.), Electronic Government. Vol. 3143: W. Liu, Y. Shi, Q. Li (Eds.), Advances in Web-
XIX, 583 pages. 2004. Based Learning – ICWL 2004. XIV, 459 pages. 2004.
Vol. 3182: K. Bauknecht, M. Bichler, B. Pröll (Eds.), E- Vol. 3142: J. Diaz, J. Karhumäki, A. Lepistö, D. Sannella
Commerce and Web Technologies. XI, 370 pages. 2004. (Eds.), Automata, Languages and Programming. XIX,
Vol. 3178: W. Jonker, M. Petkovic (Eds.), Secure Data 1253 pages. 2004.
Management. VIII, 219 pages. 2004. Vol. 3140: N. Koch, P. Fraternali, M. Wirsing (Eds.), Web
Vol. 3177: Z.R. Yang, H. Yin, R. Everson (Eds.), Intelli- Engineering. XXI, 623 pages. 2004.
gent Data Engineering and Automated Learning – IDEAL Vol. 3139: F. Iida, R. Pfeifer, L. Steels, Y. Kuniyoshi (Eds.),
2004. XVIII, 852 pages. 2004. Embodied Artificial Intelligence. IX, 331 pages. 2004.
Vol. 3174: F. Yin, J. Wang, C. Guo (Eds.), Advances in (Subseries LNAI).
Neural Networks-ISNN 2004. XXXV, 1021 pages. 2004. Vol. 3138: A. Fred, T. Caelli, R.P.W. Duin, A. Campilho,
Vol. 3172: M. Dorigo, M. Birattari, C. Blum, L. D.d. Ridder (Eds.), Structural, Syntactic, and Statistical
M.Gambardella, F. Mondada, T. Stützle (Eds.), Ant Pattern Recognition. XXII, 1168 pages. 2004.
Colony, Optimization and Swarm Intelligence. XII, 434 Vol. 3137: P. DeBra, W. Nejdl (Eds.), Adaptive Hyperme-
pages. 2004. dia and Adaptive Web-Based Systems. XIV, 442 pages.
Vol. 3166: M. Rauterberg(Ed.), Entertainment Computing 2004.
– ICEC 2004. XXIII, 617 pages. 2004. Vol. 3136: F. Meziane, E. Métais (Eds.), Natural Language
Vol. 3158:I. Nikolaidis, M. Barbeau, E. Kranakis (Eds.), Processing and Information Systems. XII, 436 pages.
Ad-Hoc, Mobile, and Wireless Networks. IX, 344 pages. 2004.
2004. Vol. 3134: C. Zannier, H. Erdogmus, L. Lindstrom (Eds.),
Vol. 3157: C. Zhang, H. W. Guesgen, W.K. Yeap (Eds.), Extreme Programming and Agile Methods - XP/Agile
PPICAI 2004: Trends in Artificial Intelligence. XX, 1023 Universe 2004. XIV, 233 pages. 2004.
pages. 2004. (Subseries LNAI). Vol. 3133: A.D. Pimentel, S. Vassiliadis (Eds.), Computer
Vol. 3156: M. Joye, J.-J. Quisquater (Eds.), Cryptographic Systems: Architectures, Modeling, and Simulation. XIII,
Hardware and Embedded Systems -CHES 2004. XIII, 455 562 pages. 2004.
pages. 2004. Vol. 3132: B. Demoen, V. Lifschitz(Eds.),Logic Program-
Vol. 3155: P. Funk, P.A. González Calero (Eds.),Advances ming. XII, 480 pages. 2004.
in Case-Based Reasoning. XIII, 822 pages. 2004. (Sub- Vol. 3131: V. Torra, Y. Narukawa (Eds.), Modeling De-
series LNAI). cisions for Artificial Intelligence. XI, 327 pages. 2004.
Vol. 3154: R.L. Nord (Ed.), Software Product Lines. XIV, (Subseries LNAI).
334 pages. 2004.
Vol. 3130: A. Syropoulos, K. Berry, Y. Haralambous, B. Vol. 3104: R. Kralovic, O. Sykora (Eds.), Structural In-
Hughes, S. Peter, J. Plaice (Eds.), TeX, XML, and Digital formation and Communication Complexity. X, 303 pages.
Typography. VIII, 265 pages. 2004. 2004.
Vol. 3129: Q. Li, G. Wang, L. Feng (Eds.), Advances Vol. 3103: K. Deb, e. al. (Eds.), Genetic and Evolutionary
in Web-Age Information Management. XVII, 753 pages. Computation – GECCO 2004. XLIX, 1439 pages. 2004.
2004.
Vol. 3102: K. Deb, e. al. (Eds.), Genetic and Evolutionary
Vol. 3128: D. Asonov (Ed.), Querying Databases Privately. Computation – GECCO 2004. L, 1445 pages. 2004.
IX, 115 pages. 2004.
Vol. 3101: M. Masoodian, S. Jones, B. Rogers (Eds.),
Vol. 3127: K.E. Wolff, H.D. Pfeiffer, H.S. Delugach(Eds.), Computer Human Interaction. XIV, 694 pages. 2004.
Conceptual Structures at Work. XI, 403 pages. 2004. (Sub- Vol. 3100: J.F. Peters, A. Skowron,
series LNAI).
B. Kostek, M.S. Szczuka (Eds.), Trans-
Vol. 3126: P Dini, P. Lorenz, J.N.d. Souza (Eds.), Service actions on Rough Sets I. X, 405 pages. 2004.
Assurance with Partial and Intermittent Resources. XI,
312 pages. 2004. Vol. 3099: J. Cortadella, W. Reisig (Eds.), Applications
and Theory of Petri Nets 2004. XI, 505 pages. 2004.
Vol. 3125: D. Kozen (Ed.), Mathematics of Program Con-
struction. X, 401 pages. 2004. Vol. 3098: J. Desel, W. Reisig, G. Rozenberg (Eds.), Lec-
tures on Concurrency and Petri Nets. VIII, 849 pages.
Vol. 3124: J.N. de Souza, P. Dini, P. Lorenz (Eds.), 2004.
Telecommunications and Networking - ICT 2004. XXVI,
1390 pages. 2004. Vol. 3097: D. Basin, M. Rusinowitch (Eds.), Automated
Reasoning. XII, 493 pages. 2004. (Subseries LNAI).
Vol. 3123: A. Belz, R. Evans, P. Piwek (Eds.), Natural Lan-
guage Generation. X, 219 pages. 2004. (Subseries LNAI). Vol. 3096: G. Melnik, H. Holz (Eds.), Advances in Learn-
ing Software Organizations. X, 173 pages. 2004.
Vol. 3122: K. Jansen, S. Khanna, J.D.P. Rolim, D. Ron
(Eds.), Approximation, Randomization, and Combinato- Vol. 3095: C. Bussler, D. Fensel, M.E. Orlowska, J. Yang
rial Optimization. IX, 428 pages. 2004. (Eds.), Web Services, E-Business, and the Semantic Web.
X, 147 pages. 2004.
Vol. 3121: S. Nikoletseas, J.D.P. Rolim (Eds.), Algorith-
mic Aspects of Wireless Sensor Networks. X, 201 pages. Vol. 3094: A. Nürnberger, M. Detyniecki (Eds.), Adaptive
2004. Multimedia Retrieval. VIII, 229 pages. 2004.
Vol. 3120: J. Shawe-Taylor, Y. Singer (Eds.), Learning Vol. 3093: S. Katsikas, S. Gritzalis, J. Lopez (Eds.), Public
Theory. X, 648 pages. 2004. (Subseries LNAI). Key Infrastructure. XIII, 380 pages. 2004.
Vol. 3118: K. Miesenberger, J. Klaus, W. Zagler, D. Burger Vol. 3092: J. Eckstein, H. Baumeister (Eds.), Extreme Pro-
(Eds.), Computer Helping People with Special Needs. gramming and Agile Processes in Software Engineering.
XXIII, 1191 pages. 2004. XVI, 358 pages. 2004.
Vol. 3116: C. Rattray, S. Maharaj, C. Shankland (Eds.),Al- Vol. 3091: V. van Oostrom (Ed.), Rewriting Techniques
gebraic Methodology and Software Technology. XI, 569 and Applications. X, 313 pages. 2004.
pages. 2004. Vol. 3089: M. Jakobsson, M. Yung, J. Zhou (Eds.), Applied
Cryptography and Network Security. XIV, 510 pages.
Vol. 3114: R. Alur, D.A. Peled (Eds.), Computer Aided 2004.
Verification. XII, 536 pages. 2004.
Vol. 3087: D. Maltoni, A.K. Jain (Eds.), Biometric Au-
Vol. 3113: J. Karhumäki, H. Maurer, G. Paun, G. Rozen- thentication. XIII, 343 pages. 2004.
berg (Eds.), Theory Is Forever. X, 283 pages. 2004.
Vol. 3086: M. Odersky (Ed.), ECOOP 2004 – Object-
Vol. 3112: H. Williams, L. MacKinnon (Eds.), Key Tech- Oriented Programming. XIII, 611 pages. 2004.
nologies for Data Management. XII, 265 pages. 2004.
Vol. 3085: S. Berardi, M. Coppo, F. Damiani (Eds.), Types
Vol. 3111: T. Hagerup, J. Katajainen (Eds.), Algorithm for Proofs and Programs. X, 409 pages. 2004.
Theory - SWAT 2004. XI, 506 pages. 2004.
Vol. 3084: A. Persson, J. Stirna (Eds.), Advanced Infor-
Vol. 3110: A. Juels (Ed.), Financial Cryptography. XI, 281 mation Systems Engineering. XIV, 596 pages. 2004.
pages. 2004.
Vol. 3083: W. Emmerich, A.L. Wolf (Eds.), Component
Vol. 3109: S.C. Sahinalp, S. Muthukrishnan, U. Dogrusoz Deployment. X, 249 pages. 2004.
(Eds.), Combinatorial Pattern Matching. XII, 486 pages.
2004. Vol. 3080: J. Desel, B. Pernici, M. Weske (Eds.), Business
Process Management. X, 307 pages. 2004.
Vol. 3108: H. Wang, J. Pieprzyk, V. Varadharajan (Eds.),
Information Security and Privacy. XII, 494 pages. 2004. Vol. 3079: Z. Mammeri, P. Lorenz (Eds.), High Speed
Networks and Multimedia Communications. XVIII, 1103
Vol. 3107: J. Bosch, C. Krueger (Eds.), Software Reuse: pages. 2004.
Methods, Techniques and Tools. XI, 339 pages. 2004.
Vol. 3078: S. Cotin, D.N. Metaxas (Eds.), Medical Simu-
Vol. 3106: K.-Y. Chwa, J.I. Munro (Eds.), Computing and lation. XVI, 296 pages. 2004.
Combinatorics. XIII, 474 pages. 2004.
Vol. 3077: F. Roli, J. Kittler, T. Windeatt (Eds.), Multiple
Vol. 3105: S. Göbel, U. Spierling, A. Hoffmann, I. Iurgel, Classifier Systems. XII, 386 pages. 2004.
O. Schneider, J. Dechau, A. Feix (Eds.), Technologies for
Interactive Digital Storytelling and Entertainment. XVI, Vol. 3076: D. Buell (Ed.), Algorithmic Number Theory.
304 pages. 2004. XI, 451 pages. 2004.