Educational Data Mining A Review of State of Art

6, NOVEMBER 2010 601

Educational Data Mining: A Review

of the State of the Art
Cristóbal Romero, Member, IEEE, and Sebastián Ventura, Senior Member, IEEE

Abstract—Educational data mining (EDM) is an emerging inter- 1) Offline education try to transmit knowledge and skills
disciplinary research area that deals with the development of meth- based on face-to-face contact and also study psychologi-
ods to explore data originating in an educational context. EDM uses cally on how humans learn. Psychometrics and statistical
computational approaches to analyze educational data in order to
study educational questions. This paper surveys the most relevant techniques have been applied to data, like student’s be-
studies carried out in this field to date. First, it introduces EDM havior/performance, curriculum, etc., that was gathered in
and describes the different groups of user, types of educational en- classroom environments.
vironments, and the data they provide. It then goes on to list the 2) E-learning and learning management system (LMS).
most typical/common tasks in the educational environment that E-learning provides online instruction, and LMS also pro-
have been resolved through data-mining techniques, and finally,
some of the most promising future lines of research are discussed. vides communication, collaboration, administration, and
reporting tools. Web mining (WM) techniques have been
Index Terms—Data mining (DM), educational data mining applied to student’s data stored by these systems in log
(EDM), educational systems, knowledge discovery.
files and databases.
3) Intelligent tutoring system (ITS) and adaptive educational
hypermedia system (AEHS) are an alternative to the just-
I. INTRODUCTION put-it-on-the-web approach, trying to adapt teaching to
DUCATIONAL data mining (EDM) is a field that exploits the needs of each particular student. DM has been applied
E statistical, machine-learning, and data-mining (DM) algo-
rithms over the different types of educational data. Its main
to data picked up by these systems, such as log files, user
models, etc.
objective is to analyze these types of data in order to resolve The EDM process converts raw data coming from educa-
educational research issues [27]. EDM is concerned with de- tional systems into useful information that could potentially
veloping methods to explore the unique types of data in edu- have a great impact on educational research and practice. This
cational settings and, using these methods, to better understand process does not differ much from other application areas of
students and the settings in which they learn [21]. On one hand, DM, like business, genetics, medicine, etc., because it follows
the increase in both instrumental educational software as well the same steps as the general DM process [219]: preprocessing,
as state databases of student’s information have created large DM, and postprocessing. However, it is important to note that
repositories of data reflecting how students learn [143]. On the in this paper, the term DM is used in a larger sense than the
other hand, the use of Internet in education has created a new original/traditional DM definition, i.e., we are going to describe
context known as e-learning or web-based education in which not only EDM studies that use typical DM techniques, such
large amounts of information about teaching–learning interac- as classification, clustering, association-rule mining, sequential
tion are endlessly generated and ubiquitously available [60]. All mining, text mining, etc., but also describe other approaches,
this information provides a gold mine of educational data [186]. such as regression, correlation, visualization, etc., which are
EDM seeks to use these data repositories to better understand not considered to be DM in a strict sense.Furthermore, some
learners and learning, and to develop computational approaches methodological innovations and trends in EDM, such as discov-
that combine data and theory to transform practice to benefit ery with models and the integration of psychometric modeling
learners. EDM has emerged as a research area in recent years frameworks, are unusual DM categories or are not necessarily
for researchers all over the world from different and related seen universally as being DM [20].
research areas, which are as follows. From a practical point of view, EDM allows, for example, to
discover new knowledge based on students’ usage data in order
to help to validate/evaluate educational systems, to potentially
improve some aspects of the quality of education, and to lay the
some important issues that differentiate the application of DM, TABLE I

specifically to education, from how it is applied in other domains
1) Objective: The objective of DM in each application area is
different. For example, in EDM, there are both applied re-
search objectives, such as improving the learning process
and guiding students’ learning, as well as pure research
objectives, such as achieving a deeper understanding of
educational phenomena. These goals are sometimes diffi-
cult to quantify and require their own special set of mea-
surement techniques.
2) Data: In educational environments, there are many dif-
ferent types of data available for mining. These data are
specific to the educational area, and therefore have intrin-
sic semantic information, relationships with other data,
and multiple levels of meaningful hierarchy. Some ex-
amples are the domain model, used in ITS and AEHS,
which represents the relationships among the concepts of
a specific subject in a graph or hierarchy format (e.g., a
course consists of several chapters that are organized in
lessons and each lesson includes several concepts); and the
Q-matrix that shows relationships between items/
questions of a test/quiz system and the concepts evalu-
ated by the test.Furthermore, it is also necessary to take
pedagogical aspects of the learner and the system into
3) Techniques: Educational data and problems have some
special characteristics that require the issue of mining to be
treated in a different way. Although most of the traditional
DM techniques can be applied directly, others cannot and
have to be adapted to the specific educational problem at
hand. Furthermore, specific DM techniques can be used
for specific educational problems.
EDM involves different groups of users or participants. Dif-
ferent groups look at educational information from different
angles, according to their own mission, vision, and objectives
for using DM [104]. For example, knowledge discovered by On the other hand, the International Working Group in EDM
EDM algorithms can be used not only to help teachers to man- (http://www.educationaldatamining.org) has achieved the estab-
age their classes, understand their students’ learning processes, lishment of an annual International Conference on EDM in
and reflect on their own teaching methods, but also to support 2008, EDM’08 [19], EDM’09 [27], and EDM’10 [22]. This
a learner’s reflections on the situation and provide feedback to conference has evolved from previous EDM workshops at the
learners [177]. Although an initial consideration seems to in- AIED’07 [112], the EC-TEL’07 [222], the ICALT’07 [35], the
volve only two main groups, the learners and the instructors, UM’07 [17], the AAII’06 [34], the ITS’06 [111], the AAAI’05
there are actually more groups involved with many more objec- [33], the AIED’05 [62], the ITS’04 [32], and the ITS’00 [30]
tives, as can be seen in Table I. conferences.
Nowadays, there is a great variety of educational systems/ The number of publications about EDM has grown expo-
environments such as: the traditional classroom, e-learning, nentially in the past few years (see Fig. 1). A clear sign of
LMS, adaptive hypermedia (AH) educational systems, ITS, this tendency is the appearance of the peer-reviewed Journal
tests/quizzes, texts/contents, and others such as: learning ob- of Educational Data Mining (JEDM) and two specific books
ject (LO) repositories, concept maps, social networks, forums, on EDM edited by Romero and Ventura entitled: Data Mining
educational game environments, virtual environments, ubiqui- in E-learning [220] and The Handbook of Educational Data
tous computing environments, etc. All data provided by each Mining [228] co-edited by Baker and Pechenizkiy. There were
of the aforementioned educational environments are different, also two surveys carried out previously about EDM. The first
thus enabling different problems and tasks to be resolved using one [221] is a former review of Romero and Ventura with 81
DM techniques (see Section II). Table II shows a list of the most references until 2005 in which papers were classified by the DM
important studies on EDM grouped according to the type of techniques used. In fact, this survey is an improved, updated, and
data/environment involved. much extended version of this previous one with 306 references

Fig. 1. Number of published papers until 2009 grouped according to the year.
Note that we have counted only 300 papers in our reference section and not the
total number of papers that were really published about EDM.

Fig. 2. Number of published papers until 2009 grouped by task/category. Note

that we have counted only 300 papers in our reference section and not the total
number of papers actually published about EDM.

in which papers are classified by educational categories/tasks II. EDUCATIONAL TASKS AND DM TECHNIQUES
and the types of data used. It also shows some examples of new There are many applications or tasks in educational envi-
categories that have appeared since the 2005 survey, such as ronments that have been resolved through DM. For example,
social network analysis and constructing courseware. The other Baker [20], [21] suggests four key areas of application for
survey [20] is a recent review by Baker and Yacef with 46 ref- EDM: improving student models, improving domain models,
erences encompassing up to 2009. This survey uses mainly the studying the pedagogical support provided by learning soft-
top eight most cited papers in the first 2005 review and the Pro- ware, and scientific research into learning and learners; and five
ceedings of the EDM’08 and the EDM’09 conferences; it also approaches/methods: prediction, clustering, relationship min-
groups papers according to EDM methods and applications, as ing, distillation of data for human judgment, and discovery with
we describe in Section II. models. Castro et al. [60] suggests the following EDM sub-
Finally, it is important to highlight that most of the pioneer jects/tasks: applications dealing with the assessment of the stu-
and older research (from 1993 to 1999) deals with predicting dent’s learning performance, applications that provide course
student’s performance (see Task D in Section II). In fact, there is adaptation and learning recommendations based on the stu-
a huge body of studies on this topic in educational journals and dent’s learning behavior, approaches dealing with the evaluation
conferences; and although seminal works date back to decades of learning material and educational web-based courses, appli-
ago, new developments are highly relevant. cations that involve feedback to both teacher and students in
This survey is organized as follows. Section II lists the most e-learning courses, and developments for detection of atypical
common tasks in education that have been resolved by using students’ learning behaviors. However, as we think that there
DM techniques. Section III describes some of the most promi- are even more possible applications, we have established our
nent future research lines. Finally, conclusions are outlined in own categories (see Fig. 2) for the main educational tasks that
Section IV. have employed DM techniques. These categories come from

different research communities (as we have previously described riods [95]; number of visits and duration per quarter, top search
in Section I), and they also use different DM tasks and tech- terms, and number of downloads of e-learning resources [99];
niques. On one hand, we can see in Table II that the most number of different pages browsed and total time for browsing
active communities are e-learning/LMS and ITS/AEHS. On different pages [127]; usage summaries and reports on weekly
the other hand, we will see in the following sections that the and monthly user trends and activities [183]; session statistics
most commonly applied DM tasks are regression, clustering, and session patterns [199]; statistical indicators on the learner’s
classification, and association-rule mining; and the most used interactions in forums [5]; the amount of material students might
DM techniques/methods are decision trees, neural networks, go through and the order in which students study topics [212]; re-
and Bayesian networks. sources used by students and resources valued by students [241];
As we can see in Fig. 2, the categories or research lines that the overall averages of contributions to discussion forums, the
have the most papers published are the first eight ones (from A amount of posting versus replies, and the amount of learner-to-
to G with 23 or more references each), and the categories that learner interaction versus learner-to-teacher interaction [110];
have the fewest papers published are the last four (from H to K the time a student dedicates to the course or a particular part of
with less than 15 references). We think that this may be mainly it [199]; the learners’ behavior and time distribution and the dis-
due to the fact that the first eight categories are older than the tribution of network traffic over time [303]; and the frequency
last four (and so more authors have worked on these tasks), but of studying events, patterns of studying activity, timing and se-
it could also be because of the special interest in each one. For quencing of events, and the content analysis of students’ notes
example, although social network analysis is one of the newest and summaries [103]. Statistical analysis is also very useful to
tasks, it has more papers than the other three. We also want to obtain reports assessing [81] how many minutes the student has
point out that we have organized these categories by grouping worked, how many minutes he has worked today, how many
them near the most closely related ones, which in our opinion problems he has resolved, and his correct percentage, our pre-
are the following: since tasks A and B provide information to diction of his score, and his performance level.
instructors and C to the students; D, E, F, and G tasks reveal Information visualization uses graphic techniques to help
students’ characteristics; H and I study graphs and relationships people to understand and analyze data [172]. Visual represen-
between students and concepts, respectively; and J and K help tations and interaction techniques take advantage of the hu-
in creating/planning courseware and the course, respectively. man eye’s broad bandwidth pathway into the mind to allow
Next, we are going to describe in detail these tasks/categories users to see, explore, and understand large amounts of in-
and the most relevant studies. But, as there are closely related formation at once. There are several studies oriented toward
areas, some references could be located in a different category visualizing different educational data such as: patterns of an-
or in several. nual, seasonal, daily, and hourly user behavior on online fo-
rums [40]; the complete educational (assessment) process [205];
mean values of attributes analyzed in data to measure mathe-
A. Analysis and Visualization of Data matical skills [302]; tutor–student interaction data from an auto-
The objective of the analysis and visualization of data is to mated reading tutor [185]; statistical graphs about assignments
highlight useful information and support decision making. In complement, questions admitted, exam score, etc. [242]; stu-
the educational environment, for example, it can help educators dent tracking data regarding social, cognitive, and behavioral
and course administrators to analyze the students’ course activ- aspects of students [170]; student’s attendance, access to re-
ities and usage information to get a general view of a student’s sources, overview of discussions, and results on assignments
learning. Statistics and visualization information are the two and quizzes [171]; weekly information regarding students’ and
main techniques that have been most widely used for this task. groups’ activity [135]; student’s progression per question as
Statistics is a mathematical science concerning the collec- a transition between the types of questions [38]; fingertip ac-
tion, analysis, interpretation or explanation, and presentation of tions in collaborative learning activities [11]; deficiencies in a
data [86]. It is relatively easy to get basic descriptive statis- student’s basic understanding of individual concepts [286] and
tics from statistical software, such as SPSS. Used with educa- higher education student-evaluation data [131]; student’s inter-
tional data, this descriptive analysis can provide such global actions with online learning environments [132]; the students’
data characteristics as summaries and reports about learner’s online exercise work, including students’ interactions and an-
behavior [282]. It is not surprising that teachers prefer pedagog- swers, mistakes, teachers’ comments, etc. [176]; questions and
ically oriented statistics (overall success rate, mastery levels, suggestions in an adaptive tutorial [39]; navigational behavior
typical misconceptions, percentage of exercises tackled, and and the performance of the learner [37]; educational trails of
material read) that are easy to interpret [301]. On the other Web pages visited and activities done [225]; and the sequence
hand, teachers find the fine-grained statistics in log data too of LOs and educational trails [238].
cumbersome to inspect or too time-consuming to interpret. Sta-
tistical analysis of educational data (logs files/databases) can
tell us things such as: where students enter and exit, the most B. Providing Feedback for Supporting Instructors
popular pages, the browsers students tend to use, and patterns The objective is to provide feedback to support course au-
of use over time, [130]; the number of visits, origin of visitors, thors/teachers/administrators in decision making (about how
number of hits, and patterns of use throughout various time pe- to improve students’ learning, organize instructional resources

more efficiently, etc.) and enable them to take appropriate proac- structure and contents [268]; to find interesting relationships
tive and/or remedial action. It is important to point out that between attributes, solution strategies adopted by learners, etc.,
this task is different than data analyzing and visualizing tasks, from a web-based mobile learning system [299]; to help the
which only provide basic information directly from data (re- teacher to discover beneficial or detrimental relationships be-
ports, statistics, etc.). Moreover, providing feedback divulges tween the use of web-based educational resources and student’s
completely new, hidden, and interesting information found in learning [226]; to reveal information about university students’
data. Several DM techniques have been used in this task, al- enrollment [236]; to help organizations to determine the think-
though association-rule mining has been the most common. ing styles of learners and the effectiveness of a website struc-
Association-rule mining reveals interesting relationships among ture [101]; to evaluate educational website design [164]; and
variables in large databases and presents them in the form of to mine open answers in questionnaire data in order to analyze
strong rules, according to the different degrees of interest they surveys [283].
might present [296]. Other different DM techniques have been applied to provide
There are many studies that apply/compare several DM mod- feedback such as: domain-specific interactive DM to find the
els that provide feedback. Association rules, clustering, classi- relationships between log data and student’s behavior in an ed-
fication, sequential pattern analysis, dependency modeling, and ucational hypermedia system [123]; temporal DM to describe,
prediction have been used to enhance web-based learning envi- interpret, and predict student’s behavior, and to evaluate progress
ronments to improve the degree to which the educator can eval- in relation to learning outcomes in ITSs [29]; learning decompo-
uate the learning process [292]. Association analysis, clustering sition and logistic regression to compare the impact of different
analysis, and case-based reasoning have also been used to orga- educational interventions on learning [84]; timely alerts to de-
nize course material and assign homework at different levels of tect critical teaching and learning patterns and to help teachers to
difficulty [243]. Clustering, classification, and association-rule make sense of what is happening in their classrooms [246]; and
mining have been applied to develop a service to allow the evalu- usage data analysis to improve the effectiveness of the learning
ator to gather feedback from the learning progress automatically, process in e-learning systems [182].
and thus, appraise online course effectiveness [232]. Decision A special type of feedback is when data come specifically
trees, Bayesian models, and other prediction techniques have from tests, questions, assessments, etc. In this case, the ob-
been proposed to address the admission and counseling pro- jective is to analyze it in order to improve the questionnaires
cess in order to assist in improving the quality of education and and to answer questions such as: what items/questions test the
student’s performance [215]. Several classifier algorithms have same information, and which are of the most use for predicting
been applied to predict whether the teacher will recommend an course/test results, etc. Several DM approaches and techniques
intervention strategy for motivational profiles [124]. Clustering (clustering, classification, and association analysis) have been
and association rules have been used in the academic community proposed for joint use in the mining of student’s assessment
to potentially improve some qualitative teaching aspects [271]. data [204]. A group of DM techniques, i.e., statistic correla-
Association-rule mining has been used to confront the prob- tion analysis, fuzzy clustering analysis, grey relational analysis,
lem of continuous feedback in the educational process [208]; K-means clustering, and fuzzy-association-rule mining have
to analyze learning data and to figure out whether students use been applied to support mobile formative assessment in or-
resources and possibly whether their use has any (positive) im- der to help teachers to understand the main factors influenc-
pact on marks [178]; to determine the relationship between each ing learner’s performance [55].Several clustering algorithms
learning-behavior pattern so that the teacher can promote col- (K-means, agglomerative clustering, and spectral clustering)
laborative learning behavior on the Web [289]; to find embed- have been applied to extract underlying relationships from a
ded information, which can be provided to teachers to further score matrix in order to help instructors to generate a large
analyze, refine, or reorganize teaching materials and tests in unit test [248]. Hierarchical clustering has been used for min-
adaptive learning environments [260]; to optimize the content ing multiple-choice assessment data for similarity of the con-
of the university e-learning portal [214]; to discover interesting cepts represented by the responses [165]. Common-factor anal-
associations between student attributes, problem attributes, and ysis and collaborative filtering have been used to discover the
solution strategies in order to improve online education systems fundamental topics of a course from item-level grades [281].
for both teachers and students [181]; to analyze rule-evaluation Association-rule mining has been applied to analyze question-
measures in order to discover the most interesting rules [267]; naire data by discovering rule patterns in questionnaire data [54].
to identify interesting and unexpected learning patterns, which Finally, another special type of feedback involves the use of
in turn may provide decision lines, enabling teachers to more text data. In this case, the objective of applying text/DM to
efficiently organize their teaching structure [272]; to provide educational data is to analyze educational contents, to summa-
feedback to the course author about how to improve course- rize/analyze the learner’s discussion process, etc., in order to
ware [219]; to analyze the user’s access log in Moodle to im- provide instructor feedback. Automatic text analysis, content
prove e-e-learning and to support the analysis of trends [28]; analysis, and text mining have been used to extract and identify
to find relationships between students’ LMS access behavior the opinions found on Web pages in e-learning systems [247];
and overall performances in order to understand student’s web- to mine free-form spoken responses given to tutor prompts by
usage patterns [46]; to improve an adaptive course design in estimating the probability that a response has of mentioning a
order to show recommendations on how to enhance the course given target or set of targets [297]; to facilitate the automatic

coding process of an online discussion forum [158]; for collab- for designing a material recommendation system based on the
orative learning prompted by learners’ comments on discussion learning actions of previous learners [159].
boards [264]; to assess asynchronous discussion forums in order Clustering has been developed to establish a recommenda-
to evaluate the progress of a thread discussion [73]; and to iden- tion model for students in similar situations in the future [276];
tify patterns of interaction and their sequential organization in for grouping Web documents using clustering methods in or-
computer-supported collaborative environments like chats [44]. der to personalize e-learning based on maximal frequent item
sets [251]; for providing personalized course material recom-
mendations based on learner’s ability [161]; and to recommend
to students those resources they have not yet visited, but would
C. Recommendations for Students find most helpful [96].
The objective is to be able to make recommendations directly Other DM techniques used are: neural networks and
to the students with respect to their personalized activities, links decision trees to provide adaptive and personalized learning
to visits, the next task or problem to be done, etc., and also to support [100]; production rules to help students to make de-
be able to adapt learning contents, interfaces, and sequences to cisions about their academic itineraries [269]; decision tree
each particular student. Several DM techniques have been used analysis to recommend optimal learning sequences to facili-
for this task, but the most common are association-rule mining, tate the students’ learning process and maximize their learning
clustering, and sequential pattern mining. Sequence/sequential outcome [279]; learning factor transfers and Q-matrixes to gen-
pattern mining aims to discover the relationships between oc- erate domain models that will sequence item types to maximize
currences of sequential events to find if there exists any specific learning [203]; an item-order effect model to suggest the most
order in the occurrences [70]. effective item sequences to facilitate learning [202]; a fuzzy
Sequential pattern mining has been developed to personalize item-response theory to recommend appropriate courseware for
recommendations on learning content based on learning style learners [50]; intelligent agent technology and SCORM-based
and web-usage habits [298]; to study eye movements (of stu- course objects to build an agent-based recommender system for
dents’ reading concept maps) in order to detect when focal lesson plan sequencing in web-based learning [284]; DM and
actions overlap unrelated actions [192]; for developing person- text mining to recommend books related to the books that the
alized learning scenarios in which the learners are assisted by target pupil has consulted [189]; case-based reasoning to offer
the system based on patterns and preferred learning styles [23]; contextual help to learners, providing them with an adapted link
to identify significant sequences of activity indicative of prob- structure for the course [114]; Markov decision process to auto-
lems/success in order to assist student teams by early recog- matically generate adaptive hints in ITS (to identify the action
nition of problems [137]; to generate personalized activities that will lead to the next state with the highest value) [249];
for learners [277]; for personalizing based on itineraries and and an extended serial blog article composition particle swarm
long-term navigational behavior [184]; to recommend the most optimization (SBACPSO) algorithm to provide optimal recom-
appropriate future links for a student to visit in a web-based mended materials to users in blog-assisted learning [122].
adaptive educational system [227]; to include the concept of
recommended itinerary in Sharable Content Object Reference
Model (SCORM) standard by combining teachers’ expertise
with learned experience [184]; to select different LOs for differ- D. Predicting Student’s Performance
ent learners based on learner’s profiles and the internal relation The objective of prediction is to estimate the unknown value
of concepts [244]; for personalizing activity trees according to of a variable that describes the student. In education, the val-
learning portfolios in a SCORM compliant environment [277]; ues normally predicted are performance, knowledge, score, or
for recommending lessons (LOs or concepts) that a student mark. This value can be numerical/continuous value (regression
should study next while using an AH system [148]; to discover task) or categorical/discrete value (classification task). Regres-
LO relationship patterns to recommend related LOs to learn- sion analysis finds the relationship between a dependent vari-
ers [198]; and for adapting learning resource sequencing [136]. able and one or more independent variables [72]. Classification
Association-rule mining has been used to recommend online is a procedure in which individual items are placed into groups
learning activities or shortcuts on a course website [293]; to based on quantitative information regarding one or more char-
produce recommendations for learning material in e-learning acteristics inherent in the items and based on a training set of
systems [166]; for content recommendation based on educa- previously labeled items [75]. Prediction of a student’s perfor-
tionally contextualized browsing events for web-based person- mance is one of the oldest and most popular applications of DM
alized learning [274]; for recommending relevant discussions in education, and different techniques and models have been ap-
to the students [2]; to provide students with personalized learn- plied (neural networks, Bayesian networks, rule-based systems,
ing suggestions by analyzing their test results and test-related regression, and correlation analysis).
concepts [57]; for making recommendations to courseware au- A comparison of machine-learning methods has been carried
thors about how to improve adaptive courses [92]; for building out to predict success in a course (either passed or failed) in ITSs
a personalized e-learning material-recommender system to help [106]. Other comparisons of different DM algorithms are made
students to find learning materials [160]; for course recom- to classify students (predict final marks) based on Moodle usage
mendation with respect to optimal elective courses [253]; and data [224]; to predict student’s performance (final grade) based

on features extracted from logged data [180]; and to predict sion algorithm) [61]; for predicting end-of-year accountability
university students’ academic performance [128]. assessment scores (using linear regression) [7]; to predict a stu-
Different types of neural-network models have been used to dent’s test score (using stepwise regression) [79]; and to predict
predict final student grades (using back-propagation and feed- the probability that the student’s next response has of being
forward neural networks) [94]; to predict the number of er- correct (using linear regression) [31].
rors a student will make (using feedforward and backpropa- Finally, correlation analyses have been applied together to
gation) [280]; to predict performance from test scores (using predict web-student performance in online classes [275]; to pre-
backpropagation and counter propagation) [78]; to predict stu- dict a student’s final exam score in online tutoring [207]; and
dents’ marks (pass or fail) from Moodle logs (using radial basis for predicting high school students’ probabilities of success in
functions) [67]; and for predicting the likely performance of university [173].
a candidate being considered for admission into the university
(using multilayer perceptron topology) [196].
Bayesian networks have been used to predict student– E. Student Modeling
applicant performance [102]; to model user knowledge and The objective of student modeling is to develop cognitive
predict student’s performance within a tutoring system [200]; models of human users/students, including a modeling of their
to predict a future graduate’s cumulative grade point average skills and declarative knowledge. DM has been applied to auto-
based on applicant background at the time of admission [117]; matically consider user characteristics (motivation, satisfaction,
to model two different approaches to determine the probability learning styles, affective status, etc.) and learning behavior in
a multiskill question has of being corrected [201] and to predict order to automate the construction of student models [89]. Dif-
future group performance in face-to-face collaborative learn- ferent DM techniques and algorithms have been used for this
ing [250]; to predict end-of-year exam performance through task (mainly, Bayesian networks).
student’s activity with online tutors [12]; and to predict item Several DM algorithms (naı̈ve Bayes, Bayes net, support vec-
response outcome [69]. tor machines, logistic regression, and decision trees) have been
Different types of rule-based systems have been applied to compared to detect student mental models in ITSs [234]. Un-
predict student’s performance (mark prediction) in an e-learning supervised (clustering) and supervised (classification) machine
environment (using fuzzy-association rules) [191]; to predict learning have been proposed to reduce development costs in
learner’s performance based on the learning portfolios compiled building user models and to facilitate transferability in intelli-
(using key-formative assessment rules) [51]; for prediction, gent learning environments [4]. Clustering and classification of
monitoring, and evaluation of student’s academic performance learning variables have been used to measure the online learner’s
(using rule induction) [195]; to predict final grades based on motivation [115].
features extracted from logged data in an education web-based Bayesian networks have been used to make predictions about
system (using genetic algorithm to find association rules) [240]; student’s knowledge, i.e., the probability that student has of
to predict student’s grades in LMSs (using grammar-guided ge- knowing a skill at a given time through cognitive tutors [18]; to
netic programming) [291]; to predict student’s performance and detect students’ learning styles in a web-based education sys-
provide timely lessons in web-based e-learning systems (using tem [91]; to predict whether a student will answer a problem
decision tree) [45]; and to predict online students’ marks (using correctly [134]; to model a student’s changing state of knowl-
an orthogonal search-based rule extraction algorithm) [76]. edge during skill acquisition in ITS [47]; to infer unobservable
Several regression techniques have been used to predict stu- learning variables from students’ help-seeking behavior in a
dents’ marks in an open university (using model trees, neural web-based tutoring system [10]; and for knowledge tracing in
networks, linear regression, locally weighed linear regression, order to verify the impact of self-discipline on students’ knowl-
and support vector machines) [146]; for predicting end-of-year edge and learning [98].
accountability assessment scores (using linear regression pre- Sequential pattern mining has been used to automatically ac-
diction models) [7]; to predict student’s performance from log quire the knowledge to construct student models [9]; to identify
and test scores in web-based instruction (using a multivariable meaningful user characteristics and to update the user model
regression model) [288]; for predicting student’s academic per- to reflect newly gained knowledge [6]; and for predicting stu-
formance (using stepwise linear regression) [97]; for predicting dents’ intermediate mental steps in sequences of actions stored
time to be spent on a learning page (using multiple linear re- by—learning environments based on problem solving [218].
gression) [8]; for identifying variables that could predict success Association-rule algorithms have been applied for personality
in colleges courses (using multiple regression) [167]; for pre- mining based on web-based education models in order to de-
dicting university students’ satisfaction (using regression and duce learners’ personality characteristics [120] and for student
decision trees analysis) [258]; for predicting exam results in modeling in ITSs [168].
distance education courses (using linear regression) [188]; for Other DM techniques and models have also been used for
predicting when a student will get a question correct and asso- student modeling. A logistic regression model has been used
ciation rules to guide a search process to find transfer models to construct transfer models (to accurately predict the level at
to predict a student’s success (using logistic regression) [88]; which a student represents knowledge) [83]. A learning agent
to predict the probability a student has of giving the correct that models student behaviors using linear regression has been
answer to a problem in an ITS (using a robust ridge regres- constructed in order to predict the probability that the student’s

next response has of being correct [31]. Inductive logic program- of irregularities and deviations in the learners’ actions in an in-
ming and a profile extractor system (using numeric algorithms) teractive learning environment [187]; and the J48 decision tree
have been developed to induce student profiles in e-learning sys- algorithm and farthest-first clustering algorithm for predicting,
tems [155]. The Markov decision process has been proposed to understanding, and preventing academic failure (exam failure)
automatically create student models by generating hints for an IT among university students [42].
that learns [26]. Fuzzy techniques have used student models in Different types of clustering also used to carry out this task
web-based learning environments in order to generate advice for are: Kohonen nets to detect students that cheat in online as-
the teachers [144]. A dynamic learning-response model has been sessments [43]; outlier detection to uncover atypical student
developed for inferring, testing, and verifying student’s learn- behavior [265]; an outlier detection method using Bayesian pre-
ing models on an adaptive learning website [125]. Bootstrapping dictive distribution to detect learners’ irregular learning [263];
novice data can create an initial skeletal model of a tutor from a constrained mixture of student t-distribution and generative
log data collected from actual use of the tool by students [174]. topographic mapping to detect atypical student behavior (out-
A collaborative-based DM approach has been developed for di- liers) [59]; and an augmented version of the Levenshtein dis-
agnostic and predictive student modeling purposes in integrated tance algorithm to identify novice errors and error paths [265].
learning environments [151]. Multiple correspondence analysis Finally, other DM techniques and models used for this task
and cross validation by correlation analysis have been applied to are, for example, association-rule mining for selecting weak
identify learning styles in Index of Learning Styles (ILS) ques- students for remedial classes [163], to send warning messages
tionnaires [270]. The Q-matrix method has been used to create to students with unusual learning behavior in an AEHS [133],
concept models that represent relationships between concepts and to construct concept-effect relationships for diagnosing stu-
and questions, and to group student’s test question responses dent’s learning problems [126]; a latent response model to iden-
according to concepts [25]. An algorithm to estimate Dirichlet tify if students are playing with the system (to detect student
priors has been developed to produce model parameters that misuse) in a way that would lead to poor learning [15] and to
provide a more plausible picture of student’s knowledge [213]. automatically detect when a student is off-task in a cognitive
Self-organizing maps and principal component analysis have tutor [16]; Bayesian networks to predict the need for help in an
been applied for predictive and compositional modeling of the interactive learning environment [169]; stepwise regression to
student’s profile [150]. A clustering algorithm (K-means) has detect misplay and look for sources of error in the prediction of
been developed to model student’s behavior with a very small student’s test scores [79]; human reliability analysis to infer the
set of parameters without compromising the behavior of the underlying causes that lead to the production of trainee errors in
system [217]. a virtual environment [74]; and Markov chain analysis to iden-
tify and classify common student errors and technical problems
in order to prevent them from occurring in the future [109].
F. Detecting Undesirable Student Behaviors
The objective of detecting undesirable student behavior is to
discover/detect those students who have some type of problem G. Grouping Students
or unusual behavior such as: erroneous actions, low motiva- The objective is to create groups of students according to
tion, playing games, misuse, cheating, dropping out, academic their customized features, personal characteristics, etc. Then,
failure, etc. Several DM techniques (mainly, classification, and the clusters/groups of students obtained can be used by the
clustering) have been used to reveal these types of students in instructor/developer to build a personalized learning system,
order to provide them with appropriate help in plenty of time. to promote effective group learning, to provide adaptive con-
Several of the classification algorithms that have been used tents, etc. The DM techniques used in this task are classification
to detect problematic student’s behavior are decision tree neu- (supervised learning) and clustering (unsupervised learning).
ral networks, naı̈ve Bayes, instance-based learning, logistic re- Cluster analysis or clustering is the assignment of a set of ob-
gression, and support vector machines for predicting/preventing servations into subsets (called clusters) so that observations in
student drop out [145]; feed-forward neural networks, support the same cluster have some points in common [229].
vector machines, and a probabilistic ensemble simplified fuzzy Different clustering algorithms have been used to group stu-
ARTMAP algorithm to predict dropouts in e-learning courses dents such as: hierarchical agglomerative clustering, K-means
[156]; Bayesian nets, logistic regression, simple logic classifi- and model-based clustering to identify groups of students with
cation, instance-based classification, attribute-selected classifi- similar skill profiles [14]; a clustering algorithm based on large
cation, bagging, classification via regression, and decision trees generalized sequences to find groups of students with similar
for engagement prediction [64]; decision tree, Bayesian classi- learning characteristics based on their traversal path patterns
fiers, logistic models, the rule-based learner, and random forest and the content of each page they have visited [256]; model-
to detect/predict first-year student drop out [66]; paired t-test based clustering to automatically discover useful groups from
for grouping students by common misconceptions (hint-driven LMS data to obtain profiles of student’s behavior [254]; a hier-
learners and failure-driven learners) [287]; C4.5 decision tree archical clustering algorithm for user modeling (learning styles)
algorithm for detecting any potential symptoms of low per- in intelligent e-learning systems in order to group students ac-
formance in e-learning courses [41]; decision trees to identify cording to their individual learning style preferences [294]; dis-
students with little motivation [63]; decision trees for detection criminating features and external profiling features (pass/fail) to

support teachers in collaborative student modeling [90]; an im- vidual attributes or properties. A social network is considered
provement in the matrix-based clustering method for grouping to be a group of people, an organization or social individu-
learners by characteristics in e-learning [295]; a fuzzy cluster- als who are connected by social relationships like friendship,
ing algorithm to find interested groups of learners according to cooperative relations, or informative exchange [87]. Differ-
their personality and learning strategy data collected from an on- ent DM techniques have been used to mine social networks
line course [259]; a hybrid method of clustering and Bayesian in educational environments, but collaborative filtering is the
networks to group students according to their skills [105]; a most common. Collaborative filtering or social filtering is a
K-means clustering algorithm for effectively grouping students method of making automatic predictions (filtering) about the
who demonstrate similar learning portfolios (students’ assign- interests of a user by collecting taste preferences from many
ment scores, exam scores, and online learning records) [51]; users (collaborating) [113]. Collaborative filtering systems can
an expectation–maximization algorithm to form heterogeneous produce personal recommendations by computing the similarity
groups according to student’s skills [188]; a K-means cluster- between students’ preferences; therefore, this task is directly re-
ing algorithm to discover interesting patterns that characterize lated to the previous task of recommendations for students (see
the work of stronger and weaker students [209]; a conditional Section II-F).
subspace clustering algorithm to identify skills that differentiate Collaborative filtering has been used for context-aware LO
students [194]; a two-step cluster analysis to classify how stu- recommendation lists [154]; to make a recommendation for
dents organize personal information spaces (piling, one-folder, a learner about what he/she should learn before taking the
small-folders, and big-folder filing) [108]; hierarchical cluster next step [300]; for developing a personal recommender sys-
analysis to establish the proportion of students who get an ex- tem for learners in lifelong learning networks [71]; to build a
ercise wrong or right [24]; and a genetic clustering algorithm to resource recommendation system based on connecting to sim-
solve the problem of allocating new students (which places new ilar e-learning [285]; for recommending relevant links to the
students into classes so that the gaps between learning levels in active learner [147]; to develop an e-learning recommendation
each class is minimum and the number of students in each class service system [157]; and to find relevant content on the Web,
does not exceed the limit) [304]. personalizing and adapting this content to learners [257].
Several classification algorithms have been applied in order to There are some other DM techniques that have been applied to
group students such as: discriminant analysis, neural networks, analyze social networks. Mining interactive social networks has
random forests, and decision trees for classifying university been proposed for recommending appropriate learning partners
students into three groups (low-risk, medium-risk, and high-risk in a web-based cooperative learning environment [53]. Social
of failing) [252]; classification and regression tree, chi-squared navigation support and various machine-learning methods have
automatic interaction detection, and C4.5 algorithm for the been used in a course recommendation system in order to make
automatic identification of the students’ cognitive styles [153]; relevant course choices based on students’ assessment of course
a classification and regression tree to create a decision tree relevance for their career goals [77]. Social network analysis
model to illustrate a user’s learning behavior, in order to analyze techniques and mining data produced by students involved in
it according to different cognitive style groups [151]; a hidden- communication through forum-like tools have been suggested
Markov-model-based classification approach to characterize to help in revealing aspects of their communication [233]. DM
different types of users through their navigation or content ac- and social networks have been used to analyze the structure and
cess patterns [85]; decision trees for classifying students accord- content of educative online communities [216]. Social network
ing to their accumulated knowledge in e-learning systems [179]; analysis has been proposed to detect patterns of academic col-
C4.5 decision tree algorithm for discovering potential student laboration in order to aid decision makers in organizations to
groups with similar characteristics who will react to a particular take specific actions depending on the patterns [190]. Analy-
strategy [49]; naı̈ve Bayes classifier to classify learning styles sis of social communicative categories has been suggested to
that describe learning behavior and educational content [138]; distinguish between a variety of speech acts (informing belief,
genetic algorithms for grouping students according to their disagreeing with concepts, offering collaborative acts, and in-
profiles in a peer review content [65]; classification trees and sulting) [206]. Visualizing and clustering on discussion forum
multivariate adaptive regression to identify those students who graphs have been applied as social network analysis to measure
tend to take online courses and those who do not [290]; decision the cohesion of small groups in collaborative distance learn-
tree and support vector machine for assessing an activity by ing [231].
more than one lecturer using a pairwise learning model [210];
a classification algorithm for speech act patterns to assess
participants’ roles and identify discussion threads [141]; and K- I. Developing Concept Maps
nearest neighbor (K-NN) classification combined with genetic The objective of constructing concept maps is to help in-
algorithms to identify and classify student learning styles [48]. structors/educators in the automatic process of developing/
constructing concept maps. A concept map is a conceptual graph
that shows relationships between concepts and expresses the hi-
H. Social Network Analysis
erarchal structure of knowledge [193]. Some DM techniques
Social networks analysis (SNA), or structural analysis, aims (mainly, association rules, and text mining) have been used to
at studying relationships between individuals, instead of indi- construct concept maps.

Association-rule mining has been used to automatically con- and to help to predict interactive properties in the multimedia
struct concept maps guided by learners’ historical testing records presentations [13].
[262]; to discover concept-effect relationships for diagnosing
the learning problems of students [126]; and for conceptual di- K. Planning and Scheduling
agnosis of e-learning through automatically constructed concept
The objective of planning and scheduling is to enhance the tra-
maps that enable teachers to overcome the learning barrier and
ditional educational process by planning future courses, helping
misconceptions of learners [152].
with student course scheduling, planning resource allocation,
Text mining has been applied to automatically construct con-
helping in the admission and counseling processes, developing
cept maps from academic articles in the e-learning domain [52];
curriculum, etc. Different DM techniques have been used for
to formulate concept maps from online discussion boards us-
this task (mainly, association rules).
ing fuzzy ontology [149]; to find relationships between text
Classification, categorization, estimation, and visualization
documents and construct document index graphs [107]; and to
have been compared in higher education for different objectives,
explore cognitive concept-map differences in instructional out-
such as academic planning, predicting alumni pledges, and cre-
comes [119].
ating meaningful learning outcome typologies [162]. Decision
Finally, a specific concept-map algorithm has been created to
trees, link analysis, and decision forests have been used in course
automatically organize knowledge points and map them [243];
planning to analyze enrollees’ course preferences and course
a method of automatic concept relationship discovery for an
completion rates in extension education courses [118]. Classi-
adaptive e-course has been developed to help teachers to author
fication, prediction, association-rule analysis, clustering, etc.,
overall automation [245]; and a multiexpert e-training course
have been compared to discover new explicit knowledge that
design model has been developed by concept-map generation
could be useful in the decision-making process in higher learn-
in order to help the experts to organize their domain knowledge
ing institutions [68]. Educational training courses have been
planned through the use of cluster analysis, decision trees, and
back-propagation neural networks in order to find the correlation
J. Constructing Courseware between the course classifications of educational training [121].
Decision trees and Bayesian models have been proposed to help
The objective of constructing courseware is to help instruc-
management institutes to explore the probable effects of changes
tors and developers to carry out the construction/development
in recruitments, admissions, and courses [215].
process of courseware and learning contents automatically. On
Association-rule mining has been used to provide new, im-
the other hand, it also tries to promote the reuse/exchange of
portant, and therefore, demand-oriented impulses for the de-
existing learning resources among different users and systems.
velopment of new bachelor and master courses [237]. Curricu-
Different DM techniques and models have been used to de-
lum revision has been done by association-rule mining in order
velop courseware. The clustering of students and naı̈ve algo-
to identify and understand whether curriculum revisions can
rithms have been proposed to construct personalized course-
affect students in a university [36]. A decisional tool (based
ware by building a personalized Web tutor tree [255]. Rough set
on association-rule mining) has been constructed to help in
theory and clustering concept hierarchy have been used to con-
making decisions on how to improve the quality of the service
struct e-learning frequently asked questions (FAQ) retrieval in-
provided by the university based on students’ success and fail-
frastructures [56].Multilingual knowledge-discovery technique
ure rates [239]. Association-rule mining and genetic algorithms
processing has been combined with AH techniques to automat-
have been applied to an automatic course-scheduling system to
ically create online information systems from linear texts in
produce the course timetables that best suit student and teacher
electronic format, such as textbooks [3]. Argument mining has
needs [278].
been proposed to support argument construction for agents and
Finally, a regression model has been developed to predict
ITSs using different mining techniques [1].
the likelihood a specific undergraduate applicant has of matric-
Several DM techniques have been applied to reuse learn-
ulation, if admitted [139]; several clustering algorithms (self-
ing resources. Hybrid unsupervised DM techniques have been
organizing map networks, K-means, and Kth-nearest neighbor)
employed to facilitate LO reuse and retrieval from the Web
have been used as a decision support in selecting Associa-
or from different LO repositories [142]. Valuable informa-
tion of Advance Collegiate Schools of Business (AACSB) peer
tion can be found by mining metadata from educational re-
schools [140].
sources (ontology of pedagogical objects), which helps DM
to retrieve more precise information for content reuse and ex-
change [175]. The automatic classification of Web documents III. FUTURE WORK AND RESEARCH LINES
in a hierarchy of concepts based on naı̈ve Bayes has been sug- Although there is a lot of future work to be considered in
gested for the indexing and reuse of learning resources [235]. EDM, we indicate in continuation what arguably are the most
Profile analysis based on collaborative filtering has been used interesting and influential among them. In fact, a few initial
to search LOs and rank search results according to the pre- studies on some of these points have already begun to appear.
dicted level of user interest [197]. Mining educational multi- 1) EDM tools have to be designed to be easier for educators
media presentations has been used to establish explicit relation- or nonexpert users in DM. DM tools are normally designed
ships among the data related to interactivity (links and actions) more for power and flexibility than for simplicity. Most

of the current DM tools are too complex for educators to This shows the need for more effective mining tools that
use and their features go well beyond the scope of what an integrate educational domain knowledge into DM algo-
educator may want to do. For example, on one hand, users rithms. For example, Iksal and Choquet [129] have pro-
have to select the specific DM method/algorithm they want posed specific usage tracking language (UTL) to describe
to apply/use from the wide range of methods/algorithms the track semantics recorded by an LMS and to link them
available on DM. On the other hand, most of the DM al- to the need for observation defined in a predictive sce-
gorithms need to be configured before they are executed. nario. Education-specific mining techniques can greatly
Users have to provide appropriate values for the parame- improve instructional design and pedagogical decisions,
ters in advance in order to obtain good results/models, and and the aim of the semantic Web is to facilitate data man-
therefore, the user must possess a certain amount of exper- agement in educational environments.
tise in order to find the right settings. One possible solution
is the development of wizard tools that use a default al- IV. CONCLUSION
gorithm for each task and parameter-free DM algorithms
to simplify the configuration and execution for nonexpert This paper is a review of the state of the art with respect to
users. EDM tools must also have a more intuitive interface EDM and surveys the most relevant work in this area to date.
that is easy to use and with good visualization facilities to In fact, after first collecting and consulting all the published
make their results meaningful to educators and e-learning bibliography in EDM area, we have selected each author’s most
designers [93]. It is also very important to develop spe- important studies. Then, we have classified each study not only
cific preprocessing tools in order to automate and facilitate by the type of data and DM techniques used, but also and more
all the preprocessing functions or tasks that EDM users importantly, by the type of educational task that they resolve.
currently must do manually. EDM has been introduced as an upcoming research area re-
2) Integration with the e-learning system. The DM tool has to lated to several well-established areas of research, including
be integrated into the e-learning environment as one more e-learning, AH, ITSs, WM, DM, etc. We have seen how fast
traditional authoring tool (course creator, test creator, re- EDM is growing as reflected in the increasing number of con-
port tools, etc.). All DM tasks (preprocessing, DM, and tributions published every year in international conferences and
postprocessing) must be carried out in a single application journals and the number of specific tools specially developed
with a similar interface. In this way, EDM tools will be for applying DM algorithms in educational data/environments.
more widely used by educators, and feedback and results Therefore, it could be said that EDM is now approaching its
obtained with DM techniques could be easily and directly adolescence, i.e., it is no longer in its early days, but is not yet a
applied to the e-learning environment using an iterative mature area. In fact, we have described some interesting future
evaluation process [224]. lines, but for it to become a more mature area, it is also necessary
3) Standardization of data and models. Current tools for min- for researchers to develop more unified and collaborative studies
ing data pertaining to a specific course/framework may be instead of the current plethora of multiple individual proposals
useful to their developers only. There are no general tools and lines. Thus, the full integration of DM in the educational
or reusing tools that can be applied to any educational environment will become a reality, and fully operative imple-
system. Therefore, a standardization of input data and mentations (both commercial and free) could be made available
output model are needed, as along with preprocessing, not only for researchers and developers, but also for external
discovering, and postprocessing tasks. Shen et al. [243] users.
proposed using Extensible Markup Language (XML) as
data specification. Ventura et al. [267] used Predictive REFERENCES
Modeling Markup Language (PMML) that is the leading [1] S. Abbas and H. Sawamura, “A first step towards argument mining and
standard for statistical and DM models. But, it is also nec- its use in arguing agents and ITS,” in Proc. Int. Conf. Knowl.-Based
essary to incorporate domain knowledge and semantics Intell. Inf. Eng. Syst., Zagreb, Croatia, 2008, pp. 149–157.
[2] F. Abel, I. I. Bittencourt, N. Henze, D. Krause, and J. Vassileva, “A
using ontology-specification languages, such as Ontology rule-based recommender system for online discussion forums,” in Proc.
Web Language (OWL) and Resource Description Frame- Int. Conf. Adaptive Hypermedia Adaptive Web-Based Syst., Hannover,
work (RDF); and standard metadata for e-learning, such Germany, 2008, pp. 12–21.
[3] E. Alfonseca, P. Rodriguez, and D. Perez, “An approach for automatic
as SCORM. In this line, currently, there is only one pub- generation of adaptive hypermedia in education with multilingual knowl-
lic educational data repository, the PSLC DataShop [143], edge discovery techniques,” Comput. Educ. J., vol. 49, no. 2, pp. 495–
which provides a lot of educational datasets and also facil- 513, 2007.
[4] S. Amershi and C. Conati, “Combining unsupervised and supervised clas-
itates analysis. However, all this log data are obtained from sification to build user models for exploratory learning environments,”
ITSs; therefore, it is necessary to have more public datasets J. Educ. Data Mining, vol. 1, no. 1, pp. 18–71, 2009.
from other types of educational environments as well. In [5] A. Anaya and J. Boticario, “A data mining approach to reveal representa-
tive collaboration indicators in open collaboration frameworks,” in Proc.
this way, specific educational benchmark datasets could Int. Conf. Educ. Data Mining, Cordoba, Spain, 2009, pp. 210–218.
be used to compare/evaluate different DM algorithms. [6] A. Andrejko, M. Barla, M. Bielikova, and M. Tvarozek, “User character-
4) Traditional mining algorithms need to be tuned to take into istics acquisition from logs with semantics,” in Proc. Int. Conf. Inf. Syst.
Implementation Model., Czech Republic, 2007, pp. 103–110.
account the educational context. DM techniques must use [7] N. Anozie and B. W. Junker, “Predicting end-of-year accountability
semantic information when applied to educational data. assessment scores from monthly student records in an online tutoring

system,” in Proc. AAAI Workshop Educ. Data Mining, Menlo Park, CA, [32] J. E. Beck, R. Baker, A. T. Corbett, J. Kay, D. J. Litman, T. Mitrovic, and
2006, pp. 1–6. S. Ritter, presented at the 7th Int. Conf. Workshop Analyzing Student-
[8] A. Arnold, R. Scheines, J. E. Beck, and B. Jerome, “Time and attention: tutor Interaction Logs Improve Educ. Outcomes, Alagoas, Brazil, 2004.
Students, sessions, and tasks,” in Proc. AAAI 2005 Workshop Educ. Data [33] J. E. Beck, presented at the 20th Nat. Conf. Artif. Intell. (AAAI) Work-
Mining, Pittsburgh, PA, pp. 62–66. shop Educ. Data Mining, Pittsburgh, PA, 2005.
[9] C. Antunes, “Acquiring background knowledge for intelligent tutoring [34] J. E. Beck, E. Aimeur, and T. Barnes, presented at the 21st Nat. Conf.
systems,” in Proc. Int. Conf. Educ. Data Mining, Montreal, QC, Canada, Artif. Intell. (AAAI) Workshop Educ. Data Mining, Boston, MA, 2006.
2008, pp. 18–27. [35] J. E. Beck, M. Pechenizkiy, T. Calders, and S. R. Viola, presented at the
[10] I. Arroyo, T. Murray, and B. P. Woolf, “Inferring unobservable learning 7th IEEE Int. Conf. Adv. Learn. Technol. Workshop Educ. Data Mining,
variables from students’ help seeking behavior,” in Proc. Int. Conf. Intell. Niigata, Japan, 2007.
Tutoring Syst., Brazil, 2004, pp. 782–784. [36] K. Becker, C. Ghedini, and E. Terra, “Using kdd to analyze the impact of
[11] N. Avouris, V. Komis, G. Fiotakis, M. Margaritis, and E. Voyiatzaki, curriculum revisions in a Brazilian university,” in Proc. 11th Int. Conf.
“Why logging of fingertip actions is not enough for analysis of learning Data Eng., Orlando, FL, 2000, pp. 412–419.
activities,” in Proc. AIED Conf. Workshop Usage Anal. Learn. Syst., [37] A. Bellaachia and E. Vommina, “MINEL: A framework for mining
Amsterdam, The Netherlands, 2005, pp. 1–8. e-learning logs,” in Proc. Fifth IASTED Int. Conf. Web-Based Educ.,
[12] E. Ayers and B. W. Junker, “Do skills combine additively to predict Mexico, 2006, pp. 259–263.
task difficulty in eighth grade mathematics?” in Proc. AAAI Work- [38] D. Ben-naim, N. Marcus, and M. Bain, “Visualization and analysis of
shop Educ. Data Mining, Menlo Park, CA: AAAI Press, 2006, pp. 14– student interaction in an adaptive exploratory learning environment,”
20. in Proc. Int. Workshop Intell. Support Exploratory Environ. Eur. Conf.
[13] M. Bari and B. Lavoie, “Predicting interactive properties by mining Technol. Enhanced Learn., Maastricht, The Netherlands, 2008, pp. 1–10.
educational multimedia presentations,” in Proc. Int. Conf. Inf. Commun. [39] D. Ben-naim, M. Bain, and N. Marcus, “A user-driven and data-driven
Technol., 2007, pp. 231–234. approach for supporting teachers in reflection and adaptation of adaptive
[14] E. Ayers, R. Nugent, and N. Dean, “A comparison of student skill knowl- tutorials,” in Proc. Int. Conf. Educ. Data Mining, Cordoba, Spain, 2009,
edge estimates,” in Proc. Int. Conf. Educ. Data Mining, Cordoba, Spain, pp. 21–30.
2009, pp. 1–10. [40] L. Burr and D. H. Spennemann, “Pattern of user behavior in university
[15] R. Baker, A. Corbett, and K. Koedinger, “Detecting student misuse of online forums,” Int. J. Instruct. Technol. Distance Learn., vol. 1, no. 10,
intelligent tutoring systems,” in Proc. Int. Conf. Intell. Tutoring Syst., pp. 11–28, 2004.
Alagoas, Brazil, 2004, pp. 531–540. [41] J. Bravo and A. Ortigosa, “Detecting symptoms of low performance
[16] R. Baker, “Modeling and understanding students’ off-task behavior in using production rules,” presented at the Int. Conf. Educ. Data Mining,
intelligent tutoring systems,” in Proc. Conf. Hum. Factors Comput. Syst., Cordoba, Spain, 2009.
San Jose, CA, 2007, pp. 1059–1068. [42] V. P. Bresfelean, M. Bresfelean, and N. Ghisoiu, “Determining students’
[17] R. Baker, J. E. Beck, B. Berendt, A. Kroner, E. Menasalvas, and academic failure profile founded on data mining methods,” in Proc. Int.
S. Weibelzahl, “Track on educational data mining,” presented at the Conf. Inf. Technol. Interfaces, Croatia, 2008, pp. 317–322.
11th Int. Conf. User Model. Workshop Data Mining User Model, Corfu, [43] G. Burlak, J. Muñoz, A. Ochoa, and J. A. Hernández, “Detecting cheats
Greece, 2007. in online student assessments using data mining,” in Proc. Int. Conf.
[18] R. Baker, A. T. Corbett, and V. Aleven, “Improving contextual models of Data Mining, Las Vegas, NV, 2006, pp. 204–210.
guessing and slipping with a truncated training set,” in Proc. Int. Conf. [44] M. Cakir, F. Xhafa, N. Zhou, and G. Stahl, “Thread-based analysis of
Educ. Data Mining, Montreal, QC, Canada, 2008, pp. 67–76. patterns of collaborative interaction in chat,” in Proc. Int. Conf. AI Educ.,
[19] R. Baker, T. Barnes, and J. E. Beck, presented at the 1st Int. Conf. Educ. Amsterdam, The Netherlands, 2005, pp. 121–127.
Data Mining, Montreal, QC, Canada, 2008. [45] C. C. Chan, “A framework for assessing usage of web-based e-learning
[20] R. Baker and K. Yacef, “The state of educational data mining in 2009: A systems,” in Proc. Int. Conf. Innovative Comput., Inf. Control, Washing-
review and future visions,” J. Educ. Data Mining, vol. 1, no. 1, pp. 3–17, ton, DC, 2007, pp. 147–151.
2009. [46] F. H. Chanchary, I. Haque, and M. S. Khalid, “Web usage mining to
[21] R. Baker, “Data mining for education,” in International Encyclopedia of evaluate the transfer of learning in a web-based learning environment,”
Education, B. McGaw, P. Peterson, and E. Baker, Eds., 3rd ed. Oxford, in Proc. Int. Workshop Knowl. Discov. Data Mining, Washington, DC,
U.K.: Elsevier, 2010. 2008, pp. 249–253.
[22] R. Baker, A. Merceron, and P. I. Pavilk, presented at the 3rd Int. Conf. [47] K. M. Chang, J. E. Beck, J. Mostow, and A. Corbett, “A bayes net toolkit
Educ. Data Mining, Pittsburgh, PA, 2010. for student modeling in intelligent tutoring systems,” in Proc. Int. Conf.
[23] H. Ba-Omar, I. Petrounias, and F. Anwar, “A framework for using web Intell. Tutoring Syst., Jhongli, Taiwan, 2006, pp. 104–113.
usage mining for personalise e-learning,” in Proc. Int. Conf. Adv. Learn. [48] Y. C. Chang, W. Y. Kao, C. P. Chu, and C. H. Chiu, “A learning style
Technol., Niigata, Japan, 2007, pp. 937–938. classification mechanism for e-learning,” Comput. Educ. J., vol. 53,
[24] D. Barker-Plummer, R. Cox, and R. Dale, “Dimensions of difficulty in no. 2, pp. 273–285, 2009.
translating natural language into fist order logic,” in Proc. Int. Conf. [49] G. Chen, C. Liu, K. Ou, and B. Liu, “Discovering decision knowledge
Educ. Data Mining, Cordoba, Spain, 2009, pp. 220–228. from web log portfolio for managing classroom processes by applying
[25] T. Barnes, “The q-matrix method: Mining student response data for decision tree and data cube technology,” J. Educ. Comput. Res., vol. 23,
knowledge,” in Proc. AAAI Workshop Educ. Data Mining, Pittsburgh, no. 3, pp. 305–332, 2000.
PA, 2005, pp. 1–8. [50] C. Chen, L. Duh, and C. Liu, “A personalized courseware recommen-
[26] T. Barnes and J. Stamper, “Toward automatic hint generation for logic dation system based on fuzzy item response theory,” in Proc. IEEE
proof tutoring using historical student data,” in Int. Conf. Intell. Tutoring Int. Conf. E-Technol., E-Commerce E-Service, Washington, DC, 2004,
Syst., Montreal, QC, Canada, 2008, pp. 373–382. pp. 305–308.
[27] T. Barnes, M. Desmarais, C. Romero, and S. Ventura, presented at the [51] C. Chen, M. Chen, and Y. Li, “Mining key formative assessment rules
2nd Int. Conf. Educ. Data Mining, Cordoba, Spain, 2009. based on learner portfiles for web-based learning systems,” in Proc. IEEE
[28] C. B. Baruque, M. A. Amaral, A. Barcellos, J. C. Da Silva Freitas, Int. Conf. Adv. Learn. Technol., Niigata, Japan, 2007, pp. 1–5.
and C. J. Longo, “Analysing users’ access logs in Moodle to improve e [52] N. S. Chen, Kinshuk, C. W. Wei, and H. J. Chen, “Mining e-learning
learning,” in Proc. Euro Amer. Conf. Telematics Inf. Syst., Faro, Portugal, domain concept map from academic articles,” Comput. Educ. J., vol. 50,
2007, pp. 1–4. pp. 1009–1021, 2008.
[29] C. R. Beal and P. R. Cohen, “Temporal data mining for educational [53] C. H. Chen, C. M. Hong, and C. C. Chang, “Mining interactive social
applications,” in Proc. 10th Pacific Rim Int. Conf. Artif. Intell.: Trends network for recommending appropriate learning partners in a Web-based
Artif. Intell., Hanoi, Vietnam, 2008, pp. 66–77. cooperative learning environment,” in Proc. IEEE Conf. Cybern. Intell.
[30] J. E. Beck, presented at the 5th Int. Conf. Intell. Tutoring Syst. Syst., Chengdu, China, 2008, pp. 642–647.
(ITS) Workshop Applying Mach. Learning to ITS Design/Construction, [54] Y. Chen and C. Weng, “Mining fuzzy association rules from questionnaire
Montreal, QC, Canada, 2000. data,” Knowl.-Based Syst. J., vol. 22, no. 1, pp. 46–56, 2009.
[31] J. E. Beck, and B. P. Woolf, “High-level student modeling with machine [55] C. Chen and M. Chen, “Mobile formative assessment tool based on data
learning,” in Proc. 5th Int. Conf. Intell. Tutoring Syst., Alagoas, Brazil, mining techniques for supporting web-based learning,” Comput. Educ.
2000, pp. 584–593. J., vol. 52, no. 1, pp. 256–273, 2009.

[56] D. Y. Chiu, Y. C. Pan, and W. C. Chang, “Using rough set theory to [79] M. Feng, N. Heffernan, and K. Koedinger, “Looking for sources of error
construct e-learning faq retrieval infrastructure,” in Proc. IEEE Ubi- in predicting student’s knowledge,” in Proc. AAAI Workshop Educ. Data
Media Comput. Conf., Lanzhou, China, 2008, pp. 547–552. Mining, 2005, pp. 1–8.
[57] H. C. Chu, G. J. Hwang, J. C. R. Tseng, and G. H. Hwang, “A com- [80] M. Feng and N. T. Heffernan, “Informing teachers live about student
puterized approach to diagnosing student learning problems in health learning: Reporting in the assistment system,” in Proc. Conf. Artif. Intell.
education,” Asian J. Health Inf. Sci., vol. 1, no. 1, pp. 43–60, 2006. Educ. Workshop Usage Anal. Learn. Syst., Amsterdam, The Netherland,
[58] H. C. Chu, G. J. Hwang, P. H. Wu, and J. M. Chen, “A computer-assisted 2005, pp. 1–8.
collaborative approach for E-training course design,” in Proc. IEEE Conf. [81] M. Feng and N. Heffernan, “Informing teachers live about student learn-
Adv. Learn. Technol., Niigata, Japan, 2007, pp. 36–40. ing: Reporting in the assistment system,” Technol., Instruction, Cogni-
[59] F. Castro, A. Vellido, A. Nebot, and J. Minguillon, “Detecting atypical tion, Learn. J., vol. 3, pp. 1–8, 2006.
student behaviour on an e-learning system,” in Proc. Simposio Nacional [82] M. Feng, N. T. Heffernan, M. Mani, and C. Heffernan, “Using mixed-
de Tecnologı́as de la Información y las Comunicaciones en la Educación, effects modeling to compare different grain-sized skill models,” in Proc.
Granda, Spain, 2005, pp. 153–160. Workshop Educ. Data Mining, Menlo Park, CA, 2006, pp. 57–66.
[60] F. Castro, A. Vellido, A. Nebot, and F. Mugica, “Applying data min- [83] M. Feng and J. Beck, “Back to the future: A non-automated method of
ing techniques to e-learning problems,” in Evolution of Teaching and constructing transfer models,” in Proc. Int. Conf. Educ. Data Mining,
Learning Paradigms in Intelligent Environment (Studies in Computa- Cordoba, Spain, 2009, pp. 240–248.
tional Intelligence), vol. 62, L. C. Jain, R. Tedman, and D. Tedman, Eds. [84] M. Feng, J. E. Beck, and N. T. Heffernan, “Using learning decomposition
New York: Springer-Verlag, 2007, pp. 183–221. and bootstrapping with randomization to compare the impact of different
[61] A. Cetintas, L. Si, Y. P. Xin, and C. Hord, “Predicting correctness of educational interventions on learning,” in Proc. Int. Conf. Educ. Data
problem solving from low-level log data in intelligent tutoring systems,” Mining, Cordoba, Spain, 2009, pp. 51–60.
in Proc. Int. Conf. Educ. Data Mining, Cordoba, Spain, 2009, pp. 230– [85] A. W. P. Fok, H. S. Wong, and Y. S. Chen, “Hidden markov model based
238. characterization of content access patterns in an e-learning environment,”
[62] C. Choquet, V. Luengo, and K. Yacef, presented at the Artif. Intell. Educ. in Proc. IEEE Int. Conf. Multimedia Expo., Amsterdam, The Netherlands,
Conf. (AIED) Workshop Usage Analysis Learning Syst., Amsterdam, 2005, pp. 201–204.
The Netherlands, 2005. [86] D. Freedman, R. Purves, and R. Pisani, Statistics, 4th ed. New York:
[63] M. Cocea and S. Weibelzahl, “Can log files analysis estimate Norton, 2007.
learners’ level of motivation?” in Proc. Workshop week Lernen— [87] L. Freeman, The Development of Social Network Analysis. Vancouver,
Wissensentdeckung—Adaptivität, Hildesheim, Germany, 2006, pp. 32– BC, Canada: Empirical Press, 2006.
35. [88] J. Freyberger, N. T. Heffernan, and C. Ruiz, “Using association rules
[64] M. Cocea and S. Weibelzahl, “Cross-system validation of engagement to guide a search for best fitting transfer models of student learning,”
prediction from log files,” in Proc. Int. Conf. Conf. Technol. Enhanced in Proc. Workshop Analyzing Student-Tutor Interaction Logs Improve
Learn., Crete, Greece, 2007, pp. 14–25. Educ. Outcomes, Alagoas, Brazil, 2004, pp. 1–10.
[65] R. M. Crespo, A. Pardo, J. P. Pérez, and C. D. Kloos, “An algorithm for [89] E. Frias-Martinez, S. Chen, and X. Liu, “Survey of data mining ap-
peer review matching using student profiles based on fuzzy classification proaches to user modeling for adaptive hypermedia,” IEEE Trans. Syst.,
and genetic algorithms,” in Proc. Int. Conf. Innov. Appl. Artif. Intell., Man, Cybern. C, Appl. Rev., vol. 36, no. 6, pp. 734–749.
Bari, Italy, 2005, pp. 685–694. [90] E. Gaudioso, M. Montero, L. Talavera, and F. Hernandez-Del-Olmo,
[66] G. W. Dekker, M. Pechenizkiy, and J. M. Vleeshouwers, “Predicting “Supporting teachers in collaborative student modeling: A framework
students drop out: A case study,” in Proc. Int. Conf. Educ. Data Mining, and an implementation,” Exp. Syst. Appl., vol. 36, pp. 2260–2265, 2009.
Cordoba, Spain, 2009, pp. 41–50. [91] P. Garcia, A. Amandi, S. Schiaffino, and M. Campo, “Evaluating bayesian
[67] M. Delgado, E. Gibaja, M. C. Pegalajar, and O. Pérez, “Predicting stu- networks’ precision for detecting student’s learning styles,” Comput.
dents’ marks from. Moodle logs using neural network models,” in Proc. Educ. J., vol. 49, pp. 794–808, 2007.
Int. Conf. Current Dev. Technol.-Assist. Educ., Sevilla, Spain, 2006, [92] E. Garcia, C. Romero, S. Ventura, and C. Castro, “An architecture for
pp. 586–590. making recommendations to courseware authors using association rule
[68] N. Delavari, S. Phon-amnuaisuk, and M. Beikzadeh, “Data mining ap- mining and collaborative filtering,” User Model. User-Adapted Interac-
plication in higher learning institutions,” Inf. Educ. J., vol. 7, no. 1, tion: J. Personalization Res., vol. 19, pp. 99–132, 2009.
pp. 31–54, 2008. [93] E. Garcia, C. Romero, S. Ventura, and C. Castro, “Collaborative data
[69] M. C. Desmarais, M. Gagnon, and P. Meshkinfram, “Bayesian student mining tool for education,” in Proc. Int. Conf. Educ. Data Mining, Cor-
models based on item to item knowledge structures,” in Proc. Conf. doba, Spain, 2009, pp. 299–306.
Technol. Enhanced Learn., Crete, Greece, 2006, pp. 1–10. [94] T. D. Gedeon and H. S. Turner, “Explaining student grades predicted
[70] G. Dong and J. Pei, Sequence Data Mining. New York: Springer- by a neural network,” in Int. Conf. Neural Netw., Nagoya, Japan, 1993,
Verlag, 2007. pp. 609–612.
[71] H. Drachsler, H. G. Hummel, and R. Koper, “Personal recommender [95] J. Gibbs and M. Rice, “Evaluating student behavior on instructional Web
systems for learners in lifelong learning networks: The requirements, sites using web server logs,” in Proc. Ninth Sloan-C Int. Conf. Online
techniques and model,” Int. J. Learn. Technol., vol. 3, no. 4, pp. 404– Learn., Orlando, FL, 2003, pp. 1–3.
423, 2008. [96] M. Girones and T. A. Fernandez, “Ariadne, a guiding thread in the
[72] N. R. Draper and H. Smith, Applied Regression Analysis. New York: learning process’s labyrinth,” in Proc. Int. Conf. Current Dev. Technol.-
Wiley, 1998. Assist. Educ., Sevilla, Spain, 2006, pp. 287–290.
[73] L. P. Dringus and T. Ellis, “Using data mining as a strategy for assessing [97] P. Golding and O. Donalson, “Predicting academic performance,” in
asynchronous discussion forums,” Comput. Educ. J., vol. 45, no. 1, Proc. Frontiers Educ. Conf., San Diego, CA, 2006, pp. 21–26.
pp. 141–160, 2005. [98] Y. Gong, D. Rai, J. E. Beck, and N. T. Heffernan, “Does self-discipline
[74] N. El-kechaı̈ and C. Després, “Proposing the underlying causes that lead impact students’ knowledge and learning?,” in Proc. Int. Conf. Educ.
to the trainnee’s erroneous actions to the trainer,” in Proc. Eur. Conf. Data Mining, Cordoba, Spain, 2009, pp. 61–70.
Technol. Enhanced Learn., Crete, Creece, 2007, pp. 41–55. [99] H. L. Grob, F. Bensberg, and F. Kaderali, “Controlling open source
[75] P. Espejo, S. Ventura, and F. Herrera, “A survey on the application of intermediaries—A web log mining approach,” in Proc. Int. Conf. Inf.
genetic programming to classification,” IEEE Trans. Syst., Man, Cybern. Technol. Interfaces, Zagreb, Croatia, 2004, pp. 233–242.
C, Appl. Rev., vol. 40, no. 2, pp. 121–144, 2010. [100] Q. Guo and M. Zhang, “Implement web learning environment based on
[76] T. A. Etchells, A. Nebot, A. Vellido, P. J. G. Lisboa, and F. Mugica, data mining,” Knowl.-Based Syst. J., vol. 22, pp. 439–442, 2009.
“Learning what is important: Feature selection and rule extraction in [101] S. Ha, S. Bae, and S. Park, “Web mining for distance education,” in
a virtual course,” in Proc. Eur. Symp. Artif. Neural Netw., Bruseles, Proc. IEEE Int. Conf. Manage. Innovation Technol., Singapore, 2000,
Belgium, 2006, pp. 401–406. pp. 715–719.
[77] R. Farzan and P. Brusilovsky, “Social navigation support in a course [102] P. Haddawy, N. Thi, and T. N. Hien, “A decision support system for
recommendation system,” in Proc. 4th Int. Conf. Adaptive Hypermedia evaluating international student applications,” in Proc. Frontiers Educ.
Adaptive Web-based Syst., Dublin, 2006, pp. 91–100. Conf., Milwaukee, WI, 2007, pp. 1–4.
[78] L. V. Fausett and W. Elwasif, “Predicting performance from test scores [103] A. F. Hadwin, J. C. Nesbit, D. Jamieson-Noel, J. Code, and P. H. Winne,
using backpropagation and counterpropagation,” in Proc. IEEE World “Examining trace data to explore self-regulated learning,” Metacognition
Congr. Comput. Intell., Paris, France, 1994, pp. 3398–3402. Learn. J., vol. 2, no. 2/3, pp. 107–124, 2007.

[104] M. Hanna, “Data mining in the e-learning domain,” Campus-Wide Inf. [127] G. J. Hwang, P. S. Tsai, C. C. Tsai, and J. C. R. Tseng, “A novel approach
Syst., vol. 21, no. 1, pp. 29–34, 2004. for assisting teachers in analyzing student web-searching behaviors,”
[105] W. Hämäläinen, J. Suhonen, E. Sutinen, and H. Toivonen, “Data mining Comput. Educ. J., vol. 51, pp. 926–938, 2008.
in personalizing distance education courses,” in Proc. World Conf. Open [128] Z. Ibrahim and D. Rusli, “Predicting students’ academic performance:
Learn. Distance Educ., Hong Kong, 2004, pp. 1–11. Comparing artificial neural network, decision tree and linear regression,”
[106] W. Hämäläinen and M. Vinni, “Comparison of machine learning methods in Proc. Annu. SAS Malaysia Forum, Kuala Lumpur, Malaysia, 2007,
for intelligent tutoring systems,” in Proc. Int. Conf. Intell. Tutoring Syst., pp. 1–6.
Taiwan, 2006, pp. 525–534. [129] S. Iksal and C. Choquet, “Usage analysis driven by models in a peda-
[107] K. Hammouda and M. Kamel, “Data Mining in e-learning,” in E- gogical context,” in Proc. 12th Int. Conf. Artif. Intell. Workshop Usage
Learning Networked Environments and Architectures: A Knowledge Pro- Analysis Learning Syst., Amsterdam, The Netherlands, 2005, pp. 1–8.
cessing Perspective (Advanced Information and Knowledge Processing), [130] A. Ingram, “Using web server logs in evaluating instructional web sites,”
S. Pierre, Ed. Springer, 2006, pp. 1–28. J. Educ. Technol. Syst., vol. 28, no. 2, pp. 137–157, 1999.
[108] S. Hardof-Jaffe, A. Hershkovitz, H. Abu-Kishk, O. Bergman, and [131] H. Jin, T. Wu, Z. Liu, and J. Yan, “Application of visual data mining
R. Nachmias, “How do students organize personal information spaces?” in higher-education evaluation system,” in Proc. Int. Workshop Educ.
in Proc. Int. Conf. Educ. Data Mining, Cordoba, Spain, 2009, pp. 250– Technol. Comput. Sci., Washington, DC, 2009, pp. 101–104.
258. [132] J. Jovanovic, D. Gasevic, C. Brooks, V. Devedzic, and M. Hatala,
[109] E. Heathcote and S. Prakash, “What your learning management system “LOCO-Analyist: A tool for raising teacher’s awareness in online learn-
is telling you about supporting your teachers: Monitoring system infor- ing environments,” in Proc. Eur. Conf. Technol. Enhanced Learn., Crete,
mation to improve support for teachers using educational technologies at Greece, 2007, pp. 112–126.
Queensland University of Technology,” in Proc. Int. Conf. Inf. Commun. [133] B. S. Jong, T. Y. Chan, and Y. L. Wu, “Learning log explorer in e-learning
Technol. Educ., Samos Island, Greece, 2007, pp. 1–6. diagnosis,” IEEE Trans. Educ. J., vol. 50, no. 3, pp. 216–228, Aug. 2007.
[110] E. Heathcote and S. Dawson, “Data mining for evaluation, benchmarking [134] A. Jonsson, M. Hasmik, J. Johns, H. Mehranian, I. Arroyo, B. Woolf,
and reflective practice in a LMS,” in Proc. World Conf. E-Learn. Corp., A. Barto, D. Fisher, and S. Mahadevan, “Evaluating the feasibility of
Gov., Healthcare Higher Educ., Vancouver, BC, Canada, 2005, pp. 326– learning student models from data,” in Proc. AAAI Workshop Educ.
333. Data Mining, Pittsburgh, PA, 2005, pp. 1–6.
[111] C. Heiner, R. Baker, and K. Yacef, presented at the 8th Int. Conf. Intell. [135] A. Juan, T. Daradoumis, J. Faulin, and F. Xhafa, “SAMOS: A model for
Tutoring Syst. (ITS) Workshop Educ. Data Mining, Jhongli, Taiwan, monitoring students’ and groups’ activities in collaborative e-learning,”
2006. Int. J. Learning Technol., vol. 4, no. 1–2, pp. 53–72, 2009.
[112] C. Heiner, N. Heffernan,, and T. Barnes presented at the 13th Int. Conf. [136] P. Karampiperis and D. Sampson, “Adaptive learning resources sequenc-
Artif. Intell. Educ. Workshop Educ. Data Mining, Los Angeles, CA, ing in educational hypermedia systems,” Educ. Technol. Soc. J., vol. 8,
2007. no. 4, pp. 128–147, 2005.
[113] J. Herlocker, J. Konstan, L. G. Tervin, and J. Riedl, “Evaluating collabo- [137] J. Kay, N. Maisonneuve, K. Yacef, and O. R. Zaiane, “Mining patterns
rative filtering recommender systems,” ACM Trans. Inf. Syst. J., vol. 22, of events in students’ teamwork data,” in Proc. Workshop Educ. Data
no. 1, pp. 5–53, 2004. Mining, Taiwan, 2006, pp. 1–8.
[114] J. M. Heraud, L. France, and A. Mille, “Pixed: An ITS that guides [138] D. Kelly and B. Tangney, “First aid for you: Getting to know your
students with the help of learners’ interaction log,” in Proc. Int. Conf. learning style using machine learning,” in Proc. IEEE Int. Conf. Adv.
Intell. Tutoring Syst., Maceio, Brazil, 2004, pp. 57–64. Learning Technol., Washington, DC, 2005, p. 1-3.
[115] A. Hershkovitz and R. Nachmias, “Developing a log-based motivation [139] S. Khajuria, “A model to predict student matriculation from admissions
measuring tool,” in Proc. 1st Int. Conf. Educ. Data Mining, Montreal, data,” M.S. thesis, Ind. Manufact. Syst. Eng., Ohio Univ., Athens, OH,
QC, Canada, 2008, pp. 226–233. 2007.
[116] A. Hershkovitz and R. Nachmias, “Consistency of students’ pace in [140] M. Y. Kiang, D. M. Fisher, J. V. Chen, S. A. Fisher, and R. T. Chi, “The
online learning,” in Proc. Int. Conf. Educ. Data Mining, Cordoba, Spain, application of SOM as a decision support tool to identify AACSB peer
2009, pp. 71–80. schools,” Decision Support Syst. J., vol. 47, no. 1, pp. 51–59, 2009.
[117] N. T. N. Hien and P. Haddawy, “A decision support system for evaluat- [141] J. Kim, G. Chern, D. Feng, E. Shaw, and E. Hovy, “Mining and assessing
ing international student applications,” in Proc. Frontiers Educ. Conf., discussions on the web through speech act analysis,” in Proc. AAAI
Milwaukee, WI, 2007, pp. 1–6. Workshop Web Content Mining Human Lang. Technol., Athens, GA,
[118] T. Hsia, A. Shie, and L. Chen, “Course planning of extension education 2006, pp. 1–8.
to meet market demand by using data mining techniques—An example [142] C. C. Kiu and C. S. Lee, “Learning objects reusability and retrieval
of Chinkuo technology university in Taiwan,” Expert Syst. Appl. J., through ontological sharing: A hybrid unsupervised data mining ap-
vol. 34, no. 1, pp. 596–602, 2008. proach,” in Proc. IEEE Conf. Adv. Learning Technol., Niigata, Japan,
[119] C. Huang, P. Tsai, C. Hsu, and R. Pan, “Exploring cognitive dif- 2007, pp. 548–550.
ference in instructional outcomes using text mining technology,” in [143] K. Koedinger, K. Cunningham, A. Skogsholm, and B. Leber, “An open
Proc. IEEE Conf. Syst., Man, Cybern., Taipei, Taiwan, 2006, pp. 2116– repository and analysis tools for fine-grained, longitudinal learner data,”
2120. in Proc. 1st Int. Conf. Educ. Data Mining, Montreal, QC, Canada, 2008,
[120] J. Huang, A. Zhu, and Q. Luo, “Personality mining method in web based pp. 157–166.
education system using data minig,” in Proc. IEEE Int. Conf. Grey Syst. [144] E. M. Kosba, V. Dimitrova, and R. Boyle, “Using student and group
Intell. Services, Nanjing, China, 2007, pp. 155–158. models to support teachers in web-based distance education,” in Proc.
[121] C. Huang, W. Lin, W. Wang, and W. Wang, “Planning of educational Int. Conf. User Model., Edinburgh, U.K., 2005, pp. 124–133.
training courses by data mining: Using China Motor Corporation as an [145] S. Kotsiantis, C. Pierrakeas, and P. Pintelas, “Preventing student dropout
example,” Expert Syst. Appl. J., vol. 36, no. 3, pp. 7199–7209, 2009. in distance learning systems using machine learning techniques,” in
[122] T. Huang, S. Cheng, and Y. Huang, “A blog article recommendation Proc. Int. Conf. Knowl.-Based Intell. Inf. Eng. Syst., Oxford, U.K.,
generating mechanism using an SBACPSO algorithm,” Expert Syst. 2003, pp. 3–5.
Appl. J., vol. 36, no. 7, pp. 10388–10396, 2009. [146] S. B. Kotsiantis and P. E. Pintelas, “Predicting students’ marks in Hellenic
[123] R. Hübscher, S. Puntambekar, and A. Nye, “Domain specific interactive Open University,” in Proc. IEEE Int. Conf. Adv. Learning Technol.,
data mining,” in Proc. 11th Int. Conf. User Model. Workshop Data Mining Washington, DC, 2005, pp. 664–668.
User Model., Corfu, Greece, 2007, pp. 81–90. [147] M. K. Khribi, M. Jemni, and O. Nasraoui, “Automatic recommendations
[124] T. Hurley and S. Weibelzahl, “Using MotSaRT to support on-line teachers for e-learning personalization based on web usage mining techniques and
in student motivation,” in Proc. Eur. Conf. Technol. Enhanced Learn., information retrieval,” in Proc. IEEE Int. Conf. Adv. Learning Technol.,
Crete, Creece, 2007, pp. 101–111. Washington, DC, 2008, pp. 241–245.
[125] W. Y. Hwang, C. B. Chang, and G. J. Chen, “The relationship of learning [148] A. Kristofic, “Recommender system for adaptive hypermedia applica-
traits, motivation and performance-learning response dynamics,” Com- tions,” in Proc. Stud. Res. Conf. Informat. Inf. Technol., Bratislava, Slo-
put. Educ. J., vol. 42, pp. 267–287, 2004. vakia, 2005, pp. 229–234.
[126] G. J. Hwang, “A data mining approach to diagnosing student learning [149] R. Lau, A. Chung, D. Song, and Q. Huang, “Towards fuzzy domain
problems in science courses,” J. Distance Educ. Technol., vol. 3, no. 4, ontology based concept map generation for e-learning,” in Proc. Int.
pp. 35–50, 2005. Conf. Web-Based Learning, Ediburgh, U.K., 2007, pp. 90–101.

[150] C. S. Lee, “Diagnostic, predictive and compositional modeling with data dent log files,” in Proc. Workshop Analyzing Student-Tutor Interaction
mining in integrated learning environments,” Comput. Educ. J., vol. 49, Logs Improve Educ. Outcomes, Alagoas, Brazil, 2004, pp. 1–10.
pp. 562–580, 2007. [175] A. Merceron, C. Oliveira, M. Scholl, and C. Ullrich, “Mining for con-
[151] M. W. Lee, S. Y. Chen, and X. Liu, “Mining learners’ behavior in access- tent re-use and exchange - solutions and problems,” in Proc. Int. Conf.
ing web-based interface,” in Proc. Int. Conf. Edutainment, Hong Kong, Semantic Web Conf., Hiroshima, Japan, 2004, pp. 1–2.
China, 2007, pp. 226–346. [176] A. Merceron and K. Yacef, “Mining student data captured from a web-
[152] C. H. Lee, G. Lee, and Y. Leu, “Application of automatically constructed based tutoring tool: Initial exploration and results,” J. Interactive Learn-
concept map of learning to conceptual diagnosis of e-learning,” Expert ing Res., vol. 15, no. 4, pp. 319–346, 2004.
Syst. Appl. J., vol. 36, pp. 1675–1684, 2009. [177] A. Merceron and K. Yacef, “Educational data mining: A case study,” in
[153] M. W. Lee, S. Y. Chen, K. Chrysostomou, and X. Liu, “Mining student’s Proc. Int. Conf. Artif. Intell. Educ., Amsterdam, The Netherlands, 2005,
behavior in web-based learning programs,” Expert Syst. Appl. J., vol. 36, pp. 1–8.
pp. 3459–3464, 2009. [178] A. Merceron and K. Yacef, “Interestingness measures for association
[154] D. Lemire, Boley, H. S. Mcgrath, and M. Ball, “Collaborative filtering rules in educational data,” in Proc. Int. Conf. Educ. Data Mining,
and inference rules for context-aware learning object recommendation,” Montreal, QC, Canada, 2008, pp. 57–66.
Int. J. Interactive Technol. Smart Educ., vol. 2, no. 3, pp. 1–11, 2005. [179] C. Mihaescu and D. Burdescu, “Testing attribute selection algorithms for
[155] O. Licchelli, T. M. Basile, D. N. Mauro, F. Esposito, G. Semeraro, and classification performance on real data,” in Proc. Int. IEEE Conf. Intell.
S. Ferilli, “Machine learning approaches for inducing student models,” Syst., Varna, Bulgaria, 2006, pp. 581–586.
in Proc. Int. Conf. Innov. Appl. Artif. Intell., Ottawa, Canada, 2004, [180] B. Minaei-bidgoli, D. A. Kashy, G. Kortmeyer, and W. F. Punch, “Pre-
pp. 935–944. dicting student performance: An application of data mining methods with
[156] I. Lykourentzou, I. Giannoukos, V. Nikolopoulos, G. Mpardis, and an educational Web-based system,” in Proc. Int. Conf. Frontiers Educ.,
V. Loumos, “Dropout prediction in e-learning courses through the com- 2003, pp. 13–18.
bination of machine learning techniques,” Comput. Educ. J., vol. 53, [181] B. Minaei-bidgoli, P. Tan, and W. Punch, “Mining interesting contrast
no. 3, pp. 950–965, 2009. rules for a web-based educational system,” in Proc. Int. Conf. Mach.
[157] X. Li, Q. Luo, and J. Yuan, “Personalized recommendation service sys- Learning Appl., Los Angeles, CA, 2004, pp. 1–8.
tem in e-learning using web intelligence,” in Proc. 7th Int. Conf. Comput. [182] P. B. Myszkowski, H. Kwasnicka, and U. Markowska-Kaczmar, “Data
Sci., Beijing, China, 2007, pp. 531–538. mining techniques in e-learning celgrid system,” in Proc. Int. Conf.
[158] F. Lin, L. Hsieh, and F. Chuang, “Discovering genres of online discussion Comput. Inf. Syst. Ind. Manage. Appl., Ostrava, The Czech Republic,
threads via text mining,” Comput. Educ. J., vol. 52, no. 2, pp. 481–495, 2008, pp. 315–319.
2009. [183] D. Monk, “Using data mining for e-learning decision making,” Electron.
[159] F. Liu and B. Shih, “Learning activity-based e-learning material recom- J. E-Learning, vol. 3, no. 1, pp. 41–54, 2005.
mendation system,” in Proc. Int. Symp. Multimedia, Taichung, Taiwan, [184] E. Mor and J. Minguillón, “E-learning personalization based on
2007, pp. 343–348. itineraries and long-term navigational behavior,” in Proc. 13th World
[160] J. Lu, “A personalized e-learning material recommender system,” in Wide Web Conf., New York, 2004, pp. 264–265.
Proc. Int. Conf. Inf. Technol. Appl., Harbin, China, 2004, pp. 374–379. [185] J. Mostow, J. Beck, H. Cen, A. Cuneo, E. Gouvea, and C. Heiner, “An
[161] F. Lu, X. Li, Q. Liu, Z. Yang, G. Tan, and T. He, “Research on person- educational data mining tool to browse tutor-student interactions: Time
alized e-learning system using fuzzy set based clustering algorithm,” in will tell!,” in Proc. Workshop Educ. Data Mining, 2005, pp. 15–22.
Proc. Int. Conf. Comput. Sci., Beijing, China, 2007, pp. 587–590. [186] J. Mostow and J. Beck, “Some useful tactics to modify, map and mine
[162] J. Luan, “Data mining, knowledge management in higher education, data from intelligent tutors,” J. Nat. Lang. Eng., vol. 12, no. 2, pp. 195–
potential applications,” in Proc. Workshop Assoc. Inst. Res. Int. Conf., 208, 2006.
Toronto, ON, Canada, 2002, pp. 1–18. [187] M. Muehlenbrok, “Automatic action analysis in an interactive learning
[163] Y. Ma, B. Liu, C. Wong, P. S. Yu, and S. M. Lee, “Targeting the right environment,” in Proc. 12th Int. Conf. Artif. Intell. Educ. Workshop
students using data mining,” in Proc. 6th ACM SIGKDD Int. Conf. Knowl. Usage Anal. Learn. Syst., Amsterdam, The Netherlands, 2005, pp. 73–
Discov. Data Mining (KDD), 2000, pp. 457–464. 80.
[164] L. Machado and K. Becker, “Distance education: A web usage mining [188] N. Myller, J. Suhonen, and E. Sutinen, “Using data mining for improving
case study for the evaluation of learning sites,” in Proc. Int. Conf. Adv. web-based course design,” in Proc. Int. Conf. Comput. Educ., Washing-
Learning Technol., Athens, Greece, 2003, pp. 360–361. ton, DC, 2002, pp. 959–964.
[165] T. Madhyastha and E. Hunt, “Mining diagnostic assessment data for [189] R. Nagata, K. Takeda, K. Suda, J. Kakegawa, and K. Morihiro, “Edu-
concept similarity,” J. Educ. Data Mining, vol. 1, no. 1, pp. 72–91, mining for book recommendation for pupils,” in Proc. Int. Conf. Educ.
2009. Data Mining, Cordoba, Spain, 2009, pp. 91–100.
[166] P. Markellou, I. Mousourouli, S. Spiros, and A. Tsakalidis, “Using [190] E. Nankani, S. Simoff, S. Denize, and L. Young, “Supporting strategic
semantic web mining technologies for personalized e-learning expe- decision making in an enterprise university through detecting patterns of
riences,” in Proc. Web-Based Educ., Grindelwald, Switzerland, 2005, academic collaboration,” in Proc. Int. United Inf. Syst. Conf., Sydney,
pp. 461–826. Australia, 2009, pp. 496–507.
[167] D. Martinez, “Predicting student outcomes using discriminant function [191] A. Nebot, F. Castro, A. Vellido, and F. Mugica, “Identification of fuzzy
analysis,” in Proc. Meeting Res. Plann. Group, Lake Arrowhead, CA, models to predict students perfornance in an e-learning environment,”
2001, pp. 1–22. in Proc. Int. Conf. Web-Based Educ., Puerto Vallarta, Mexico, 2006,
[168] N. Matsuda, W. Cohen, J. Sewall, G. Lacerda, and K. R. Koedinger, “Pre- pp. 74–79.
dicting students performance with SimStudent that learns cognitive skills [192] J. C. Nesbit, Y. Xu, P. H. Winne, and M. Zhou, “Sequential pattern
from observation,” in Proc. Int. Conf. Artif. Intell. Educ., Amsterdam, analysis software for educational event data,” in Proc. Int. Conf. Methods
The Netherlands, 2007, pp. 467–476. Tech. Behav. Res., Netherlands, 2008.
[169] M. Mavrikis, “Data-driven prediction of the necessity of help requests [193] J. D. Novak and A. J. Cañas, “The theory underlying concept maps and
in ILEs,” in Proc. Int. Conf. Adaptive Hypermedia, Hannover, Germany, how to construct and use them,” Inst. Human Mach. Cogn., Pensacola.
2008, pp. 316–319. FL, Tech. Rep. IHMC CmapTools 2006-01, 2006.
[170] R. Mazza and D. Vania, “The design of a course data visualizator: [194] R. Nugent, E. Ayers, and N. Dean, “Conditional subspace clustering of
An empirical study,” in Proc. Int. Conf. New Educ. Environ., Lucerne, skill mastery: Identifying skills that separate students,” in Proc. Int. Conf.
Switzerland, 2003, pp. 215–220. Educ. Data Mining, Cordoba, Spain, 2009, pp. 101–110.
[171] R. Mazza and C. Milani, “GISMO: A graphical interactive student moni- [195] E. N. Ogor, “Student academic performance monitoring and evaluation
toring tool for course management systems,” in Proc. Int. Conf. Technol. using data mining techniques,” in Proc. Electron., Robot. Automotive
Enhanced Learn., Milan, Italy, 2004, pp. 1–8. Mech. Conf., Washington, DC, 2007, pp. 354–359.
[172] R. Mazza, Introduction to Information Visualization. New York: [196] V. O. Oladokun, A. T. Adebanjo, and O. E. Charles-Owaba, “Predicting
Springer-Verlag, 2009. student’s academic performance using artificial neural network: A case
[173] B. Mcdonald, “Predicting student success,” J. Math. Teaching Learning, study of an engineering course,” Proc. Pacific J. Sci. Technol., vol. 9,
vol. 1, pp. 1–14, 2004. no. 1, pp. 72–79, 2008.
[174] B. M. Mclaren, K. R. Koedinger, M. Schneider, A. Harrer, and L. Lollen, [197] T. Orzechowski, S. Ernst, and A. Dziech, “Profiled search methods for
“Bootstrapping novice data: Semi-automated tutor authoring using stu- e-learning systems,” in Proc. Int. Workshop Learning Object Discov.

Exchange Eur. Conf. Technol. Enhanced Learn., Crete, Greece, 2007, [220] C. Romero and S. Ventura, Data Mining in E-Learning. Ashurst,
pp. 1–10. Southampton: Wit Press, 2006.
[198] Y. Ouyang and M. Zhu, “eLORM: Learning object relationship min- [221] C. Romero and S. Ventura, “Educational data mining: A survey from
ing based repository,” in Proc. IEEE Int. Conf. Enterprise Comput., 1995 to 2005,” Expert Syst. Appl., vol. 1, no. 33, pp. 135–146, 2007.
E-Commerce, E-Service, Tokyo, Japan, 2007, pp. 691–698. [222] C. Romero, M. Pechenizkiy, T. Calders, and S. R. Viola, in Proc. Int.
[199] C. Pahl and C. Donnellan, “Data mining technology for the evaluation of Workshop Applying Data Mining E-Learning (ADML): 2nd Eur. Conf.
web-based teaching and learning systems,” in Proc. Congr. E-Learning., Technol. Enhanced Learn. (EC-TEL’07), Crete, Creece, 2007.
Montreal, Canada, 2003, pp. 1–7. [223] C. Romero, S. Ventura, and E. Salcines, “Data mining in course manage-
[200] Z. Pardos, N. Heffernan, B. Anderson, and C. Heffernan, “The effect ment systems: Moodle case study and tutorial,” Comput. Educ., vol. 51,
of model granularity on student performance prediction using bayesian no. 1, pp. 368–384, 2008.
networks,” in Proc. Int. Conf. User Model., Corfu, Greece, 2007, pp. 435– [224] C. Romero, S. ventura, C. hervás, and P. Gonzales, “Data mining al-
439. gorithms to classify students,” in Proc. Int. Conf. Educ. Data Mining,
[201] Z. Pardos, J. E. Beck, C. Ruiz, and N. Heffernan, “The composition Montreal, Canada, 2008, pp. 8–17.
effect: Conjunctive or compensatory? An analysis of multi-skill math [225] C. Romero, S. Gutierrez, M. Freire, and S. Ventura, “Mining and visual-
questions in ITS,” in Proc. Int. Conf. Educ. Data Mining, Montreal, QC, izing visited trails in web-based educational systems,” in Proc. Int. Conf.
Canada, 2008, pp. 147–156. Educ. Data Mining, Montreal, Canada, 2008, pp. 182–185.
[202] Z. Pardos, J. E. Beck, and N. Heffernan, “Determining the significance of [226] C. Romero, P. Gonzalez, S. Ventura, M. J. del Jesus, and F. Herrera,
item order in randomized problem sets,” in Proc. Int. Conf. Educ. Data “Evolutionary algorithms for subgroup discovery in e-learning: A prac-
Mining, Cordoba, Spain, 2009, pp. 111–120. tical application using Moodle data,” Expert Syst. Appl. J., vol. 36,
[203] P. Pavlik, H. Cen, and K. Koedinger, “Learning factors transfer analysis: pp. 1632–1644, 2009.
Using learning curve analysis to automatically generate domain models,” [227] C. Romero, S. Ventura, A. Zafra, and P. de bra, “Applying web usage min-
in Proc. Int. Conf. Educ. Data Mining, Cordoba, Spain, 2009, pp. 121– ing for personalizing hyperlinks in web-based adaptive educ. systems,”
130. Comput. Educ., vol. 53, no. 3, pp. 828–840, 2009.
[204] M. Pechenizkiy, T. Calders, E. Vasilyeva, and P. De bra, “Mining the [228] C. Romero, S. Ventura, M. Pechenizkiy, and R. Baker, Handbook of
student assessment data: Lessons drawn from a small scale case study,” Educational Data Mining. New York: Taylor & Francis, 2010.
in Proc. Int. Conf. Educ. Data Mining, Montreal, QC, Canada, 2008, [229] H. C. Romesburg, Cluster Analysis for Researchers. Melbourne, FL:
pp. 187–191. Krieger, 2004.
[205] M. Pechenizkiy, N. Trcka, E. Vasilyeva, W. Aalst, and P. De bra, “Process [230] F. Rosta and P. Brusilovsky, “Social navigation support in a course recom-
mining online assessment data,” in Proc. Int. Conf. Educ. Data Mining, mendation system,” in Proc. Int. Conf. Adaptive Hypermedia Adaptive
Cordoba, Spain, 2009, pp. 279–288. Web-Based Syst., Dublin, Ireland, 2006, pp. 1–10.
[206] D. Prata, R. Baker, E. Costa, C. Rose, and Y. Cui, “Detecting and under- [231] C. Reffay and T. Chanier, “How social network analysis can help to
standing the impact of cognitive and interpersonal conflict in computer measure cohesion in collaborative distance-learning,” in Proc. Int. Conf.
supported collaborative learning environments,” in Proc. Int. Conf. Educ. Comput. Supported Collaborative Learning, Bergen, Norvège, 2003,
Data Mining, Cordoba, Spain, 2009, pp. 131–140. pp. 1–6.
[207] D. Pritchard and R. Warnakulasooriya, “Data from a web-based home- [232] S. Retalis, A. Papasalouros, Y. Psaromilogkos, S. Siscos, and T. Kargidis,
work tutor can predict student’s final exam score,” in Proc. World Conf. “Towards networked learning analytics—A concept and a tool,” in Proc.
Educ. Multimedia, Hypermedia Telecommun., Chesapeake, VA, 2005, 5th Int. Conf. Netw. Learning, 2006, pp. 1–8.
pp. 2523–2529. [233] P. Reyes and P. Tchounikine, “Mining learning groups’ activities in
[208] Y. Psaromiligkos, M. Orfanidou, C. Kytagias, and E. Zafiri, “Mining forum-type tools,” in Proc. Conf. Comput. Support Collaborative Learn-
log data for the analysis of learners’ behaviour in web-based learning ing: Learning, Taipei, Taiwan, 2005, pp. 509–513.
management systems,” Oper. Res. J., vol. 11, pp. 1–14, 2009. [234] V. Rus, M. Lintean, and R. Azevedo, “Automatic detection of student
[209] D. Perera, J. Kay, I. Koprinska, K. Yacef, and O. R. Zaı̈ane, “Clus- mental models during prior knowledge activation in MetaTutor,” in Proc.
tering and sequential pattern mining of online collaborative learning Int. Conf. Educ. Data Mining, Cordoba, Spain, 2009, pp. 161–170.
data,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 6, pp. 759–772, Jun. [235] P. S. Saini, D. Sona, S. Veeramachaneni, and M. Ronchetti, “Making
2009. e-learning better through machine learning,” in Proc. Int. Conf. Methods
[210] J. R. Quevedo and E. Motañes, “Obtaining rubric weitghts for assess- Technol. Learning, Barcelona, Spain, 2005, pp. 1–6.
ments by more than one lecturer using a pairwise learning model,” in [236] A. P. Sanjeev and J. M. Zytkow, “Discovering enrollment knowledge in
Proc. Int. Conf. Educ. Data Mining, Cordoba, Spain, 2009, pp. 289–298. university databases,” in Proc. Int. Conf. Knowl. Discov. Data Mining,
[211] S. N. R. Raghavan, “Data mining in e-commerce: A survey,” Sadhana Montreal, Canada, 1995, pp. 246–251.
J., vol. 30, no. 2/3, pp. 275–289, 2005. [237] K. Schönbrunn and A. Hilbert, “Data mining in higher education,” in
[212] M. Rahkila and M. Karjalainen, “Evaluation of learning in computer Proc. Annu. Conf. Gesellschaft für Klassifikation e.V., Freie Universität
based education using log systems,” in Proc. ASEE/IEEE Front. Educ. Berlin, 2007, pp. 489–496.
Conf., San Juan, Puerto Rico, 1999, pp. 16–21. [238] J. Schoonenboom, K. Heller, M. Neenoy, and M. Levene, Trails in Edu-
[213] D. Rai, Y. Gong, and J. E. Beck, “Using Dirichlet priors to improve cation: Technologies That Support Navigational Learning. Rotterdam,
model parameter plausibility,” in Proc. Int. Conf. Educ. Data Mining, The Netherlands: Sense Publisher, 2007.
Cordoba, Spain, 2009, pp. 141–150. [239] N. Selmoune and Z. Alimazighi, “A decisional tool for quality improve-
[214] A. A. Ramli, “Web usage mining using a priori algorithm: UUM learning ment in higher education,” in Proc. Int. Conf. Inf. Commun. Technol.,
care portal case,” in Proc. Int. Conf. Knowl. Manage., 2005, Malaysia, Damascus, Syria, 2008, pp. 1–6.
pp. 1–19. [240] D. Shangping and Z. Ping, “A data mining algorithm in distance learn-
[215] J. Ranjan and S. Khalil, “Conceptual framework of data mining process ing,” in Proc. Int. Conf. Comput. Supported Cooperative Work in Design,
in management education in India: An institutional perspective,” Inf. Xian, China, 2008, pp. 1014–1017.
Technol. J., vol. 7, no. 1, pp. 16–23, 2008. [241] J. Sheard, J. Ceddia, J. Hurst, and J. Tuovinen, “Inferring student learning
[216] R. Rallo, M. Gisbert, and J. Salinas, “Using data mining and social behaviour from website interactions: A usage analysis,” J. Educ. Inf.
networks to analyze the structure and content of educative online com- Technol., vol. 8, no. 3, pp. 245–266, 2003.
munities,” in Proc. Int. Conf. Multimedia ICTs Educ., Caceres, Spain, [242] R. Shen, F. Yang, and P. Han, “Data analysis center based on e-learning
2005, pp. 1–10. platform,” in Proc. Workshop Internet Challenge: Technol. Appl., Berlin,
[217] S. Ritter, T. Harris, T. Nixon, D. Dickison, R. Murray, and B. Towle, Germany, 2002, pp. 19–28.
“Reducing the knowledge tracing space,” in Proc. Int. Conf. Educ. Data [243] R. Shen, P. Han, F. Yang, Q. Yang, and J. Huang, “Data mining and
Mining, Cordoba, Spain, 2009, pp. 151–160. case-based reasoning for distance learning,” J. Distance Educ. Technol.,
[218] V. Robinet, G. Bisson, M. Gordon, and B. Lemaire, “Searching for stu- vol. 1, no. 3, pp. 46–58, 2003.
dent intermediate mental steps,” in Proc. Int. Conf. User Model. Work- [244] L. Shen and R. Shen, “Learning content recommendation service based-
shop Data Mining User Model., Corfu, Greece, 2007, pp. 101–105. on simple sequencing specification,” in Proc. Int. Conf. Web-based
[219] C. Romero, S. Ventura, and P. De Bra, “Knowledge discovery with ge- Learning, Beijing, China, 2004, p. 363-370.
netic programming for providing feedback to courseware author,” User [245] M. Simko and M. Bielikova, “Automatic concept telationships discov-
Model. User-Adapted Interaction: J. Personalization Res., vol. 14, no. 5, ery for an adaptive e-course,” in Proc. Int. Conf. Educ. Data Mining,
p. 425-464, 2004. Cordoba, Spain, 2009, pp. 171–178.

[246] M. K. Singley and R. B. Lam, “The classroom sentinel: Supporting data- [269] C. Vialardi, J. Bravo, L. Shafti, and A. Ortigosa, “Recommendation in
driven decision-making in the classroom,” in Proc. 13th World Wide Web higher education using data mining techniques,” in Proc. Int. Conf. Educ.
Conf., Chiba, Japan, 2005, pp. 315–322. Conf., Cordoba, Spain, 2009, pp. 190–198.
[247] D. Song, H. Lin, and Z. Yang, “Opinion mining in e-learning system,” [270] S. R. Viola, S. Graf, Kinshuk, and T. Leo, “Analysis of Felder-Silverman
in Proc. Int. Conf. Netw. Parallel Comput. Workshops, Washington, DC, index of learning styles by a data-driven statistical approach,” in Proc.
2007, pp. 788–792. 8th IEEE Int. Symp. Multimedia, Washington, DC, 2006, pp. 959–964.
[248] J. Spacco, T. Winters, and T. Payne, “Inferring use cases from unit [271] M. Vranic, D. Pintar, and Z. Skocir, “The use of data mining in education
testing,” in Proc. Workshop Educ. Data Mining, New York, 2006, pp. 1– environment,” in Proc. Int. Conf. Telecommun., Zagred, Croatia, 2007,
7. pp. 243–250.
[249] J. Stamper and T. Barnes, “Unsupervised MDP value selection for [272] F. H. Wang, “On using data-mining technology for browsing log file
automating ITS capabilities,” in Proc. Int. Conf. Educ. Data Mining, analysis in asynchronous learning environment,” in Proc. World Conf.
Cordoba, Spain, 2009, pp. 180–188. Educ. Multimedia, Hypermedia Telecommun., Chesapeake, VA, 2002,
[250] R. Stevens, A. Giordani, M. Cooper, A. Soller, L. Gerosa, and C. Cox, pp. 2005–2006.
“Developing a framework for integrating prior problem solving and [273] F. H. Wang, “A fuzzy neural network for item sequencing in personalized
knowledge sharing histories of a group to predict future group perfor- cognitive scaffolding with adaptive formative assessment,” Expert Syst.
mance,” in Proc. Int. Conf. Collaborative Comput.: Netw., Appl. Work- Appl. J., vol. 27, pp. 11–25, 2004.
sharing, Boston, MA, 2005, pp. 1–9. [274] F. H. Wang, “Content recommendation based on education-
[251] Z. Su, W. Song, M. Lin, and J. Li, “Web text clustering for personalized contextualized browsing events for web-based personalized learning,”
e-learning based on maximal frequent item sets,” in Proc. Int. Conf. Educ. Technol. Soc. J., vol. 11, no. 4, pp. 94–112, 2008.
Comput. Sci. Softw. Eng., Washington, DC, 2008, pp. 452–455. [275] A. Y. Wang and M. H. Newlin, “Predictors of we-based performance: The
[252] J. F. Superby, J. P. Vandamme, and N. Meskens, “Determination of role olf self-effecacy and reasons for taking an on-line class,” Comput.
factors influencing the achievement of the first-year university students Human Behav. J., vol. 18, pp. 151–163, 2002.
using data mining methods,” in Proc. Int. Conf. Intell. Tutoring Syst. [276] F. H. Wang and H. M. Shao, “Effective personalized recommendation
Workshop Educ. Data Mining, Taiwan, 2006, pp. 1–8. based on time-framed navigation clustering and association mining,”
[253] D. W. Tai, H. J. Wu, and P. H. Li, “Effective e-learning recommendation Expert Syst. Appl. J., vol. 27, pp. 365–377, 2004.
system based on self-organizing maps and association mining,” Electron. [277] W. Wang, J. Weng, J. Su, and S. Tseng, “Learning portfolio analysis and
Library J., vol. 26, no. 3, pp. 329–344, 2008. mining in scorm compliant environment,” in Proc. ASEE/IEEE Front.
[254] L. Talavera and E. Gaudioso, “Mining student data to characterize similar Educ. Conf., Savannah, GA, 2004, pp. 17–24.
behavior groups in unstructured collaboration spaces,” in Proc. Workshop [278] Y. Wang, Y. Cheng, T. Chang, and S. M. Jen, “On the application of
Artif. Intell. CSCL, Valencia, Spain, 2004, pp. 17–23. data mining technique and genetic algorithm to an automatic course
[255] C. Tang, R. W. H. Lau, Q. Li, H. Yin, T. Li, and D. Kilis, “Personalized scheduling system,” in Proc. IEEE Conf. Cybern. Intell. Syst., Chengdu,
courseware construction based on web data mining,” in Proc. 1st Int. China, 2008, pp. 400–405.
Conf. Web Inf. Syst. Eng., Hong Kong, China, 2000, pp. 204–211. [279] Y. Wang, M. Tseng, and H. Liao, “Data mining for adaptive learning
[256] T. Y. Tang and G. Mccalla, “Student modeling for a web-based learning sequence in English language instruction,” Expert Syst. Appl. J., vol. 36,
environment: A data mining approach,” in Proc. Conf. Artif. Intell., pp. 7681–7686, 2009.
Edmonton, Canada, 2002, pp. 967–968. [280] T. Want and A. Mitrovic, “Using neural networks to predict student’s
[257] T. Tang and G. Mccalla, “Smart recommendation for an evolving e- performance,” in Proc. Int. Conf. Comput. Educ., Washington, DC, 2002,
learning system,” Int. J. E-Learning, vol. 4, no. 1, pp. 105–129, 2005. pp. 1–5.
[258] E. H. Thomas and N. Galambos, “What satisfies students? Mining [281] T. Winters, C. R. Shelton, T. Payne, and G. Mei, “Topic extraction from
student-opinion data with regression and decision tree analysis,” Res. item-level grades,” in Proc. AAAI Workshop Educ. Data Mining, Pitts-
Higher Educ. J., vol. 45, no. 3, pp. 251–269, 2004. burgh, PA, 2005, pp. 7–14.
[259] F. Tian, S. Wang, C. Zheng, and Q. Zheng, “Research on e-learning [282] A. Wu and C. Leung, “Evaluating learning behavior of web-based train-
personality group based on fuzzy clustering analysis,” in Proc. Int. ing (WBT) using web log,” in Proc. Int. Conf. Comput. Educ., New
Conf. Comput. Supported Cooperative Work Design, Xian, China, 2008, Zealand, 2002, pp. 736–737.
pp. 1035–1040. [283] K. Yamanishi and H. Li, “Mining from open answers in questionnaire
[260] C. J. Tsai, S. S. Tseng, and C. Y. Lin, “A two-phase fuzzy mining and data,” in Proc. 7th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining,
learning algorithm for adaptive learning environment,” in Proc. Int. Conf. San Francisco, CA, 2001, pp. 443–449.
Comput. Sci., San Francisco, 2001, p. 429-438. [284] T. D. Yang, T. Lin, and K. Wu, “An agent-based recommender system
[261] L. Tsantis and J. Castellani, “Enhancing learning environments through for lesson plan sequencing,” in Proc. Int. Conf. Adv. Learning Technol.,
solution-based knowledge discovery tools,” J. Spec. Educ. Technol., Kazan, Russia, 2002, pp. 14–20.
vol. 16, no. 4, pp. 39–52, 2001. [285] F. Yang, P. Han, R. Shen, and Z. Hu, “A novel resource recommendation
[262] S. S. Tseng, P. C. Sue, J. M. Su, J. F. Weng, and W. N. Tsai, “A new system based on connecting to similar e-learners,” in Proc. Int. Conf.
approach for constructing the concept map,” Comput. Educ. J., vol. 49, Web-Based Learning, Hong Kong, China, 2005, pp. 122–130.
pp. 691–707, 2007. [286] J. Yoo, S. Yoo, C. Lance, and J. Hankins, “Student progress monitoring
[263] M. Ueno and K. Nagaoka, “Learning log database and data mining system tool using treeview,” in Proc. Tech. Symp. Comput. Sci. Educ., ACM-
for e-learning—on line statistical outlier detection of irregular learning SIGCESE, 2006, pp. 373–377.
processes,” in Proc. Int. Conf. Adv. Learning Technol., Tatarstan, Russia, [287] M. V. Yudelson, O. Medvedeva, E. Legowski, M. Castine, D. Jukic,
2002, pp. 436–438. and D. Rebecca, “Mining student learning data to develop high level
[264] M. Ueno, “Data mining and text mining technologies for collaborative pedagogic strategy in a medical ITS,” in Proc. AAAI Workshop Educ.
learning in an ILMS "Samurai",” in Proc. IEEE Int. Conf. Adv. Learning Data Mining, Boston, MA, 2006, pp. 1–8.
Technol., Washington, DC, 2004, pp. 1052–1053. [288] C. H. Yu, A. Jannasch-Pennell, S. Digangi, and B. Wasson, “Using on-
[265] M. N. Vee, B. Meyer, and K. L. Mannock, “Understanding novice errors line interactive statistics for evaluating web-based instruction,” J. Educ.
and error paths in Object-oriented programming through log analysis,” Media Int., vol. 35, pp. 157–161, 1999.
in Proc. Workshop Educ. Data Mining, Taiwan, 2006, pp. 13–20. [289] P. Yu, C. Own, and L. Lin, “On learning behavior analysis of web
[266] A. Vellido, F. Castro, T. A. Etchells, A. Nebot, and F. Mugica, “Data based interactive environment,” in Proc. Int. Conf. Comput. Electr. Eng.,
mining of virtual campus data,” in Evolution of Teaching and Learning Oslo/Bergen, Norway, 2001, pp. 1–9.
Paradigms in Intelligent Environment. Studies in Computational Intelli- [290] C. H. Yu, S. Digangi, A. K. Jannasch-pennell, and C. Kaprolet, “Pro-
gence (SCI) (Advanced Information and Knowledge Processing), vol. 62. filing students who take online courses using data mining methods,”
New York: Springer-Verlag, 2007, pp. 223–254. Online J. Distance Learning Administ., vol. 11, no. 2, pp. 1–14,
[267] S. Ventura, C. Romero, and C. Hervas, “Analyzing rule evaluation mea- 2008.
sures with educational datasets: A framework to help the teacher,” in [291] A. Zafra and S. Ventura, “Predicting student grades in learning manage-
Proc. Int. Conf. Educ. Data Mining, Montreal, Canada, 2008, pp. 177– ment systems with multiple instance programming,” in Proc. Int. Conf.
181. Educ. Data Mining, Cordoba, Spain, 2009, pp. 307–314.
[268] C. Vialardi, J. Bravo, and A. Ortigosa, “Improving AEH courses through [292] O. Zaı̈ane and J. Luo, “Web usage mining for a better web-based learning
log analysis,” J. Universal Comput. Sci., vol. 14, no. 17, pp. 2777–1798, environment,” in Proc. Conf. Adv. Technol. Educ., Banff, AB, Canada,
2008. 2001, pp. 60–64.

[293] O. Zaı̈ane, “Building a recommender agent for e-learning systems,” in Cristóbal Romero (M’09) received the B.Sc. and
Proc. Int. Conf. Educ., Auckland, New Zealand, 2002, pp. 55–59. Ph.D. degrees in computer science from the Univer-
[294] D. Zakrzewska, “Cluster analysis for user’s modeling in intelligent e- sity of Granada, Granada, Spain, in 1996 and 2003,
learning systems,” in Proc. Int. Conf. Ind., Eng. Other Appl. Appl. Intell. respectively.
Syst., Poland, 2008, pp. 209–214. He is currently an Associate Professor in the De-
[295] K. Zhang, L. Cui, H. Wang, and Q. Sui, “An improvement of matrix- partment of Computer Science and Numerical Anal-
based clustering method for grouping learners in e-learning,” in Proc. ysis, University of Córdoba, Córdoba, Spain. He has
Int. Conf. Comput. Supported Cooperative Work Design, Melbourne, authored or coauthored more than 30 papers about
Australia, 2007, pp. 1010–1015. educational data mining in international journals and
[296] C. Zhang and S. Zhang, Association Rule Mining: Models and Algorithms conferences. His research interests include applying
(Lecture Notes in Artificial Intelligence). New York: Springer-Verlag, data mining in e-learning systems.
2002. Dr. Romero is a member of the IEEE Computer Society, the International
[297] X. Zhang, J. Mostow, N. Duke, C. Trotochaud, J. Valeri, and A. Corbett, Educational Data Mining (EDM) Working Group, and the steering committee
“Mining free-form spoken responses to tutor prompts,” in Proc. Int. Conf. of the EDM Conferences.
Educ. Data Mining, Montreal, QC, Canada, 2008, pp. 234–241.
[298] L. Zhang, X. Liu, and X. Liu, “Personalized instructing recommendation
system based on web mining,” in Proc. Int. Conf. Young Comput. Sci.,
Hunan, China, 2008, pp. 2517–2521.
[299] S. Zheng, S. Xiong, Y. Huang, and S. Wu, “Using methods of association
rules mining optimization in web-based mobile learning system,” in
Proc. Int. Symp. Electron. Commerce Security, Guangzhou, China, 2008, Sebastián Ventura (SM’07) received the B.Sc.
pp. 967–970. and Ph.D. degrees in sciences from the University
[300] F. Zhu, H. Ip, A. Fok, and J. Cao, “PeRES: A personalized recommenda- of Córdoba, Córdoba, Spain, in 1989 and 1996,
tion education system based on multi-agents & SCORM,” in Proc. Int. respectively.
Conf. Web-Based Learning, Ediburgh, U.K., 2007, pp. 31–42. He is currently an Associate Professor in the De-
[301] C. Zinn and O. Scheuer, “Getting to know your students in distance- partment of Computer Science and Numerical Anal-
learning contexts,” in Proc. 1st Eur. Conf. Tehcnol. Enhanced Learn., ysis, University of Córdoba, where he is the Head
2006, pp. 437–451. of the Knowledge Discovery and Intelligent Sys-
[302] L. Zoubek and M. Burda, “Visualization of differences in data measuring tems Research Laboratory. He has authored or coau-
mathematical skills,” in Proc. Int. Conf. Educ. Data Mining, Cordoba, thored more than 70 international publications, 20 of
Spain, 2009, pp. 315–324. them published in international journals. He has also
[303] M. E. Zorrilla, E. Menasalvas, D. Marin, E. Mora, and J. Segovia, “Web worked on eleven research projects (being the Coordinator of two of them) that
usage mining project for improving web-based learning sites,” in Proc. were supported by the Spanish and Andalusian governments and the European
Int. Conf. Comput. Aided Syst. Theory, Las Palmas de Gran Canaria, Union. His research interests include soft computing, machine learning, data
Spain, 2005, pp. 205–210. mining, and its applications.
[304] Z. Zukhri and K. Omar, “Solving new student allocation problem with Dr. Ventura is a Senior Member of the IEEE Computer Society, the IEEE
genetic algorithms: A hard problem for partition based approach,” J. Computational Intelligence Society, the IEEE Systems, Man, and Cybernetics
Zhejiang Univ., vol. 2, no. 1, pp. 6–15, 2007. Society, and the Association of Computing Machinery.

You might also like