A Survey On Educational Data Mining Techniques

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 5

Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications

Volume: 05 Issue: 02 December 2016, Page No.167-171

ISSN: 2278-2419

A Survey on Educational Data Mining Techniques

A.S. Arunachalam1, T.Velmurugan2
Research scholar, Department of Computer Science, Vels University, Chennai, India.
Associate Professor, PG and Research Department of Computer Science, D.G.Vaishanav College, Chennai, India.
Mail:[email protected], [email protected]

Abstract - Educational data mining (EDM) creates high impact stores E Commerce transaction data for money transfer. The
in the field of academic domain. The methods used in this topic commercial web sites transaction data passes the control to
are playing a major advanced key role in increasing knowledge bank website application and purchases happen. E Learning
among students. EDM explores and gives ideas in technology is used for learning from web sites. The users can
understanding behavioral patterns of students to choose a gather necessary knowledge from web based learning process.
correct path for choosing their carrier. This survey focuses on The web applications used in this type of websites are very
such category and it discusses on various techniques involved useful in providing knowledge for learners. Web based
in making educational data mining for their knowledge applications stores the information through web access log
improvement. Also, it discusses about different types of EDM files, which eventually stores information about users working
tools and techniques in this article. Among the different tools on the websites.
and techniques, best categories are suggested for real world
usage. In educational system the knowledge assessment techniques
applied to improve students’ learning process. The formative
Key words: Educational Data Mining, Web Mining, E- assessment process evaluating continues improvement of
Learning, Data Mining Techniques students learning capacity. The formative system helps the
educator to improve instructional materials. The data mining
I. INTRODUCTION techniques helped the educator to make academic decision
when designing or editing the teaching methodology. The
Collecting relevant student record and analyzing the same from educational data mining follow the common data mining
huge record set always remain difficult task for researchers. methods. Extracted information should enter the circle of the
Data mining process of extracting hidden information from system and guide, fine tuning and refinement of learning [4].
large database provides a meaningful solution for educational This data not only becoming the knowledge, it improves the
data mining. The researcher also faces many problems in mined knowledge for decision making. The rest of the survey
implementing the developed system for educational data paper is planned as follows. Section 2 discusses about the basic
mining in different platform. Huge number of developments in concepts of educational data mining. In section 3, it is
educational courses always remains difficult task for students discussed about tools used for educational data mining. The
in choosing best course. Current web based course applications applications or techniques of educational data mining are
doesn’t provide static learning materials by understanding illustrated in section 4. Finally, section 5 concludes the survey
students mentality. User friendly environment for web based work.
educational system always remain a good solution to richer
learning environment. In traditional education system, students II. EDUCATIONAL DATA MINING
share their learning experiences one to one interaction and
continual evaluation process [1]. Classroom Evaluations Educational data mining play a major role in society and
processes by observing student’s attitude, analyzing record, educational area. The data mining sequence applying in
and student appraisal in teaching strategies. The supervision is educational system can be clearly represented with a diagram
not possible when the students working in IT field; pedagogue shown in Figure1.
chooses for other techniques to get class room data.
Institutions, which run websites for distance learning, collect
huge data, collecting server access log and web server by
automatically. Web based learning analysing tools available
online increses the interaction data between acdemicion and
students [2]. Most effective learninig environment can be
carried by following data mining techniques. The data mining
techniques stages starts from pre processing to post processing
techniques by following KDD process of identifying necessary
educational data. Web based domain area E-commerce uses
data mining techniques in advancing educational mining. E-
Learning process gives optimal solution for improving the
educational data mining process. Some differents in E-learning
and E-Commerce systems are disscussed below [3].

E-commerce technology is used for communicating client with

server for commercial purpose. Web based applications are Figure 1: Data mining sequence applying in Educational
used for carrying out this technology. Web access log files System
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 05 Issue: 02 December 2016, Page No.167-171
ISSN: 2278-2419
a. Academician and symbolic data analysis. Their objective is to find mistakes
Academician plays a major role in designing educational that often occurs together [10]. Becker introduced Sequential
system. The educational system should be constructed for patterns can expose which content has provoked the access to
better benefits to students. This effort may cause dramatic other contents, or how tools and contents are tangled in the
changes in educational environment and society. The learning process [11]. Avouris, N., Komis, V., Fiotakis, G.,
academicians and teachers should analyze students’ records Margaritis, M. and Voyiatzaki, E. develop automatically
and construct the better educational system. Data mining generated log files by introducing contextual information as
techniques provides an additional benefit to academicians for additional events and by associating comments and static files
analyzing student’s behavior based on historical data. [12]. Mazza and Milani introduced a tool GISMO/CourseVis
which features Information visualization techniques can be
b. Educational System used to graphically render multidimensional, complex student
Proper educational system provides a healthy environment and tracking data collected by web-based educational systems these
society. The students are one of the major factors in techniques facilitate to analyze large amounts of information
educational system that eventually makes the society healthy. by representing the data in some visual display [13]. Mostow
The rules and regulations are made possible by the use of used Listen tool for understanding of their learners and become
experienced academicians. aware of what is happening in distance classes [14]. Damez,
M., Dang, T.H., Marsala, C. and Bouchon-Meunier, B., used a
c. Data mining Techniques fuzzy decision tree for user modeling and discriminating a
Historical Students records are collected from various localities learner from an experimented consumer automatically. They
and data mining techniques are applied. The unwanted data are use an agent to learn the cognitive characteristics of a user’s
removed by applying preprocessing technique. There are relations and classify users as experimented or not [15]. Bari
multiple data mining techniques have been used in EDM and Benzater recover data from pdf hypermedia productions
process including classification, association, prediction, for serving the assessment of multimedia presentations, for
clustering, sequential pattern and decision tree. statistics purpose and for extracting relevant data. They
recognize the major blocks of multimedia presentations and
d. Analyzing for Recommendation recover their internal properties [16]. Qasem A. Al.Radaideh
The goal to have constraint about how to develop educational has introduced CRISP Framework for mining student related
system efficiency and get used to it to the performance of their academic data. He have used decision tree as a classification
students, have measures about how to enhanced categorize technique for rule mining in academic data [17]. Cristobal
institutional resources (human and material) and their Romero also introduced a tool KEEL which is a software tool
educational offer, enhance educational programs offer and to access evolutionary algorithms to solve various data mining
determine effectiveness of the new computer mediated distance problems in regression, classification and unsupervised
learning approach. learning [18]. C.Marquez-Vera has introduced an SMOTE
algorithm for classifying rebalanced students success rate using
e. Students 10 Fold Cross Validation, which the rules was implemented
The objective is absorb students learning experience,resources, and tested in WEKA [19]. Dragan Gašević identifies the
their activities and interest of learning based on the critical topics that require immediate research attention for
responsibilities already done by the student and their successes learning analytics to make a sustainable impact on the research
and on errands made by other similar learners, etc. and practice of learning and teaching by using online learning
tool and video annotation tool [20].
Zaiane and Luo discusses about Web Utilization Miner MINING
(WUM) for students’ feedback system. This added value by
specific event recording on the E- learning side will give click Cristobal Romero compares different type of data mining
steams and the patterns discovered a better meaning and classification techniques with the use of Moodle usage data of
interpretation [5]. Silva and Vieira uses web usage ranking Cordoba University. He modeled a rebalanced preprocessing
technique to identify student information web pages. MultiStar technique for classifying original numerical data [21].
textually presents the patterns it finds. The patterns resulting Ramasami has developed a Predictive data mining model for
from the classification task are expressed through certain rules identifying slow learners and to study the influence of the
[6]. Shen implements student academic performances dominating factor of their academic performances. He also
visualization through statistical graphs [7]. Romero discovers introduced CHAID model for predicting slow learners in an
interesting prediction rules from student usage information to accurate manner [22]. Edin Osmanbegovie successfully
improve adaptive web courses like AHA!(Adaptive implemented a datamining technique in higher education
Hypermedia for All). He also uses a visual tool (EPRules) socio-demographic variables and analysis high school entrance
discover prediction rules and it is oriented to be used by the exam and attribute related to it. He also uses some of
teacher [8]. Tane propose an ontology-based tool to build the supervised algorithms for analyzing the student data [23].
majority of the resources available on the web. He uses text Sujeet Kumar Yadav has used three decision trees and three
mining and text clustering techniques in order to group papers machine learning algorithms (ID3, C4.5, and CART) for
according to their topics and similarities [9]. Merceron and obtaining students predictive model. He also classifies by
Yacef , used traditional SQL queries to mining student data giving accuracy value in time and identifies student’s success
captured from a web-based tutoring tool and association rule and failure ratio [24].
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 05 Issue: 02 December 2016, Page No.167-171
ISSN: 2278-2419
Table 1: Educational Data Mining Tools

Authors Tool Mining task Findings

Zaiane and Association and E- learning side will give click steams and the patterns discovered
Luo (2001) Patterns a better meaning and interpretation.
Silva and MultiStar textually presents the patterns it finds. The patterns
Association and
Vieira MultiStar resulting from the classification task are expressed through
(2002) certain rules.
Shen et al. Data Analysis Association and E- Learning system for solving problem like students and
(2002) Center classification teacher’s interaction problem in assignments and other problems.
Grammar based genetic programming with multi-objective
Romero et optimization techniques for providing a feedback to course ware
EPRules Association
al. (2003) authors was developed and identifies increasing relationship in
student’s usage data.
The Course ware Watch dog addresses the different needs of
Tane et al. Text mining and teachers and students. It integrates the Semantic Web vision by
(2004) Clustering using ontologies and a peer-to-peer network of semantically
annotated learning material.
Merceron Implementing traditional SQL queries to mining student data
Classification and
and Yacef TADA-ED from a web based tutoring tool. The objectives of fining mistakes
(2005) that occur often are done with accurate rating.
The proposed filtering functionality 1) They have the support of
the ontology to understand the domain, and establish interesting
Becker et
O3R Sequential patterns filters, and 2) Direct manipulation of domain concepts and
al. (2005)
structural operators minimize the skills required for defining
Log each distinct type of tutorial event in its own table. Include
Mostow et
Listen tool Visualization student ID, computer, start time, and end time as fields of each
al. (2005)
such table so as to identify its records as events.
Merceron Implementing traditional SQL queries to mining student data
Classification and
and Yacef TADA-ED from a web based tutoring tool. The objectives of fining mistakes
(2005) that occur often are done with accurate rating.
Main features of two tools that facilitate analysis of complex field
Avouris et Statistics and
Synergo/ ColAT data of technology mediated learning activities, the Synergo
al. (2005) visualization
Analysis Tool and ColAT.
GISMO has been implemented based on authors previous
Mazza and
GISMO/ experience with the CourseVis research, and proposes some
Milani Visualization
CourseVis graphical representations that can be useful to gain some insights
on the students of the course.
Four new steps were expected as four questions were asked, but
Damez et that made a notice that one user missed a question by double-
TAFPA Classification
al. (2005) clicking accidentally on the button “Show next question”. It can
lead to some mistakes to use the LCS.
Qasem A.
CRISP The classification algorithms ID3, C4.5 and Naive Bayes
Al.Radaide Classification
Classifier correctly classification parentage accuracy rating is not so high.
h (2006)
Cristobal Association rule mining has been used to provide new, important
classification and
Romero KEEL and therefore demand-oriented impulses for the development of
(2009) new bachelor and master courses
Rule induction algorithms such as JRip, NNge, OneR, Prism and
10 Fold Cross Ridor; and decision tree algorithms such as J48, SimpleCart,
Validation using Classification ADTree, RandomTree and REPTree are used for experiment
Vera (2010)
WEKA Tool because these algorithms can be used directly for decision making
and provides detailed classification results.
Dragan Online learning Transition graphs are constructed from a contingency matrix in
Classification and
Gasevic tools and video which rows and columns were all events logged by the video
(2015) annotation tool annotation tool.

Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 05 Issue: 02 December 2016, Page No.167-171
ISSN: 2278-2419
Table 2: Techniques for Educational Data Mining

Authors Techniques Mining task Findings

C4.5 and CART algorithms are simple for instructors to
understand and interpret. GGP algorithms have a higher
Romero Moodles Classification
expressive power allowing the user to determine the specific
format of the rules.
Features whose chi-square values were greater than 100 were
M. Ramasami given due considerations and the highly influencing variables
CHIAD model Association
(2010) with high chi-square values. These features were used for the
CHAID prediction model construction.
The Results shows that the naïve Bayes algorithms performance
Edin Osman Classification
L86 Classifier and accuracy level for decision tree is much more than that of
begovie (2012) and Association
the neural network method for decision tree classification.
ID3, C4.5 and CART machine learning algorithms that produce
Sujeet Kumar C4.5,ID3,CART predictive models with the best class wise accuracy. Classifiers
Yadav (2012) Algorithm based tool accuracy True positive rate for FAIL is 0.786 for ID3 and C4.5.
The created model successfully classified.
The J48 classifier correctly classifies about 66% of the instance
with 10-fold cross-validation testing and 66.59 % for the
Kabakchieva CRISP-DM Classification
percentage split testing and produces. The achieved results are
slightly better for the percentage split testing option.
Abeer Badr El For measurements of best attribute for a particular node in the
Decission Tree (ID3
Din Ahmed Classification tree Information Gain are used with attribute A, relative to a
(2014) collection of sample S.
Implemented EDM, theory and application to solve the problem
GP-ICRM for rule
Xing Wanli of predicting student’s performance in a CSCL learning
generation and work Classification
(2015) environment with small datasets. Model for performance
prediction is evaluated using a GP algorithm.
Single Model: Decision trees (J48) was used for single filtering
base model.
Classification Online Bagging: Implemented online bagging using Naive
Satyanarayana Bootstrap Averaging
and Clustering Bayes as the base model. Ensemble Filtering: The proposed
algorithm uses the following classifiers: J48, RandomForest and
Naive Bayes.

Dorina Kabakchieva implemented CRISP (Cross-Industry educational data mining and core paths of EDM. The
Standard Process) approach for Data mining model for non- techniques and tools discussed in this survey will provide a
property, freely available and application neutral standard for clear cut idea to the young educational data mining researchers
data mining projects. Author also discusses about decision tree to carry out their work in this field. Also, this research work
classifier of NaiveBayes and BayesNet with J48 10 fold cross carried out on the areas which make data mining process with
validation and J48 percentage split and identifies weighted educational data mining in a batter way. Finally, it is confirmed
average. Same J48 10 fold cross validation and J48 percentage that most of the classification algorithms perform in a better
split comparison testing was carried for K-NN Classifier (with way of understating the current trends of EDM by the students
k=100 and k=250) and OneR and JRip classifiers [25]. Abeer as well as academicians.
Badr El Din Ahmed uses decision tree method for predicting
students’ performance with the help of ID3 Algorithm [26]. References
Xing Wanli introduces student prediction messures by using
different rules and uses gnetic operator for classification and [1] Sheard. J, Ceddia. J, Hurst. J & Tuovinen. J, “Inferring
evaluate offspring for analysing student participations [27]. student learning behaviour from website interactions: A
Ashwin Satyanarayana uses multiple classifiers such as J48, usage analysis”, Journal of Education and Information
NaiveBayes, and Random Forest for classifying students’ Technologies, 2003, Vol: 8(3), pp. 245-266.
prediction. He also uses K-means clustering algorithm for [2] Sheard. J, Ceddia. J, Hurst. J & Tuovinen. J, “Determining
calculating similar cluster cancroids average in student cluster website usage time from interactions: Data preparation and
[28]. analysis”, Journal of Educational Technology Systems,
2003, Vol: 32(1), pp.101-121.
V. CONCLUSION [3] Muehlenbrock, Martin.,“Automatic action analysis in an
interactive learning environment”, Proceedings of the 12th
Educational data mining is the most valuable research area International Conference on Artificial Intelligence in Edu-
which makes society a better one by giving nice prediction cation. 2005, pp.452-455.
techniques for academician, teachers and students. The papers [4] Anuradha. C and T. Velmurugan, “A Data Mining based
discussed in this survey will give the detailed thought of Survey on Student Performance Evaluation System”, IEEE
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 05 Issue: 02 December 2016, Page No.167-171
ISSN: 2278-2419
Int. Conference on Computational Intelligence and Com- [18] Romero. C, Alcala-Fdez. J, Sanchez. L, Garcia. S, del Je-
puting Research, 2014, pp. 452-455. sus. M. J, Ventura. S, Garrell. J. M,& Fernandez. J,
[5] Zaiane. O, & Luo. J, “Web usage mining for a better web- “KEEL: a software tool to assess evolutionary algorithms
based learning environment”, In Proceedings of confer- for data mining problems.” Soft Computing, 2009,
ence on advanced technology for education, Banff, Al- Vol:13(3), pp. 307-318.
berta, 2001, pp. 60–64. [19] Marquez-Vera. C, ROMERO. C, "Predicting School Fail-
[6] Silva, D., & Vieira, M. “Using data warehouse and data ure Using Data Mining", Educational Data Min-
mining resources for ongoing assessment in distance learn- ing, 2010,2011.
ing”. In IEEE international conference on advanced learn- [20] Dragan Gasevic, “Let’s not forget: Learning analytics are-
ing technologies, Kazan, Russia, 2002, pp. 40–45. about learning", Association for Educational Communica-
[7] Shen, Ruimin, Fan Yang, and Peng Han. “Data analysis tions and Technology, 2015, Vol: 59(1), pp. 64-71.
center based on e-learning platform”, The Internet Chal- [21] Romero. C, Ventura. S, Espejo. P. G, & Hervas. C, “Data
lenge: Technology and Applications. Springer Nether- mining algorithms to classify students”, Educational Data
lands, 2002, pp.19-28. Mining 2007, 2007, pp:1-10.
[8] Cristobal Romero, Sebastian Ventura, Paul de Bra & Car- [22] Ramaswami. M, and R. Bhaskaran, “A CHAID based per-
los de Castro, “Discovering prediction rules in AHA! formance prediction model in educational data mining”,
Courses”, International Conference on User Modeling, 2010, arXiv preprint arXiv:1002.1144.
Springer Berlin Heidelberg, 2003, pp.25-34. [23] Edin Osmanbegovic & Mirza Suljic , “DATA MINING
[9] Tane, Julien, Christoph Schmitz, and Gerd Stumme, “Se- APPROACH FOR PREDICTING STUDENT PERFOR-
mantic resource management for the web: an e-learning MANCE”, Journal of Economics and Business, 2012, Vol:
application”, Proceedings of the 13th international World 10(1), pp. 3-12.
Wide Web conference on Alternate track papers & posters, [24] Surjeet Kumar Yadav, Saurabh Pal, “Data Mining: A Pre-
ACM, 2004, pp. 1-10. diction for Performance Improvement of Engineering Stu-
[10] Merceron, Agathe, and Kalina Yacef, “Tada-ed for educa- dents using Classification”, World of Computer Science
tional data mining”, Interactive multimedia electronic and Information Technology Journal (WCSIT), 2012, Vol:
journal of computer-enhanced learning, 2005, Vol:7(1), 2(2), pp. 51-56.
pp: 267-287. [25] Dorina Kabakchieva, “Predicting Student Performance by
[11] Vanzin, Mariangela, Karin Becker, and Duncan Dubugras Using Data Mining Methods for Classification”, Cybernet-
Alcoba Ruiz , “Ontology-based filtering mechanisms for ics and Information Technologies, 2013, Vol: 13(1),
web usage patterns retrieval”, International Conference on pp.61-72.
Electronic Commerce and Web Technologies, Springer [26] Abeer Badr El Din Ahmed and Ibrahim Sayed Elaraby,
Berlin Heidelberg, 2005, pp. 267-277. “Data Mining: A prediction for Student's Performance Us-
[12] Avouris, N., Komis, V., Fiotakis, G., Margaritis, M. and ing Classification Method”, World Journal of Computer
Voyiatzaki, E., “Logging of fingertip actions is not enough Application and Technology, 2014, Vol: 2(2), pp. 43-47.
for analysis of learning activities”, In 12th International [27] Xing Wanli , Guo Rui, Petakovic Eva & Goggins Sean,
Conference on Artificial Intelligence in Education, AIED “Participation-based student final performance prediction
05 Workshop1: Usage analysis in learning systems, 2005, model through interpretable Genetic Programming: Inte-
pp.1-8. grating learning analytics, Educational data mining and
[13] Mazza. R , and Milani. C , “Exploring usage analysis in theory”, Computers in Human Behavior, 2015, Vol: 47,
learning systems: Gaining insights from visualizations”, pp. 168–181.
Workshop on usage analysis in learning systems at 12th [28] Ashwin Satyanarayana, Gayathri Ravichandran, "Mining
international conference on artificial intelligence in educa- Student data by Ensemble Classification and Clustering
tion, 2005, pp. 65-72. for Profiling and Prediction of Student Academic Perfor-
[14] Mostow. J, Beck. J, Cen. H, Cuneo. A, Gouvea. E, & mance", 2016 ASEE Mid-Atlantic Section Conference,
Heiner. C,  “An educational data mining tool to browse tu- 2016.
tor-student interactions: Time will tell”, Proceedings of the
Workshop on Educational Data Mining, National Confer-
ence on Artificial Intelligence. AAAI Press, 2005, pp. 15-
[15] Damez. M, Marsala. C, Dang. T, & Bouchon-Meunier. B,
“Fuzzy decision tree for user modeling from human–com-
puter interactions”, In International conference on human
system learning: Who is in control?,2005, pp.287–302.
[16] Bari. M, & Benzater. B, “Retrieving data from pdf interac-
tive multimedia productions”. In International conference
on human system learning: Who is in control? ,2005,
[17] Qasem A. Al-Radaideh, Emad M. Al-Shawakfa, Mustafa
I. Al-Najjar, “Mining Student Data Using Decision Trees”,
The 2006 International Arab Conference on Information
Technology, Jordan,2006, pp.1-5.


You might also like