V. Vijayalakshmi K. Venkatachalapathy V. Ohmprakash

Research Scholar Professor B.Tech - IV Yr / ECE
Computer Science and Engineering Division of Computer and Information Panimalar Engg. College
Annamalai University Science, Chennai, India
Tamilnadu, India Annamalai University [email protected]
[email protected] Tamilnadu, India
[email protected]

Abstract – Educational Data Mining (EDM) is applying 2. ATT RIBUTES AND METHODS
data mining techniques in educational data. Prediction of For this comparison we searched databases: IEEE Xplore,
student performance is complicated one due to the large Spinger Link, Science Direct, ACM digital Library. Search
amount of data in educational field. Now a day there is lack items: Journal articles, workshops papers and conference
of existing survey to get clear view about predictions. There papers. In previous work of predicting student’s performance
are two factors involve in this process such as attributes for some attributes used with specific methods are depicted in
prediction and prediction methods. The main objective of this Table I.
paper is to provide the idea of data mining techniques mostly
used to predict the students performances. We compare the TABLE I. COMMON ATTRIBUTES AND METHODS USED TO
prediction accuracy percentage of different data mining PREDICT STUDENTS PERFORMANCE
methods such as Decision Tree, Neural Network, Naive
Bayes, K-Nearest Neighbor, and Support Vector Machine. S.
Among these techniques the Decision Tree and Neural No.
Network provide best accuracy. Decision Tree
Internal assessments
Keywords— Classification Technique; Educational Data
1 Neural Network
Mining; Decision Tree; Neural Network. K-Nearest Neighbor
Internal assessments,
2 Support Vector Machine
Predicting student’s performance is important part in Internal assessments, Decision Tree
educational field. This process leads to achieve the excellent CGPA, Extra-curricular Naive Bayes
record in academic. Usamah et al. (2013) stated that students activities K-Nearest Neighbor
performance can be improve by measuring the learning Support Vector Machine
assessment and co-curriculum [23]. The measurement is Internal assessments, Decision Tree
necessary to predict the learning level of the students. The 4 CGPA, Student Naive Bayes
final grades are used to evaluate student’s performance. Final Demographic K-Nearest Neighbor
grades are based on course structure, assessment mark, final
Internal assessments,
exam score and also extracurricular activities.
5 External assessment Neural Network
The evaluation is important to maintain student’s
performances and the effectiveness of learning process. Data
External assessment,
mining is the most popular techniques to analyze student’s Neural Network
6 Student Demographic,
performance. Data mining has been broadly applied in Naive Bayes
High school background
educational area in recent times [24]. It is called educational
Decision Tree
data mining. Educational data mining is a process used to
Psychometric factors Neural Network
mine the useful information and patterns from a huge 7
K-Nearest Neighbor
educational database. Predicting the performance is very
Support Vector Machine
important to improve the quality of learning skill of the
students. Decision Tree
External assessments
The next section describes about the attributes and methods 8 Naive Bayes
to be used for this comparison. Then, a description on factors Neural Network
described in section 3. In section 4, the detail on the existing Decision Tree
prediction methods with its prediction accuracy are 9 Neural Network
discussed. Lastly the conclusion is in section 5. Naive Bayes
CGPA, Student Decision Tree
10 Demographic, High school Neural Network
background, Scholarship, Naive Bayes

Social network interaction 1. CGPA (Cumulative Grade Point Average) is the

mostly used attribute to predict the performance of
CGPA, Student 2. Internal assessments
Demographic, High school - Assignment mark, quizzes, lab work, class test
background, Scholarship, Decision Tree and attendance
Social network interaction, 3. Demographic
Internal assessments, - Gender, age, family background, and disability
Extra-curricular activities 4. External assessments
Student Demographic, - Mark obtained in final exam for a particular
12 Neural Network
High school background subject
External assessment, 5. Extra-curricular activities
CGPA, Student Decision Tree 6. High school background
Demographic, Extra- 7. Social interaction network
curricular activities 8. Psychometric factor
Psychometric factors, - Student interest, study behavior, engage time,
Decision Tree
14 Extra-curricular activities, and family support
soft skills
Student Demographic, 3.2 The important prediction methods used for student
High school background, performance
Decision Tree
15 Internal assessment,
Student Demographic, In educational data mining there are many algorithms for
Extra-curricular activities classification techniques had applied to predict the student’s
Internal assessments, performances. Among the algorithms mostly used are
External assessment, Decision Tree Decision Tree, Neural Networks, Naïve Bayes, K-Nearest
16 Neighbor and Support Vector Machine.
Demographic, Extra- Neural Network
curricular activities
3.2.1 Decision Tree

3. IMPORTANT FACTORS ON PREDICTING One of a popular technique for prediction is

STUDENTS PERFORMANCE Decision Tree. Most of researchers have used this technique
because of its simplicity and comprehensibility to uncover
There are two factors for predicting performances of small or large data structure and predict the value [21, 5, 9].
students, which are attributes and methods. Table 1 shows a The decision tree models are easily understood because of
detail list of common attributes and list of methods used in their reasoning process and it can be directly converted into
predicting student’s performance. Primary step is focused on set of IF-THEN rules [1]. As shown in Table 1, there are
the important attributes used in predicting student approximately ten (13) papers that have used Decision Tree
performance and next step is focused on the prediction as their method to predict the students performance. The
methods used in predicting students performance. Figure 1 previous studies using Decision Tree method are predicting
clearly explained about the attributes used for prediction. drop out features of student’s data for academic performance
[21], predicting the suitable career for a student through their
3.1 The important attributes used in predicting student’s behavioral patterns [8] and also predicting third semester
performance performance of MCA students [10]. The samples of dataset
are student’s final grades [15], the cumulative grade point
High school Social interaction average (CGPA) [22] and marks obtained in particular
background courses [9]. All this datasets were considered and analyzed to
Attributes for predicting
student performance discover the main attributes or factors that may affects the
CGPA student’s performance [3, 9]. Finally, the appropriate data
mining algorithm will be investigated to predict student’s
Demographic External factor
performance [7]. The classification techniques are compared
Attributes assessments for predicting student’s performance in their study [6]. Gray
Student interest,
et al. (2014) investigated the accuracy of classification
Assignment mark,
Mark obtained in study behaviour, models to predict learner’s progression in tertiary education
quizzes, lab work, Gender, age, engage time, and
class test and family background final exam for a [2]. The model focused on analyzing the prediction of the
particular subject family support
attendance and disability academic performance of the students by using some
classification algorithms [18]. Accurately predicting
Figure.1 Classification of attributes
student performance, compare the accuracy of data
There are nearly eight attributes are formed by grouping mining algorithms [19].
common classifications.
3.2.2 Neural Networks

Neural network is another popular technique used in

educational data mining. The main advantage of neural

network is it has the ability to detect all the possible (2015)[4]

interactions between predictors variables [2]. Neural network CGPA, Student
could also do a complete detection without having any doubt Demographic,
even in complex nonlinear relationship between dependent High school Osmanbegovic
and independent variables. So Neural Network is best background, 73% and Suljic
predictor method [12]. There are nine (9) papers have been Scholarship, (2008) [5]
published using Neural Network method. The papers present Social network
an Artificial Neural Network model to predict the student’s interaction
performance [11] [12]. The attributes analyzed by Neural Internal
Network are admission data [13], student’s attitude towards Mayilvaganan
self-regulated learning and academic performance [14]. The 66% and
CGPA, Extra-
paper presents how data can be preprocessed and improve the Kapalnadevi
accuracy of the student’s final grade prediction model for a (2014) [6]
particular course [24]. The remaining papers are using Student Ramesh et al.
decision tree also [5, 7, 18]. The results of prediction Demographic, (2013) [7]
accuracy are summarized in Table 1. 65%
High school
3.2.3 Naive Bayes Internal
Naive Bayes algorithm is also a choice to make a Student Elakia et al.
prediction. There are six (6) papers that have used Naive 90%
Demographic, (2014)[8]
Bayes algorithms to estimate student’s performance. The Extra-curricular
objective of all these papers is to find the most effective activities
prediction technique for predicting student’s performance by External
making comparisons [5, 6, 7, 4, 19, 20]. The result is shown assessment,
in Table 1. Natek and
CGPA, Student
90% Zwilling
3.2.4 K-Nearest Neighbor (2014)[9]
Four papers studied showed that K-Nearest Psychometric
Neighbor is used with good accuracy. K-Nearest Neighbor factors, Extra-
method had taken less time to identify the different levels Mishra et al.
curricular 88%
students’ performance such as slow learner, average learner, (2014) [10]
activities, soft
good learner and excellent learner [15, 6]. This method gives skills
a best accuracy in estimating the detailed pattern for learner’s Sharma et al.
progression in tertiary education [2, 19]. CGPA 98%
(2011) [20]
3.2.5 Support Vector Machine assessments,
Supervised learning method used Support Vector Ruby, Jai et al.
assessment, 73%
Machine for classification. There are three papers that have (2014) [18]
used Support Vector Machine as their method to predict Extra-curricular
student’s performance. Hamalainen et al. (2006) had chose activities
Support Vector Machine as their prediction technique Internal Nghe, Nguyen
because it suitable well in small datasets [17]. Sembiring et assessments, Thai et al.
al. (2011) stated that Support Vector Machine has a good 69%
CGPA, Student (2007) [19]
generalization ability and faster than other methods. [16]. Demographic
Support Vector Machine method has acquired the highest
Wang and
prediction accuracy in identifying students at risk of failing Internal
81% Mirovic
[6]. assessments
(2002) [11]
Psychometric Gray et al.
External Arsad et al.
assessment (2013)[12]
Prediction Neural
Methods Attributes Authors Jishan et al.
Accuracy Network CGPA 75%
Internal Romero et al.
76% CGPA, Student
assessments (2008) [1]
Psychometric Gray et al. Osmanbegovic
Decision 65% High school
factors (2014)[2] 71% and Suljic
Tree background,
External Bunkar et al. (2008) [5]
85% Scholarship,
assessment (2012)[3]
Social network
CGPA 91% Jishan et al.

interaction assessment, and

Student CGPA, Extra- Kapalnadevi
Demographic, Ramesh et al. curricular (2014)[6]
High school (2013) [7] activities
background Internal Hamalainen et
External assessment, CGPA al. (2006) [17]
Student 74% Oladokun et
Demographic, al. (2008)[13] 4. DISCUSSIONS
High school
background This comparison is based on the highest accuracy of
Internal prediction methods and also the important factors that may
Anupama and
assessments, influence the student’s performance. Figure 2 shows the
98% Vijayalakshmi
External prediction accuracy that uses classification method grouped
(2012) [14]
assessment by algorithms for predicting student’s performance since
Internal 2002 to 2016. ACCURACY is the overall correctness of the
assessments, model and is calculated as the sum of correct classifications
External divided by the total number of classifications. By seem to be
Ruby, Jai et al.
assessment, 74% at the graph in Figure 2, Neural Network and Decision Tree
(2014) [18]
Demographic, has the highest prediction accuracy by (98%) followed by
Extra-curricular Naive Bayes by (94%). Lastly, Support Vector Machine and
activities K-Nearest Neighbor gave the same accuracy, which is (83%).
CGPA, Student The result on prediction accuracy is depending on the
Demographic, attributes and also prediction method that were used during
High school Osmanbegovic the prediction process.
background, 76% and Suljic Neural Network gave high accuracy (98%) with
Scholarship, (2008) [5] combination of internal and external assessments. This
Social network method got (97%) with external assessment, (81%) with
interaction internal assessments and low accuracy (69%) with
Student psychometric factors.
Demographic, Ramesh et al. Decision Tree gave highest prediction accuracy
50% (98%) for CGPA and lowest accuracy (65%) for Student
High school (2013) [7]
background Demographic, High school background and Psychometric
Naive Jishan et al. factors.
CGPA 75% Then next is Naive Bayes with prediction accuracy
Bayes (2015)[4]
Internal around (94%) as highest for CGPA and lowest (50%) for
Mayilvaganan Student Demographic, High school background.
and K-Nearest Neighbor gave high prediction accuracy
CGPA, Extra- 73%
Kapalnadevi (83%) for internal assessment, CGPA, Extra-curricular
(2014) [6] activities and low accuracy (62%) for internal assessments,
Sharma et al. CGPA, Student Demographic.
CGPA 94% Support Vector Machine provided high prediction
(2011) [20]
Internal accuracy (83%) for Psychometric factors.
Nghe, Nguyen
71% Thai et al.
CGPA, Student
(2007) [19]
Psychometric Gray et al.
factors (2014) [2]
CGPA, Extra- 83%
K- curricular
(2014) [6]
Nearest activities
Neighbor Internal Bigdoli et al.
assessment (2003)[15]
Nghe, Nguyen
assessments, Figure 2. Prediction accuracy grouped by algorithms
62% Thai et al.
CGPA, Student
(2007) [19]
Support Psychometric Sembiring et
Vector factors al. (2011) [16]
Predicting student’s performance is mainly helpful
Machine Internal 80% Mayilvaganan
to the educators and learners for improving their learning and

teaching process. This paper has reviewed the earlier studies [11] T. Wang, A. Mitrovic, Using neural networks to predict student’s
performance, in: Computers in Education, 2002. Proceedings.
on predicting the student’s performance with various data
International Conference on, IEEE, 2002, pp. 969–973.
mining methods. Most of the researchers have used [12] P. M. Arsad, N. Buniyamin, J.-l. A. Manan, A neural network students’
cumulative grade point average (CGPA) and internal performance prediction model (nnsppm), in: Smart Instrumentation,
assessment as data sets for prediction. For the prediction Measurement and Applications (ICSIMA), 2013 IEEE International
Conference on, IEEE, 2013, pp. 1–5.
techniques, the classification method is often used in [13] V. Oladokun, A. Adebanjo, O. Charles-Owaba, Predicting students
educational data mining area. Among the classification academic performance using artificial neural network: A case study of
techniques, Neural Network and Decision Tree are the two an engineering course, The Pacific Journal of Science and Technology
methods highly used by the researchers for predicting 9 (1) (2008) 72–79.
[14] D. M. S.Anupama Kumar, Appraising the significance of self regulated
student’s performance. This comparison will help the learning in higher education using neural networks, International
educational system to observe the students performance and Journal of Engineering Research and Development Volume 1 (Issue 1)
can improve. (2012) 09–15.
[15] B. M. Bidgoli, D. Kashy, G. Kortemeyer, W. Punch, Predicting student
performance: An application of data mining methods with the
educational web-based system lon-capa, in: Proceedings of
REFERENCES ASEE/IEEE frontiers in education conference, 2003.
[16] S. Sembiring, M. Zarlis, D. Hartama, S. Ramliana, E. Wani, Prediction
[1] C. Romero, S. Ventura, P. G. Espejo, C. Herv´as, Data mining
of student academic performance by an application of data mining
algorithms to classify students, in: Educational Data Mining 2008.
techniques, in: International Conference on Management and Artificial
[2] G. Gray, C. McGuinness, P. Owende, An application of classification
Intelligence IPEDR, Vol. 6, 2011, pp. 110–114.
models to predict learner progression in tertiary education, in: Advance
[17] W. H¨am¨al¨ainen, M. Vinni, Comparison of machine learning methods
Computing Conference (IACC), 2014 IEEE International, IEEE, 2014,
for intelligent tutoring systems, in: Intelligent Tutoring Systems,
pp. 549–554.
Springer, 2006, pp. 525–534.
[3] K. Bunkar, U.K. Singh, B. Pandya, R. Bunkar, Data mining: Prediction
[18] Ruby, Jai, and K. David. "Predicting the Performance of Students in
for performance improvement of graduate students using classification,
Higher Education Using Data Mining Classification Algorithms–A
in: Wireless and Optical Communications Networks (WOCN), 2012
Case Study." International Journal for Research in Applied Science &
Ninth International Conference on, IEEE, 2012, pp.1–5.
Engineering Technology (IJRASET) 2 (2014).
[4] S. T. Jishan, R. I. Rashu, N. Haque, R. M. Rahman, Improving
[19] Nghe, Nguyen Thai, Paul Janecek, and Peter Haddawy. "A comparative
accuracy of students final grade prediction model using optimal equal
analysis of techniques for predicting academic performance." Frontiers
widthbinning and synthetic minority over-sampling technique,
In Education Conference-Global Engineering: Knowledge Without
Decision Analytics 2 (1) (2015) 1–25.
Borders, Opportunities Without Passports, 2007. FIE'07. 37th Annual.
[5] E. Osmanbegovi´c, M. Sulji´c, Data mining approach for predicting
IEEE, 2007.
student performance, Economic Review 10 (1).
[20] Sharma, Mamta, and Monali Mavani. "Accuracy Comparison of
[6] M. Mayilvaganan, D. Kalpanadevi, Comparison of classification
Predictive Algorithms of Data Mining: Application in Education
techniques for predicting the performance of students academic
Sector." Advances in Computing, Communication and Control.
environment, in: Communication and Network Technologies (ICCNT),
Springer Berlin Heidelberg, 2011. 189-194.
2014 International Conference on, IEEE, 2014, pp. 113–118.
[7] V. Ramesh, P. Parkavi, K. Ramar, Predicting student performance: a [21] M. M. Quadri, N. Kalyankar, Drop out feature of student data for
statistical and data mining approach, International Journal of Computer academic performance using decision tree techniques, Global Journal
Applications 63 (8) (2013) 35–39. ofComputer Science and Technology 10 (2).
[8] G. Elakia, N. J. Aarthi, Application of data mining in educational [22] Z. Ibrahim, D. Rusli, Predicting students academic performance:
database for predicting behavioural patterns of the students, Elakia et comparing artificial neural network, decision tree and linear regression,
al,/(IJCSIT) International Journal of Computer Science and Information in: 21st Annual SAS Malaysia Forum, 5th September, 2007.
Technologies 5 (3) (2014) 4649–4652. [23] U. bin Mat, N. Buniyamin, P. M. Arsad, R. Kassim, An overview of
[9] S. Natek, M. Zwilling, Student data mining solution–knowledge using academic analytics to predict and improve students’ achievement:
management system related to higher education institutions, Expert A proposed proactive intelligent intervention, in: Engineering
systems with applications 41 (14) (2014) 6400–6407. Education (ICEED), 2013 IEEE 5th Conference on, IEEE, 2013, pp.
[10] T. Mishra, D. Kumar, S. Gupta, Mining students’ data for prediction 126–130.
performance, in: Proceedings of the 2014 Fourth International [24] C. Romero, S. Ventura, Educational data mining: A review of the state
Conference on Advanced Computing & Communication Technologies, of the art, Trans. Sys. Man Cyber Part C 40 (6) (2010) 601–618.
ACCT ’14, IEEE Computer Society, Washington, DC, USA, 2014, pp.

