Analysis of Educational

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (Com-IT-Con), India, 14th -16th

Feb 2019

Analysis of Educational Data Mining using


Classification
Chitra Jalota, Rashmi Agrawal
Faculty of Computer Applications
Manav Rachna International Institute of Research and Studies
Faridabad, India
[email protected], [email protected]

Abstract— Higher education institutions are often very curious


to know about the success rate of the students throughout their II. LITERATURE SURVEY
study. For this reason, they need to use several methods like Most of the researcher have done their study in data
physical examination, Statistical methods and currently mining using for educational purposes to get the prophecy of
prevailing data mining techniques for the prediction of student’s the students' achievement. In [8] the performance of
performance. An upcoming area of research which uses
engineering students can be judged with the help of Decision
techniques of data mining is known as Educational Data Mining.
Tree (DT) algorithm. Around 340 students data was collected
It involves machine learning algorithms and statistical
techniques to help the user for interpretation of student’s for the prophecy of their achievement in the first year exams.
learning habits, their academic performance and further The build model was able to generate only 60% accuracy in
improvement if required. In this paper we will discuss various the training set. In [9] WEKA was used for the prognosis of
techniques of data mining which are useful for predicting marks of final year students and these were based on two
performance level of students. For this we used dataset of different dataset’s parameters. There was one common
kalboard 360and applied it on weka to analyze the data mining information in each dataset i.e. variety of students could be
techniques taken from one college course in last four semesters. In [10]
the author analyzed with his own reviews of past research
Keywords— Data Mining, Error Measurement, Accuracy, Naïve work done on performance prediction of students’ its analysis
Bayes, J48, Multilayer Perceptron(Key Words) and assessment by applying dissimilar techniques of data
mining. In [11] the authors measuring student performance
I. INTRODUCTION using Decision Tree classification techniques and used
artificial neural network to build classifier models. The
In the present scenario, data mining/Machine Learning is produced outcome was based on various traits to foresee the
a very important field of research and playing an outcome of the students. Analyzing the weakness and strength
indispensable responsibility in educational institutions and one of student which may be helpful to improve the performance
of the most important areas of exploration with the aim to find in future. This study shows the efficacy of applying the
out relevant facts taken from historical data stored in huge methods/procedures of data mining in course rating data and
dataset. Data mining for education i.e. Educational Data the data could be mined for education at higher level. In [12]
Mining (EDM) is the discipline which uses data mining the authors represent a study that will be beneficial to the
techniques in the environment of education. It is a very students and the teachers for the betterment to uplift the result
important research area which helps to predict useful of the students who are having more chances of non success.
information from educational databases to improve There are many parameters like Attendance, Seminar and
educational achievement and to have better assessment of the assignment marks were collected from very important
students learning process. Educational Data Mining could be resource i.e. previous database of students, to evaluate their
considered as a best option of the science of learning and as a prophecy at the semester end. The authors used Naïve Bayes
branch of data mining [1][2][3]. Educational Data Mining can classification algorithm that shows a highest accuracy
be useful while creating a model of user perception, action compared to other classification algorithms. The researchers
and trial [4]. Data Mining or knowledge discovery has gain in [13] worked on a relative research to examine various
decision tree algorithms and their influence on the data set
the popularity in such a way that it has become the emerging
choose for education to stereotype the education related
relevance because it is very helpful in examining data form
prophecy of stake holders i.e. students. It mainly cynosure on
divergent approach and abridge it into functional information choosing the top prioritized algorithm of decision tree and
[5]. Educational data mining relies on many data mining explain the detailed meaning of each one of them and the
techniques like k-nearest neighbor, neural networks, decision result shown that the regression as well as classification
trees, support vector machines, naive bayes, and many more methods are best because they are more compatible to produce
[6]. For doing quick analysis on data with the help of data better result with the dataset that is already tested.
mining techniques, there are many open source softwares like Researchers in [14] have concluded with an idea for the better
weka, rapid miner, orange, knime, SSDt (SQL Server data use of data mining techniques in the prediction of student’s
Tools) designed for data investigation and to get prophecy and also it provided the strong interpretation that
understandable structure for future use. In this paper, we use algorithms for prediction of data mining, Decision Tree and
WEKA (Waikato Environment for Knowledge Analysis) Neural network are the two prime methods which are highly
which is best suited for the analysis of data and to built a advisable by the researchers for the prediction of student’s
model to get predictive outcome. prophecy. Authors in [15] applied Data Mining techniques to
find and evaluate future results and factors which affect them.
978-1-7281-0211-5/19/$31.00 2019 ©IEEE 243

ized licensed use limited to: SVKM's NMIMS Mukesh Patel School of Technology Management & Engineering. Downloaded on August 26,2020 at 09:17:11 UTC from IEEE Xplore. Restrictions
Author in [16] discussed k-Nearest Neighbor (k-NN) representation. At the time of tuple classification, all attributes
algorithm which plays an effective role in the accuracy of the related to that tuple are redirected into a graph.
classifier.
x Rule based algorithms: In this method, classification
may be done on the basis of if then else rules for data
III. CLASSIFICATION TECHNIQUES classification.
For decision making procedure, Data mining is a very
favorable and constructive method. Classification is a very
simple and mostly used data mining technique. Knowledge of CLASSIFICATION
training data is mandatory for understanding of Classification.
There are two phases of classification procedure: Rule
Statis
x Development of a model for training Based

x Evaluating the model using testing data Distance Neural


On the basis of algorithms, different methods of Decision
Network
classification are: Tree

x Statistical based algorithms: Statistical procedures Fig 1 Different Classification Methods


are normally having an accurate fundamental probability
model which provides chances of being in each class rather
than just a simple classification. IV. DATA SET DESCRIPTION
x Correlation Analysis: It is a statistical method used to In this paper, we are using kalboard 360 dataset which lies
find the degree of association between two numerically in the domain of education and gathered using learning
measured, continuous variables (e.g. age and weight) is management system (LMS). This type of system always
related to each other. facilitates users with a contemporary use for the resources
related to education with the help of an instrument and
x Regression Analysis: This method describes that how Internet connection.
an independent variable is numerically associated with
dependent variable Collection of data is done through the tool which is called
learner activity tracker tool, called experience API (xAPI), a
x Bayesian Model: This method uses frequentist major part of the training and learning architecture (TLA)
technique. The essence of frequentist technique is to apply which authorize to check progress of learning and actions of
probability to data. Bayesian calculations go straight for the learner’s which may be an article’s reading or watching a
probability of the hypothesis. training video. The experience API helps the learning activity
providers to determine the learner, activity and objects that
x Distance based algorithms: Each item plot to a
describe a learning experience. There are 16 features and 480
particular class can be observe as same as other items are
student records in this dataset. There are three main categories
already present in that class and could be differentiated from
of features:
the items of other classes. There are two approaches for
classification on the basis of distance i.e Demographic features such as gender and nationality
x Simple Approach: In this method, an assumption is Educational features such as educational stage, grade
that each class is represented by its center. A new item can Level and section.
become a member of a class with the possibility of largest
Psychological features such as raised hand on class,
similarity value.
opening resources, answering survey by parents, and school
x K nearest neighbors: It is a non parametric method satisfaction.
which depends on the use of distance measurement All
available cases can be stored in it and whenever a new case V. IMPLEMENTATION OF CLASSIFIERS IN WEKA
entered, it can be classify based on the distance function.
x Decision tree based algorithms: According to this A. Using J48 algorithm
method, there is a requirement of construction of a tree to It is an extended version of ID3. Some additional features
model classification process. Two steps are required in this like accounting for missing values, decision trees pruning,
method of classification: derivation of rules etc. are added in J48. It is an open source
Java implementation of C4.5 algorithm.
a. Build a tree named with Decision Tree
b. Implementation of Decision tree to database
x Neural Network based algorithms: In this method, a
model is created which provides a format for data

244

ized licensed use limited to: SVKM's NMIMS Mukesh Patel School of Technology Management & Engineering. Downloaded on August 26,2020 at 09:17:11 UTC from IEEE Xplore. Restrictions
B. Using Support Vector machine algorithm D. Using Random Forest algorithm
This approach is of machine learning approach which is It is flexible, easy to use a supervised algorithm of
used for classification and regression analysis. But most of the classification. As per its name, this algorithm creates forest
time, it is used for classification challenges. Large amount of with a number of trees. More trees in the forest means more
data can be analyzed to find hidden patterns from them. robust the forest which indicates the high accuracy results. In
simple word, we can say that there are multiple decision trees
built by this algorithm which can be merged together to get
more stable and accurate prediction for result.

C. Using Naïve Bayes algorithm


It is a well built algorithm for the classification task. We
can achieve great results from this algorithm when we use the
same for text based data analysis like Natutal language E. Using Multilayer Perceptron
Processing (NLP). There is an assumption that a particular
It is a class of feeforward artificial neural network. It
feature and its value is independent of any other feature and
consist of three layers of nodes i.e. input layer, hidden layer
its value.
nad an output layer. It generates a set of outputs from a set of
inputs. A Multilayer perceptron consists of several layers of
input nodes which are connected to each other as a directed
graph between input and output layers. It is a deep learning
techniquewhich can be used in speech reognition, image
recognition and machine translation.

245

ized licensed use limited to: SVKM's NMIMS Mukesh Patel School of Technology Management & Engineering. Downloaded on August 26,2020 at 09:17:11 UTC from IEEE Xplore. Restrictions
usually referred to the best accuracy model. The graphical
representation in Figure 4 shows that the best classifier of
students' performance based on their dataset is the Multilayer
Perceptron classifiers. In the result, Multilayer Perceptron has
an efficient classification among other classifiers.
Table I. shows the performance accuracy of the five
classifiers based on different classification metrics. These
metrics are; (TP), (FP), Precision, Recall and F-measure
measure are very important to determine the classifiers based
on the accuracy. These metrics shows that Multilayer
Perceptron classifier performs better than other classifiers.

VI. RESULTS AND DISCUSSION


The experimental results and discussion have done on
selecting 163 instance. Five selected classification algorithms
were used; Random Forest, Naive Bayse, Multilayer
Perceptron, Support Vector machine and J48 each one has its
own characteristics to classify the data set. Table I shows
performance results of all classifiers by using WEKA, and
Figure 4 shows the accuracy performance of classification
techniques.

TABLE I. PERFORMANCE RESULT

Classifiers
Support
Random Naïve Multilayer Vector DT -
Criteria Forest Bayes Perceptron Machine J48 Fig.3. Classifiers Performance Metrics
73.6
Accuracy % 67.40% 64.40% 76.07% 75.40% 0%
Correctly VII. CONCLUSIONS
Classified
Data mining has a significant importance in educational
Instances 110 105 124 123 120
Incorrectly institutions. The knowledge acquired by the usage of data
Classified mining techniques can be used to make successful and
Instances 53 58 39 40 43 effective decisions that will improve and progress the student's
performance in education. Data set contains of 163 instance
and sixteen attributes. Five classifiers are used under weka
and the comparisons are made based on the accuracy among
these classifiers and different error measures are used to
determine the best classifier. Experiments results show that
Multilayer Perceptron has the best performance among other
classifiers. In future work, more dataset instance will be
collected and will be compared and analyzed with other data
mining techniques such as association and clustering.

VIII. SUGGESTIONS AND FUTURE PLAN


In future, research related to the same can do using
classification and clustering applications to increase the
prediction result in terms of speed and exactness in the field of
education.

REFERENCES
[1] M. Goyal and R. Vohra, “Applications of Data Mining in Higher
Fig.2 Classifiers Accuracy Performance
Education”, IJCSI International Journal of Computer Science Issues,
Vol. 9, Issue 2, No 1, March 2012.
In Table I, the Multilayer Perceptron classifier has more
correctly classified instances than other classifiers, which is

246

ized licensed use limited to: SVKM's NMIMS Mukesh Patel School of Technology Management & Engineering. Downloaded on August 26,2020 at 09:17:11 UTC from IEEE Xplore. Restrictions
[2] R. Huebner, “A survey of educational data mining research”, Research
in Higher Education Journal, 2012.
[3] M.S. Mythili, A.R. Mohamed Shanavas, “An Analysis of students’
performance using classification algorithms”, IOSR, Journal of
Computer Engineering, Volume 16, Issue 1, January 2014.
[4] S. Lakshmi Prabha, A.R.Mohamed Shanavas, “Educational data mining
applications”, Operations Research and Applications: An International
Journal (ORAJ), Vol. 1, No. 1, August 2014.
[5] C. Romero, S. Ventura and E. Garcia, "Data mining in course
management systems: Moodle case study and tutorial", Computers &
Education, Vol. 51, no. 1, pp. 368-384, 2008
[6] S. Ayesha, T. Mustafa, A. Sattar and M. Khan, “Data mining model for
higher education system”, Europen Journal of Scientific Research,
Vol.43, no.1, pp.24-29., 2010
[7] Weka: Data Mining Software in Java, University of
Waikato,[Online].Available:
http://www.cs.waikato.ac.nz/ml/index.html.
[8] Z. J. Kovacic, “Early prediction of student success: Mining student
enrollment data”, Proceedings of Informing Science & IT Education
Conference (In SITE) 2010.
[9] I. Milos, S. Petar, V. Mladen and A. Wejdan, Students’ success
prediction using Weka tool, INFOTEH-JAHORINA Vol. 15, March
2016.
[10] P. Kavipriya, A Review on Predicting Students’ Academic Performance
Earlier, Using Data Mining Techniques, International Journal of
Advanced Research in Computer Science and Software Engineering,
Volume 6, Issue 12, December 2016 ISSN: 2277 128X.
[11] N. Ankita, R. Anjali, Analysis of Student Performance Using Data
Mining Technique, International Journal of Innovative Research in
Computer and Communication Engineering, Vol. 5, Issue 1, January
2017.
[12] P. Shruthi, B. Chaitra, Student Performance Prediction in Education
Sector Using Data Mining, International Journal of Advanced Research
in Computer Science and Software Engineering, Vol. 6, Issue 3, March
2016.
[13] S.K Yadav, B. Bharadwaj, and S. Pal. Data Mining Applications: A
Comparative Study for Predicting Student’s Performance. International
Journal of Innovative Technology & Creative Engineering (ISSN: 2045-
711), Vol. 1, No.12, December 2012
[14] A.Mohamed Shahiria,, W. Husaina , N. Abdul Rashida, "A Review on
Predicting Student’s Performance using Data Mining Techniques"
Procedia Computer Science 72 ,414 – 422, ELSEVIER 2015.
[15] K. Kohli and S. Birla, " Data Mining on Student Database to Improve
Future Performance", International Journal of Computer Applications,
Vol.146 No.15, pp. 0975 – 8887, July 2016.
[16] Rashmi Agrawal, “Integrated Effect of k Nearest Neighbors and
Distance Measures in k-NN Algorithms”, International Journal of
Advances in Intelligent Systems and Soft Computing, vol. 654, pp.759-
765 , Springer, 2017
[17] Rashmi Agrawal, Neha Gupta “Educational Data Mining Review:
Teaching Enhancement”, Privacy and Security Policies in Big Data,
pp.149-165, IGI Global, 2017

247

ized licensed use limited to: SVKM's NMIMS Mukesh Patel School of Technology Management & Engineering. Downloaded on August 26,2020 at 09:17:11 UTC from IEEE Xplore. Restrictions

You might also like