The Utilization of Naive Bayes
The Utilization of Naive Bayes
The Utilization of Naive Bayes
Abstract
An assessment of the success of a college is if the student's graduation rate is on time and high
every year. The timeliness of students' graduation can be influenced by several factors. This study
aims to determine the profile of the students who graduated both on time and not on time given
a certain graduation predicate set by the institution and to know the factors influencing students'
graduation. The model used in this study using the NBC to determine the graduation pattern and
the Decision tree to determine the influencing factors. In calculating the NBC algorithm using
Rapidminer, it was found that the profiles of students who graduated on time and late with the
predicate of less satisfactory, satisfactory, very satisfactory and cum laude. In the Decision Tree
calculation, the highest gain values are obtained in the IPK3, IPS1, and IPK2 attributes. This
research needs to be developed further by increasing the number of attributes and data, and it is
necessary to make a system to determine the accuracy of students' graduation from the patterns
that have been produced so that it can help universities to increase the level of students'
graduation every year.
Keywords: Information Systems, Data Mining, UNW.
1. INTRODUCTION
Students are the most important part of evaluating the success of the implementation of
study programs in tertiary institutions. Some efforts to improve the quality of a tertiary
institution can be made in various ways, including increasing the quota of new students,
increasing students' achievement both in academic and non-academic achievements,
and also increasing the graduation rate of students each year. Ngudi Waluyo University,
for example, has low graduation rates in Pharmacy study program. The comparison of
new students and graduate students is very high, this is because there are many students
who graduate not on time. The following Table 1 is a comparison between the level of
graduates and the acceptance of new students in the last four years in the Pharmacy
study program.
99
Some previous studies which examined the graduation rate of students are shown in
Table 2.
High and low students' graduation rates have many factors that become a problem in
college. In this research, data analysis will be conducted at Ngudi Waluyo University
on the Pharmacy study program for the year of 2012, 2013, 2014 and 2015. The analysis
carried out is by measuring the graduation level of students through predictions of the
timeliness of on time and not on time graduation based on the cumulative students'
achievement index during second, third and fourth semesters which will be classified
according to the graduation predicate consisting of satisfactory, very satisfactory,
cumlaude and several other attributes such as NIM, Name, Gender, scores on
mathematics courses, PMB (school enrolllment) test scores, origin and place of birth of
students, origin of previous schools, and parents / guardians' occupations of students
using the C.45 and Naive Bayes algorithm methods that aim to not only compare the
higehst accuracy of the two algorithms but also to find out what kind of students' profile
who can graduate on time? and what are the factors that can influence the timeliness of
students' graduation? It aims to provide information for the institution so that it can
In a study conducted by Romadhona [2], the results obtained that the highest
Information gain value is in the 4th-semester achievement index (IPS-4) with a value
of 0.340 and this attribute is eligible as the root. It is recommended to increase the
number of training data records in subsequent studies in order to obtain better
performance in the results of accuracy [2]. Subsequent research by Indah Puji Astuti
[3] obtains the result that the highest information gain value is found in the parents'
occupational attribute which is used as root in the study and then continued with the
attributes of the region and type of school of origin. From the results of this study, the
C4.5 algorithm has an accuracy value of 82%. It is recommended to look for other
factors, in addition to student self-data, for example, academic factors, family economic
conditions and psychological factors in determining students' graduation [3].
The difference between the previous researches above and the research that will be
carried out is the attribute that is used which is the development of previous studies. It
was conducted in a different place of study in determining the graduation criteria where
in this study, the graduation attribute is added with the graduation predicates of less
satisfactory, quite satisfactory, satisfactory, very satisfactory, and cum laude. And data
processing methods used Naïve Bayes and decision tree C.45. Where these two
algorithms are combined based on their characteristics, Naïve Bayes can predict the
future by knowing the graduation pattern of students, decision tree C.45 can find out
the most significant attributes in determining graduation.
2.2.4. The Attributes Of Semester Achievement Index 1-4 And Gpa 2 - Gpa 4
Semester achievement index and GPA according to [1] is the highest value in
determining students' graduation. NBC Achievement Index Algorithm has the highest
value in data processing. The higher the semester achievement index value and students'
GPA, the higher the opportunities of the students to graduate.
Graduates' Attributes are categorized by the title of, among others, unsatisfactory, quite
satisfactory, satisfactory, very satisfactory and cum laude. Graduation predicate can be
categorized as in Table 3.
There were X data that were not yet known its class as in Table 7.
How the Naive Bayes algorithm worked [10]:
1) First, read the training data in Table 6.
2) Second, calculate the mean and standard deviation of the predictor attributes in each
class. The result of this stage shown in Table 8.
3) Third, we looked for information gain values for each attribute and chose the largest
value from the calculation results. Entropy and Gain calculation results are shown
in Table 11.
3.1.5. Evaluation
After the data mining process was completed, an accurate model/pattern would be
obtained in predicting the timeliness of students' graduation using two algorithms, NBC
and C.45. The results of calculations with Naive Bayes can produce graduation patterns
based on the attributes used, while C.45 decision trees can find out the most significant
attributes in determining students' graduation.
4. RESULT AND DISCUSSION
The data used in this study were the data of the students who graduated on time with a
total of 147 graduates and who graduated late who were 81 graduates. The number of
training data was 228 graduates, the testing data were used by five graduates to
determine the accuracy of the NBC and C.45 algorithms. The attributes used as
parameters were 15 attributes, of which 14 were predictors and 1 was the result.
The results of experiments using NBC (in Table 12) could be known for its result on
the label of on-time C (graduated on time with a predicate of very satisfactory) higher
than other labels. Table 13, Shown students’ profiles based on NBC calculation.
Naïve Bayes algorithm calculation using Rapidminer got the students' profiles as shown
in Table 14 Students' profiles can be used as a pattern to find out whether students can
graduate on time or late with the title of A, B, C, or D. In this study, the use of two
Naïve Bayes and Decision tree algorithms have more complete results compared to
studies previous. Naïve Bayes is used for future predictions with patterns generated in
graduates and decision tree C.45 is used to determine the attributes that most play a role
in students' graduation.
6. REFERENCES
[1] Nasution, N., Djahara, K. and Zamsuri, A. (2015). Evaluasi Kinerja Akademik
Mahasiswa Menggunakan Algoritma Naïve Bayes (Studi Kasus: Fasilkom
Unilak). Digital Zone: Jurnal Teknologi Informasi dan Komunikasi, 6(2), 1-11
[2] Romadhona, A., Suprapedi, & Himawan, H. (2017). Prediksi Kelulusan
Mahasiswa Tepat Waktu Berdasarkan Usia, Jenis Kelamin, Dan Indeks Prestasi
Menggunakan Algoritma Decision Tree . Jurnal Teknologi Informasi, 13(1), 69-
83.
[3] Astuti, I. P. (2017). Prediksi Ketepatan Waktu Kelulusan Dengan Algoritma Data
Mining C4.5. Fountain of Informatics, 2(2), 41-45.
[4] Andri, Kunang, Y.N., & Murniati, S. (2013). Implementasi Teknik Data Mining
Untuk Memprediksi Tingkat Kelulusan Mahasiswa Pada Universitas Bina Darma
Palembang. Seminar Nasional Informatika . Yogyakarta.
[5] Erdogan, S. M. (2005). A Data Mining Application In A Student Database.
Journal Of Aeronautics And Space Technologies, 2(2), 53-57.
[6] Sidik, M., Rasminto, H., Iriani, A., & Manongga, D. (2017). Implementasi Data
Mining Untuk Prediksi Kelulusan Menggunakan Metode Klasifikasi Naive
Bayes. Jurnal Teknologi Informasi dan Komunikasi, 8(2), 13-20.
[7] Daniel, L. T. (2006). Data Mining Methods dan Models. John Wiley & Sons, Inc
Publication.
[8] Cahyaningtyas, C., Purnomo, H. D., & Kristianto, B. (2019). The Use of Naive
Bayes for Broiler Digestive Tract Disease Detection. JITCE (Journal of
Information Technology and Computer Engineering), 03, 1-7.
[9] Kusrini, &. E. (2009). Algoritma Data Mining. Yogyakarta: Andi Publishing.
[10] Saputra, M. F., Widiyaningtyas, T., and Wibawa, A. P. (2018). Illiteracy
Classification Using K Means - Naive Bayes Algorithm. International Journal On
Informatics Visualization, 2 (3), 153.
[11] Lungu, I., & Pirjan, A. (2010). Research Issues Concerning Algorithms Used For
Optimizing The Data Mining Process. IDEAS.