A Predictive Model To Evaluate Student Performance

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/276863880

A Predictive Model to Evaluate Student Performance

Article  in  Journal of Information Processing · March 2015


DOI: 10.2197/ipsjjip.23.192

CITATIONS READS

23 1,931

4 authors, including:

Shaymaa Sorour Tsunenori Mine


Kafrelsheikh University Kyushu University
17 PUBLICATIONS   119 CITATIONS    141 PUBLICATIONS   480 CITATIONS   

SEE PROFILE SEE PROFILE

Sachio Hirokawa
Kyushu University
287 PUBLICATIONS   860 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

logic and theoretical computer science View project

Giving automatic advice to students using comment mining View project

All content following this page was uploaded by Tsunenori Mine on 28 August 2015.

The user has requested enhancement of the downloaded file.


Journal of Information Processing Vol.23 No.2 192–201 (Mar. 2015)
[DOI: 10.2197/ipsjjip.23.192]

Regular Paper

A Predictive Model to Evaluate Student Performance

Shaymaa E. Sorour1,2,a) Tsunenori Mine3,b) Kazumasa Goda4,c) Sachio Hirokawa5,d)


Received: June 27, 2014, Accepted: December 3, 2014

Abstract: In this paper we propose a new approach based on text mining techniques for predicting student per-
formance using LSA (latent semantic analysis) and K-means clustering methods. The present study uses free-style
comments written by students after each lesson. Since the potentials of these comments can reflect student learning at-
titudes, understanding of subjects and difficulties of the lessons, they enable teachers to grasp the tendencies of student
learning activities. To improve our basic approach using LSA and k-means, overlap and similarity measuring methods
are proposed. We conducted experiments to validate our proposed methods. The experimental results reported a model
of student academic performance predictors by analyzing their comments data as variables of predictors. Our proposed
methods achieved an average 66.4% prediction accuracy after applying the k-means clustering method and those were
73.6% and 78.5% by adding the overlap method and the similarity measuring method, respectively.

Keywords: comments data mining, latent semantic analysis (LSA), similarity measuring method, overlap method

learning attitudes all over the periods in the semester.


1. Introduction Goda et al. [12], [13] proposed the PCN method to estimate
Recently, many researchers have turned their attention to ex- learning situations from comments freely written by students.
plaining and predicting learners’ performance. They have con- The PCN method categorizes the comments into three items: P
tributed to the related literature. By and large, researchers in this (Previous activity), C (Current activity), and N (Next activity).
field manage to advocate novel and smart solutions to improve Item P indicates the learning activity before the class time. Item
performance [15]. Thus, learners’ performance assessment may C shows the understanding and achievements of class subjects
not be viewed as being somewhat separate from learning process. during the class time, and item N expresses the learning activity
It is a continuous and an integral part of learning processes [10]. plan until the next class. These comments have vital roles in edu-
By revealing what students already know and what they need to cational environments. For example, the comments help students
learn, it enables teachers to build on existing knowledge and pro- to communicate with their teacher indirectly, and provide a lot of
vide appropriate scaffolding [25]. If such information is timely clues or hints to the teacher for improving his/her lessons. Each
and specific, it can serve as a valuable feedback to both teachers student writes his/her comments after a lesson; the student looks
and students so that it will improve student performance. back upon his/her learning behavior and situation; he/she can ex-
Yet interpreting assessment in the learning environment re- press about his/her attitudes, difficulties, and any other informa-
mains a challenge for many reasons. Most teachers lack training tion that might help a teacher estimate his/her learning activities.
in the assessment of understanding beyond the established test- However [12], [13] did not discuss the prediction of a final stu-
ing culture. Externally designed tests offer limited information dent grades. In this paper we first propose a basic prediction
due to less varied and frequent assessment, as well as delayed and method of student grades using comment data with item C (C-
coarse-grained feedback [27]. The solution to these problems is comment for short) from the PCN method. The basic method uses
to grasp all the class members’ learning attitudes and tendencies LSA technique to extract semantic information from student com-
of learning activities. Teachers can give advices by their careful ments by using statistically derived conceptual indices instead of
observation, but it is a hard task to grasp all the class members’ individual words, then classifies the obtained results into 5 groups
according to their grades by using a K-means clustering method.
1
Faculty of Specific Education, Kafr Elsheik University, Kafr Elsheikh,
The basic method achieves an average 66.4% prediction accuracy
Egypt
2
Graduate School of Information Science and Electrical Engineering, 744 of student grades. To improve the prediction accuracy, we addi-
Motooka Nishiku, Fukuoka, Japan tionally propose overlap and similarity measuring methods.
3
Faculty of Information Science and Electrical Engineering, Kyushu Uni-
versity, 744 Motooka Nishiku, Fukuoka, Japan
Experiments were conducted to validate our newly proposed
4
Kyushu Institute of Information Science, 6–3–1 Saifu, Dazaifu, methods; the results illustrated that the proposed methods
Fukuoka, Japan achieved 73.6% and 78.5% prediction accuracy of student grades
5
Research Institute for Information Technology, Kyushu University, 6–
10–1 Hakozaki Higashi-ku, Fukuoka, Japan by the overlap and the similarity measuring methods, respec-
a)
[email protected] tively. The contributions of our work are the following:
b)
[email protected] • The LSA technique is adopted to analyze patterns and rela-
c)
[email protected]
d)
[email protected] tionships between the extracted words and latent concepts


c 2015 Information Processing Society of Japan 192
Journal of Information Processing Vol.23 No.2 192–201 (Mar. 2015)

contained in an unstructured collection of texts (student cluster similarity with LSA was much higher than that without
comment); we classify the results obtained after applying LSA. In addition, Hung [16] used clustering analysis as an ex-
LSA into 5 groups according to student grades by using the ploratory technique to examine e-learning literature and visual-
K-means clustering method. ized patterns by grouping sources that shared similar words and
• The similarity measuring method is proposed to calculate attribute values. In addition, Minami et al. [23] analyzed student
similarity between a new comment and comments in the attitudes towards learning, and investigated how the attitudes af-
nearest cluster, which is created in the training phase. fect final student evaluation; they pursued a case study of lecture
• The overlap method is introduced for a stable evaluation that data analysis in which the correlations exist between student at-
allows to accept the adjacent grade of its original grade cor- titudes to learning such as attendance and homework, as effort,
responding to 5-grade categories. To this end, we classify and the student examination scores, as achievement; they ana-
student marks into 9 grades. lyzed the student own evaluation and lectures based on a ques-
• The experiments were conducted to validate the proposed tionnaire. Through this study, Minami et al. showed that a lec-
methods: basic, overlap and similarity measuring methods turer could give feedback to students who tended to over-evaluate
by calculating the F-measure and the accuracy for each themselves, and let the students recognize their real positions in
method in estimating final student grades. The experimental the class.
results illustrate the validity of the proposed methods. Previous studies show that we need to understand individual
The rest of the paper is organized as follows. Section 2 gives students more deeply and recognize student learning status and
an overview of some related literature. Section 3 introduces the attitude to give feedback to them. We need to comprehend student
overview of our research and the procedures of the proposed characteristics by letting them describe themselves about their ed-
methods. Section 4 describes the methodology of our proposed ucational situations such as understanding of subjects, difficulties
methods. Section 5 discusses some of the highlighted experimen- to learn, learning activities in the classroom, and their attitude
tal results. Finally, Section 6 concludes the paper and describes toward the lesson. Researchers have used various classification
our future work. methods and various data in their studies to predict student aca-
demic performance.
2. Related Work Different from the above studies, Goda et al. [13] proposed the
The ability to predict student performance is very impor- PCN method to estimate student learning situations on the ba-
tant in educational environments. Increasing students’ success sis of their freestyle comments written just after the lesson. The
in their learning environment is a long-term goal in all aca- PCN method categorizes their comments into three items: P (Pre-
demic institutions. In recent years, there is a growing inter- vious), C (Current), and N (Next) so that it can analyze the com-
est in employing educational data mining techniques (EDM) to ments from the points of views of their time-oriented learning
conduct the automatic analysis and prediction of learner perfor- situations. Goda et al. [12] also conducted another study on us-
mance [2], [5], [8], [14], [24]. An emerging trend in EDM is ing PCN scores to determine the level of validity of assessment
the use of text mining which is an extension of data mining to based on student comments and showed strong correlations be-
text data [19], [20], [26]. Many researchers have successfully tween the PCN scores and the prediction accuracy of final student
used text mining techniques to analyze large amounts of tex- grades. They employed multiple regression analysis to calcu-
tual data in business, health science and educational domains late PCN scores. Their results indicated that students who wrote
[11], [21], [28], [29]. In our research we focus on using text min- comments with high PCN scores were considered as those who
ing in education fields to predict student grades. described their learning attitude appropriately. In addition, ap-
plying machine learning method support vector machine (SVM),
2.1 Educational Text Mining they illustrated that as student comments got higher PCN scores,
Text mining focuses on finding and extracting useful or in- prediction performance of their grades became higher. Goda et
teresting patterns, models, directions, trends, or rules from an al., however, did not discuss prediction performance of final stu-
unstructured text such as in text documents, HTML files, chat dent grades.
messages and emails. In addition, the major applications of text The current study is an extension of Goda et al. [12]; we fo-
mining include automatic classification (clustering), information cus on accuracy of prediction of final student grades. Using C-
extraction, text summarization, and link analysis [3]. As an au- comments from the PCN method, we try to predict their grade in
tomated technique, text mining can be used to efficiently and each lesson and discuss the changes in accuracy and F-measure
systematically identify, extract, manage, integrate, and exploit over a sequence of lessons.
knowledge for research and education [1].
Currently, there are only several studies about how to use text
3. Overview of the Prediction Method
mining techniques to analyze learning related data. For exam- 3.1 PCN Method and Student Grade
ple, Tane et al. [28] used text mining (text clustering techniques) To grasp student lesson attitudes and learning situations and to
to group e-learning resources and documents according to their give feedback to each student are educational foundations. Goda
topics and similarities. Antai et al. [4] classified a set of docu- et al. [13] proposed the PCN method to estimate learning situa-
ments according to document topic areas by using CLUTO pro- tions from comments freely written by students. Each student
gram with and without LSA. The results showed that the internal described his/her learning tendency, attitudes, and understanding


c 2015 Information Processing Society of Japan 193
Journal of Information Processing Vol.23 No.2 192–201 (Mar. 2015)

Table 1 Viewpoint categories of student comment [13].


Viewpoint Meaning
P(Previous) The learning activity before the class time such as review of
previous class and preparation for the coming class.
Example “I read chapter 3 of the textbook.”
C(Current) The understanding and achievements of class subjects during
the class time.
Example “I was completely able to understand the subject of this lesson
and have the confidence to make other similar to the ones.
I learned in this lesson.”
N(Next) The learning activity plan until the next class.
Example “I will make preparation by next class.”
O(Other) Other descriptions.

Fig. 2 Procedures of the proposed method.

Table 2 Number of comments.


Lesson 7 8 9 10 11 12 13 14 15
Number 104 103 107 111 107 109 107 111 121

( 1 ) Comment Data Collection: This phase focuses on collect-


ing student comments after each lesson. In our research,
we collected C-comments from the PCN method. The C-
comment indicates students understanding and achievements
of class subjects during the class time. In addition, it has
a stronger correlation with the prediction accuracy than P-
and N-comment [12]. Although we have two class data in
Fig. 1 The relation between the grades and the range of the marks.
each lesson, we combined them to increase the number of
comments in each grade; some students didn’t submit their
for each lesson according to four items: P, C, N and O. The ex-
comments because they did not write any comments or were
planations and the examples of the items are shown in Table 1.
absent. Table 2 displays the real number of comments in
Comments data were collected from 123 students in two
each lesson. The number of words appearing in the com-
classes: 60 in Class A and 63 in class B. They took Goda’s
ments is about 1400 in each lesson. In addition, the number
courses that consisted of 15 lessons. In this research, we use the
of distinct words in each lesson is over 430 words.
student comments collected for the last half, from lessons 7 to
( 2 ) Data Preparation: The data preparation phase covers all
15. The main subject and the contents are different from lessons
the activities required to construct the final data set from the
1 to 6. The main subject from lessons 1 to 6 is computer liter-
initial raw data. This phase includes the following steps:
acy. From lessons 7 to 15, students begin to learn the basics of
( a ) Analyze C-comments, extract words and parts of
programming.
speech with Mecab program *1 , which is a Japanese
To predict student grades from comment data, 5-grade cate-
morphological analyzer designed to extract words and
gories are used to classify student marks. The assessment of each
identify their part of speech (verb, noun, adjective, and
student was done by considering the average mark of three as-
adverb).
signed reports, and his/her attendance rate. Figure 1 displays the
( b ) Calculate the occurrence frequencies of words in com-
number of students in each grade according to the range of the
ments and apply a log entropy term weighting method
marks. For example, the number of students in grade A is 41 and
so as to balance the effect of occurrence frequency of
their marks are between 89 and 80.
words in all the comments. The detail is discribed in
Section 4.1.
3.2 Procedures of the Basic Prediction Method
( c ) Employ LSA to analyze patterns and relationships be-
This research aims to predict student performance by analyzing
tween the extracted words and latent concepts contained
C-comments data. In order to generate a correlation between the
in unstructured collection of texts (student comment).
comment data and the student grade, Fig. 2 displays the overall
We call the obtained results LSA results. The details
procedures of the proposed method, we call it the Basic Predic-
are described in Section 4.2.
tion Method. The procedures of the method are based on five
( 3 ) Training Phase: This phase builds the prediction models
phases:
of student grades by classifying LSA results into 5 clusters.
(1) Comment data collection, (2) Data preparation, (3) Training
The model identifies the center of each cluster and the grade
phase, (4) Noisy data detection, (5) Test phase. The details of
these phases are as follows: *1 http://sourceforge.net/projects/mecab/


c 2015 Information Processing Society of Japan 194
Journal of Information Processing Vol.23 No.2 192–201 (Mar. 2015)

that most frequently appears in the cluster. We call the grade the last twenty years its capacity to simulate aspects of human
the dominant grade in the cluster. semantics has been widely demonstrated [18]. LSA is based on
( 4 ) Test Phase: The test phase evaluates the performance of three fundamental ideas: (1) to begin to simulate human seman-
prediction models by calculating Accuracy and F-measure. tics of language, we first obtain an occurrence matrix of terms
This phase first extracts words from a new comment, and contained in a comment, (2) the dimensionality of this matrix
transforms an extracted-word vector of the comment to a set is reduced using singular value decomposition, a mathematical
of K-dimensional vectors by using LSA. To evaluate the technique that effectively represents abstract concepts, and (3)
prediction performance, 10-fold cross validation was used. any word or text is represented by a vector in this new latent se-
90% of comments were classified as training data and con- mantic space [7], [18].
structed a model, then the model was applied to the rest 10% 4.2.1 Singular Value Decomposition
of comments as test data, and compared a predicted value LSA works through singular value decomposition (SVD), a
with corresponding the original data. The procedure was re- form of factor analysis. The singular value decomposition of A is
peated 10 times and the results were averaged. To improve defined as:
the prediction accuracy of student grade, we employed the
similarity measuring and overlap methods in addition to our A = US V T (2)
basic method. The details of the two methods are described where U and V are the matrices of the term vectors and document
in Sections 4.4 and 4.5, respectively. vectors. S = diag(r1 , . . . , rn ) is the diagonal matrix of singular
( 5 ) Noisy Data detection: Outlier analysis can be used to de- values. To reduce the dimensions, we can simply choose the k
tect data that adversely affect the results. In this paper, we largest singular values and the corresponding left and right sin-
detect outliers in two phases: training phase and test phase. gular vectors, the best approximation of A with rank-k matrix is
As a threashold to check the outliers in a cluster, we use the given by
standard deviation (S d) of each distance between each mem-
ber and the center in a cluster in the training phase, and the Ak = Uk S k VkT (3)
average of each distance between each member and the cen-
ter in a cluster, in the test phase. The detail is described in where Uk is comprised of the first k columns of the matrix U and
Section 4.3. VkT is the first k rows of matrix V T , S k = diag(r1 , . . . , rk ) is the
first k factors, the matrix Ak captures most of the important under-
4. Methodology lying structure in the association of terms and documents while
This section describes our methodology for predicting student ignoring noise due to word choice [30].
performance from free-style comments. When LSA is applied to a new comment, a query, a set of
words (like the new comment), is represented as a vector in a k-
4.1 Term Weighting of Comments dimensional space. The new comment query can be represented
In preparing for LSA, the C-comments are transformed into a by
standard word-by-comment matrix [6] by extracting words from
q = qT Uk S k−1 (4)
them. This word-by-comment matrix say A, is comprised of m
words w1 , w2 . . . , wm in n comments c1 , c2 , . . . , cn , where the where q and q are simply the vector of words in a new comment
value of each cell ai j indicates the total occurrence frequency of multiplied by the appropriate word weights and the k-dimensional
word wi in comment c j . vector transformed from q, respectively. The sum of these k di-
To balance the effect of word frequencies in all the comments, mensional word vectors is reflected in the term qT Uk in the above
a log entropy term weighting method is applied to the original equation. The right multiplication by S k−1 differentially weights
word-by-comment matrix, which is the basis for all subsequent the separate dimensions. Thus the query vector is located at the
analyses [17]. We apply a global weighting function to each weighted sum of its constituent term vectors [6].
nonzero element of ai j of A. The global weighting function trans- 4.2.2 Feature Selection and Semantic Feature Space
forms each cell ai j of A to a global term weight gi of wi for the Choosing the number of dimensions k for matrix A is an in-
entire collection of comments. teresting problem. While a reduction in k can remove much of
Here gi is calculated as follows: the noise, keeping to few dimensions or factors may lose impor-

n tant information [7]. In our study, we propose a method which is
gi = 1 + (pi j log(pi j ))/ log(n) (1) based on analyzing the first four dimensions of U, S and V from
j=1
comments data. We evaluated the first four columns of U results
where pi j = Lij /g fi , Lij = log(t fij + 1), t fij is the number of and confirmed they showed the relation between the meaning of
occurrences of wi in c j , g fi is the number of occurrences of word each column and the higher weight words. Therefore, we can
wi in all comments, and n is the number of all comments. predict student performance with more accuracy by employing
K-means clustering method [31]. Tables 3 and 4 show the mean-
4.2 Latent Semantic Analysis ing and the higher weight words of the first four columns after
Latent semantic analysis (LSA) is a computational technique analyzing U results by taking lesson 7 as an example. Words in
that contains a mathematical representation of language. During the first column include the subject of lesson 7 entitled by “An


c 2015 Information Processing Society of Japan 195
Journal of Information Processing Vol.23 No.2 192–201 (Mar. 2015)

Table 3 Meaning of dimensions. ( 2 ) Noisy data detection


Column Meaning Let sk,i be the k-th member of the i-th cluster; if |sk,i − Ci | >
First Main subject and learning status S di , then sk,i is a noisy data of the cluster, otherwise sk,i is
Second Students’ learning attitudes not a noisy data in the cluster.
Third Topics in the lesson
Fourth Learning rate and student’s behavior 4.3.2 Noisy Data Detection in the i-th Cluster in Test Phase
In the test phase, instead of the standard deviation, we use the
Table 4 Standard words for lesson 7. average distance between each comment and a cluster center in
First Second Third Fourth a cluster as a threashold to detect noisy data. This is because
Word Weight Word Weight Word Weight Word Weight
the prediction accuracy with the average distance became higher
Procedure 0.353 Be able to 0.732 Symbol 0.504 Early 0.448
Language 0.334 Make 0.721 Meaning 0.504 First 0.441 than that with the standard deviation from our preliminary exper-
Symbol 0.346 Learning 0.438 Various 0.503 Training 0.411 iments.
Programming 0.321 Procedure 0.363 Class 0.441 Myself 0.387
Learning 0.287 Myself 0.346 Time 0.373 Beginning 0.338 We define noisy data of the i-th cluster in test phase as follows:
Difficulty 0.284 Study 0.340 Terminal 0.37 Good 0.332 • Let Ci , sk,i and di,ave be the centroid of the i-th cluster,
Use 0.274 High 0.333 Sufficient 0.358 Make 0.323
Easy 0.265 Interest 0.323 Compare 0.357 End 0.32 the k-th member in the cluster, and the average distance
Treatment 0.248 Theory 0.322 Program 0.338 High 0.303 between members in the cluster and Ci , respectively; if
Knowledge 0.237 Good 0.304 Save 0.337 Finish 0.209
|sk,i −Ci | > di,ave , then sk,i is a noisy data for the i-th cluster,
Introduction to C programming language,” and learning status otherwise sk,i is not a noisy data for the i-th cluster.
such as “understand” or “difficult.” In the second column we We separated off about 10% to 15% of comments data as noisy
found words related to student learning attitudes for the lesson data.
take higher weight. In the third column the higher weight words
are topics in the lesson, such as “symbol, compare, save or func- 4.4 Similarity Measuring Method
tion.” In the fourth column, the higher weight words are related to The similarity measuring method is proposed to refine the ba-
the learning time or rate such as “early, first, full and take time,” sic prediction method and improve the prediction accuracy of fi-
circumstances, or behaviors performed such as “first time, prac- nal student grades. We measured the similarity by calculating
tical training, or follow.” According to the previous analysis, we cosine values between a new comment and each member in the
can conclude the first four dimensions have the strong context to identified cluster by the following equation:
predict student grades with high accuracy. S new ∗ S k S new ∗ S k
S imilarity = =   (7)
||S new || ∗ ||S k || k
i=1 S new ∗
2 k 2
i=1 S k
4.3 Noisy Data Detection
Outlier detection discovers data points that are significantly where S k is the kth member in the cluster, and S new is the new
different from the rest of the data [22]. In this paper, detecting comment.
outliers are based on two phases: training phase and test phase. After identifying the nearest cluster center to the new com-
We call such outliers noisy data from the points of view of grade ment, we measure the similarity by calculating cosine values be-
prediction. tween the new comment S new , and each member S k , in the iden-
4.3.1 Noisy Data in the i-th Cluster in Training Phase tified cluster, and then return, as an estimated grade of S new , the
We define noisy data of the i-th cluster in the training phase. grade of S k that gets the maximum cosine value among all mem-
We calculate S d to each cluster as a threashold to check a noisy bers in the cluster. This similarity measuring method is used in
data. the Test Phase.
( 1 ) S d to each cluster
( a ) For each cluster, say i-th cluster, calculate the cen- 4.5 Overlap Method
troid Ci of the cluster by finding the average value of To predict student grades from comment data, 5-grade cate-
K-dimensional vectors (KDV) transformed from com- gories are used to classify student marks. The method considers
ments in the cluster. prediction is correct only if a grade estimated within the 5-grade
ni categories is the actual grade of a student. We call this method
sk,i
Ci = k=1 (5) 5-grade prediction method.
ni
In this paper, in addition to 5-grade categories, we use 9-grade
where sk,i and ni are the k-th singular vectors represent-
categories so that we can allow the acceptance of a different grade
ing a comment and the number of the comments in the
adjacent to the original grade in 5-grade categories of a mark
i-th cluster, respectively.
range, i.e., make one mark range correspond to two grades in
( b ) Calculate standard deviation S di for the cluster. The
5-grade categories. We call this method overlap method or 9-
higher the S di is, the lower the semantic coherence is
grade prediction method for the contrast of 5-grade prediction
[9].
 method. Tables 5 and 6 show the correspondence relationship
ni 2 between the 5- and 9-grade categories and the range of student
k=1 (ski −Ci )
S di = (6) marks. For example, we assume a student’s mark is 87; the grade
ni
of the mark in 5-grade categories is A, and in 9-grade categories


c 2015 Information Processing Society of Japan 196
Journal of Information Processing Vol.23 No.2 192–201 (Mar. 2015)

Table 5 5-grades.
Grade S A B C D
Mark 90–100 80–89 70–79 60–69 0–59
#Student 21 41 23 17 21

Table 6 9-grades.
Grade S AS AB BA BC CB CD DC D
Mark 90–100 85–89 80–84 75–79 70–74 65–69 60–64 55–59 0–54
#Student 21 35 6 10 13 9 8 2 19

is AS; AS corresponds to two grades: A and S, in 5-grade cate-


gories.
In the 9-grade prediction method, we consider the prediction is
correct if an estimated grade is either A or S. The reasons why
we adopt the overlap method are the following: Learning status
of students with the upper mark in a grade and others with the
lower mark in its one upper grade are not so different from the
point of view of the observing teacher. Therefore it is worth not-
ing that handling the two adjacent grades as one grade sometimes
helps a teacher to grasp student real learning situations, and to
give stable evaluations to students. For example, the mark range
of grade AS is from 85 to 89, and that is closer to the lowest mark
90 of grade S than the lowest mark 80 of grade A. The overlap
method is used in the Test Phase.

5. Experimental Results
5.1 Measures of Student Grade Prediction
In our experiment, we evaluated the prediction results by 10-
fold cross validation and run evaluation experiments according to
4 values: TP (True Positive), TN (True Negative), FP (False Pos-
itive) and FN (False Negative) and calculated Precision, Recall,
F-measure and Accuracy in each lesson as follows: Fig. 3 Results of classifying students.
Let G be 5-grade categories (S, A, B, C and D), and X be a
higher, we may more misdetect them. So our study showed the
subset of G; let obs (si , X) be a function that returns 1 if the grade
prediction results by calculating Precision, Recall, Accuracy, F-
of student si is included in X, 0 otherwise, where 1 ≤ i ≤ n, and n
measure and standard deviation to the 3 methods: the basic pre-
is the number of students; pred(si ) be a function that returns a set
diction method, the overlap method and the similarity measuring
of grade categories only including a predicted grade for student
method.
si ; !pred (si ) returns a complement of pred(si ).

T P = {si |obs(si , pred(si )) = 1} 5.2 Effect of Basic Prediction Method


FP = {si |obs(si , pred(si )) = 0} 5.2.1 Training Phase
According to the training phase of the basic prediciton method
T N = {si |obs(si , !pred(si )) = 1}
described in Section 3.2 (3), we built a prediction model.
FN = {si |obs(si , !pred(si )) = 0} Figure 3 (a) displays the results for lesson 7. Grade S accounts
TP
Precision = for about 54% in Cluster 1; grade A about 61% in Cluster 2;
T P + FP
TP grade B about 43% in Cluster 3; grade C about 45% in Cluster 4;
Recall = finally, grade D about 53% in Cluster 5. Grade S, A, B, C, and D
TP + TN
(Precision ∗ Recall) are dominant grades in Cluster 1, 2, 3, 4, and 5, respectively. We
F − measure = 2 ∗
(Precision + Recall) analyzed each lesson from 8 to 15 as well.
T P + FN 5.2.2 Test Phase
Accuracy =
T P + T N + FP + FN We conducted student grade prediction according to the steps
Actually, FP and T N are important values and affect the pre- described in Section 3.2 (4) and evaluated the prediction perfor-
diction results. FP has a strong relation with Precision and T N mance by 10-fold cross validation.
with Recall. As FP increases, we may more pick up other grade Figure 3 (b) presents the results of student grade prediction:
students, say (S) or (A), as a target grade student, say (D). We (Cluster 1, S=53%), (Cluster 2, A= 54%), (Cluster 3, B=52%),
often want to take care about low level students. At that time, (Cluster 4, C=63%), (Cluster 5, D=47%).
we need to detect all of them. As the value of T N becomes


c 2015 Information Processing Society of Japan 197
Journal of Information Processing Vol.23 No.2 192–201 (Mar. 2015)

Fig. 4 The effect of LSA.

Fig. 6 The results of accuracy and F-measure in each grade after detecting
noisy data.

Table 7 Overall prediction results.


Precision Recall F-measure Accuracy
Method After(Before) After(Before) After(Before) After(Before)
Basic prediction method 0.530(0.452) 0.589(0.536) 0.554(0.480) 0.696(0.664)
Overlap Method 0.682(0.662) 0.642(0.545) 0.622(0.596) 0.771(0.736)
Similarity + 5 grades 0.645(0.631) 0.697(0.695) 0.680(0.661) 0.822(0.785)
Similarity + 9 grades 0.787(0.765) 0.735(0.721) 0.762(0.743) 0.864(0.842)

lessons 7 and 12, and the lowest ones from the bottom in lessons
8 and 14.
In addition, Fig. 6 focuses on the prediction results in each
grade after detecting noisy data. We can see that grade A took
the highest results of prediction accuracy and F-measure, and the
grade D was the lowest.

5.5 Effect of Overlap and Similarity Measuring Methods


We illustrate the difference between the basic method and two
additional methods: overlap and similarity measuring methods.
The average overall results of accuracy and F-measure are re-
ported in Table 7. It shows the effect of noisy data detection by
evaluating the prediction results across all the lessons between
Fig. 5 Prediction results (with and without noisy data detection) from
the basic prediction, similarity measuring, and overlap methods.
lessons 7 to 15.
Comparing with the basic prediction method, the similarity
5.3 Effect of LSA measuring improved the accuracy results from 66.4% to 78.5%
We checked the effect of LSA from lessons 7 to 15. We used through the analysis of all data, from 69.6% to 82.2% after de-
all comments data. As shown in Fig. 4, the average prediction ac- tecting noisy data. Moreover, the overlap method had a strong ef-
curacy results of the basic prediction method without LSA were fect that increased the prediction accuracy from 66.4% to 73.6%
between 19.0% and 26.4%. It is much lower than those with LSA, with all data, 69.6% to 77.1% after detecting noisy data. Com-
which were between 59.0% and 71.0%. In addition, adding the bining the similarity measuring method and the overlap method,
overlap method to the basic prediction method with LSA, the av- the prediction accuracy increased from 73.6 % to 84.2% with all
erage prediction accuracy became between 71.0% and 76.0%. data, 77.1% to 86.4% after detecting noisy data.
For more clarity, to evaluate the effect of similarity measur-
5.4 Effect of Noisy Data Detection ing method, Fig. 7 displays the comparison between the accuracy
To examine the effect of filtering noisy data, we calculated the results of the prediction of student grades with and without the
average prediction (accuracy and F-measure) of student grades similarity measuring method. We can see the effect of the similar-
before and after detecting noisy data. The results are shown in ity measuring method, especially when checking the fact that the
Fig. 5. The prediction accuracy results were between 59.0% and prediction accuracy results with the similarity measuring method
71.0% for all data, and those became between 63.5% and 74.0% without the overlap method is better than those of the overlap
after detecting noisy data as shown in Fig. 5 (a). Also, the av- method without the similarity measuring method.
erage F-measure for all lessons was 48.1% and after removing
noisy data it became 55.8% as shown in Fig. 5 (b).
The highest accuracy results from the top were obtained in


c 2015 Information Processing Society of Japan 198
Journal of Information Processing Vol.23 No.2 192–201 (Mar. 2015)

Fig. 7 The prediction accuracy results with and without similarity measur- Fig. 9 The prediction accuracy by grade after applying the similarity mea-
ing method. suring method.

Table 8 S d of prediction accuracy for the proposed methods. and between the S d and the similarity measuring method. On the
Lesson 7 8 9 10 11 12 13 14 15 other hand, the correlation coefficient between the S d and that of
Basic Prediction Method 2.76 9.19 3.73 6.02 7.89 5.19 4.57 8.07 5.63 the overlap method only shows a weak correlation. We believe
Overlap Method 2.99 6.41 3.44 5.11 6.20 4.45 6.0 5.12 4.24 that this shows a fact that the overlap method gives stable evalua-
Similarity Measuring Method 2.16 5.95 4.05 4.76 7.22 4.03 4.56 6.35 4.03
tion of student grades.
Table 9 Correlation coefficient of S d and accuracy.
5.7 Prediction Accuracy Differences between 5 Grades
Basic Prediction Method −0.89485
Finally, Fig. 9 displays the relationship between C-comments
Overlap Method −0.5152
Similarity Measuring Method −0.91978 data from lessons 7 to 15 and the prediction accuracy in each
grade after applying the similarity measuring method and detect-
ing noisy data. As shown in Fig. 9, the average prediction accu-
racy results were between 0.65 and 0.93. We can clearly distin-
guish higher grade group: S, A, and B from lower one: C and
D. One of the reasons why prediction accuracy of grades C and
D became lower came from the smaller number of comments in
those grades.

6. Conclusion and Future Work


Learning comments are valuable sources of interpreting stu-
dent behavior during a class. The present study discussed student
grade prediction methods based on their free-style comments. We
used C-comments data from the PCN method that presented stu-
dent attitudes, understanding and difficulties concerning to each
lesson.
The main contributions of this work are twofold. First, we dis-
cussed the basic prediction method that analyzed C-comments
data based on the LSA technique and classified the results ob-
tained using LSA into 5 groups by K-means clustering method.
Second, we proposed two new approaches: overlap and similar-
Fig. 8 The correlation between S d and prediction accuracy. ity measuring methods to improve the basic method and con-
ducted experiments to validate the two approaches. The over-
5.6 Correlation between Standard Deviation and Prediction lap method allows the acceptance of two grades for one mark to
Accuracy get the correct relation between LSA results and student grades.
Table 8 displays the standard deviation (S d) results of the pre- We made confirmation that the overlap method with 9-grade cat-
diction accuracy from lessons 7 to 15 for the basic prediction egories enabled a more stable evaluation than 5-grade categories.
method, overlap and similarity measuring methods after detect- The overall results of average prediction accuracy became better
ing noisy data. We can see that higher S d in lessons: 8, 11, and than those of classifying student marks to 5-grade categories.
14 tend to get lower prediction accuracy and F-measure. The similarity measuring method calculated similarity between
Table 9 shows the correlation coefficients between the S d and a new comment and comments in the nearest cluster. The results
prediction accuracy. of prediction accuracy with the similarity measuring method be-
Figure 8 (a) and (c) show there are strong correlations between came much better than those without the similarity measuring
the S d and the prediction accuracy of the basic prediction method method. By combining with the overlap method (9-grade pre-


c 2015 Information Processing Society of Japan 199
Journal of Information Processing Vol.23 No.2 192–201 (Mar. 2015)

diction method), the prediction accuracy became higher. Continuous Inquiry and Improvement, Southwest Educational Devel-
opment Laboratory (1997).
To sum up, there are still quite a few considerations that would
[16] Hung, J.: Trends of E-learning Research from 2000 to 2008: Use of
surely add even more value to the results obtained. It is nec- Text Mining and Bibliometrics, British Journal of Educational Tech-
essary to improve the prediction results of student performance nology, Vol.43, No.1, pp.5–15 (2012).
[17] Landauer, T.K., Foltz, P.W. and Laham, D.: An Introduction to Latent
from their comments; other machine learning techniques, such as, Semantic Analysis, Discourse Processes, Vol.25, pp.259–284 (1998).
neural network and support vector machine, will be candidate for [18] León, J., Olmos, R., Escudero, I., Jorge-Botana, G. and Perry, D.: Ex-
ploring the Assessment of Summaries: Using Latent Semantic Analy-
the improvement to compare with the present method (K-means sis to Grade Summaries Written by Spanish Students, Procedia - So-
Clustering). cial and Behavioral Sciences, Vol.83, pp.151–155 (2013).
[19] Li, S. and Ho, H.: Predicting Financial Activity with Evolutionary
Another interesting issue is expanding the problem to improve Fuzzy Case Bbased Reasoning, Expert Systems with Applications,
student performance by providing advice to them in different Vol.36, No.1, pp.411–422 (2009).
[20] Lin, F.-R., Hsieh, L.-S. and Chuang, F.-T.: Discovering genres of on-
classes according to the estimated performance of each student. line discussion threads via text mining, Computers and Education,
Measuring motivation after each lesson can help for giving feed- Vol.52, pp.481–495 (2009).
[21] Liu, B., Cao, S.G. and He, W.: Distributed Data Mining for E-
back to students and encourage them to improve writing skills; Business, Information Technology and Management, Vol.12, No.1,
they can describe attitudes and situations, understanding of sub- pp.1–13 (2011).
jects, difficulties to learn, and learning activities in the classroom. [22] Mansur, M.O. and Md. Sap, M.N.: Outlier Detection Technique in
Data Mining: A Research Perspective, Postgraduate Annual Research
This will help a teacher to give advice and improve their perfor- Seminar (2005).
mance. [23] Minami, T. and Ohura, Y.: Lecture Data Analysis towards to Know
How the Students’ Attitudes Affect to their Evaluations, International
Acknowledgments This work was supported in part by Conference on Information Technology and Applications (ICITA),
PEARL, enPiT of Project for Establishing a Nationwide Practi- pp.164–169 (2013).
[24] Parack, S., Zahid, Z. and Merchant, F.: Application of data mining
cal Education Network for IT Human Resources Development in educational databases for predicting academic trends and patterns,
under the MEXT, Japan, and JSPS KAKENHI Grant Number IEEE International Conference on Technology Enhanced Education
(ICTEE) (2012).
24500176, 25350311, 26350357 and 26540183. [25] Pellegrino, W., Chudowsky, N. and Glaser, R.: Knowing What Stu-
dents Know: The Science and Design of Educational Assessment,
Washington. DC: National Academy Press (2001).
References [26] Romero, C., Espejo, P.G., Zfra, A., Romero, J.R. and Ventura, S.: Web
[1] Abdousa, M.: A predictive Study of Learner Satisfaction and Out- Usage Mining for Predicting Final Marks of Students That Use Moo-
comes in Face-to-Face, Satellite Broadcast, and Live Video-Streaming dle Courses, Computer Applications in Engineering Education (2010).
Learning Environments, The Internet and Higher Education, Vol.13, [27] Shepard, L.A.: The Role of Classroom Assessment in Teaching and
No.4, pp.248–257 (2010). Learning, (CSE Technical Report 517), Los Angeles CA: Center for
[2] Adhatrao, K., Gaykar, A., Dhawan, A., Jha, R. and Honrao, V.: Pre- the Study of Evaluation (2000).
dicting Students’ Using ID3 and C4.5 Classification Algorithms, In- [28] Tane, J., Schmitz, C. and Stumme, G.: Semantic Resource Manage-
ternational Journal of Data Mining and Knowledge Management Pro- ment for the Web: An e-learning Application, WWW Conference, New
cess (IJDKP), Vol.3, No.5, pp.39–52 (2013). York, USA, pp.1–10 (2004).
[3] Ananiadou, S.: National Centre for Text Mining: Introduction to Tools [29] Ur-Rahman, N. and Harding, J.: Textual Data Mining for Industrial
for Researchers, (online), available from http://www.jisc.ac.uk/ Knowledge Management and Text Classification: A business Oriented
publications/publications/ (2008). Approach, Expert Systems with Applications, Vol.39, No.5, pp.4729–
[4] Antai, R., Fox, C. and Kruschwitz, U.: The Use of Latent Semantic 4739 (2012).
Indexing to Cluster Documents into Their Subject Areas., Fifth Lan- [30] Yu, B., ben Xu, Z. and hua Li, C.: Latent Semantic Analysis for
guage Technology Conference. Springer. ASME Design Engineering Text Categorization Using Neural Network, Knowledge-Based Sys-
Technical conferences, DETC2001/DTM (2011). tems, Vol.21, pp.900–904 (2008).
[5] Baker, R.S.J.D.: Data Mining for Education, International Encyclope- [31] Zaiane, O.R.: Introduction to Data Mining, chapter I, CMPUT690
dia of Education (3rd edition). Oxford, UK: Elsevier (2009). Principles of Knowledge Discovery in Databases (1999).
[6] Berry, M.W., Dumais, S.T. and O’Brien, G.W.: Using Linear Alge-
bra for Intelligent Information Retrieval, SIAM Review, Vol.37, No.4,
pp.573–595 (1995).
[7] Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K. and
Harshman, R.: Indexing by Latent Semantic Analysis, Journal of the
American Society for Information Science, Vol.41, No.6, pp.391–407 Shaymaa E. Sorour received her B.S.
(1990). degree in Education Technology, at Fac-
[8] Delavari, N., Phon-Hon-Amnuaisuk, S. and Beikzadeh, M.R.: Data
Mining Application in Higher Learning Institutions, Informatics in
ulty of Specific Education, Kafr Elsheikh
Education, Vol.7, No.1, pp.31–54 (2008). University, Egypt and a M.S. degree in
[9] Dhillon, I.S. and Modha, D.S.: Concept Decompositions for Large Computer Education at Department of
Sparse Text Data using Clustering, Machine Learning, Springer,
Vol.42, No.1-2, pp.143–175 (2001). Computer Instructor Preparation, Faculty
[10] Earl, L.: Assessment of Learning, for Learning, and as Learning, of Specific Education, Mansoura Univer-
chapter 3, Thousand Oaks, CA, Corwin Press (2003).
[11] Fuller, C.M., Biros, D.P. and Delen, D.: An Investigation of Data sity, Egypt, in 2004 and 2010 respectively.
and Text Mining Methods for Real World Deception Detection, Ex- Since 2005 until now, she is working as an assistant lecture at
pert Systems with Applications, Vol.38, No.7, pp.8392–8398 (2011).
[12] Goda, K., Hirokawa, S. and Mine, T.: Correlation of Grade Predic- Department of Education Technology, Faculty of Specific Educa-
tion Performance and Validity of-Self-Evaluation Comments, SIGITE tion, Kafr Elsheikh University, Egypt. She is currently a Ph.D.
2013, The 14th Annual Conference in Information Technology Educa-
tion, pp.35–42 (2013). student in Graduate School of Information Science and Electrical
[13] Goda, K. and Mine, T.: Analysis of Students’ Learning Activities Engineering, Department of Advanced Information Technology,
through Quantifying Time-Series Comments, KES 2011, Part II, LNAI
6882, Springer-Verlag Berlin Heidelberg, pp.154–164 (2011). Kyushu University, Japan. She is a member of IEEE and IPSJ.
[14] Gorissen, P., van Bruggen, J. and Jochems, W.: Usage Reporting
on Recorded Lectures, International Journal of Learning Technology,
Vol.7, No.1, pp.23–40 (2012).
[15] Hord, S.M.: Professional Learning Communities: Communities of


c 2015 Information Processing Society of Japan 200
Journal of Information Processing Vol.23 No.2 192–201 (Mar. 2015)

Tsunenori Mine received his B.E. de-


gree in Computer Science and Computer
Engineering, in 1987, and his M.E. and
D.E. degrees in Information Systems, in
1989 and 1993, respectively, all from
Kyushu University. He was a lecturer at
College of Education, Kyushu University,
from 1992 to 1994 and at Department of
Physics, Faculty of Science, Kyushu University from 1994 to
1996. He was a visiting researcher at DFKI, Saarbruecken, Ger-
many from 1998 to 1999, and at Language Technology Institutes
of CMU, Pittsburgh, PA, USA in 1999. He is currently an As-
sociate Professor at Department of Advanced Information Tech-
nology, Faculty of Information Science and Electrical Engineer-
ing, Kyushu University. His current research interests include
Natural Language Processing, Information Retrieval, Information
Extraction, Collaborative Filtering, Personalization and Multi-
Agent Systems. He is a member of IPSJ, IEICE, JSAI, NLPSJ
and ACM.

Kazumasa Goda received his B.E. and


M.E. degrees in 1994 and in 1996 from
Kyushu University, respectively. He has
been an Associate Professor at Kyushu In-
stitute of Information Science since 2008.
His research interests include Program-
ming Theory, Programming Education,
and Computer Education. He is a mem-
ber of JSiSE, JAEiS, and IPSJ.

Sachio Hirokawa received his B.S. and


M.S. degrees in Mathematics and Ph.D.
degree in Interdisciplinary Graduate
School of Engineering Sciences from
Kyushu University in 1977, 1979, and
1992. He founded the search engine
company Lafla (http://www.LafLa.co.jp)
in 2008 as a university venture company.
He has been working as an outside director of the company. He
is currently a Professor in the Research Institute for Information
Technology of Kyushu University, since 1997. His research in-
terests include Search Engine, Text Mining, and Computational
Logic. He is a member of JSAI, IPSJ, IEICE and IEEE. He has
been serving as a general chair of international conference AAI
(Advanced Applied Informatics) since 2012.


c 2015 Information Processing Society of Japan 201

View publication stats

You might also like