Student Performance Prediction
Student Performance Prediction
Student Performance Prediction
https://www.scirp.org/journal/jcc
ISSN Online: 2327-5227
ISSN Print: 2327-5219
Yanqing Xie
Keywords
Data Science Applications in Education, Distance Education and Online
Learning, LSTM
1. Introduction
Online education is a new way of education in the Internet era [1]. Online
education platforms, e.g., MOOC, propose massive high-quality learning re-
sources, including lots of classroom videos, exercises and assessments of many
formance into four categories: withdrawn, fail, pass, and distinction. We sepa-
rately train and test the two classification methods, and record their model eval-
uation results. The main contributions of this work are as follow.
• We propose an Attention-based Multi-LSTM model to predict students’ final
performance. This model utilizes students’ demographic data and students’
clickstream data, which makes the model can predict on the situation of cold
start.
• We did not distinguish between the types of courses when training the mod-
el, which made the model perform well in course transfer.
This paper is organized as follows. Section 2 introduces the related work of
student performance prediction methods. Section 3 introduces some mathemat-
ical notations and formally defines the given problem. Section 4 introduces the
model we propose. Section 5 introduces the experiments and results of our work.
Section 6 introduces the conclusions of this paper.
2. Related Work
With the development of the online education industry, more and more students
have poured into online education platforms [16]. Many educators began to
consider how to ensure the quality of online learning for each student when
there are a large number of students in a course [17]. Therefore, the concept of
student performance prediction system came into being. Most of the input data
of the student performance prediction model comes from the back-end data of
various online education platforms, which is private.
Many domestic and foreign scholars have been invited to build student per-
formance prediction systems for online education platforms. They can use the
private data of online education platforms to build student performance predic-
tion models. References [18] uses the student event stream sequence, such as
whether the student submits an assignment at a certain time, whether the stu-
dent asks a question at a certain time, whether the student completes the exam at
a certain time, etc., to build a GritNet model to predict the student’s final per-
formance. Modern data mining and machine learning techniques [19] are used
for predicting student performance in small student cohorts. References [20]
compare the effect of supervised learning algorithms for student performance
prediction. References [21] build a decision tree-based algorithm, Logistic Model
Trees (LMT) to learn the intrinsic relationship between the identified feature,
which are identification of academic and socio-economic features, and students’
academic grades. References [22] apply a transfer learning methodology using
deep learning and traditional modeling techniques to study high and low repre-
sentations of unproductive persistence. References [23] extend the deep know-
ledge tracing model, which is a state-of-the-art sequential model for knowledge
tracing, to consider forgetting by incorporating multiple types of information
related to forgetting. References [24] propose an attention-based graph convolu-
tional networks model for student’s performance prediction. References [25] de-
3. Problem Statement
In this section, we introduce some mathematical notations and formally define
the given problem.
Since we need to make a timely assessment of the learning status of each stu-
dent in the online course, we propose an Attention-based Multi-layer LSTM
model for real-time student performance prediction. The mathematical defini-
tions of some concepts involved in the model are as follows.
Suppose that we have m courses, the jth course is denoted as c j , the set of
courses is denoted as C = {c1 , c2 , , c j , , cm } . Suppose there be n students
enrolled in at least one course, the ith student is denoted as si , the set of stu-
S = {s1 , s2 , , si , , sn }
dents is denoted as . For each student si , the online
education platform will collect his gender, age, highest education level and other
background information as his demographic data. There are eight items of
background information. The demographic data of student si is denoted as the
vector di . We encode the category data in di , and the encoded demographic
vector of student si is di . Thus, the demographic data set of all students is
denoted as D = {d1 , d 2 , , di , , d n } . Suppose that the course c j has a total of
K weeks, the clickstream data sequence vector of student si in the kth week of
the course c j is denoted as qijk . Thus, the clickstream dataset of student si in
{ }
the course c j is denoted as Qij = qij1 , qij2 , , qijk , , qijK . The actual outcome of
student si in the course c j is denoted as oij , which has p possibilities. When
we perform a binary prediction, the possible result of oij is pass or fail. When
we perform a four-class prediction, the possible result of oij is distinction, pass,
fail or withdrawn.
According to the definition given above, we build a model f ( ⋅) to predict
student performance, and the obtained prediction result is denoted as oij . The
model obtains the best parameters θ through learning, and then substitutes θ
into the model to obtain the predicted outcomes. The learning process of the
model is shown as Equation (1):
T ( D, Q, O, f ( ⋅ ) ) → θ (1)
where T ( ⋅) means the learning process of the model, D means the demo-
graphic data of students, Q means the clickstream data of students, O means the
actual outcomes of students, f ( ⋅) means the proposed model, θ means the
trained model parameters.
The prediction process of the model is shown as Equation (2).
f ( D, Q | θ ) → Q (2)
4. Proposed Model
Our goal is to build a model that can predict the performance of any student in
any period of any course. We hope that this model has universal applicability
and can be transferred to any course instead of only predicting a single course.
We hope that this model can predict the course from before the start of the
course, that is, week 0, to the end of the course at any time, not only after the
start of the course. Especially in the early and middle of the course, we hope to
obtain more accurate forecasts as soon as possible so that the online education
platform can issue early warnings in time to urge students to adjust their learn-
ing status. We hope that this model can predict the individual performance of
any student in the course, not just predict all students in the entire course. In
order to achieve the above-mentioned purpose, we propose an Attention-based
Multi-layer LSTM (AML) model, whose structure is shown in Figure 1, and its
specific description is as follows:
In order to obtain a reliable prediction of student results, we consider using
student clickstream data, which is inherently a time sequence. Time sequence
refers to the input sequence in which the data has a contextual relationship on
the time axis, that is, the output state generated at the current time point is not
only related to the input data at the current time point, but also related to the
data input before, and will affect the subsequent time point Output status. Text,
voice, etc. are all time sequence data. Student clickstream data is divided into
many different categories according to the content of interaction between stu-
dents and VLE platform. If we simply record the number of interactions between
each student and the VLE platform in days or weeks, we ignore the fact that dif-
ferent types of interactions have different effects on student performance.
Therefore, we keep the students’ clickstream data types and input data into our
model on a weekly basis. We utilize the LSTM structure to process the input
student clickstream data. LSTM is an effective structure for processing time se-
quence shown as Equation (3). LSTM selects and memorizes the input informa-
tion through three gating units, so that the model only remembers the key in-
formation, thereby reducing the burden of memory, so it can solve the problem
of long-term dependence.
I t = σ ( X tWi + H t −1Wi + bi )
Ft = σ ( X tW f + H t −1W f + b f )
Ot = σ ( X tWo + H t −1Wo + bo ) (3)
tanh ( X W + H W + b )
C
=t t c t −1 c c
=Ct Ft Ct −1 + I t Ct
α it =
(
exp uitT uw ) (5)
∑ exp ( u T
u
it w )
t
si = ∑ α it hit
t
where hit means the hidden vector of the student Si in time t, Ww and bw
mean weight matrix and bias, which are initialed randomly.
As the number of weeks of each course varies, we uniformly take the student
data of the first 25 weeks of the course as the input data of the model. We output
and record the prediction results every five weeks. Next, we will introduce the
experimental process of this article.
5. Experiments
In this section, we conducted some experiments to verify the effect of our pro-
posed model. First, we introduce the dataset used in the paper and our dataset
processing scheme. Second, we describe the experimental settings of the pro-
posed model. Finally, we show the experimental comparison results of the pro-
posed model and the baseline model on two classification tasks and a student
performance prediction task for the specific course, as well as perform corres-
ponding analysis.
5.1. Dataset
The Open University Learning Analytics (OULA) [14] dataset contains a series
of online education related data provided by online education platforms such as
student demographic data, student clickstream data, and course data. Student
demographic data is background information such as the student’s gender, age,
and highest education level. It is unique and is the data collected by the online
education platform when the student registers. Student clickstream data is the
type and frequency information of students interacting with the Virtual Learn-
ing Environment (VLE) platform in a course, which includes accessing re-
sources, web-page click, forum click and so on. It reflects the active degree of
students participating in the course. The OULA dataset includes 22 courses,
32,593 students, and 10,655,280 data on interactions between students and the
VLE platform. Students’ output is divided into four categories, including Dis-
tinction (D), Pass (P), Fail (F) and Withdrawn (W). When the student’s score is
higher than 75 points, his outcome is D. When the student’s score is higher than
40 but lower than 75, his outcome is P. When a student completes the course,
but the score is less than 40 points, his outcome is F. When the student does not
complete the course, his outcome is W. We use data in this dataset to train and
test our model, and compare the output of the model with the actual results.
From this, we can get the accuracy, precision, recall and F1 score of the model in
different situations.
When we use the OULA dataset, we divide the dataset differently according to
different prediction tasks. When we perform the four-class classification predic-
tion task, we retain the original four-class classification division in the dataset,
namely D, P, F and W. In the binary classification prediction task of our general
experiment, that is, the dropout prediction task, we consider D and P as P, keep
W, and discard F. The students who pass the course are divided into one cate-
gory, and the students who drop out are divided into another one category. When
we perform the binary classification prediction task on a specific course, we con-
sider D and P as P, take W and F as F. Students who will pass the course and those
who have not passed the course are divided into two opposite categories.
The above is a general test of the model. Next, to test the prediction effect of
the proposed model on a specific course, we randomly select some courses as the
test set, and use the proposed model for training and testing. This article shows
the effect of one case as an example. Our prediction tasks for specific courses are
still divided into the four categories task and the two categories task. The classi-
fication of the prediction results in the four category is the same as that de-
scribed above. For the two classification tasks, D and P are considered as P, F
and W are considered as F, that is, the students who pass the course are classified
into one category, as well as the students who fail the course are classified into
another category. The model we proposed still uses the same model parameters
as the general test on the prediction task in a specific course.
model can identify students’ performance in a specific course, and the more
accurate student performance pre-dictions can be made.
• Adding demographic data can help improving the student performance pre-
diction effects of the model. The learning status of students is easily affected
by the surrounding environment, which brings an inspiration that online
education platforms can provide more personalized teaching programs based
on the background information of different students. The influence of de-
mographic data on the final prediction results is more obvious when the
number of weeks is small, because the amount of student clickstream data
entered at this time is low, and the model is more dependent on demographic
data when making predictions. Especially when the number of weeks is 0, the
prediction of the model is completely dependent on demographic data,
which is also the key to solving the cold start problem in the task of predict-
ing student performance.
• Compared with other baseline models, the AML model has better predictive
performance. This is because the AML model adds an attention mechanism
to the DOPP model. The attention mechanism allows the model to focus on
factors that have a greater impact on the model’s student performance pre-
diction effect, thereby improving the model’s prediction accuracy, precision,
recall and F1 score.
Next, we will use the DOPP model as the baseline model to test the effect of both
the proposed model and the baseline model on the specific course classification
task. We display one case to show the effect of both the models. The specific
course classification task uses the data of the BBB course opened in the two
semesters of 2014B and 2014J as the test set, and the rest of the data as the train-
ing set. After such division, it is used as input data for training and testing. Us-
ing the experimental process given in [13] as a reference, we conducted experi-
ments when only student clickstream data were used and when student demo-
graphic data was added. We record student clickstream data as cl and student
demographic data as de. The results we get are shown in Table 4 & Table 5.
By observing Table 4 & Table 5, we can draw the conclusions as follows:
• In the four-class classification task and binary classification task, as the
number of weeks increases, the prediction effects of the baseline model and
the proposed model both improve, which is consistent with the results de-
scribed above.
• In the same situation, the AML model has better predictive performance than
the DOPP model, which shows that the AML model still has an advantage in
predicting performance in a specific course.
6. Conclusions
Different from the traditional face-to-face teaching method, the online educa-
tion method relies on the powerful Internet technology to get rid of the time and
place constraints of students in the learning process, and truly bring high-quality
education to everyone. Online education has attracted a large number of stu-
dents, and the number of students in each course far exceeds the number of stu-
dents in traditional classrooms. Due to this situation, we need to propose a me-
thod, which is to build a student performance prediction system, to ensure the
quality of online education for students. The online education platform collects
student demographic data and student clickstream data to use student perfor-
mance prediction models for tracking and analyzing student learning status in
real time. Once the student’s final performance prediction is found to be a fail-
ure or withdrawal, we can intervene in time to help students adjust their learn-
ing status and better master this course.
This article uses the Open University Learning Analytics (OULA) dataset for
analysis and proposes an Attention-based Multi-layer LSTM (AML) model. We
use student demographic data and student clickstream data to predict student
performance at the end of the period. The results show that the proposed model
is always better than other models. In other words, the AML model can predict
the student’s final performance earlier and more accurately than other models.
The reasons for the results are as follows. First, the AML model combines stu-
dents’ background information and interaction information with the online
learning platform. Second, it adds an attention layer into multi-layer LSTM
model, which helps the model pay more attention to those data that impact the
prediction effect more deeply. Therefore, it can be used to intervene in the stu-
dent’s learning state earlier to reduce the dropout rate and failure rate of the
course.
In the future, we will consider adding unused data in the OULA dataset to the
model, such as course information, students’ pre-course learning conditions,
and the time when students submit classroom tests. We try to further improve
the model’s accuracy, precision, recall and F1 score when facing different stu-
dents and different courses, especially in the initial stage of the course.
Acknowledgements
The author is grateful to Jinan University for encouraging them to do this re-
search.
Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this pa-
per.
References
[1] Volery, T. and Lord, D. (2000) Critical Success Factors in Online Education. Inter-
national Journal of Educational Management, 14, 216-223.
https://doi.org/10.1108/09513540010344731
[2] Christensen, G., Steinmetz, A., Alcorn, B., Bennett, A., Woods, D. and Emanuel, E.
(2013) The MOOC Phenomenon: Who Takes Massive Open Online Courses and
Why? SSRN, Article ID: 2350964.
[3] Bettinger, E. and Loeb, S. (2017) Promises and Pitfalls of Online Education. Evi-
dence Speaks Reports, 2, 1-4.
[4] Mackness, J., Mak, S. and Williams, R. (2010) The Ideals and Reality of Participat-
ing in a MOOC. Proceedings of the 7th International Conference on Networked
Learning, Aalborg, 3-4 May 2010, 266-275.
[5] Salal, Y., Abdullaev, S. and Kumar, M. (2019) Educational Data Mining: Student
Performance Prediction in Academic. International Journal of Engineering and
Advanced Technology, 8, 54-59.
[6] Arasaratnam-Smith, L.A. and Northcote, M. (2017) Community in Online Higher
Education: Challenges and Opportunities. Electronic Journal of e-Learning, 15,
188-198.
[7] Larreamendy-Joerns, J. and Leinhardt, G. (2006) Going the Distance with Online
Education. Review of Educational Research, 76, 567-605.
https://doi.org/10.3102%2F00346543076004567
[8] Gargano, T. and Throop, J. (2017) Logging on: Using online Learning to Support
the Academic Nomad. Journal of International Students, 7, 918-924.
https://doi.org/10.32674/jis.v7i3.308
[9] Bao, W. (2020) Covid-19 and Online Teaching in Higher Education: A Case Study
of Peking University. Human Behavior and Emerging Technologies, 2, 113-115.
https://doi.org/10.1002/hbe2.191
[10] Korkmaz, G. and Toraman, C. (2020) Are We Ready for the Post-Covid-19 Educa-
tional Practice? An Investigation into What Educators Think as to Online Learning.
International Journal of Technology in Education and Science (IJTES), 4, 293-309.
https://doi.org/10.46328/ijtes.v4i4.110
[11] Liu, D., Zhang, Y., Zhang, J., Li, Q., Zhang, C. and Yin, Y. (2020) Multiple Features
Fusion Attention Mechanism Enhanced Deep Knowledge Tracing for Student Per-
formance Prediction. IEEE Access, 8, 194894-1944903.
https://doi.org/10.1109/ACCESS.2020.3033200
[12] Pandey, M. and Taruna, S. (2016) Towards the Integration of Multiple Classifier
Pertaining to the Student’s Performance Prediction. Perspectives in Science, 8, 364-366.
https://doi.org/10.1016/j.pisc.2016.04.076
[13] Karimi, H., Huang, J. and Derr, T. (2020) A Deep Model for Predicting Online
Course Performance. 34th AAAI Conference on Artificial Intelligence, New York,
7-12 January 2020.
[14] Kuzilek, J., Hlosta, M. and Zdrahal, Z. (2017) Open University Learning Analytics
Dataset. Scientific Data, 4, Article No. 170171.
https://doi.org/10.1038/sdata.2017.171
[15] Halawa, S., Greene, D. and Mitchell, J. (2014) Dropout Prediction in Moocs Using
Learner Activity Features. Proceedings of the Second European MOOC Stake-Holder
Summit, Lausanne, 10-12 February 2014, 58-65.
[16] Sun, A. and Chen, X. (2016) Online Education and Its Effective Practice: A Research
Review. Journal of Information Technology Education: Research, 15, 157-190.
[17] Kebritchi, M., Lipschuetz, A. and Santiague, L. (2017) Issues and Challenges for
Teaching Successful Online Courses in Higher Education: A Literature Review.
Journal of Educational Technology Systems, 46, 4-29.
https://doi.org/10.1177%2F0047239516661713
[18] Kim, B.-H., Vizitei, E. and Ganapathi, V. (2018) GritNet: Student Performance Pre-
diction with Deep Learning. arXiv:1804.07405. https://doi.org/10.1111/bjet.12836
[19] Wakelam, E., Jefferies, A., Davey, N. and Sun, Y. (2020) The Potential for Student
Performance Prediction in Small Cohorts with Minimal Available Attributes. Brit-
ish Journal of Educational Technology, 51, 347-370.
https://doi.org/10.1111/bjet.12836
[20] Mohammadi, M., Dawodi, M., Tomohisa, W. and Ahmadi, N. (2019) Comparative
Study of Supervised Learning Algorithms for Student Performance Prediction. 2019
International Conference on Artificial Intelligence in Information and Communi-
cation (ICAIIC), Okinawa, 11-13 February 2019, 124-127.
https://doi.org/10.1109/ICAIIC.2019.8669085
[21] Aman, F., Rauf, A., Ali, R., Iqbal, F. and Khattak, A.M. (2019) A Predictive Model
for Predicting Students Academic Performance. 2019 10th International Conference
on Information, Intelligence, Systems and Applications (IISA), Patras, 15-17 July
2019, 1-4. https://doi.org/10.1109/IISA.2019.8900760
[22] Botelho, A.F., Varatharaj, A., Patikorn, T., Doherty, D., Adjei, S.A. and Beck, J.E.
(2019) Developing Early Detectors of Student Attrition and Wheel Spinning Using
Deep Learning. IEEE Transactions on Learning Technologies, 12, 158-170.
https://doi.org/10.1109/TLT.2019.2912162
[23] Nagatani, K., Zhang, Q., Sato, M., Chen, Y.-Y., Chen, F. and Ohkuma, T. (2019)
Augmenting Knowledge Tracing by Considering Forgetting Behavior. The World
Wide Web Conference, San Francisco, 13-17 May 2019, 3101-3107.
https://doi.org/10.1145/3308558.3313565
[24] Hu, Q. and Rangwala, H. (2019) Reliable Deep Grade Prediction with Uncertainty
Estimation. 9th International Conference on Learning Analytics & Knowledge, 4-8
March 2019, 76-85. https://doi.org/10.1145/3303772.3303802
[25] Su, Y., Liu, Q., Liu, Q., Huang, Z., Yin, Y., Chen, E., et al. (2018) Exercise-Enhanced
Sequential Modeling for Student Performance Prediction. 32nd AAAI Conference
on Artificial Intelligence, New Orleans, 2-7 February 2018, 2435-2443.
[26] Sood, S. and Saini, M. (2021) Hybridization of Cluster-Based LDA and ANN for
Student Performance Prediction and Comments Evaluation. Education and Infor-
mation Technologies, 26, 2863-2878. https://doi.org/10.1007/s10639-020-10381-3
[27] Lu, H. and Yuan, J. (2018) Student Performance Prediction Model Based on Dis-
criminative Feature Selection. International Journal of Emerging Technologies in
Learning, 13, 55-68. https://doi.org/10.3991/ijet.v13i10.9451
[28] Rizvi, S., Rienties, B. and Khoja, S.A. (2019) The Role of Demographics in Online
Learning; A Decision Tree Based Approach. Computers & Education, 137, 32-47.
https://doi.org/10.1016/j.compedu.2019.04.001
[29] Kloft, M., Stiehler, F., Zheng, Z. and Pinkwart, N. (2014) Predicting MOOC Dro-
pout over Weeks Using Machine Learning Methods. Proceedings of the EMNLP
2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs, Doha,
October 2014, 60-65. http://dx.doi.org/10.3115/v1/W14-4111
[30] Waheed, H., Hassan, S.-U., Aljohani, N.R., Hardman, J., Alelyani, S. and Nawaz, R.
(2020) Predicting Academic Performance of Students from VLE Big Data Using
Deep Learning Models. Computers in Human Behavior, 104, Article ID: 106189.
https://doi.org/10.1016/j.chb.2019.106189
[31] Huang, F., Zhang, X. and Li, Z. (2018) Learning Joint Multimodal Representation
with Adversarial Attention Networks. Proceedings of the 26th ACM International
Conference on Multimedia, New York, 22-26 October 2018, 1874-1882.
https://doi.org/10.1145/3240508.3240614
[32] Huang, F., Xu, J. and Weng, J. (2020) Multi-Task Travel Route Planning with a
Flexible Deep Learning Framework. IEEE Transactions on Intelligent Transporta-
tion Systems, 22, 3907-3918. https://doi.org/10.1109/TITS.2020.2987645
[33] Huang, F., Wei, K., Weng, J. and Li, Z. (2020) Attention-Based Modality-Gated
Networks for Image-Text Sentiment Analysis. ACM Transactions on Multimedia
Computing, Communications, and Applications (TOMM), 16, Article No. 79.
https://doi.org/10.1145/3388861
[34] Huang, F., Jolfaei, A. and Bashir, A.K. (2021) Robust Multimodal Representation
Learning with Evolutionary Adversarial Attention Networks. IEEE Transactions on
Evolutionary Computation, 17 March 2021, p. 1.
https://doi.org/10.1109/TEVC.2021.3066285
[35] Huang, F., Li, C., Gao, B., Liu, Y., Alotaibi, S. and Chen, H. (2021) Deep Attentive
Multimodal Network Representation Learning for Social Media Images. ACM Trans-
actions on Internet Technology (TOIT), 21, Article No. 69.
https://doi.org/10.1145/3417295
[36] Aljohani, N.R., Fayoumi, A. and Hassan, S.-U. (2019) Predicting At-Risk Students
Using Clickstream Data in the Virtual Learning Environment. Sustainability, 11,
Article No. 7238. https://doi.org/10.3390/su11247238
[37] Hassan, S.-U., Waheed, H., Aljohani, N.R., Ali, M., Ventura, S. and Herrera, F.
(2019) Virtual Learning Environment to Predict Withdrawal by Leveraging Deep
Learning. International Journal of Intelligent Systems, 34, 1935-1952.
https://doi.org/10.1002/int.22129
[38] Jha, N.I., Ghergulescu, I. and Moldovan, A.-N. (2019) OULAD MOOC Dropout
and Result Prediction Using Ensemble, Deep Learning and Regression Techniques.
Proceedings of the 11th International Conference on Computer Supported Educa-
tion: CSEDU, Vol. 2, Heraklion, 2-4 May 2017, 154-164.
https://doi.org/10.5220/0007767901540164
[39] Hlosta, M., Herrmannova, D., Vachova, L., Kuzilek, J., Zdrahal, Z. and Wolff, A.
(2018) Modelling Student Online Behaviour in a Virtual Learning Environment.
arXiv: 1811.06369.