Student achievement prediction using deep neural network from multi-source campus data
Complex & Intelligent Systems
https://doi.org/10.1007/s40747-022-00731-8
ORIGINAL ARTICLE
Student achievement prediction using deep neural network from
multi-source campus data
Xiaoyong Li1,2
· Yong Zhang1 · Huimin Cheng2 · Mengran Li1 · Baocai Yin1
Received: 1 November 2021 / Accepted: 27 March 2022
© The Author(s) 2022
Abstract
Finding students at high risk of poor academic performance as early as possible plays an important role in improving
education quality. To do so, most existing studies have used the traditional machine learning algorithms to predict students’
achievement based on their behavior data, from which behavior features are extracted manually thanks to expert experience
and knowledge. However, owing to an increase in the varieties and overall volume of behavioral data, it has become more and
more challenging to identify high-quality handcrafted features. In this paper, we propose an end-to-end deep learning model
that automatically extracts features from students’ multi-source heterogeneous behavior data to predict academic performance.
The key innovation of this model is that it uses long short-term memory networks to capture inherent time-series features for
each type of behavior, and it takes two-dimensional convolutional networks to extract correlation features among different
behaviors. We conducted experiments with four types of daily behavior data from students of the university in Beijing. The
experimental results demonstrate that the proposed deep model method outperforms several machine learning algorithms.
Keywords Academic performance prediction · Time-series features · Correlation features · LSTM · 2DCNN
Introduction
Students’ performance is a key indicator in measuring the
quality of academic education and is also closely related
to students’ mental health. Related studies have shown that
students with poor academic performance are prone to anxiety and depression [1], and their risk of suicide is much
higher than that of students with excellent performance [2,3].
Achievement prediction aims to identify students with high
academic risk in advance, which reminds administrators,
B Yong Zhang
Xiaoyong Li
Huimin Cheng
1
Beijing Key Laboratory of Multimedia and Intelligent
Software Technology, Beijing Artificial Intelligence Institute,
Faculty of Information Technology, Beijing University of
Technology, 100 Pingleyuan, Chaoyang District, Beijing
100124, China
2
Information Technology Support Center, Beijing University
of Technology, 100 Pingleyuan, Chaoyang District, Beijing
100124, China
teachers, and students themselves of taking timely targeted
intervention actions to avoid poor performance, such as failing courses, dropping out, staying out, and so on. Therefore,
student achievement prediction has been receiving extensive
attention and research.
Factors affecting academic achievement are complex and
diverse. To explore related factors, researchers in various
fields have done lots of work. For example, literature [4]
explored the relationship between cognitive abilities and
academic performance. Literatures [5,6] expounded the correlation between “Big Five traits” (openness to experience,
conscientiousness, extraversion, agreeableness, and neuroticism) and academic achievement. Studies [7–9] show that
good sleep habits are helpful to improve academic performance. Literatures [10–13] conclude that moderate physical
activity can facilitate the improvement of academic achievement. Literature [14] shows that binge eating and purging
behaviors lead to relatively poor academic performance, and
literature [15] shows that the influence in girls is higher
than in boys. Literatures [16–19] show that bad habits of
using social software and electronic devices affect academic achievement. These studies demonstrate the strong
correlation between various behavior-related factors and academic performance, and provide guidance and suggestions
123
Complex & Intelligent Systems
for managers and teachers to improve students’ academic
achievement. However, most data used in these studies were
collected from questionnaires or self-reports, which usually
suffer from small sample size and social desirability bias.
With the rapid development of digital campus in recent
years, many information systems are deployed on campus,
such as learning management system (LMS), smart card system, gateway system, access control system, and so on, which
truly record various behavior data of students in the learning and living processes. Compared with data obtained from
questionnaires, these data objectively reflect students’ behavior patterns and cover a large number of samples, which
provides an great opportunity for performance prediction.
Because there is an obvious correlation between learning
behavior and academic achievement, many studies [20–22]
create predictive models by analyzing students’ learning
behavior patterns from LMS log files, such as video watching,
homework submitting, and BBS discussion. Unfortunately,
these learning behavior data are limited to specific courses,
so the models trained on a specific course cannot be well
generalized to other courses. Furthermore, many courses in
the traditional face-to-face education are not taught through
LMS, in which there are little available learning data to predict achievement.
Daily living behavior data are another important data
source that describe students’ campus behavior patterns,
they include dining behavior, shopping behavior, library
entry behavior, web page browsing behavior, and so on.
Be different from learning behavior on LMS, living behavior can be recorded for every student living on campus,
which provide a much broader and available data source for
performance prediction. Based on them, related studies [23–
28] artificially extracted features from raw behavioral data
relying on expert knowledge, such as breakfast frequency,
Internet time, orderliness, diligence, sleep pattern, and so
on, and then constructed prediction models using machine
learning algorithms. However, the following challenges are
encountered when manually extracting features from massive multi-source living data: (1) the quality and number of
features are directly influenced by expert knowledge, and it is
difficult to extract high-quality features by understanding the
overall distribution of massive data; (2) although some features such as orderliness express the regularity of behavior,
they still cannot fully represent the temporal characteristics of
time-series behavior data; (3) the correlation between multisource behavior data need to be further mined.
To address the aforementioned challenges, we put forward
a novel academic performance prediction method based on
deep neural network (DNN), in which behavioral features are
automatically learned instead of being extracted manually,
Long-Short-Term Memory (LSTM) networks are applied
to model the temporal characteristics of behavior data, and
two-dimensional convolu (...truncated)