Student achievement prediction using deep neural network from multi-source campus data

Complex & Intelligent Systems, May 2022

Finding students at high risk of poor academic performance as early as possible plays an important role in improving education quality. To do so, most existing studies have used the traditional machine learning algorithms to predict students’ achievement based on their behavior data, from which behavior features are extracted manually thanks to expert experience and knowledge. However, owing to an increase in the varieties and overall volume of behavioral data, it has become more and more challenging to identify high-quality handcrafted features. In this paper, we propose an end-to-end deep learning model that automatically extracts features from students’ multi-source heterogeneous behavior data to predict academic performance. The key innovation of this model is that it uses long short-term memory networks to capture inherent time-series features for each type of behavior, and it takes two-dimensional convolutional networks to extract correlation features among different behaviors. We conducted experiments with four types of daily behavior data from students of the university in Beijing. The experimental results demonstrate that the proposed deep model method outperforms several machine learning algorithms.

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s40747-022-00731-8.pdf

Student achievement prediction using deep neural network from multi-source campus data

Complex & Intelligent Systems https://doi.org/10.1007/s40747-022-00731-8 ORIGINAL ARTICLE Student achievement prediction using deep neural network from multi-source campus data Xiaoyong Li1,2 · Yong Zhang1 · Huimin Cheng2 · Mengran Li1 · Baocai Yin1 Received: 1 November 2021 / Accepted: 27 March 2022 © The Author(s) 2022 Abstract Finding students at high risk of poor academic performance as early as possible plays an important role in improving education quality. To do so, most existing studies have used the traditional machine learning algorithms to predict students’ achievement based on their behavior data, from which behavior features are extracted manually thanks to expert experience and knowledge. However, owing to an increase in the varieties and overall volume of behavioral data, it has become more and more challenging to identify high-quality handcrafted features. In this paper, we propose an end-to-end deep learning model that automatically extracts features from students’ multi-source heterogeneous behavior data to predict academic performance. The key innovation of this model is that it uses long short-term memory networks to capture inherent time-series features for each type of behavior, and it takes two-dimensional convolutional networks to extract correlation features among different behaviors. We conducted experiments with four types of daily behavior data from students of the university in Beijing. The experimental results demonstrate that the proposed deep model method outperforms several machine learning algorithms. Keywords Academic performance prediction · Time-series features · Correlation features · LSTM · 2DCNN Introduction Students’ performance is a key indicator in measuring the quality of academic education and is also closely related to students’ mental health. Related studies have shown that students with poor academic performance are prone to anxiety and depression [1], and their risk of suicide is much higher than that of students with excellent performance [2,3]. Achievement prediction aims to identify students with high academic risk in advance, which reminds administrators, B Yong Zhang Xiaoyong Li Huimin Cheng 1 Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing Artificial Intelligence Institute, Faculty of Information Technology, Beijing University of Technology, 100 Pingleyuan, Chaoyang District, Beijing 100124, China 2 Information Technology Support Center, Beijing University of Technology, 100 Pingleyuan, Chaoyang District, Beijing 100124, China teachers, and students themselves of taking timely targeted intervention actions to avoid poor performance, such as failing courses, dropping out, staying out, and so on. Therefore, student achievement prediction has been receiving extensive attention and research. Factors affecting academic achievement are complex and diverse. To explore related factors, researchers in various fields have done lots of work. For example, literature [4] explored the relationship between cognitive abilities and academic performance. Literatures [5,6] expounded the correlation between “Big Five traits” (openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism) and academic achievement. Studies [7–9] show that good sleep habits are helpful to improve academic performance. Literatures [10–13] conclude that moderate physical activity can facilitate the improvement of academic achievement. Literature [14] shows that binge eating and purging behaviors lead to relatively poor academic performance, and literature [15] shows that the influence in girls is higher than in boys. Literatures [16–19] show that bad habits of using social software and electronic devices affect academic achievement. These studies demonstrate the strong correlation between various behavior-related factors and academic performance, and provide guidance and suggestions 123 Complex & Intelligent Systems for managers and teachers to improve students’ academic achievement. However, most data used in these studies were collected from questionnaires or self-reports, which usually suffer from small sample size and social desirability bias. With the rapid development of digital campus in recent years, many information systems are deployed on campus, such as learning management system (LMS), smart card system, gateway system, access control system, and so on, which truly record various behavior data of students in the learning and living processes. Compared with data obtained from questionnaires, these data objectively reflect students’ behavior patterns and cover a large number of samples, which provides an great opportunity for performance prediction. Because there is an obvious correlation between learning behavior and academic achievement, many studies [20–22] create predictive models by analyzing students’ learning behavior patterns from LMS log files, such as video watching, homework submitting, and BBS discussion. Unfortunately, these learning behavior data are limited to specific courses, so the models trained on a specific course cannot be well generalized to other courses. Furthermore, many courses in the traditional face-to-face education are not taught through LMS, in which there are little available learning data to predict achievement. Daily living behavior data are another important data source that describe students’ campus behavior patterns, they include dining behavior, shopping behavior, library entry behavior, web page browsing behavior, and so on. Be different from learning behavior on LMS, living behavior can be recorded for every student living on campus, which provide a much broader and available data source for performance prediction. Based on them, related studies [23– 28] artificially extracted features from raw behavioral data relying on expert knowledge, such as breakfast frequency, Internet time, orderliness, diligence, sleep pattern, and so on, and then constructed prediction models using machine learning algorithms. However, the following challenges are encountered when manually extracting features from massive multi-source living data: (1) the quality and number of features are directly influenced by expert knowledge, and it is difficult to extract high-quality features by understanding the overall distribution of massive data; (2) although some features such as orderliness express the regularity of behavior, they still cannot fully represent the temporal characteristics of time-series behavior data; (3) the correlation between multisource behavior data need to be further mined. To address the aforementioned challenges, we put forward a novel academic performance prediction method based on deep neural network (DNN), in which behavioral features are automatically learned instead of being extracted manually, Long-Short-Term Memory (LSTM) networks are applied to model the temporal characteristics of behavior data, and two-dimensional convolu (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007/s40747-022-00731-8.pdf
Article home page: https://link.springer.com/article/10.1007/s40747-022-00731-8

Li, Xiaoyong, Zhang, Yong, Cheng, Huimin, Li, Mengran, Yin, Baocai. Student achievement prediction using deep neural network from multi-source campus data, Complex & Intelligent Systems, 2022, pp. 1-14, DOI: 10.1007/s40747-022-00731-8