Sentiment analysis of MOOC reviews via ALBERT-BiLSTM model
MATEC Web of Conferences 336, 05008 (2021)
CSCNS2020
Sentiment analysis of
ALBERT-BiLSTM model
https://doi.org/10.1051/matecconf/202133605008
MOOC
reviews
via
Cheng Wang1, Sirui Huang2, and Ya Zhou1,*
1
Guangxi Key Lab of Trusted Software, Guilin University of Electronic Technology, 541004 Guilin
Guangxi, China
2
Electronic and Electrical Engineering Department, University College London, WC1E 6BT London,
UK
Abstract. The accurate exploration of the sentiment information in
comments for Massive Open Online Courses (MOOC) courses plays an
important role in improving its curricular quality and promoting MOOC
platform’s sustainable development. At present, most of the sentiment
analyses of comments for MOOC courses are actually studies in the
extensive sense, while relatively less attention is paid to such intensive
issues as the polysemous word and the familiar word with an upgraded
significance, which results in a low accuracy rate of the sentiment analysis
model that is used to identify the genuine sentiment tendency of course
comments. For this reason, this paper proposed an ALBERT-BiLSTM
model for sentiment analysis of comments for MOOC courses. Firstly,
ALBERT was used to dynamically generate word vectors. Secondly, the
contextual feature vectors were obtained through BiLSTM pre-sequence
and post-sequence, and the attention mechanism that could calculate the
weight of different words in a sentence was applied together. Finally, the
BiLSTM output vectors were input into Softmax for the classification of
sentiments and prediction of the sentimental tendency. The experiment was
performed based on the genuine data set of comments for MOOC courses.
It was proved in the result that the proposed model was higher in accuracy
rate than the already existing models.
1 Introduction
With the rapid development of Internet technology, the online learning platform
Massive Open Online Courses (MOOC) has attracted wide attention. Many learners leave
comments in the commenting section when attending the courses. These comments not only
include evaluation of the curricular quality, but also give direct feedbacks to some technical
problems existing in the MOOC platform. While it is extremely difficult to apply manual
methods in implementing statistics on and performing analysis of the large amounts of
comments as information data, it is better to apply sentiment analysis to obtain the
sentiment tendency of comment texts and to extract and explore the information from
*
Corresponding author:
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons
Attribution License 4.0 (http://creativecommons.org/licenses/by/4.0/).
MATEC Web of Conferences 336, 05008 (2021)
CSCNS2020
https://doi.org/10.1051/matecconf/202133605008
massive amounts of comment data, which will be helpful not only for learners to make
choices of courses and but also for platform administrators to find out certain problems.
The existing sentiment analyses of comments for MOOC courses can be roughly
divided into three categories. The research method based on the sentiment dictionary is an
important way of analyzing textual sentiments. Araque et al. [1] indicated that the result
would be influenced to a certain extent when choosing vocabulary from different data sets
of different fields. Kim et al. [2] improved the accuracy rate of textual sentiment analysis,
by extending the existing sentiment dictionary with the sentiment vocabulary collected
manually. However, the sentiment dictionary had the defects of low universality and
instantaneity. In Traditional machine learning, the task of sentiment analysis was
accomplished through the method of feature engineering. Cai et al. [3] first structuralized
the text waiting for procession with a sentiment dictionary, and Gradient Boost Decision
Tree (GBDT) model for training and prediction, achieving a better result than using the
single model. However, a high labor cost was inevitable when extracting features manually
with the traditional machine learning method, so it was not suitable to be used for exploring
and analyzing massive data of curricular comments at the present stage. Currently, in-depth
learning has become the mainstream technology of textual sentiment analysis. Long et al. [4]
analyzed comments on social media, overcame the deficiencies of the traditional sentiment
analysis model and achieved a good result in numerous grammar databases of different
fields. Devlin et al. [5] proposed the BERT textual pre-training model which operated well
in performing the task of textual classification. It has abandoned the traditional structure of
convolution and recurrent neural network (RNN), and used the Transformer structure to
build the overall network model. In the pre-training process, it could learn the grammar and
semantics of the language incrementally. However, when encoding the contextual
information, the BERT pre-training model, based only on the attention mechanism without
considering the part of speech, would be trapped in misjudgment. Google launched
ALBERT model in 2019. Compared with BERT, it used fewer parameters and took up less
memory, which greatly improved the training speed and accuracy. At present, there are still
few sentiment analyses which were made by using ALBERT pre-training model to study
the comments for MOOC courses. On this basis, we proposed the ALBERT-BiLSTM
model for the sentiment analysis of comments for MOOC courses.
2 Related work
BERT applied a Transformer compiler with self-attention mechanism in the whole pretraining process. As a multi-task model, it could capture the bi-directional relationship in
sentences more thoroughly and realize the bi-directional learning of linguistic
representation in all layers. However, BERT needed a number of parameters and took up
huge resources, so BRET, a lightweight version of ALBERT, was adopted in this thesis. To
a certain extent, ALBERT solved the setbacks of the BERT model in terms of multiple
parameters and large-resource occupancy by adopting three methods: factorization, crosslayer parameter sharing and inter-sentence coherence.
In the whole sequence modeling process, RNN could capture a long-term dependent
relationship and obtain a word vector with global contextual information. As a variant of
RNN, LSTM, to a certain extent, alleviated the problem of RNN in gradient disappearance,
but it could only process sequence data from the forward direction, whereas it was also very
important to process sequence data from the backward direction in the classification of
textual sentiment. However, the basic component of BiLSTM was indeed the LSTM which
was composed of forward-direction LSTM and backward-direction LSTM, so it could
apply the two mutually independent hidden layers to process data from forward and
backward directions simultaneously so as to obtain complete semantic information.
2
MATEC Web of Conferences 336, 0 (...truncated)