Dependency-based Siamese long short-term memory network for learning sentence representations
March
Dependency-based Siamese long short-term memory network for learning sentence representations
Wenhao Zhu 0 1
Tengjun Yao 0 1
Jianyue Ni 0 1
Baogang Wei 1
Zhiguo Lu 1
0 School of Computer Engineering and Science, Shanghai University , Shanghai , China , 2 College of Computer Science and Technology, Zhejiang University , Zhejiang , China , 3 Library of Shanghai University, Shanghai University , Shanghai , China
1 Editor: Xuchu Weng, Hangzhou Normal University , CHINA
Textual representations play an important role in the field of natural language processing (NLP). The efficiency of NLP tasks, such as text comprehension and information extraction, can be significantly improved with proper textual representations. As neural networks are gradually applied to learn the representation of words and phrases, fairly efficient models of learning short text representations have been developed, such as the continuous bag of words (CBOW) and skip-gram models, and they have been extensively employed in a variety of NLP tasks. Because of the complex structure generated by the longer text lengths, such as sentences, algorithms appropriate for learning short textual representations are not applicable for learning long textual representations. One method of learning long textual representations is the Long Short-Term Memory (LSTM) network, which is suitable for processing sequences. However, the standard LSTM does not adequately address the primary sentence structure (subject, predicate and object), which is an important factor for producing appropriate sentence representations. To resolve this issue, this paper proposes the dependency-based LSTM model (D-LSTM). The D-LSTM divides a sentence representation into two parts: a basic component and a supporting component. The D-LSTM uses a pre-trained dependency parser to obtain the primary sentence information and generate supporting components, and it also uses a standard LSTM model to generate the basic sentence components. A weight factor that can adjust the ratio of the basic and supporting components in a sentence is introduced to generate the sentence representation. Compared with the representation learned by the standard LSTM, the sentence representation learned by the DLSTM contains a greater amount of useful information. The experimental results show that the D-LSTM is superior to the standard LSTM for sentences involving compositional knowledge (SICK) data.
-
Funding: The work of this paper is supported by
National Natural Science Foundation of China (No.
61572434 and No. 61303097) to WZ. The funder
had no role in study design, data collection and
analysis, decision to publish, or preparation of the
manuscript.
Competing interests: The authors have declared
that no competing interests exist.
Introduction
Learning textual representations is a vital part of natural language processing (NLP) and
important for subsequent NLP tasks. Recently, the study of representations of phrases and
sentences has attracted the attention of many researchers, who have achieved a degree of success
[
1
].
Studies of short textual representations have attained a number of achievements, and
Miklov's continuous bag of words (CBOW) model and the skip-gram model (continuous
skipgram model) are among the most famous models. The word representations learned from
these models present a relatively good performance in many NLP tasks, including word
analogies [
2, 3
]. Recently, interests have shifted towards extensions of these ideas beyond the
individual word-level to larger bodies of text, such as sentences. Researchers hope to directly learn
sentence representation via the sum or average based on the word representation, and they
have achieved satisfactory results for certain simple NLP tasks [4]. Because of the variable
length and complex structure of sentences, these simple algorithms cannot handle complex
tasks (such as evaluating the similarity between two sentences). To resolve this problem, Kiros,
Tai and Le have proposed methods of learning fixed-length sentence representations [5±7].
Among all models for learning sentence representations, recurrent neural network (RNN)
models, especially the Long Short-Term Memory (LSTM) model [
8
], are among the most
appropriate models for processing sentences, and they have achieved substantial success
in text categorization [
9
] and machine translation [
10
]. Therefore, this paper has also
introduced LSTM networks into a dependency-based Siamese LSTM model (D-LSTM) for better
performance.
In this paper, a sentence is composed of two parts, namely, the basic component and the
supporting component. We have improved upon the traditional method, which employs
standard LSTM to learn sentence representations, and proposed the D-LSTM, which is based on
sentence dependency to learn sentence representations. The D-LSTM can read sentences with
different lengths to generate fixed-length representations. The basic component, which
contain (...truncated)