Tibetan interrogative sentence recognition and classification based on phrase features
MATEC Web of Conferences 336, 06017 (2021)
CSCNS2020
https://doi.org/10.1051/matecconf/202133606017
Tibetan interrogative sentence recognition and
classification based on phrase features
Mabao Ban1,3,4,, Zhijie Cai1,2,3,4*, Rangzhuoma Cai1,2,3,4, and Rangjia Cai1,3,4
1College
of Computer Science and Technology, Qinghai Normal University, Qinghai Xining 810016,
China
2School of Computer Science and Technology, Southwest Minzu University, Sichuan Chengdu
610041, China
3Tibetan Information Processing and Machine Translation Key Laboratory of Qinghai Province,
Qinghai Xining 810008, China
4Key Laboratory of Tibetan Information Processing, Ministry of Education, Qinghai Xining 810008,
China
Abstract. The recognition of Tibetan interrogative sentences is a basic
work in natural language processing, which has a wide application value in
terms of Tibetan syntactic analysis, semantic analysis, intelligent question
answering, search engine and other research fields. Employing
interrogative pronouns as a entry point to analyze the phrase features
before and after interrogative pronouns, the paper proposes a method for
Tibetan interrogative sentence recognition and classification based on
phrase features by designing a Tibetan interrogative sentence recognition
and classification model based on phrase features. Experimental results
show that the recognition accuracy, recall rate and F value of this method
are 98.21%, 100.00% and 99.10% respectively, and the average
classification accuracy, recall rate and F value are 96.98%, 100.00% and
98.39%, respectively.
1 Introduction
With the development of computer technology, the research of Tibetan natural language
processing has gradually developed from word level to sentence level. Tibetan interrogative
sentence is a common sentence pattern, and its recognition and classification is one of the
key technologies in Tibetan syntactic analysis, semantic analysis, intelligent question
answering, search engine and other tasks.
In the recognition methods of sentences and sentence patterns, the commonly used
methods are rule method, statistical method and the combination of rules and statistics, etc.
There are many documents on Chinese sentence pattern recognition. Literature [1-4]
employs different methods to identify and classify Chinese subjective sentences,
explanatory opinion sentences, opinion sentences, and graceful sentences, all of which have
achieved good experimental results. In terms of Tibetan sentence and sentence pattern
recognition, because there is no obvious boundary symbol in Tibetan sentence, the current
*
Corresponding author:
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons
Attribution License 4.0 (http://creativecommons.org/licenses/by/4.0/).
MATEC Web of Conferences 336, 06017 (2021)
CSCNS2020
https://doi.org/10.1051/matecconf/202133606017
research mainly focuses on sentence boundary recognition technology [5-14], which
provides a theoretical basis for the study of Tibetan sentence boundary recognition. The
research on Tibetan sentence pattern recognition and classification technology has not been
reported. The research shows that identifying different sentence patterns and classifying
them can improve the performance of question answering system. Analyzed the phrase
features before and after interrogative pronouns.
2 Tibetan interrogative sentence recognition and classification
based on phrase features
2.1 Tibetan interrogative sentence recognition and classification model
In Tibetan written language, each interrogative sentence contains at least one interrogative
pronoun with distinct structural features. Taking interrogative pronouns as the starting point,
this paper designs a Tibetan interrogative sentence recognition and classification model
with phrase features as shown in Fig.1.
Phrase feature analysis module
Interrogative word recognition
Analysis of phrase features
ry1
Tibetan
sentence
bank
ry2
Feature1
TIS1
Feature2
TIS2
…
ry3
Interrogative sentence recognition module
Feature8
…
TIS8
Fig.1. Tibetan interrogative sentence recognition and classification model based on phrase features.
The Tibetan interrogative sentence recognition and classification model based on phrase
features includes phrase feature analysis and question sentence recognition module. There
are two parts in the phrase feature analysis module: interrogative word recognition and
phrase feature analysis. In the part of interrogative word recognition, interrogative pronouns
are identified by ry1, ry2, and ry3. The phrase feature analysis part obtains the phrase
feature Feature1 or Feature2 or...or Feature8 of the corresponding question sentence by
analyzing the phrase features before and after ry. The interrogative sentence recognition
module recognizes and classifies Tibetan interrogative sentences exploits phrase
characteristics.
2.2 An analysis of the features of Tibetan interrogative sentences
Tibetan interrogative sentence is a sentence pattern classified according to the mood of the
sentence. It is a sentence that asks others questions about the type and nature of the things
in question [15-18]. Compared with declarative sentences, imperative sentences and
exclamatory sentences, Tibetan interrogative sentences have obvious differences in mood
and emotional color.However, the current technology can not identify interrogative
sentences according to mood and emotional color. By analyzing the structural features of
Tibetan interrogative sentences, we find that each interrogative sentence contains at least
one interrogative word (called interrogative pronoun ry in part of speech marker set, also
known as interrogative pronoun below). Tibetan interrogative pronouns are very clear and
limited in number. In order to analyze the features of interrogative sentences, we divide
2
MATEC Web of Conferences 336, 06017 (2021)
CSCNS2020
https://doi.org/10.1051/matecconf/202133606017
interrogative pronouns into three categories. The classification of Tibetan interrogative
pronouns is shown in Table 1.
Table 1. Classification of Tibetan interrogative pronouns.
Serial number
1
2
3
type
ry1
ry2
ry3
Interrogative pronouns
གམ་ངམ་དམ་ནམ་བམ་མམ་འམ་རམ་ལམ་སམ
ཅི་ཇི་�་གང་�་ནམ
ཨེ
In Table 1, except for "ནམ", all the others belong to one type, and there is no multicategory problem. The type of the interrogative pronoun "ནམ" can be judged according to its
position and context. When it appears after the verb, adjective or auxiliary verb, it belongs
to ry1, otherwise it belongs to ry2.
Employ interrogative pronouns as an entry point, we analyze the grammatical structure
and structural characteristics of Tibetan interrogative sentences. According to the different
combination characteristics of interrogative pronouns and their contexts, we can divide
them into general interrogative sentence (TIS1), emphatic interrogative sentence (TIS2),
specific interrogative sentence (...truncated)