Semantic relation identification for consecutive predicative constituents in Chinese
Huang et al. Lingua Sinica
Semantic relation identification for consecutive predicative constituents in Chinese
Shu-Ling Huang 1
Keh-Jiann Chen 0
Wei-Yun Ma 0
Su-Chu Lin 0
Yu-Ming Hsieh 0
0 Institute of Information Science , Academia Sinica , Taiwan
1 Research Center for Translation, Compilation and Language Education, National Academy for Educational Research , Taiwan
In this paper, we propose a general methodology for designing semantic role/ relation system. Based on this methodology, we establish a succinct semantic relation system for consecutive predicative constituents for Chinese, which includes serial verb construction, discourse construction, and other constructions describing serial events. This semantic relation system has 13 middle-level classes and 24 fine-grained sub-classes in contrast to conventional complex classification schemes and meets the uniqueness and completeness criteria of semantic relation identification. We conduct experiments on our system by training four annotators in 1 h to label 200 sentences extracted from Sinica Treebank and HIT-CDTB. With the help of our predesigned feature-based decision tree and a connective markers checklist, the annotators attain a 73% consistency with the reference standard annotation and substantial agreement by Cohen's kappa coefficient for middle-level labeling. By analyzing the labeling error types, we slightly revise our classification scheme and propose six methods to improve the classification and labeling system, hoping to achieve even better agreement in the future.
Semantic relation identification; Semantic roles; Feature-based semantic relation system; Serial verb construction; Discourse construction; Discourse relation recognition
1 Introduction
Essential to natural language understanding are the processes of part-of-speech
tagging, parsing, and semantic relation identification. In this paper, our objective is to
clarify the relations between consecutive predicative constituents (abbreviated CPCs),
which include serial verb construction (abbreviated SVC), discourse construction, and
other constructions describing serial events in Chinese text and to find a good and
workable semantic relation system for semantic role/relation labeling tasks. As a
consequence, a methodology of semantic role/relation design methodology was also
established.
Chinese has various constructions to juxtapose two predicative constituents, such as
compounding, coordinate constructions, serial verb constructions, and discourse
constructions. Each CPC may be with/without an overt syntactic marking of the semantic
relation between the described events, for example, 戰敗投降 zhànbài tóu xiáng
'defeated and surrendered'. Whereas in English the conjunction and is used to mark
a simple coordination or temporal succession between VPs, in Chinese, the two VP
constituents are simply adjoined. CPCs may occur in a simple sentence, as shown
in (1a) and (1b), and are termed serial verb construction
(Aikhenvald 2006; Lin et
al. 2012; Tao 2009)
; that which occurs with coherent sequences of sentences as
given in (1c) is called discourse construction
(Hovy and Maier 1992; Prasad et al.
2008; Wolf and Gibson 2005)
.
(1) a. 大人們趕著上山(V1)打虎(V2) means-purpose
dàrénmen__gǎnzhe__shàngshān__dǎhǔ
the-adults__hurried__go-uphill__hunt-the-tiger
The adults hurried to go uphill (V1) to hunt the tiger (V2).
b. 一大早上山(V1)累壞(V2)學生了 cause-result
yīdàzǎo__shàngshān__lèihuài__xuésheng__le
early-in-the-morning__go-up-hill__tire-out__student__LE
It tired the students out (V2) to go uphill (V1) early in the morning.
c. 如遇下雪(V1), 一般車輛避免(V2)上山, 以免發生(V3)危險
condition-result between V1 and V2; event-avoidance between V2 and V3
rú__yù__xiàxuě__yībān__chēliàng__bìmiǎn__shàngshān,__yǐmiǎn__
fāshēng__wéixiǎn
if__encounter__snow__ordinary__vehicle__avoid__go-up-hill,__lest__
occur__danger
If it snows (V1), it is better to avoid (V2) driving uphill lest danger occurs (V3).
Most studies discuss different constructions separately. However, when studying
semantic relations between CPCs in different constructions, it is not necessary to
regard them as distinct phenomena.
Zhou and Xue (2012)
described four
characteristics which blur the boundary between discourse construction and serial
verb construction. They are as follows: (i) semi-colon is not always used to separate
the sentences; (ii) in most of the cases, no explicit discourse connectives are used to
denote the discourse relations; (iii) no inflectional clues to differentiate free adjuncts
and main clauses; and (iv) both subject and object can be dropped in Chinese. For
instance in (2), there are no essential reasons we need to separate (a) and (b) into
different categories of discourse construction and serial verb construction.
(2) a. 她丈夫車禍過世了。 SVC
tā__zhàngfu__chēhuò__guòshì__le
she__husband__car-accident__died
Her husband died in a car accident.
b. 她丈夫車禍, 所以過世了。 Discourse Construction
tā__zhàngfu__chēhuò,__suǒyǐ__guòshì__le
she__husband__car-acci (...truncated)