Semantic relation identification for consecutive predicative constituents in Chinese

Lingua Sinica, Oct 2017

In this paper, we propose a general methodology for designing semantic role/relation system. Based on this methodology, we establish a succinct semantic relation system for consecutive predicative constituents for Chinese, which includes serial verb construction, discourse construction, and other constructions describing serial events. This semantic relation system has 13 middle-level classes and 24 fine-grained sub-classes in contrast to conventional complex classification schemes and meets the uniqueness and completeness criteria of semantic relation identification. We conduct experiments on our system by training four annotators in 1 h to label 200 sentences extracted from Sinica Treebank and HIT-CDTB. With the help of our predesigned feature-based decision tree and a connective markers checklist, the annotators attain a 73% consistency with the reference standard annotation and substantial agreement by Cohen’s kappa coefficient for middle-level labeling. By analyzing the labeling error types, we slightly revise our classification scheme and propose six methods to improve the classification and labeling system, hoping to achieve even better agreement in the future.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1186%2Fs40655-017-0028-1.pdf

Semantic relation identification for consecutive predicative constituents in Chinese

Huang et al. Lingua Sinica Semantic relation identification for consecutive predicative constituents in Chinese Shu-Ling Huang 1 Keh-Jiann Chen 0 Wei-Yun Ma 0 Su-Chu Lin 0 Yu-Ming Hsieh 0 0 Institute of Information Science , Academia Sinica , Taiwan 1 Research Center for Translation, Compilation and Language Education, National Academy for Educational Research , Taiwan In this paper, we propose a general methodology for designing semantic role/ relation system. Based on this methodology, we establish a succinct semantic relation system for consecutive predicative constituents for Chinese, which includes serial verb construction, discourse construction, and other constructions describing serial events. This semantic relation system has 13 middle-level classes and 24 fine-grained sub-classes in contrast to conventional complex classification schemes and meets the uniqueness and completeness criteria of semantic relation identification. We conduct experiments on our system by training four annotators in 1 h to label 200 sentences extracted from Sinica Treebank and HIT-CDTB. With the help of our predesigned feature-based decision tree and a connective markers checklist, the annotators attain a 73% consistency with the reference standard annotation and substantial agreement by Cohen's kappa coefficient for middle-level labeling. By analyzing the labeling error types, we slightly revise our classification scheme and propose six methods to improve the classification and labeling system, hoping to achieve even better agreement in the future. Semantic relation identification; Semantic roles; Feature-based semantic relation system; Serial verb construction; Discourse construction; Discourse relation recognition 1 Introduction Essential to natural language understanding are the processes of part-of-speech tagging, parsing, and semantic relation identification. In this paper, our objective is to clarify the relations between consecutive predicative constituents (abbreviated CPCs), which include serial verb construction (abbreviated SVC), discourse construction, and other constructions describing serial events in Chinese text and to find a good and workable semantic relation system for semantic role/relation labeling tasks. As a consequence, a methodology of semantic role/relation design methodology was also established. Chinese has various constructions to juxtapose two predicative constituents, such as compounding, coordinate constructions, serial verb constructions, and discourse constructions. Each CPC may be with/without an overt syntactic marking of the semantic relation between the described events, for example, 戰敗投降 zhànbài tóu xiáng 'defeated and surrendered'. Whereas in English the conjunction and is used to mark a simple coordination or temporal succession between VPs, in Chinese, the two VP constituents are simply adjoined. CPCs may occur in a simple sentence, as shown in (1a) and (1b), and are termed serial verb construction (Aikhenvald 2006; Lin et al. 2012; Tao 2009) ; that which occurs with coherent sequences of sentences as given in (1c) is called discourse construction (Hovy and Maier 1992; Prasad et al. 2008; Wolf and Gibson 2005) . (1) a. 大人們趕著上山(V1)打虎(V2) means-purpose dàrénmen__gǎnzhe__shàngshān__dǎhǔ the-adults__hurried__go-uphill__hunt-the-tiger The adults hurried to go uphill (V1) to hunt the tiger (V2). b. 一大早上山(V1)累壞(V2)學生了 cause-result yīdàzǎo__shàngshān__lèihuài__xuésheng__le early-in-the-morning__go-up-hill__tire-out__student__LE It tired the students out (V2) to go uphill (V1) early in the morning. c. 如遇下雪(V1), 一般車輛避免(V2)上山, 以免發生(V3)危險 condition-result between V1 and V2; event-avoidance between V2 and V3 rú__yù__xiàxuě__yībān__chēliàng__bìmiǎn__shàngshān,__yǐmiǎn__ fāshēng__wéixiǎn if__encounter__snow__ordinary__vehicle__avoid__go-up-hill,__lest__ occur__danger If it snows (V1), it is better to avoid (V2) driving uphill lest danger occurs (V3). Most studies discuss different constructions separately. However, when studying semantic relations between CPCs in different constructions, it is not necessary to regard them as distinct phenomena. Zhou and Xue (2012) described four characteristics which blur the boundary between discourse construction and serial verb construction. They are as follows: (i) semi-colon is not always used to separate the sentences; (ii) in most of the cases, no explicit discourse connectives are used to denote the discourse relations; (iii) no inflectional clues to differentiate free adjuncts and main clauses; and (iv) both subject and object can be dropped in Chinese. For instance in (2), there are no essential reasons we need to separate (a) and (b) into different categories of discourse construction and serial verb construction. (2) a. 她丈夫車禍過世了。 SVC tā__zhàngfu__chēhuò__guòshì__le she__husband__car-accident__died Her husband died in a car accident. b. 她丈夫車禍, 所以過世了。 Discourse Construction tā__zhàngfu__chēhuò,__suǒyǐ__guòshì__le she__husband__car-acci (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1186%2Fs40655-017-0028-1.pdf

Shu-Ling Huang, Keh-Jiann Chen, Wei-Yun Ma, Su-Chu Lin, Yu-Ming Hsieh. Semantic relation identification for consecutive predicative constituents in Chinese, Lingua Sinica, 2017, pp. 9, Volume 3, Issue 1, DOI: 10.1186/s40655-017-0028-1