KC-GEE: knowledge-based conditioning for generative event extraction (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s11280-023-01216-5.pdf

KC-GEE: knowledge-based conditioning for generative event extraction

World Wide Web https://doi.org/10.1007/s11280-023-01216-5 KC-GEE: knowledge-based conditioning for generative event extraction Tongtong Wu1,2 · Fatemeh Shiri2 · Jingqi Kang1 · Guilin Qi1 · Gholamreza Haﬀari2 · Yuan-Fang Li2 Received: 21 October 2022 / Revised: 4 September 2023 / Accepted: 1 October 2023 © The Author(s) 2023 Abstract Event extraction is an important, but challenging task. Many existing techniques decompose it into event and argument detection/classification subtasks, which are complex structured prediction problems. Generation-based extraction techniques lessen the complexity of the problem formulation and are able to leverage the reasoning capabilities of large pretrained language models. However, they still suffer from poor zero-shot generalizability and are ineffective in handling long contexts such as documents. We propose a generative event extraction model, KC-GEE, that addresses these limitations. A key contribution of KC-GEE is a novel knowledge-based conditioning technique that injects the schema of candidate event types as the prefix into each layer of an encoder-decoder language model. This enables effective zero-shot learning and improves supervised learning. Our experiments on two benchmark datasets demonstrate the strong performance of our KC-GEE model. It achieves particularly strong results in the challenging document-level extraction task and in the zero-shot learning setting, outperforming state-of-the-art models by up to 5.4 absolute F1 points. Keywords Event extraction · Information extraction · Zero-shot learning · Document-level event extraction 1 Introduction Event extraction [1] aims at extracting structured event records from unstructured text. For example, as shown in Figure 1, the goal of event extraction is to map the document “Two homemade pressure-cooker bombs are detonated remotely by the Tsarnaevs near the finish line of the Boston Marathon, killing three and injuring some 260 others. Seventeen people lost limbs.” to four predefined event types (highlighted with celeste), such as <event type: Attack, trigger word: detonated, role:Attacker: Tsarnaevs, . . . , role:ExplosiveDevice: bombs, role:Place: Boston Marathon>, as well as other events that are triggered by words killing and injuring. B Yuan-Fang Li Extended author information available on the last page of the article 123 World Wide Web Figure 1 The event extraction task. In each event schema, we delineate the event type along with its associated roles. For instance, within the "Attack" event schema, roles such as "Attacker," "ExplosiveDevice," and "Place" are encompassed Event extraction is challenging due to the diversity of natural language expressions and the complexity of event structures. These challenges are amplified in document-level event extraction where the text is a full document and typically contains more events. Currently, most event extraction methods employ a decomposition-based approach [2], which involves breaking down the structured prediction problem of a complex event into classifications of substructures like trigger detection, entity recognition, and argument classification. Many of these methods tackle the subproblems separately, which necessitates additional annotations for each stage [3]. Natural language generation techniques have been successfully applied to a number of NLP tasks [4–6]. These techniques have inspired the use of controlled event generation to tackle event extraction. These approaches use manually designed templates to wrap input sentences and train a model for cloze-style filling. The study by [7] proposes generating linearised event records via a pretrained encoder-decoder architecture combined with a constrained decoding mechanism that alleviates the complexity associated with template combination when extracting multiple events. The advantage of the extraction-as-generation approach is the removal of the need for fine-grained token-level annotations, which are typically used in previous event extraction approaches [8], thus enjoying greater feasibility. Although good generalizability has been achieved for other tasks, we have observed a significant decrease in performance when it comes to generation-based event extraction over documents or unseen event types. Structured prediction tasks, such as event extraction, often rely on an external schema to format the output, whereas natural language generation tasks do not. To bridge this gap, we introduce a novel technique called knowledge-based conditioning. This approach involves injecting event type information as prefixes on different layers of the underlying pretrained language model. By incorporating this information, we aim to improve the performance of event extraction tasks. Additionally, to address the challenge of adapting to new scenarios, we consider event extraction from the perspective of zero-shot learning [9, 10]. Our model, KC-GEE, is capable of document-level event extraction and is generalizable to the zero-shot setting. Our main contributions are as follows. • We propose a novel knowledge-based conditioning technique that injects event type information into the model, enabling zero-shot learning capability. 123 World Wide Web • We carefully design a prefix-based injection mechanism that incorporates cross-attention to improve document-level event extraction. • We conducted extensive experiments on two benchmark datasets, in both fully supervised and zero-shot settings. Our evaluation consistently shows strong performance across all settings. In particular, our model achieves substantial superiority in the challenging settings of document-level event extraction and zero-shot transfer, outperforming stateof-the-art models by up to 5.4 absolute F1 points. 2 Related work Document-level event extraction Event extraction is a task that extracts structured event records from unstructured text [5]. Many approaches have been proposed for sentencelevel event extraction [11, 12], ranging from hand-designed features [13] and neural-learned features [14, 15]. Yet, many real-world applications require document-level event extraction [14–18], in which the information of an event may be mentioned in multiple sentences [19, 20]. Moreover, most work adopt decomposition strategies in event extraction [2], which employ trigger detection [13], entity recognition [21, 22], and argument classification [23]. These decomposition strategies have shown high performance while introducing more detailed annotation to model training [5, 7]. Zero-shot event extraction Several previous supervised event extraction methods have relied on features derived from manual annotations, limiting their applicability to new event types without additional annotation effort [9, 24, 25]. These methods often struggle to effectively generalize to new label taxonomies and domains. In contrast, [26] proposes a zero-shot event extraction approach. They first utilize existing tools, such as (...truncated)