Entity alignment via graph neural networks: a component-level study (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s11280-023-01221-8.pdf

Entity alignment via graph neural networks: a component-level study

World Wide Web https://doi.org/10.1007/s11280-023-01221-8 Entity alignment via graph neural networks: a component-level study Yanfeng Shu1 · Ji Zhang2 · Guangyan Huang3 · Chi-Hung Chi4 · Jing He5 Received: 10 July 2023 / Revised: 3 November 2023 / Accepted: 15 November 2023 © The Author(s) 2023 Abstract Entity alignment plays an essential role in the integration of knowledge graphs (KGs) as it seeks to identify entities that refer to the same real-world objects across different KGs. Recent research has primarily centred on embedding-based approaches. Among these approaches, there is a growing interest in graph neural networks (GNNs) due to their ability to capture complex relationships and incorporate node attributes within KGs. Despite the presence of several surveys in this area, they often lack comprehensive investigations specifically targeting GNN-based approaches. Moreover, they tend to evaluate overall performance without analysing the impact of individual components and methods. To bridge these gaps, this paper presents a framework for GNN-based entity alignment that captures the key characteristics of these approaches. We conduct a fine-grained analysis of individual components and assess their influences on alignment results. Our findings highlight specific module options that significantly affect the alignment outcomes. By carefully selecting suitable methods for combination, even basic GNN networks can achieve competitive alignment results. Keywords Knowledge graph · Entity alignment · Graph neural network · Experimental study B Yanfeng Shu Ji Zhang Guangyan Huang Chi-Hung Chi Jing He 1 Data61, CSIRO, Hobart, Australia 2 University of Southern Queensland, Brisbane, Australia 3 Deakin University, Melbourne, Australia 4 Nanyang Technological University, Singapore, Singapore 5 Oxford University, Oxford, England 123 World Wide Web 1 Introduction Knowledge graphs (KGs) serve as structured, graph-based representations of knowledge, capturing real-world entities, their attributes, and the relationships between them. They are indispensable tools, facilitating sophisticated data analysis, inference, and decisionmaking processes. KGs come in various forms, including general KGs like DBpedia [1] and YAGO [2], as well as domain-specific KGs like BioKG [3] and FoodKG [4], catering to a wide range of applications. However, a common challenge with standalone KGs is their incompleteness, lacking comprehensive domain coverage. To overcome this limitation, KG integration becomes essential. By combining KGs from diverse sources, integration enables the presentation of different perspectives and complementary information. One crucial step in KG integration is entity alignment, which involves identifying entities across KGs that refer to the same real-world objects. Aligning entities allows the development of advanced applications that offer a holistic view of information, enhancing the quality of knowledge-based systems. Recent research in entity alignment has primarily focused on embedding-based approaches. These approaches represent entities as low-dimensional vectors, capturing semantic relatedness by computing distances in the vector space. Among them, graph neural networks (GNNs) [5, 6] have gained popularity for embedding learning. GNNs effectively learn node representations by aggregating information from neighboring nodes recursively. The underlying assumption behind using GNNs for entity alignment is that similar entities tend to have similar neighborhoods, as supported by the expressiveness of GNNs in identifying isomorphic subgraphs, akin to the Weisfeiler-Lehman (WL) algorithms [7]. Moreover, GNNs naturally excel at handling complex graph structures and incorporating node attributes, making them promising for entity alignment tasks. However, the introduction of GNNs into entity alignment has led to more intricate embedding architectures, complicating the interpretation of an approach’s effectiveness as it becomes hard to discern whether the effectiveness is due to the embedding itself or other components of the alignment process. Despite several surveys on embedding-based entity alignment approaches [8–11], they often fail to specifically examine GNN-based approaches, overlooking key characteristics of GNNs that are crucial for entity alignment. Additionally, while these surveys assess the overall effectiveness of the approaches, they typically overlook the impact of individual components and methods on performance. To fill this gap, our work offers a fine-grained analysis of individual components and their impacts. We contribute to the field by providing: • A general framework that encompasses the fundamenal components of GNN-based entity alignment approaches, along with a categorisation of these approaches based on the key characteristics associated with these components. • A comprehensive component-level experimental study conducted on representative datasets, evaluating the impact of different components and their combinations on the overall performance. Our analysis reveals that certain module options have a significant impact on performance, such as combining entity name initialisation with skip connections for embedding and employing iterative training with CSLS as the enhanced distance metric. We demonstrate that, by selecting suitable methods for combination, even basic GNN networks can achieve competitive results. This study provides valuable insights into the design and optimisation of GNN-based approaches for entity alignment, advancing the understanding and applicability of these methods in knowledge graph integration tasks. 123 World Wide Web The rest of the paper is organised as follows. Section 2 provides preliminaries, including problem definition and a summary of related work. Section 3 presents a general framework for GNN-based entity alignment approaches. Section 4 discusses the importance of componentlevel analysis and Section 5 reports analysis results. Finally, Section 6 concludes the paper. 2 Preliminaries 2.1 Problem definition We define a KG as G = (E , R, A, V , T ), where E , R, A, V and T are sets of entities, relations, attributes, values, and triples respectively. T consists of relation triples T r and attribute triples T a , where T r ⊆ E × R × E , and T a ⊆ E × A × V . Given two KGs, G1 = (E1 , R1 , A1 , V1 , T1 ) and G2 = (E2 , R2 , A2 , V2 , T2 ), the goal of entity alignment is to find aligned entities = {(e1 , e2 )|e1 ∈ E1 , e2 ∈ E2 }, where e1 and e2 refer to the same real-world object. In many cases, a small subset of , i.e., pre-aligned entities, is provided and used as training data for finding new alignments. 2.2 Related work GNNs Many learning tasks involve complex relationships and dependencies within graph data, which cannot be effectively handled by standard neural networks like convolutional neural networks (CNNs) [12] and recurrent neural networks (RNNs) [13]. These networks are speci (...truncated)