Entity alignment via graph neural networks: a component-level study
World Wide Web
https://doi.org/10.1007/s11280-023-01221-8
Entity alignment via graph neural networks: a
component-level study
Yanfeng Shu1 · Ji Zhang2 · Guangyan Huang3 · Chi-Hung Chi4 · Jing He5
Received: 10 July 2023 / Revised: 3 November 2023 / Accepted: 15 November 2023
© The Author(s) 2023
Abstract
Entity alignment plays an essential role in the integration of knowledge graphs (KGs) as it
seeks to identify entities that refer to the same real-world objects across different KGs. Recent
research has primarily centred on embedding-based approaches. Among these approaches,
there is a growing interest in graph neural networks (GNNs) due to their ability to capture
complex relationships and incorporate node attributes within KGs. Despite the presence of
several surveys in this area, they often lack comprehensive investigations specifically targeting GNN-based approaches. Moreover, they tend to evaluate overall performance without
analysing the impact of individual components and methods. To bridge these gaps, this paper
presents a framework for GNN-based entity alignment that captures the key characteristics of these approaches. We conduct a fine-grained analysis of individual components and
assess their influences on alignment results. Our findings highlight specific module options
that significantly affect the alignment outcomes. By carefully selecting suitable methods for
combination, even basic GNN networks can achieve competitive alignment results.
Keywords Knowledge graph · Entity alignment · Graph neural network ·
Experimental study
B Yanfeng Shu
Ji Zhang
Guangyan Huang
Chi-Hung Chi
Jing He
1
Data61, CSIRO, Hobart, Australia
2
University of Southern Queensland, Brisbane, Australia
3
Deakin University, Melbourne, Australia
4
Nanyang Technological University, Singapore, Singapore
5
Oxford University, Oxford, England
123
World Wide Web
1 Introduction
Knowledge graphs (KGs) serve as structured, graph-based representations of knowledge,
capturing real-world entities, their attributes, and the relationships between them. They
are indispensable tools, facilitating sophisticated data analysis, inference, and decisionmaking processes. KGs come in various forms, including general KGs like DBpedia [1]
and YAGO [2], as well as domain-specific KGs like BioKG [3] and FoodKG [4], catering
to a wide range of applications. However, a common challenge with standalone KGs is their
incompleteness, lacking comprehensive domain coverage. To overcome this limitation, KG
integration becomes essential. By combining KGs from diverse sources, integration enables
the presentation of different perspectives and complementary information. One crucial step in
KG integration is entity alignment, which involves identifying entities across KGs that refer
to the same real-world objects. Aligning entities allows the development of advanced applications that offer a holistic view of information, enhancing the quality of knowledge-based
systems.
Recent research in entity alignment has primarily focused on embedding-based approaches.
These approaches represent entities as low-dimensional vectors, capturing semantic relatedness by computing distances in the vector space. Among them, graph neural networks
(GNNs) [5, 6] have gained popularity for embedding learning. GNNs effectively learn node
representations by aggregating information from neighboring nodes recursively. The underlying assumption behind using GNNs for entity alignment is that similar entities tend to
have similar neighborhoods, as supported by the expressiveness of GNNs in identifying isomorphic subgraphs, akin to the Weisfeiler-Lehman (WL) algorithms [7]. Moreover, GNNs
naturally excel at handling complex graph structures and incorporating node attributes, making them promising for entity alignment tasks. However, the introduction of GNNs into entity
alignment has led to more intricate embedding architectures, complicating the interpretation
of an approach’s effectiveness as it becomes hard to discern whether the effectiveness is due
to the embedding itself or other components of the alignment process.
Despite several surveys on embedding-based entity alignment approaches [8–11], they
often fail to specifically examine GNN-based approaches, overlooking key characteristics
of GNNs that are crucial for entity alignment. Additionally, while these surveys assess the
overall effectiveness of the approaches, they typically overlook the impact of individual
components and methods on performance. To fill this gap, our work offers a fine-grained
analysis of individual components and their impacts. We contribute to the field by providing:
• A general framework that encompasses the fundamenal components of GNN-based entity
alignment approaches, along with a categorisation of these approaches based on the key
characteristics associated with these components.
• A comprehensive component-level experimental study conducted on representative
datasets, evaluating the impact of different components and their combinations on the
overall performance.
Our analysis reveals that certain module options have a significant impact on performance,
such as combining entity name initialisation with skip connections for embedding and
employing iterative training with CSLS as the enhanced distance metric. We demonstrate
that, by selecting suitable methods for combination, even basic GNN networks can achieve
competitive results. This study provides valuable insights into the design and optimisation of
GNN-based approaches for entity alignment, advancing the understanding and applicability
of these methods in knowledge graph integration tasks.
123
World Wide Web
The rest of the paper is organised as follows. Section 2 provides preliminaries, including
problem definition and a summary of related work. Section 3 presents a general framework for
GNN-based entity alignment approaches. Section 4 discusses the importance of componentlevel analysis and Section 5 reports analysis results. Finally, Section 6 concludes the paper.
2 Preliminaries
2.1 Problem definition
We define a KG as G = (E , R, A, V , T ), where E , R, A, V and T are sets of entities,
relations, attributes, values, and triples respectively. T consists of relation triples T r and
attribute triples T a , where T r ⊆ E × R × E , and T a ⊆ E × A × V . Given two KGs,
G1 = (E1 , R1 , A1 , V1 , T1 ) and G2 = (E2 , R2 , A2 , V2 , T2 ), the goal of entity alignment is
to find aligned entities = {(e1 , e2 )|e1 ∈ E1 , e2 ∈ E2 }, where e1 and e2 refer to the same
real-world object. In many cases, a small subset of , i.e., pre-aligned entities, is provided
and used as training data for finding new alignments.
2.2 Related work
GNNs Many learning tasks involve complex relationships and dependencies within graph
data, which cannot be effectively handled by standard neural networks like convolutional
neural networks (CNNs) [12] and recurrent neural networks (RNNs) [13]. These networks
are speci (...truncated)