Causal integration in graph neural networks toward enhanced classification: benchmarking and advancements for robust performance
World Wide Web
(2025) 28:30
https://doi.org/10.1007/s11280-025-01343-1
Causal integration in graph neural networks toward
enhanced classification: benchmarking and advancements
for robust performance
Simi Job1 · Xiaohui Tao1 · Taotao Cai1 · Lin Li2 · Quan Z. Sheng3 · Haoran Xie4 ·
Jianming Yong5
Received: 3 October 2024 / Revised: 3 February 2025 / Accepted: 27 March 2025
© The Author(s) 2025
Abstract
The expansion of Graph Neural Networks (GNNs) has highlighted the importance of evaluating their performance in real-world scenarios. However, existing evaluation frameworks
often overlook the integration of causality, a critical component that is essential for more
robust evaluation of GNNs. To address this gap, we present a benchmark study that systematically compares standard and causal GNN models with a focus on classification tasks. Our
analysis encompasses a careful selection of nine GNN models across seven diverse datasets
that span three distinct domains. The results reveal the following: I) Causality-enhanced
GNNs consistently outperform their traditional counterparts in graph classification tasks;
II) Models integrating causal features exhibit greater generalizability across varied datasets;
and III) Incorporation of causal elements significantly improves the predictive accuracy of
GNNs. These findings highlight the importance of embedding causality in the evaluation and
development of GNNs for improved performance and application.
Keywords Graph neural networks · GCN · GAT · Causality · Graph classification ·
GraphSAGE
1 Introduction
Graph Neural Networks (GNNs) have emerged as a powerful tool for processing graphstructured data, demonstrating remarkable performance in various tasks such as node
B
Simi Job
1
School of Mathematics, Physics, and Computing, University of Southern Queensland, Toowoomba,
Australia
2
School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan,
China
3
School of Computing, Macquarie University, Sydney, Australia
4
School of Data Science, Lingnan University, Hong Kong Special Administrative Region, Hong
Kong, China
5
School of Business, University of Southern Queensland, Springfield, Australia
0123456789().: V,-vol
123
30
Page 2 of 26
World Wide Web
(2025) 28:30
classification [1], link prediction [2] and graph classification [1–3]. GNNs have found applications in various domains including recommendation [4], urban intelligence [5], medicine
[6], community detection [7], fraud detection [8] and so on. Despite their success, GNNs face
several limitations including over-smoothing, interpretability and generalizability problems,
sensitivity to graph structure and limited ability in capturing long-range dependencies.
Causality, the understanding of cause and effect relationships among various factors,
extends beyond mere correlations. It focuses on comprehending the interactions between
elements that result in specific outcomes and explores how changes in one aspect can impact
another element within a system. Recently, there has been an increased focus on exploring
causality, with researchers acknowledging the importance of incorporating causal knowledge
into data modelling. Causality has found numerous applications in several domains including economics [9], social sciences [10], medicine [11] and healthcare [12], environmental
science [13], recommendation [14, 15] etc. For instance, in medicine, causality can explore
factors that impact treatment outcomes and those that increase the risk of medical conditions.
In social sciences, it can reveal the causal factors that contribute to economic inequalities. In
recommendation systems, causality can uncover factors that influence user preferences and
engagement. In these contexts, causal analysis can uncover factors that influence outcomes,
improve model interpretability and enhance predictive accuracy. Integrating causality into
GNN architecture can significantly mitigate the aforementioned limitations by prioritizing
relevant information, capturing long-range dependencies, and promoting the extraction of
transferable features, thereby improving generalizability. By examining inherent causal relationships within the data, it becomes possible to enhance GNN performance and application.
In this study, we aim to thoroughly investigate the application of graph neural networks
for classification tasks and demonstrate the significance of causally enabled GNNs in identifying true interactions within data. Few studies have systematically benchmarked GNNs with
a focus on causal classification. Existing benchmark studies such as [16], which examined
graph positional encoding in GNNs, and [17], which explored the use of GNNs for fault
diagnosis, have provided foundational insights into these areas. Kosan et al. [18] conducted a
benchmark study that focused on GNN explainers, while [19] performed an extensive investigation into deep GNN architectures, experimenting with different model settings across
various citation network datasets. All of these studies primarily evaluate GNN performance
based on traditional metrics without integrating causal analysis. To address this gap, our
research aims to analyse the significance of causality in generalizable graph prediction models. Specifically, we conduct a comprehensive study on the most representative models that are
used in graph neural networks classification tasks, with the potential of incorporating causal
elements into the respective frameworks. This empirical study contributes to the research
community with the following interesting findings:
• The attention-based causal model (CAL framework) consistently outperformed baseline
GNN models in larger graph classification tasks, demonstrating its ability to capture
complex global patterns and dependencies across networks.
• Baseline GNN models excelled in smaller node classification tasks, highlighting their
efficiency in scenarios with limited data and simpler relationships.
• Hyperparameter tuning plays a crucial role in improving model performance, and our
research emphasizes the adaptability of causal models to multi-class datasets in graph
classification.
These findings highlight the importance of embedding causality in the evaluation and
development of GNNs for enhanced performance and application. The remainder of the
paper is structured as follows: Section 2 provides an overview of research studies centered
123
World Wide Web
(2025) 28:30
Page 3 of 26
30
on GNNs and causality. Section 3 outlines the study design. Section 4 presents the results
of the empirical study. Finally, Section 5 concludes the paper with a brief summary of our
findings.
2 Where GNN meets causality
This section reviews existing literature focusing on graph neural networks (GNNs) and their
variants, causality and their applications in classification tasks.
2.1 Graph neural networks
Graph Neural Networks (GNNs) are designed to process graph-structured data, wher (...truncated)