FPGN: follower prediction framework for infectious disease prevention
World Wide Web
https://doi.org/10.1007/s11280-023-01205-8
FPGN: follower prediction framework for infectious
disease prevention
Jianke Yu1 · Xianhang Zhang2 · Hanchen Wang1 · Xiaoyang Wang2 ·
Wenjie Zhang2 · Ying Zhang1,3
Received: 25 June 2023 / Revised: 15 August 2023 / Accepted: 26 August 2023
© The Author(s) 2023
Abstract
In recent years, how to prevent the widespread transmission of infectious diseases in communities has been a research hot spot. Tracing close contact with infected individuals is one of
the most severe problems. In this work, we present a model called Follower Prediction Graph
Network (FPGN) to identify high-risk visitors, which is known as follower prediction. The
model is designed to identify visitors who may be infected with a disease by tracking their
activities at the exact location of infected visitors. FPGN is inspired by the state-of-the-art
temporal graph edge prediction algorithm TGN and draws on the shortcomings of existing
algorithms. It utilizes graph structure information based on (α, β)-core, time interval statistics by using the statistics of timestamp information, and a GAT-based prediction module
to achieve high accuracy in follower prediction. Extensive experiments are conducted on
two real datasets, demonstrating the progress of FPGN. The experimental results show that
FPGN can achieve the highest results compared with other SOTA baselines. Its AP scores
are higher than 0.46, and its AUC scores are higher than 0.62.
Keywords Follower prediction · Temporal bipartite graphs · Graph neural networks
B Hanchen Wang
Jianke Yu
Xianhang Zhang
Xiaoyang Wang
Wenjie Zhang
Ying Zhang
1
University of Technology Sydney, Sydney, Australia
2
University of New South Wales, Sydney, Australia
3
Zhejiang Gongshang University, Hangzhou, China
123
World Wide Web
1 Introduction
In recent years, the outbreak of infectious diseases has caused great harm to human health
and social stability. To prevent and control the spread of infectious diseases, it is necessary
to predict and control the high-risk population accurately. In this regard, graph analysis has
become an essential tool for epidemiologists [14, 32]. Temporal bipartite graphs represent the
relationships between two sets of vertices while tracking their evolution over time. As a result,
this structure is particularly useful in analyzing infectious disease transmission patterns. By
analyzing these temporal bipartite graphs, we can identify gaps and clusters that may indicate
potential transmission.
This paper focuses on identifying high-risk visitors who may be infected with a disease by
tracking their activities at the same location as infected visitors. The identification allows public health authorities to take proactive measures to prevent the spread of infectious diseases.
The objective is to predict the likelihood that two visitors (i.e., a follower and a leader) will
visit the same location within a specific time window. This challenge is called the follower
prediction problem.
Example An example of the following behavior is shown in Figure 1. We assume that the
time window to identify the following behavior in this figure is half an hour, i.e., if two visitors
visit the same venue within half an hour, the second visitor is identified as the follower of
the first visitor. We also assume that Bob (u 1 ) is an infected visitor. We can see that Bob
went to the cafe (v1 ) at 8 o’clock in the morning, and another visitor Sam (u 2 ) arrived within
ten minutes after Bob’s arrival. Therefore, we consider Sam to be one of Bob’s followers.
Although Lisa (u 3 ) and Bob also arrived at Gym(v2 ) within ten minutes of each other, Lisa
arrived at the venue before Bob did. Jack(u 4 ) arrived at the restaurant(v4 ) three hours after
Bob did. Therefore, neither Lisa nor Jack is a follower of Bob.
8:00 10:00
9:50
8:10
13:00
Figure 1 An Example of Following Behavior
123
20:00
17:00
World Wide Web
Challenge The difficulty in solving this problem lies in two aspects: on the one hand, due
to the special nature of the bipartite graph structure, there will be no connections between
vertices of the same type in the dataset, while the goal of follower prediction is precisely
to predict connections between vertices of the same type. On the other hand, the conditions
for constituting the following behavior are strict. They require simultaneous consideration
of both cases where two vertices are connected to the same vertex and their time difference
for connecting.
To the best of our knowledge, there is no existing method proposed to solve the follower
prediction problem. One possible solution is to use a high-performence GNN model [4, 15, 19,
42, 49] trained on datasets for this problem. However, these generic models do not effectively
utilize timestamp information and have limited accuracy. A related problem of the follower
prediction problem is edge prediction in temporal graphs, which has been addressed by
various algorithms [3, 20, 33, 40, 47]. These methods accurately predict edges on temporal
graphs but either rely solely on timestamps as sampling criteria or only learn individual
activity information for each vertex. Consequently, models designed for other purposes may
struggle with this specific follower prediction problem and even perform worse than general
GNN models due to being misled by their original goals.
In order to better solve the problem of follower prediction, inspired by the state-of-theart temporal graph edge prediction algorithm TGN [33] and drawing on the shortcomings of
existing algorithms, we propose a model called Follower Prediction Graph Network (FPGN).
TGN significantly contributes to preserving temporal information by learning the memory
of vertices. Therefore, the vertex memory obtained by TGN can help us solve the problem
of predicting connections between same-type vertices on bipartite graphs. However, TGN
cannot be directly applied to this problem. When TGN updates the memory of a vertex, it
refers to its previous memories, allowing for sampling train sets based on chronological order.
In other words, TGN updates vertex memories while sampling based on downstream task
requirements. On the other hand, to predict followers accurately, it is necessary to anticipate
the connections between same-type vertices in the bipartite graph that have been updated
after all timestamps in the training set. This makes it impossible for TGN to be directly used
for solving follower problems. Motivated by TGN and learned from his shortcomings, we
design our model by optimizing the structure of TGN to let it update memories of all vertices
beforehand and then train according to downstream needs.
Besides, FPGN can also consider the correlation information between vertex activities to
enhance the model’s ability to solve follower prediction problems. Specifically, in addition to
learning which vertices will be followers of their leaders, FPGN will further analyze the time (...truncated)