FPGN: follower prediction framework for infectious disease prevention (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s11280-023-01205-8.pdf

FPGN: follower prediction framework for infectious disease prevention

World Wide Web https://doi.org/10.1007/s11280-023-01205-8 FPGN: follower prediction framework for infectious disease prevention Jianke Yu1 · Xianhang Zhang2 · Hanchen Wang1 · Xiaoyang Wang2 · Wenjie Zhang2 · Ying Zhang1,3 Received: 25 June 2023 / Revised: 15 August 2023 / Accepted: 26 August 2023 © The Author(s) 2023 Abstract In recent years, how to prevent the widespread transmission of infectious diseases in communities has been a research hot spot. Tracing close contact with infected individuals is one of the most severe problems. In this work, we present a model called Follower Prediction Graph Network (FPGN) to identify high-risk visitors, which is known as follower prediction. The model is designed to identify visitors who may be infected with a disease by tracking their activities at the exact location of infected visitors. FPGN is inspired by the state-of-the-art temporal graph edge prediction algorithm TGN and draws on the shortcomings of existing algorithms. It utilizes graph structure information based on (α, β)-core, time interval statistics by using the statistics of timestamp information, and a GAT-based prediction module to achieve high accuracy in follower prediction. Extensive experiments are conducted on two real datasets, demonstrating the progress of FPGN. The experimental results show that FPGN can achieve the highest results compared with other SOTA baselines. Its AP scores are higher than 0.46, and its AUC scores are higher than 0.62. Keywords Follower prediction · Temporal bipartite graphs · Graph neural networks B Hanchen Wang Jianke Yu Xianhang Zhang Xiaoyang Wang Wenjie Zhang Ying Zhang 1 University of Technology Sydney, Sydney, Australia 2 University of New South Wales, Sydney, Australia 3 Zhejiang Gongshang University, Hangzhou, China 123 World Wide Web 1 Introduction In recent years, the outbreak of infectious diseases has caused great harm to human health and social stability. To prevent and control the spread of infectious diseases, it is necessary to predict and control the high-risk population accurately. In this regard, graph analysis has become an essential tool for epidemiologists [14, 32]. Temporal bipartite graphs represent the relationships between two sets of vertices while tracking their evolution over time. As a result, this structure is particularly useful in analyzing infectious disease transmission patterns. By analyzing these temporal bipartite graphs, we can identify gaps and clusters that may indicate potential transmission. This paper focuses on identifying high-risk visitors who may be infected with a disease by tracking their activities at the same location as infected visitors. The identification allows public health authorities to take proactive measures to prevent the spread of infectious diseases. The objective is to predict the likelihood that two visitors (i.e., a follower and a leader) will visit the same location within a specific time window. This challenge is called the follower prediction problem. Example An example of the following behavior is shown in Figure 1. We assume that the time window to identify the following behavior in this figure is half an hour, i.e., if two visitors visit the same venue within half an hour, the second visitor is identified as the follower of the first visitor. We also assume that Bob (u 1 ) is an infected visitor. We can see that Bob went to the cafe (v1 ) at 8 o’clock in the morning, and another visitor Sam (u 2 ) arrived within ten minutes after Bob’s arrival. Therefore, we consider Sam to be one of Bob’s followers. Although Lisa (u 3 ) and Bob also arrived at Gym(v2 ) within ten minutes of each other, Lisa arrived at the venue before Bob did. Jack(u 4 ) arrived at the restaurant(v4 ) three hours after Bob did. Therefore, neither Lisa nor Jack is a follower of Bob. 8:00 10:00 9:50 8:10 13:00 Figure 1 An Example of Following Behavior 123 20:00 17:00 World Wide Web Challenge The difficulty in solving this problem lies in two aspects: on the one hand, due to the special nature of the bipartite graph structure, there will be no connections between vertices of the same type in the dataset, while the goal of follower prediction is precisely to predict connections between vertices of the same type. On the other hand, the conditions for constituting the following behavior are strict. They require simultaneous consideration of both cases where two vertices are connected to the same vertex and their time difference for connecting. To the best of our knowledge, there is no existing method proposed to solve the follower prediction problem. One possible solution is to use a high-performence GNN model [4, 15, 19, 42, 49] trained on datasets for this problem. However, these generic models do not effectively utilize timestamp information and have limited accuracy. A related problem of the follower prediction problem is edge prediction in temporal graphs, which has been addressed by various algorithms [3, 20, 33, 40, 47]. These methods accurately predict edges on temporal graphs but either rely solely on timestamps as sampling criteria or only learn individual activity information for each vertex. Consequently, models designed for other purposes may struggle with this specific follower prediction problem and even perform worse than general GNN models due to being misled by their original goals. In order to better solve the problem of follower prediction, inspired by the state-of-theart temporal graph edge prediction algorithm TGN [33] and drawing on the shortcomings of existing algorithms, we propose a model called Follower Prediction Graph Network (FPGN). TGN significantly contributes to preserving temporal information by learning the memory of vertices. Therefore, the vertex memory obtained by TGN can help us solve the problem of predicting connections between same-type vertices on bipartite graphs. However, TGN cannot be directly applied to this problem. When TGN updates the memory of a vertex, it refers to its previous memories, allowing for sampling train sets based on chronological order. In other words, TGN updates vertex memories while sampling based on downstream task requirements. On the other hand, to predict followers accurately, it is necessary to anticipate the connections between same-type vertices in the bipartite graph that have been updated after all timestamps in the training set. This makes it impossible for TGN to be directly used for solving follower problems. Motivated by TGN and learned from his shortcomings, we design our model by optimizing the structure of TGN to let it update memories of all vertices beforehand and then train according to downstream needs. Besides, FPGN can also consider the correlation information between vertex activities to enhance the model’s ability to solve follower prediction problems. Specifically, in addition to learning which vertices will be followers of their leaders, FPGN will further analyze the time (...truncated)