Efficient maintenance of highway cover labelling for distance queries on large dynamic graphs (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s11280-023-01146-2.pdf

Efficient maintenance of highway cover labelling for distance queries on large dynamic graphs

World Wide Web https://doi.org/10.1007/s11280-023-01146-2 Eﬃcient maintenance of highway cover labelling for distance queries on large dynamic graphs Muhammad Farhan1 · Qing Wang1 Received: 26 August 2022 / Revised: 14 December 2022 / Accepted: 29 January 2023 © The Author(s) 2023 Abstract Graphs in real-world applications are typically dynamic which undergo rapid changes in their topological structure over time by either adding or deleting edges or vertices. However, it is challenging to design algorithms capable of supporting updates efficiently on dynamic graphs. In this article, we devise a parallel fully dynamic labelling method to reflect rapid changes on graphs when answering shortest-path distance queries, a fundamental problem in graph theory. At its core, our solution accelerates query processing through a fully dynamic distance labelling of a limited size, which provides a good approximation to bound online searches on dynamic graphs. Our parallel fully dynamic labelling method leverages two sources of efficiency gains: landmark parallelism and anchor parallelism. Furthermore, it can handle both incremental and decremental updates efficiently using a unified search approach and a bounded repairing inference mechanism. We theoretically analyze the correctness, labelling minimality, and time complexity of our method, and also conduct extensive experiments to empirically verify its efficiency and scalability on 10 real-world large networks. Keywords Graph algorithms · Highway cover · Shortest-path distance queries · Distance labelling · Dynamic graphs 1 Introduction Given a graph G, a distance query on G is to answer the distance between any two vertices in the graph G. As a fundamental primitive, distance queries are widely applied in modern network-oriented systems, such as communication networks, context-aware search This article belongs to the Topical Collection: Special Issue on Knowledge-Graph-Enabled Methods and Applications for the Future Web Guest Editors: Xin Wang, Jeff Pan, Qingpeng Zhang, Yuan-Fang Li Muhammad Farhan Qing Wang 1 School of Computing, Australian National University, Canberra, Australia World Wide Web Figure 1 Performance overview of our proposed method PARDHL and the state-of-the-art methods DECM [8], DECPLL [9], DECFD [10] and DECHL [11], where the update time is calculated by processing 1,000 edge deletions over complex networks with sizes varying from 20 millions of edges to 3 billions of edges in web graphs [1, 2], social network analysis [3, 4], route-planning in road networks [5, 6], management of resources in computer networks [7], and so on. Traditionally, a distance query can be answered using Dijkstra’s algorithm [12] on nonnegative weighted graphs or breadth-first search (BFS) algorithm on unweighted graphs. However, these algorithms may end up traversing the entire network when two vertices are far apart from each other, thus becoming too slow for applications that require low latency. To speed up query response time, a plethora of methods have been proposed in the past years [5, 10, 13–21]. Among these methods, precomputing a distance labelling is typically considered as a promising solution. However, most of existing distance labelling methods were designed for static networks. Networks in the real-world are typically dynamic which undergo rapid changes, i.e. edge additions/deletions in their topological structure over time. For example, people become friend/unfriend or follow/unfollow others in social networks, web links become valid/invalid in web graphs, and communication networks may have faults being detected and recovered [7, 22, 23]. It is imperative to design dynamic algorithms that can efficiently update distance labelling to reflect graph changes for fast and accurate responses to distance queries. So far, only limited attempts have been made on maintaining a distance labelling for dynamic graphs [8, 10, 24–26]. Among them, the methods considering incremental updates (i.e. edge additions) [10, 24, 25] are relatively efficient, e.g., an incremental update can be processed on graphs with billions of vertices in less than one second [25]. Unfortunately, the methods considering decremental updates still suffer from long update time of a distance labelling [8–10]. As shown in Figure 1, the average update time of one edge deletion on graphs of size around 20 million edges is 135 seconds for DECM [8] and 19 seconds for DECPLL [9], which are very inefficient. Moreover, these methods all consider the single-update setting, i.e., performing one single edge insertion or edge deletion at a time. Unlike existing works, in this article, we aim to explore the following research questions: Q1: Is it possible to design a dynamic labelling algorithm which can efficiently reflect both incremental and decremental updates on graphs for fast and accurate distance computation? World Wide Web Figure 2 An illustration of our parallel framework, which dynamically maintains highway cover labelling for graphs undergoing rapid updates Q2: Can such a dynamic labelling algorithm handle multiple updates in parallel in order to offer performance gains over existing dynamic labelling algorithms in the single-update setting? To answer these research questions, we propose a parallel solution for answering distance queries on dynamic graphs undergoing rapid changes in their topological structure. Our method is efficient both in time and space, and can scale to large graphs with billions of edges. There are several design considerations. First and foremost, we combine offline labelling and online searching in our proposed solution so as to leverage the advantages from both sides - accelerating query processing through a distance labelling that has a limited size but provides a good approximation to bound online searches. Then, we proceed to design a fully dynamic distance labelling algorithm, which dynamizes a distance labelling to efficiently reflect updates on the underlying graph. This algorithm consists of three stages: (1) Finding affected vertices - to precisely identify vertices that are affected by updates; (2) Finding boundary vertices - to bound the traversal space that is needed for repairing; (3) Repairing affected vertices - to change the labels of affected vertices via an inference process based on their new distances. Figure 2 illustrates the high-level overview of our solution. At its core, we abide by the following principles: Parallel searches: We exploit interactions between updates and design a novel parallel approach to find affected vertices, which involves both landmark parallelism and anchor parallelism. Bounded space: We bound search spaces for updates to only small portions of graphs that are affected, which are achieved by identifying boundary vertices with respect to updates. Repair inference: We develop a repairing approach that can efficiently infer the new distances of affected vertices to repair the (...truncated)