Efficient maintenance of highway cover labelling for distance queries on large dynamic graphs
World Wide Web
https://doi.org/10.1007/s11280-023-01146-2
Efficient maintenance of highway cover labelling
for distance queries on large dynamic graphs
Muhammad Farhan1 · Qing Wang1
Received: 26 August 2022 / Revised: 14 December 2022 / Accepted: 29 January 2023
© The Author(s) 2023
Abstract
Graphs in real-world applications are typically dynamic which undergo rapid changes in
their topological structure over time by either adding or deleting edges or vertices. However, it is challenging to design algorithms capable of supporting updates efficiently on
dynamic graphs. In this article, we devise a parallel fully dynamic labelling method to
reflect rapid changes on graphs when answering shortest-path distance queries, a fundamental problem in graph theory. At its core, our solution accelerates query processing through
a fully dynamic distance labelling of a limited size, which provides a good approximation
to bound online searches on dynamic graphs. Our parallel fully dynamic labelling method
leverages two sources of efficiency gains: landmark parallelism and anchor parallelism. Furthermore, it can handle both incremental and decremental updates efficiently using a unified
search approach and a bounded repairing inference mechanism. We theoretically analyze
the correctness, labelling minimality, and time complexity of our method, and also conduct
extensive experiments to empirically verify its efficiency and scalability on 10 real-world
large networks.
Keywords Graph algorithms · Highway cover · Shortest-path distance queries ·
Distance labelling · Dynamic graphs
1 Introduction
Given a graph G, a distance query on G is to answer the distance between any two vertices in the graph G. As a fundamental primitive, distance queries are widely applied in
modern network-oriented systems, such as communication networks, context-aware search
This article belongs to the Topical Collection: Special Issue on Knowledge-Graph-Enabled Methods
and Applications for the Future Web
Guest Editors: Xin Wang, Jeff Pan, Qingpeng Zhang, Yuan-Fang Li
Muhammad Farhan
Qing Wang
1
School of Computing, Australian National University, Canberra, Australia
World Wide Web
Figure 1 Performance overview of our proposed method PARDHL and the state-of-the-art methods DECM
[8], DECPLL [9], DECFD [10] and DECHL [11], where the update time is calculated by processing 1,000
edge deletions over complex networks with sizes varying from 20 millions of edges to 3 billions of edges
in web graphs [1, 2], social network analysis [3, 4], route-planning in road networks [5, 6],
management of resources in computer networks [7], and so on.
Traditionally, a distance query can be answered using Dijkstra’s algorithm [12] on nonnegative weighted graphs or breadth-first search (BFS) algorithm on unweighted graphs.
However, these algorithms may end up traversing the entire network when two vertices are
far apart from each other, thus becoming too slow for applications that require low latency.
To speed up query response time, a plethora of methods have been proposed in the past
years [5, 10, 13–21]. Among these methods, precomputing a distance labelling is typically
considered as a promising solution. However, most of existing distance labelling methods
were designed for static networks.
Networks in the real-world are typically dynamic which undergo rapid changes, i.e. edge
additions/deletions in their topological structure over time. For example, people become
friend/unfriend or follow/unfollow others in social networks, web links become valid/invalid
in web graphs, and communication networks may have faults being detected and recovered
[7, 22, 23]. It is imperative to design dynamic algorithms that can efficiently update distance
labelling to reflect graph changes for fast and accurate responses to distance queries. So
far, only limited attempts have been made on maintaining a distance labelling for dynamic
graphs [8, 10, 24–26]. Among them, the methods considering incremental updates (i.e. edge
additions) [10, 24, 25] are relatively efficient, e.g., an incremental update can be processed
on graphs with billions of vertices in less than one second [25]. Unfortunately, the methods
considering decremental updates still suffer from long update time of a distance labelling
[8–10]. As shown in Figure 1, the average update time of one edge deletion on graphs of
size around 20 million edges is 135 seconds for DECM [8] and 19 seconds for DECPLL [9],
which are very inefficient. Moreover, these methods all consider the single-update setting,
i.e., performing one single edge insertion or edge deletion at a time. Unlike existing works,
in this article, we aim to explore the following research questions:
Q1: Is it possible to design a dynamic labelling algorithm which can efficiently reflect
both incremental and decremental updates on graphs for fast and accurate distance
computation?
World Wide Web
Figure 2 An illustration of our parallel framework, which dynamically maintains highway cover labelling
for graphs undergoing rapid updates
Q2: Can such a dynamic labelling algorithm handle multiple updates in parallel in
order to offer performance gains over existing dynamic labelling algorithms in the
single-update setting?
To answer these research questions, we propose a parallel solution for answering distance queries on dynamic graphs undergoing rapid changes in their topological structure.
Our method is efficient both in time and space, and can scale to large graphs with billions of edges. There are several design considerations. First and foremost, we combine
offline labelling and online searching in our proposed solution so as to leverage the advantages from both sides - accelerating query processing through a distance labelling that has
a limited size but provides a good approximation to bound online searches. Then, we proceed to design a fully dynamic distance labelling algorithm, which dynamizes a distance
labelling to efficiently reflect updates on the underlying graph. This algorithm consists of
three stages: (1) Finding affected vertices - to precisely identify vertices that are affected
by updates; (2) Finding boundary vertices - to bound the traversal space that is needed for
repairing; (3) Repairing affected vertices - to change the labels of affected vertices via an
inference process based on their new distances. Figure 2 illustrates the high-level overview
of our solution. At its core, we abide by the following principles:
Parallel searches: We exploit interactions between updates and design a novel parallel
approach to find affected vertices, which involves both landmark parallelism and anchor
parallelism.
Bounded space: We bound search spaces for updates to only small portions of graphs
that are affected, which are achieved by identifying boundary vertices with respect to
updates.
Repair inference: We develop a repairing approach that can efficiently infer the new
distances of affected vertices to repair the (...truncated)