Broadcast and Minimum Spanning Tree with o(m) Messages in the Asynchronous CONGEST Model
D I S C
Broadcast and Minimum Spanning Tree with o(m) Messages in the Asynchronous CONGEST Model
Valerie King 0 1
0 Department of Computer Science, University of Victoria , BC , Canada
1 Ali Mashreghi
We provide the first asynchronous distributed algorithms to compute broadcast and minimum spanning tree with o(m) bits of communication, in a sufficiently dense graph with n nodes and m edges. For decades, it was believed that Ω(m) bits of communication are required for any algorithm that constructs a broadcast tree. In 2015, King, Kutten and Thorup showed that in the KT1 model where nodes have initial knowledge of their neighbors' identities it is possible to construct MST in O˜(n) messages in the synchronous CONGEST model. In the CONGEST model messages are of size O(log n). However, no algorithm with o(m) messages were known for the asynchronous case. Here, we provide an algorithm that uses O(n3/2 log3/2 n) messages to find MST in the asynchronous CONGEST model. Our algorithm is randomized Monte Carlo and outputs MST with high probability. We will provide an algorithm for computing a spanning tree with O(n3/2 log3/2 n) messages. Given a spanning tree, we can compute MST with O˜(n) messages. 2012 ACM Subject Classification Computing methodologies → Distributed algorithms, Mathematics of computing → Graph algorithms
and phrases Distributed Computing; Minimum Spanning Tree; Broadcast Tree

1
Introduction
We consider a distributed network as an undirected graph with n nodes and m edges, and
the problem of finding a spanning tree and a minimum spanning tree (MST) with efficient
communication. That is, we require that every node in the graph learns exactly the subset
of its incident edges which are in the spanning tree or MST, resp. A spanning tree enables a
message to be broadcast from one node to all other nodes with only n − 1 edge traversals. In
a sensor or ad hoc network where the weight of a link between nodes reflects the amount of
energy required to transmit a message along the link [19], the minimum spanning tree (MST)
provides an energy efficient means of broadcasting. The problem of finding a spanning tree in
a network has been studied for more than three decades, since it is the building block of many
other fundamental problems such as counting, leader election, and deadlock resolution [3].
A spanning tree can be constructed by a simple breadthfirst search from a single node
using m bits of communication. The tightness of this communication bound was a “folk
theorem”, according to Awerbuch, Goldreich, Peleg and Vainish [4]. Their 1990 paper defined
the KT1 model where nodes have unique IDs and know only their neighbors. It showed,
for a limited class of algorithms, a lower bound of Ω(m) messages in a synchronous KT1
network. In 2015, Kutten et al. [19] proved a lower bound for general randomized algorithms
with O(log n) bit messages, in the KT0 model, where nodes do not know their neighbors.
In 2015, King, Kutten, and Thorup gave the first distributed algorithm (“KKT”) with
o(m) communication to build a broadcast tree and MST in the KT1 model. They devised
Monte Carlo algorithms in the synchronous KT1 model with O˜(n) communication [18]. This
paper and a followup paper [21] left open the problem of whether a o(m) bit communication
algorithm in the asynchronous model was possible, for either the spanning tree or MST
problem, when nodes know their neighbors’ IDs.
In an asynchronous network, there is no global clock. All processors may wake up at the
start and send messages, but further actions by a node are eventdriven, i.e., in response to
messages received. The pioneer work of Gallager, Humblet, and Spira [14] (“GHS”) presented
an asynchronous protocol for finding the MST in the CONGEST model, where messages are
of size O(log n). GHS requires O(m + n log n) messages and O(n log n) time if all nodes are
awakened simultaneously. Afterwards, researchers worked on improving the time complexity
of MST algorithms in the CONGEST model but the message complexity remained Ω(m).
In this paper, we provide the first algorithm in the KT1 model which uses o(m) bits of
communication for finding a spanning tree in an asynchronous network, specifically we show
the following:
I Theorem 1. Given any network of n nodes where all nodes awake at the start, a
spanning tree and a minimum spanning tree can be built with O(n3/2 log3/2 n) messages in the
asynchronous KT1 CONGEST model, with high probability.
1.1
Techniques
Many distributed algorithms to find an MST use the Boruvka method: Starting from the
set of isolated nodes, a forest of edge disjoint rooted trees which are subtrees of the MST
are maintained. The algorithms runs in phases: In a phase, in parallel, each tree A finds a
minimum weight outgoing edge, that is, one with exactly one endpoint in A and its other
endpoint in some other tree B. Then the outgoing edge is inserted to create the “merged”
tree containing the nodes of A and B. In what seems an inherently synchronous process,
every tree (or a constant fraction of the trees) participates in some merge, the number of
trees is reduced by a constant factor per phase, and O(log n) phases suffice to form a single
tree. [14, 3, 18, 21].
The KKT paper introduced procedures F indAny and F indM in which can find any or
the minimum outgoing edge leaving the tree, respectively. These require O(T ) messages
and O˜(T ), resp., where T  is the number of nodes in the tree T or a total of O˜(n) per
phase. As this is done synchronously in KKT, only O(log n) phases are needed, for a total
number of only O(n log n) messages to build a spanning tree.
While F indAny and F indM in are asynchronous procedures, the Boruvka approach of
[18] does not seem to work in an asynchronous model with o(m) messages, as it does not
seem possible to prevent only one tree from growing, one node at a time, while the other
nodes are delayed, for a cost of O(n2) messages. The asynchronous GHS also uses O(log n)
phases to merge trees in parallel, but it is able to synchronize the growth of the trees by
assigning a rank to each tree. A tree which finds a minimum outgoing edge waits to merge
until the tree it is merging with is of equal or higher rank. The GHS algorithm subtly avoids
traversing the whole tree until a minimum weight outgoing edge to an appropriately ranked
tree is found. This method seems to require communication over all edges in the worst case.
Asynchrony precludes approaches that can be used in the synchronous model. For example,
in the synchronous model, if nodes of low degree send messages to all their neighbors, in one
round all nodes learn which of their neighbors do not have low degree, and therefore they
can construct the subgraph of higher degree nodes. In the asynchronous model, a node, not
hearing from its neighbor, does not know when to conclude that its neighbor is of higher
degree.
The technique for building a spanning tree in our paper is very different from the technique
in [18] or [14]. We grow one tree T rooted at one preselected leader in phases. (If there is
no preselected leader, then this may be done from a small number of randomly selfselected
nodes.) Initially, each node selects itself with probability 1/√n log n as a star node. (We use
log n to denote log2 n.) This technique is inspired from [
10
], and provides a useful property
that every node whose degree is at least √n log3/2 n is adjacent to a star node with high
probability. Initially, star nodes (and lowdegree nodes) send out messages to all of their
neighbors. Each highdegree node which joins T waits until it hears from a star node and
then invites it to join T . In addition, when lowdegree and star nodes join T , they invite
their neighbors to link to T via their incident edges. Therefore, with high probability, the
following invariant for T is maintained as T grows:
Invariant: T includes all neighbors of any star or lowdegree node in T , as well. Each
highdegree node in T is adjacent to a star node in T .
The challenge is for highdegree nodes in T to find neighbors outside T . If in each phase,
an outgoing edge from a highdegree node in T to a highdegree node x (not in T ) is found
and x is invited to join T , then x’s adjacent star node (which must lie outside T by the
Invariant) is also found and invited to join. As the number of star nodes is O(√n/ log1/2 n),
this number also bounds the number of such phases. The difficulty is that there is no obvious
way to find an outgoing edge to a high degree node because, as mentioned above, in an
asynchronous network, a high degree node has no apparent way to determine if its neighbor
has high degree without receiving a message from its neighbor.
Instead, we relax our requirement for a phase. With each phase either (A) A highdegree
node (and star node) is added to T or (B) T is expanded so that the number of outgoing
edges to lowdegree nodes is reduced by a constant factor. As there are no more than
O(√n/ log1/2 n) phases of type A and no more than O(log n) phases of type B between each
type A phase, there are a total of O(√n log1/2 n) phases before all nodes are in T . The
key idea for implementing a phase of type B is that the tree T waits until its nodes have
heard enough messages passed by lowdegree nodes over outgoing edges before initiating an
expansion. The efficient implementation of a phase, which uses only O(n log n) messages,
requires a number of tools which are described in the preliminaries section.
Once a spanning tree is built, we use it as a communication network while finding the
MST. This enables us to “synchronize” a modified GHS which uses F indM in for finding
minimum outgoing edges, using a total of O˜(n) messages.
Note: If we do not assume the existence of a preselected leader, or the graph is not
connected, then a variant of the algorithm described in the arxiv version [22] is needed.
1.2
Related work
The Awerbuch, Goldreich, Peleg and Vainish [4] lower bound on the number of messages
holds only for (randomized) algorithms where messages may contain a constant number of
IDs, and IDs are processed by comparison only and for general deterministic algorithms,
where ID’s are drawn from a very large size universe.
Time to build an MST in the CONGEST model has been explored in several papers.
Algorithms include, in the asynchronous KT0 model, [14, 3, 13, 26], and in the synchronous
KT0 model, [20, 15, 7, 17]. Recently, in the synchronous KT0 model, Pandurangan gave a
[23] O˜(D + √n) time and O˜(m) message randomized algorithm, which Elkin improved by
logarithmic factors with a deterministic algorithm [11]. The time complexity to compute
spanning tree in the algorithm of [18] is O(n log n) which was improved to O(n) in [21].
Lower bounds on time for approximating the minimum spanning tree has been proved in
the synchronous KT0 model In [8, 25] . Kutten et al. [19] show an Ω(m) lower bound on
message complexity for randomized general algorithms in the KT0 model.
F indAny and F indM in which appear in the KKT algorithms build on ideas for sequential
dynamic connectivity in [16]. A sequential dynamic ApproxCut also appeared in that paper
[16]. Solutions to the sequential linear sketching problem for connectivity [1] share similar
techniques but require a more complex step to verify when a candidate edge name is an
actual edge in the graph, as the edges names are no longer accessible once the sketch is made
(See Subsection 2.3).
The threshold detection problem was introduced by Emek and Korman [12]. It assumes
that there is a rooted spanning tree T where events arrive online at T ’s nodes. Given some
threshold k, a termination signal is broadcast by the root if and only if the number of events
exceeds k. We use a naive solution of a simple version of the problem here.
A synchronizer, introduced by Awerbuch [2] and studied in [6, 5, 24, 9], is a general
technique for simulating a synchronous algorithm on an asynchronous network using
communications along a spanning tree. To do this, the spanning tree must be built first. Using a
general synchronizer imposes an overhead of messages that affect every single step of the
synchronous algorithm that one wants to simulate, and would require more communication
than our special purpose method of using our spanning tree to synchronize the modified GHS.
1.3
Organization
Section 2 describes the model. Section 3 gives the spanning tree algorithm for the case
of a connected network and a single known leader. Finally, Section 4 provides the MST
algorithm. (Due to lack of space the algorithm for computing a minimum spanning forest in
disconnected graphs or minimum spanning tree for dealing with the case of no preselected
leader is available on the arxiv version [22]. This variant of the algorithm has the same
message complexity.)
2
2.1
Preliminaries
Model
Let c ≥ 1 be any constant. The communications network is the undirected graph G = (V, E)
over which a spanning tree or MST will be found. Edge weights are integers in [1, nc]. IDs
are assigned uniquely by the adversary from [1, nc]. All nodes have knowledge of c and n
which is an upper bound on V  (number of nodes in the network) within a constant factor.
All nodes know their own ID along with the ID of their neighbors (KT1 model) and the
weights of their incident edges. Nodes have no other information about the network. e.g.,
they do not know E or the maximum degree of the nodes in the network. Nodes can only
send direct messages to the nodes that are adjacent to them in the network. If the edge
weights are not unique they can be made unique by appending the ID of the endpoints to its
weight, so that the MST is unique. Nodes can only send direct messages to the nodes that
are adjacent to them in the network. Our algorithm is described in the CONGEST model in
which each message has size O(log n). Its time is trivially bounded by the total number of
messages. The KT1 CONGEST model has been referred to as the “standard model” [4].
Message cost is the sum over all edges of the number of messages sent over each edge
during the execution of the algorithm. If a message is sent it is eventually received, but the
adversary controls the length of the delays and there is no guarantee that messages sent by
the same node will be received in the order they are sent. There is no global clock. All nodes
awake at the start of the protocol simultaneously. After awaking and possibly sending its
initial messages, a processor acts only in response to receiving messages.
We say a network “finds” a subgraph if at the end of the distributed algorithm, every
node knows exactly which of its incident edges in the network are part of the subgraph.
The algorithm here is Monte Carlo, in that it succeeds with probability 1 − n−c00 for any
constant c00 (“w.h.p.”).
We initially assume there is a special node (called leader ) at the start and the graph is
connected. These assumptions are dropped in the algorithm we provide for disconnected
graphs in the full version of the paper.
2.2
Definitions and Subroutines
T is initially a tree containing only the leader node. Thereafter, T is a tree rooted at the
leader node. We use the term outgoing edge from T to mean an edge with exactly one
endpoint in T . An outgoing edge is described as if it is directed; it is from a node in T and
to a node not in T (the “external” endpoint).
The algorithm uses the following subroutines and definitions:
Broadcast(M ): Procedure whereby the node v in T sends message M to its children and
its children broadcast to their subtrees.
Expand: A procedure for adding nodes to T and preserving the Invariant after doing so.
F indAny: Returns to the leader an outgoing edge chosen uniformly at random with
probability 1/16, or else it returns ∅. The leader then broadcasts the result. F indAny
requires O(n) messages. We specify F indAny(E0) when we mean that the outgoing edge
must be an outgoing edge in a particular subset E0 ⊆ E.
F indM in: is similarly defined except the edge is the (unique) minimum cost outgoing
edge. This is used only in the minimum spanning tree algorithm. F indM in requires
O(n log2 n/ log log n) messages.
ApproxCut: A function which w.h.p. returns an estimate in [k/32, k] where k is the
number of outgoing edges from T and k > c log n for c a constant. It requires O(n log n)
messages.
F indAny and F indM in are described in [18] (The F indAny we use is called FindAnyC
there.) FindAnyC was used to find any outgoing edge in the previous paper. It is not
hard to see that the edge found is a random edge from the set of outgoing edges; we use
that fact here. The relationships among F indAny, F indM in and ApproxCut below are
described in the next subsection.
F oundL(v), F oundO(v): Two lists of edges incident to node v, over which v will send
invitations to join T the next time v participates in Expand. After this, the list is emptied.
Edges are added to F oundL(v) when v receives hLowdegreei message or the edge is found
by the leader by sampling and its external endpoint is lowdegree. Otherwise, an edge
is added to F oundO(v) when v receives a hStari message over an edge or if the edge is
found by the leader by sampling and its external endpoint is highdegree. Note that star
nodes that are lowdegree send both hLowdegreei and hStari. This may cause an edge to
be in both lists which is handled properly in the algorithm.
Tneighbor(v): A list of neighbors of v in T . This list, except perhaps during the execution
of Expand, includes all lowdegree neighbors of v in T . This list is used to exclude from
F oundL(v) any nonoutgoing edges.
T hresholdDetection(k): A procedure which is initiated by the leader of T . The nodes in
T experience no more than k < n2 events w.h.p. The leader is informed w.h.p. when
the number of events experienced by the nodes in T reaches the threshold k/4. Here,
an event is the receipt of hLowdegreei over an outgoing edge. Following the completion
of Expand, all edges (u, v) in F oundL(u) are events if v ∈/ Tneighbor(u). O(T  log n)
messages suffice.
2.3
Implementation of F indAny, F indM in and ApproxCut
We briefly review F indAny in [18] and explain its connection with ApproxCut. The key
insight is that an outgoing edge is incident to exactly one endpoint in T while other edges
are incident to zero or two endpoints. If there were exactly one outgoing edge, the parity
of the sum of all degrees in T would be 1, and the parity of bitwise XOR of the binary
representation of the names of all incident edges would be the name of the one outgoing edge.
To deal with possibility of more than one outgoing edge, the leader creates an efficient
means of sampling edges at different rates: Let l = d2 log ne. The leader selects and broadcasts
one pairwise independent hash function h : [edge_names] → [1, 2l], where edge_name of an
edge is a unique binary string computable by both its endpoints, e.g., {x, y} = x · y for x < y.
Each node y forms the vector h−−(→y) whose ith bit is the parity of its incident edges that hash
to [0, 2i], i = 0, . . . , l. Starting with the leaves, a node in T computes the bitwise XOR of
the vectors from its children and itself and then passes this up the tree, until the leader
has computed →−b = XORy∈T h−−(→y). The key insight implies that for each index i, →−bi equals
the parity of just the outgoing edges mapped to [0, 2i]. Let min be the smallest index i s.t.
→−i = 1. With constant probability, exactly one edge of the outgoing edges has been mapped
b
to [1, 2min]. The leader broadcasts min. Nodes send back up the XOR of the edge_names
of incident edges which are mapped by h to this range. If exactly one outgoing edge has
been indeed mapped to that range, the leader will find it by again determining the XOR of
the edge_names sent up. One more broadcast from the leader can be used to verify that
this edge exists and is incident to exactly one node in T .
Since each edge has the same probability of failing in [0, 2min], this procedure gives a
randomly selected edge. Note also that the leader can instruct the nodes to exclude certain
edges from the XOR, say incident edges of weight greater than some w. In this way the leader
can binary search for the minimal weight outgoing edge to carry out F indM in. Similarly,
the leader can select random edges without replacement.
Observe that if the number of outgoing edges is close to 2j, we’d expect min to be l − j
with constant probability. Here we introduce distributed asynchronous ApproxCut which
uses the sampling technique from F indM in but repeats it O(log n) times with O(log n)
randomly chosen hash functions. Let min_sum be the minimum i for which the sum of
→−bi s exceeds c log n for some constant c. We show 2min_sum approximates the number of
outgoing edges within a constant factor from the actual number. ApproxCut pseudocode is
given in Algorithm 5.
We show:
I Lemma 2. With probability 1 − 1/nc, ApproxCut returns an estimate in [k/32, k] where
k is the number of outgoing edges and k > c0 log n, c0 a constant depending on c. It uses
O(n log n) messages.
The proof is given in Section 3.2.
3
Asynchronous ST construction with o(m) messages
In this section we explain how to construct a spanning tree when there is a preselected leader
and the graph is connected.
Initially, each node selects itself with probability 1/√n log n as a star node. Lowdegree
and star nodes initially send out hLowdegreei and hStari messages to all of their neighbors,
respectively. (We will be using the hM i notation to show a message with content M .) A
lowdegree node which is a star node sends both types of messages. At any point during
the algorithm, if a node v receives a hLowdegreei or hStari message through some edge e, it
adds e to F oundL(v) or F oundO(v) resp.
The algorithm FindSTLeader runs in phases. Each phase has three parts: 1) Expansion
of T over found edges since the previous phase and restoration of the Invariant; 2) Search
for an outgoing edge to a highdegree node; 3) Wait until messages to nodes in T have been
received over a constant fraction of the outgoing edges whose external endpoint is lowdegree.
1) Expansion. Each phase is started with Expand. Expand adds to T any nodes which
are external endpoints of outgoing edges placed on a F ound list of any node in T since the
last time that node executed Expand. In addition, it restores the Invariant for T .
Implementation. Expand is initiated by the leader and broadcast down the tree. When a
node v receives hExpandi message for the first time (it is not in T ), it joins T and makes
the sender its parent. If it is a highdegree node and is not a star, it has to wait until it
receives a hStari message over some edge e, and then adds e to F oundO(v). It then forwards
hExpandi over the edges in F oundL(v) or F oundO(v) and empties these lists. Otherwise, if
it is a lowdegree node or a star node, it forwards hExpandi to all of its neighbors.
On the other hand, if v is already in T , it forwards hExpandi message to its children in T
and along any edges in F oundL(v) or F oundO(v), i.e. outgoing edges which were “found”
since the previous phase, and empties these lists. All hExpandi requests received by v are
answered, and their sender is added to Tneighbor(v). The procedure ends in a bottomup
way and ensures that each node has heard from all the nodes it sent hExpandi requests to
before it contacts its parent.
Let T i denote T after the execution of Expand in phase i. Initially T 0 consists of the
leader node and as its Found lists contain all its neighbors, after the first execution of
Expand, if the leader is highdegree, T1 satisfies the invariant. An easy inductive argument
on T i shows:
I Observation 1. For all i > 0, upon completion of Expand, all the nodes reachable by
edges in the F ound lists of any node in T i−1 are in T i, and for all v ∈ T , Tneighbor(v)
contains all the lowdegree neighbors of v in T .
Expand is called in line 6 of the main algorithm 1. The pseudocode is given in Expand
Algorithm 1.
2) Search for an outgoing edge to a high degree node. A sampling of the outgoing edges
without replacement is done using F indAny multiple times. The sampling either (1) finds
an outgoing edge to a high degree node, or (2) finds all outgoing edges, or (3) determines
w.h.p. that at least half the outgoing edges are to lowdegree nodes and there are at least
2c log n such edges. If the first two cases occur, the phase ends.
Implementation. Endpoints of sampled edges in T communicate over the outgoing edge
to determine if the external endpoint is highdegree. If at least one is, that edge is added
to the F oundO list of its endpoint in T and the phase ends. If there are fewer than 2 log n
outgoing edges, all these edges are added to F oundO and the phase ends. If there are no
outgoing edges, the algorithm ends. If all 2 log n edges go to lowdegree nodes, then the phase
continues with Step 3) below. This is implemented in the while loop of FindSTLeader.
Throughout this section we will be using the following fact from Chernoff bounds:
Assume X1, X2, . . . , XT are independent Bernoulli trials where each trial’s outcome is 1
with probability 0 < p < 1. Chernoff bounds imply that given constants c, c1 > 1 and
c2 < 1 there is a constant c00 such that if there are T ≥ c00 log n independent trials, then
P r(X > c1 · E[X]) < 1/nc and P r(X < c2 · E[X]) < 1/nc, where X is sum of the X1, . . . , XT .
We show:
I Lemma 3. After Search, at least one of the following must be true with probability 1−1/nc0 ,
where c0 is a constant depending on c: 1) there are fewer than 2c log n outgoing edges and the
leader learns them all; 2) an outgoing edge is to a highdegree node is found, or 3) there are
at least 2c log n outgoing edges and at least half the outgoing edges are to lowdegree nodes.
Proof. Each F indAny has a probability of 1/16 of returning an outgoing edge and if it
returns an edge, it is always outgoing. After 48c log n repetitions without replacement, the
expected number of edges returned is 3c log n. As these trials are independent, Chernoff
bounds imply that at least 2/3 of trials will be successful with probability at least 1 − 1/nc,
i.e., 2c log n edges are returned if there are that many, and if there are fewer, all will be
returned.
The edges are picked uniformly at random by independent repetitions of F indAny. If
more than half the outgoing edges are to highdegree nodes, the probability that all edges
returned are to lowdegree nodes is 1/22c log n < 1/n2c. J
3) Wait to hear from outgoing edges to lowdegree external nodes. This step forces the
leader to wait until T has been contacted over a constant fraction of the outgoing edges to
(external) lowdegree nodes. Note that we do not know how to give a good estimate on the
number of lowdegree nodes which are neighbors of T . Instead we count outgoing edges.
Implementation. This step occurs only if the 2c log n randomly sampled outgoing edges all
go to lowdegree nodes and therefore the number of outgoing edges to lowdegree nodes is
at least this number. In this case, the leader waits until T has been contacted through a
constant fraction of these edges.
If this step occurs, then w.h.p., at least half the outgoing edges go to lowdegree nodes.
Let k be the number of outgoing edges; k ≥ 2c log n. The leader calls ApproxCut to return
an estimate q ∈ [k/32, k] w.h.p. It follows that w.h.p. the number of outgoing edges to
lowdegree nodes is k/2. Let r = q/2. Then r ∈ [k/64, k/2].
The nodes v ∈ T will eventually receive at least k/2 messages over outgoing edges of the
form hLowdegreei. Note that these messages must have been received by v after v executed
Expand and added to F oundL(v), for otherwise, these would not be outgoing edges.
The leader initiates a ThresholdDetection procedure whereby there is an event for a
node v for each outgoing edge v has received a hLowdegreei message over since the last
time v executed Expand. As the ThresholdDetection procedure is initiated after the leader
finishes Expand, the Tneighbor(v) includes any lowdegree neighbor of v that is in T . Using
Tneighbor(v), v can determine which edges in F oundL(v) are outgoing.
Each event experienced by a node causes it to flip a coin with probability min{c log n/r, 1}.
If the coin is heads, then a trigger message labelled with the phase number is sent up to the
leader. The leader is triggered if it receives at least (c/2) log n trigger messages for that phase.
When the leader is triggered, it begins a new phase. Since there are k/2 triggering events,
the expected number of trigger messages eventually generated is (c log n/r)(k/2) ≥ c log n.
Chernoff bounds imply that at least (c/2) log n trigger messages will be generated w.h.p.
Alternatively, w.h.p., the number of trigger messages received by the leader will not exceed
(c/2) log n until at least k/8 events have occurred, as this would imply twice the expected
number. We can conclude that w.h.p. the leader will trigger the next phase after 1/4 of the
outgoing edges to lowdegree nodes have been found.
I Lemma 4. When the leader receives (c/2) log n messages with the current phase number,
w.h.p, at least 1/4 of the outgoing edges to lowdegree nodes have been added to F oundL lists.
3.1
Proof of the main theorem
Here we prove Theorem 1 as it applies to computing the spanning tree of a connected network
with a preselected leader.
I Lemma 5. W.h.p., after each phase except perhaps the first, either (A) A highdegree
node (and star node) is added to T or (B) T is expanded so that the number of outgoing
edges to lowdegree nodes is reduced by a 1/4 factor (or the algorithm terminates with a
spanning tree).
Proof. By Lemma 3 there are three possible results from the Search phase. If a sampled
outgoing edge to a highdegree node is found, this edge will be added to the F oundO list of
its endpoint in T . If the Search phase ends in fewer than 2c log n edges found and none of
them are to high degree nodes, then w.h.p. these are all the outgoing edges to lowdegree
nodes, these edges will all be added to some F oundL. If there are no outgoing edges, the
algorithm terminates and a spanning tree has been found. If the third possible result occurs,
then there are at least 2 log n outgoing edges, half of which go to lowdegree nodes. By
Lemma 4, the leader will trigger the next phase and it will do so after at least 1/4 of the
outgoing edges to lowdegree nodes have been added to F oundL lists.
By Observation 1, all the endpoints of the edges on the F ound lists will be added to T in
the next phase, and there is at least one such edge or there are no outgoing edges and the
spanning tree has been found. When Expand is called in the next phase, T acquires a new
high degree node in two possible ways, either because an outgoing edge on a F ound list is to
a highdegree node or because the recursive Expand on outgoing edges to lowdegree edges
eventually leads to an external highdegree node. In either case, by the Invariant, T will
acquire a new star node as well as a highdegree node. Also by the Invariant, all outgoing
edges must come from highdegree nodes. Therefore, if no highdegree nodes are added to
T by Expand, then no new outgoing edges are added to T . On the other hand, 1/4 of the
outgoing edges to lowdegree nodes have become nonoutgoing edges as their endpoints have
been added to T . So we can conclude that the number of outgoing edges to lowdegree nodes
have been decreased by 1/4 factor. J
It is not hard to see:
I Lemma 6. The number of phases is bounded by O(√n log1/2 n).
Proof. By Lemma 5, every phase except perhaps the first, is of type A or type B. Chernoff
bounds imply that w.h.p., the number of star nodes does not exceed its expected number
(√n/ log1/2 n) by more than a constant factor, hence there are no more than O(√n/ log1/2 n)
phases of type A. Before and after each such phase, the number of outgoing edges to
lowdegree nodes is reduced by at least a fraction of 1/4; hence, there are no more than
log4/3 n2 = O(log n) phases of type B between phases of type A. J
Finally, we count the number of messages needed to compute the spanning tree.
I Lemma 7. The overall number of messages is O(n3/2 log3/2 n).
Proof. The initialization requires O(√n log3/2 n) messages from O(n) lowdegree nodes and
O(n) messages from each of O(√n/ log1/2 n) stars. In each phase, Expand requires a number
of messages which is linear in the size of T or O(n), except that newly added lowdegree
and star nodes send to their neighbors when they are added to T , but this adds just a
constant factor to the initialization cost. F indAny is repeated O(log n) times for a total
cost of O(n log n) messages. ApproxCut requires the same number. The Threshold Detector
requires only O(log n) messages to be passed up T or O(n log n) messages overall. Therefore,
by Lemma 6 the number of messages over all phases is O(n log3/2 n). J
Theorem 1 for spanning trees in connected networks with a preselected leader follows
from Lemmas 7 and 6.
3.2
Proof of ApproxCut Lemma
Proof. Let W be the set of the outgoing edges. For a fixed z and i, we have:
P r(hz,i(T ) = 1) = P r(an odd number of edges in W hash to [2i]) ≥
P r(∃! e ∈ W hashed to [2i]).
This probability is at least 1/16 for i = l − dlog W e − 2 (Lemma 5 of [18]). Therefore,
since Xj = Pc log n hz,j (from pseudocode), E[Xj] = P E[hz,jl] ≥ c log n/16, where j =
l − dlog W e −z2=.1Note that j = l − dlog W e − 2 means that 2j2+3 < W  < 2j2+l 1 . Consider
j − 4. Since the probability of an edge being hashed to [2j−4] is 2j2−l 4 , we have
2j−4 1 1
P r(hz,j−4(T ) = 1) ≤ P r(∃e ∈ W hashed to [2j−4]) = W  2l ≤ 25 ≤ 32 .
Thus, E[Xj−4] ≤ c log n/32. Since an edge that is hashed to [2j−k] (for k > 4) is already
hashed to [2j−4], we have:
P r(hz,j−4(T ) = 1 ∨ . . . ∨ hz,0(T ) = 1) ≤ P r(∃e ∈ W hashed to [2j−4]or . . . or[20])) =
P r(∃e ∈ W hashed to [2j−4]) =
Let yz be 1 if hz,j−4(T ) = 1 ∨ . . . ∨ hz,0(T ) = 1, and 0 otherwise. Also, let Y = Pcz =lo1g n yz.
We haveE[Y ] ≤ c log n/32. Also, for any positive integer a,
From Chernoff bounds: and,
P r(Xj−4 > a ∨ . . . ∨ X0 > a) ≤ P r(Y > a).
P r(Xj < (3/4)c log n/16) = P r(Xj < (3/4)E[Xj]) < 1/nc0
P r(Xj−4 > (3/2)c log n/16 ∨ . . . ∨ X0 > (3/2)c log n/16) ≤ P r(Y > (3/2)c log n/16) =
P r(Y > (3/2)c log n/32) < P r(Y > (3/2)E[Y ]) < 1/nc0 .
Therefore, by finding the smallest i (called min in pseudocode) for which Xi > (3/2)c log n/16,
w.h.p. min is in [j − 3, j]. As a result, 2W  ≤ 2l−min ≤ 64W . Therefore,
W /32 ≤ 2l−min/64 ≤ W .
Furthermore, broadcasting each of the O(log n) hash functions and computing the
corresponding vector takes O(n) messages; so, the lemma follows. J
3.3
Pseudocode
Algorithm 1 Initialization of the spanning tree algorithm.
1: procedure Initialization
2: Every node selects itself to be a star node with probability of 1/√n log n.
3: Nodes that have degree < √n log3/2 n are lowdegree nodes. Otherwise, they are
highdegree nodes. (Note that they may also be star nodes at the same time.)
4: Star nodes send hStari messages to all of their neighbors.
5: Lowdegree nodes send hLowdegreei messages to all of their neighbors (even if they
are star nodes too).
6: end procedure
Algorithm 3 Given r at phase i, this procedure detects when nodes in T receive at least
r/4 hLow − degreei messages over outgoing edges. c is a constant.
1: procedure ThresholdDetection
2: Leader calls Broadcast(hSendtrigger, r, ii).
3: When a node u ∈ T receives hSendtrigger, r, ii, it first participates in the broadcast.
Then, for every event, i.e. every edge (u, v) ∈ F ound(u)L such that v ∈/ Tneighbor(u),
u sends to its parent a hT rigger, ii message with probability of c log n/r.
4: A node that receives hT rigger, ii from a child keeps sending up the message until it
reaches the leader. If a node receives an hExpandi before it sends up a hTrigger, ii, it
discards the hTrigger, ii messages as an Expand has already been triggered.
5: Once the leader receives at least c log n/2 hTrigger, ii messages, the procedure
terminates and the control is returned to the calling procedure.
6: end procedure
Algorithm 4 Leader initiates Expand by sending hExpandi to all of its children. If this is
the first Expand, leader sends to all of its neighbors. Here, x is any nonleader node.
1: procedure Expand
2: When node x receives an hExpandi message over an edge (x, y):
3: x adds y to Tneighbor(x).
4: if x is not in T then
5: The first node that x receives hExpandi from becomes x’s parent. //x joins T
6: if x is a highdegree node and x is not a star node then
7: It waits to receive a hStari over some edge e, then adds e to F oundO(x).
8: It forwards hExpandi over edges in F oundL(x) and F oundO(x) (only once in
case an edge is in both lists), then removes those edges from the Found lists.
9: else (x is a lowdegree or star node)
10: It forwards the hExpandi message to all of its neighbors.
11: end if
12: else (x is already in T )
13: If the sender is not its parent, it sends back hDonebyrejecti. Else, it forwards
hExpandi to its children in T , over the edges in F oundL(x) and F oundO(x),
then removes those edges from the Found lists.
14: end if
// Note that if x added more edges to its Found list after forward of
hExpandi, the new edges will be dealt with in the next Expand.
15: When a node receives hDonei messages (either hDonebyaccepti or hDonebyrejecti)
from all of the nodes it has sent to, it considers all nodes that have sent
hDonebyaccepti as its children. Then, it sends up hDonebyaccepti to its parent.
16: The algorithm terminates when the leader receives hDonei from all of its children.
17: end procedure
3:
Algorithm 5 Approximates the number of outgoing edges within a constant factor. c is a
constant.
1: procedure ApproxCut(T )
2: Leader broadcasts c log n random 2wise independent hash functions defined from
[1, n2c] → [2l].
For node y, and hash function hz vector h−→z(y) is computed where hz,i(y) is the parity
of incident edges that hash to [2i], i = 0, . . . , l.
4: FFoorr ehaacshh ifu=nc0t,io..n. ,hlz,,Xh−→iz(=T )P=c l⊕ogyn∈hTzh−→,iz((Ty)).is computed in the leader.
5: z=1
6: Let min be the smallest i s.t. Xi ≥ (3/4)c log n/16.
7: Return 2l−min/64.
8: end procedure
4
Finding MST with o(m) asynchronous communication
The MST algorithm implements a version of the GHS algorithm which grows a forest of
disjoint subtrees (“fragments”) of the MST in parallel. We reduce the message complexity
of GHS by using F indM in to find minimum weight outgoing edges without having to send
messages across every edge. But, by doing this, we require the use of a spanning tree to help
synchronize the growth of the fragments.
Note that GHS nodes send messages along their incident edges for two main purposes: (1)
to see whether the edge is outgoing, and (2) to make sure that fragments with higher rank
are slowed down and do not impose a lot of time and message complexity. Therefore, if we
use F indM in instead of having nodes to send messages to their neighbors, we cannot make
sure that higher ranked fragments are slowed down. Our protocol works in phases where
in each phase only fragments with smallest ranks continue to grow while other fragments
wait. A spanning tree is used to control the fragments based on their rank. (See [14] for the
original GHS.)
Implementation of FindMST. Initially, each node forms a fragment containing only that
node which is also the leader of the fragment and fragments all have rank zero. A fragment
identity is the node ID of the fragment’s leader; all nodes in a fragment know its identity and
its current rank. Let the precomputed spanning tree T be rooted at a node r, All fragment
leaders wait for instructions that are broadcast by r over T .
The algorithm runs in phases. At the start of each phase, r broadcasts the message
hRankrequesti to learn the current minimum rank among all fragments after this broadcast.
Leaves of T send up their fragment rank. Once an internal node in T receives the rank
from all of its children (in T ) the node sends up the minimum fragment rank it has received
including its own. This kind of computation is also referred to as a convergecast.
Then, r broadcasts the message hP roceed, minRanki where minRank is the current
minimum rank among all fragments. Any fragment leader that has rank equal to minRank,
proceeds to finding minimum weight outgoing edges by calling FindMin on its own fragment
tree. These fragments then send a hConnecti message over their minimum weight outgoing
edges. When a node v in fragment F (at rank R) sends a hConnecti message over an edge e
to a node v0 in fragment F 0 (at rank R0), since R is the current minimum rank, two cases
may happen: (Ranks and identities are updated here.)
1. R < R0: In this case, v0 answers immediately to v by sending back an hAccepti message,
indicating that F can merge with F 0. Then, v initiates the merge by changing its fragment
identity to the identity of F 0, making v0 its parent, and broadcasting F 0’s identity over
fragment F so that all nodes in F update their fragment identity as well. Also, the new
fragment (containing F and F 0) has rank R0.
2. R = R0: v0 responds hAccepti immediately to v if the minimum outgoing edge of F 0 is e,
as well. In this case, F merges with F 0 as mentioned in rule 1, and the new fragment will
have F 0’s identity. Also, both fragments increase their rank to R0 + 1.
Otherwise, v0 does not respond to the message until F 0’s rank increases. Once F 0 increased
its rank, it responds via an hAccepti message, fragments merge, and the new fragment
will update its rank to R0.
The key point here is that fragments at minimum rank are not kept waiting. Also, the
intuition behind rule 2 is as follows. Imagine we have fragments F1, F2, ..., Fk which all have
the same rank and Fi’s minimum outgoing edge goes to Fi+1 for i ≤ k − 1. Now, it is either
the case that Fk’s minimum outgoing edge goes to a fragment with higher rank or it goes to
Fk. In either case, rule 2 allows the fragments Fk−1, Fk−2, . . . to update their identities in a
cascading manner right after Fk increased its rank.
When all fragments finish their merge at this phase they have increased their rank by
at least one. Now, it is time for r to star a new phase. However, since communication is
asynchronous we need a way to tell whether all fragments have finished. In order to do
this, hDonei messages are convergecast in T . Nodes that were at minimum rank send up to
their parent in T a hDonei message only after they increased their rank and received hDonei
messages from all of their children in T .
Algorithm 6 MST construction with O˜(n) messages. T is a spanning tree rooted at r.
1: procedure FindMST
2: All nodes are initialized as fragments at rank 0.
// Start of a phase
3: r calls Broadcast(hRankrequesti), and minRank is computed via a convergecast.
4: r calls Broadcast(hP roceed, minRanki).
5: Fragment leaders at rank minRank that have received the hP roceed, minRanki
message, call FindMin. Then, these fragments merge by sending Connect messages
over their minimum outgoing edges. If there is no outgoing edge the fragment leader
terminates the algorithm.
6: Upon receipt of hP roceed, minRanki, a node v does the following:
If it is a leaf in T at rank minRank, sends up hDonei after increasing its rank.
If it is a leaf in T with a rank higher than minRank, it immediately sends up hDonei.
If it is not a leaf in T , waits for hDonei from its children in T . Then, sends up the
hDonei message after increasing its rank.
7: r waits to receive hDonei from all of its children, and starts a new phase at step 3.
8: end procedure
As proved in Lemma 8, this algorithm uses O˜(n) messages.
I Lemma 8. FindMST uses O(n log3 n/ log log n) messages and finds the MST w.h.p.
Proof. All fragments start at rank zero. Before a phase begins, two broadcasts and
convergecasts are performed to only allow fragments at minimum rank to proceed. This requires O(n)
messages. In each phase, finding the minimum weight outgoing edges using FindMin takes
O(n log2 n/ log log n) over all fragments. Also, it takes O(n) for the fragments to update
their identity since they just have to send the identity of the higher ranked fragment over
their own fragment. As a result, each phase takes O(n log2 n/ log log n) messages.
A fragment at rank R must contain at least two fragments with rank R − 1; therefore, a
fragment with rank R must have at least 2R nodes. So, the rank of a fragment never exceeds
log n. Also, each phase increases the minimum rank by at least one. Hence, there are at
most log n phases. As a result, message complexity is O(n log3 n/ log log n). J
From Lemma 8, Theorem 1 for minimum spanning trees follows.
5
Conclusion
We presented the first asynchronous algorithm for computing the MST in the CONGEST
model with O˜(n3/2) communication when nodes have initial knowledge of their neighbors’
identities. This shows that the KT1 model is significantly more communication efficient than
KT0 even in the asynchronous model. Open problems that are raised by these results are:
(1) Does the asynchronous KT1 model require substantially more communication that the
synchronous KT1 model? (2) Can we improve the time complexity of the algorithm while
maintaining the message complexity?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Kook Jin Ahn , Sudipto Guha, and Andrew McGregor . Graph sketches: sparsification, spanners, and subgraphs . In Proceedings of the 31st ACM SIGMODSIGACTSIGAI symposium on Principles of Database Systems , pages 5  14 . ACM, 2012 .
Baruch Awerbuch . Complexity of network synchronization . Journal of the ACM (JACM) , 32 ( 4 ): 804  823 , 1985 .
Baruch Awerbuch . Optimal distributed algorithms for minimum weight spanning tree, counting, leader election, and related problems . In Proceedings of the nineteenth annual ACM symposium on Theory of computing , pages 230  240 . ACM, 1987 .
Baruch Awerbuch , Oded Goldreich, Ronen Vainish, and David Peleg . A tradeoff between information and communication in broadcast protocols . Journal of the ACM (JACM) , 37 ( 2 ): 238  256 , 1990 .
A timeoptimal selfstabilizing synchronizer using a phase clock . IEEE Transactions on Dependable and Secure Computing , 4 ( 3 ), 2007 .
Baruch Awerbuch and David Peleg . Network synchronization with polylogarithmic overhead . In Foundations of Computer Science , 1990 . Proceedings., 31st Annual Symposium on , pages 514  522 . IEEE, 1990 .
Michael Elkin . A faster distributed protocol for constructing a minimum spanning tree . In Proceedings of the fifteenth annual ACMSIAM symposium on Discrete algorithms , pages 359  368 . Society for Industrial and Applied Mathematics, 2004 .
Michael Elkin . An unconditional lower bound on the timeapproximation tradeoff for the distributed minimum spanning tree problem . SIAM Journal on Computing , 36 ( 2 ): 433  456 , 2006 .
Michael Elkin . Synchronizers, spanners . In Encyclopedia of Algorithms , pages 1  99 .
Springer , 2008 .
Michael Elkin . Distributed exact shortest paths in sublinear time . In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017 , pages 757  770 , New York, NY, USA, 2017 . ACM. doi: 10 .1145/3055399.3055452.
Michael Elkin . A simple deterministic distributed mst algorithm, with nearoptimal time and message complexities . arXiv preprint arXiv:1703.02411 , 2017 .
In Proceedings of the 29th ACM SIGACTSIGOPS symposium on Principles of distributed computing , pages 183  191 . ACM, 2010 .
Michalis Faloutsos and Mart Molle . Optimal distributed algorithm for minimum spanning trees revisited . In Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing , pages 231  237 . ACM, 1995 .
Robert G. Gallager , Pierre A. Humblet , and Philip M. Spira . A distributed algorithm for minimumweight spanning trees . ACM Transactions on Programming Languages and systems (TOPLAS) , 5 ( 1 ): 66  77 , 1983 .
Juan A Garay , Shay Kutten , and David Peleg . A sublinear time distributed algorithm for minimumweight spanning trees . SIAM Journal on Computing , 27 ( 1 ): 302  316 , 1998 .
Bruce M Kapron , Valerie King , and Ben Mountjoy . Dynamic graph connectivity in polylogarithmic worst case time . In Proceedings of the twentyfourth annual ACMSIAM symposium on Discrete algorithms , pages 1131  1142 . Society for Industrial and Applied Mathematics, 2013 .
Maleq Khan and Gopal Pandurangan . A fast distributed approximation algorithm for minimum spanning trees . In Proceedings of the 20th International Conference on Distributed Computing, DISC'06 , pages 355  369 , Berlin, Heidelberg, 2006 . SpringerVerlag.
doi:10 .1007/11864219_ 25 .
Valerie King , Shay Kutten , and Mikkel Thorup . Construction and impromptu repair of an mst in a distributed network with o (m) communication . In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing , pages 71  80 . ACM, 2015 .
Shay Kutten , Gopal Pandurangan, David Peleg, Peter Robinson , and Amitabh Trehan . On the complexity of leader election . Journal of the ACM (JACM) , 62 ( 1 ): 7 , 2015 .
Shay Kutten and David Peleg . Fast distributed construction of kdominating sets and applications . In Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing , pages 238  251 . ACM, 1995 .
Ali Mashreghi and Valerie King . Timecommunication tradeoffs for minimum spanning tree construction . In Proceedings of the 18th International Conference on Distributed Computing and Networking, page 8. ACM , 2017 .
Ali Mashreghi and Valerie King . Broadcast and minimum spanning tree with o(m) messages in the asynchronous congest model . arXiv, 2018 . arXiv: 1806 .04328.
Gopal Pandurangan , Peter Robinson , and Michele Scquizzato . A timeand messageoptimal distributed algorithm for minimum spanning trees . In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing , pages 743  756 . ACM, 2017 .
David Peleg and Jeffrey D Ullman . An optimal synchronizer for the hypercube . In Proceedings of the sixth annual ACM Symposium on Principles of distributed computing , pages 77  85 . ACM, 1987 .
Atish Das Sarma , Stephan Holzer, Liah Kor, Amos Korman, Danupon Nanongkai, Gopal Pandurangan, David Peleg, and Roger Wattenhofer . Distributed verification and hardness of distributed approximation . SIAM Journal on Computing , 41 ( 5 ): 1235  1265 , 2012 .
Gurdip Singh and Arthur J Bernstein . A highly asynchronous minimum spanning tree protocol . Distributed Computing , 8 ( 3 ): 151  161 , 1995 .