Log Diameter Rounds Algorithms for 2Vertex and 2Edge Connectivity
I C A L P
Log Diameter Rounds Algorithms for 2Vertex and 2Edge Connectivity
Peilin Zhong 0 1 2
Category Track A: Algorithms, Complexity and Games
0 Alexandr Andoni Columbia University , New York City, NY , USA
1 Columbia University , New York City, NY , USA
2 Clifford Stein Columbia University , New York City, NY , USA
Many modern parallel systems, such as MapReduce, Hadoop and Spark, can be modeled well by the MPC model. The MPC model captures well coarsegrained computation on large data  data is distributed to processors, each of which has a sublinear (in the input data) amount of memory and we alternate between rounds of computation and rounds of communication, where each machine can communicate an amount of data as large as the size of its memory. This model is stronger than the classical PRAM model, and it is an intriguing question to design algorithms whose running time is smaller than in the PRAM model.
and phrases parallel algorithms; biconnectivity; 2edge connectivity; the MPC model

14:2
1
Introduction
The success of modern parallel and distributed systems such as MapReduce [16, 17], Spark [41],
Hadoop [39], Dryad [23], together with the need to solve problems on massive data, is
driving the development of new algorithms which are more efficient and scalable in these
largescale systems. An important theoretical problem is to develop models which are good
abstractions of these computational frameworks. The Massively Parallel Computation (MPC)
model [25, 21, 11, 3, 9, 15, 4] captures the capabilities of these computational systems while
keeping the description of the model itself simple. In the MPC model, there are machines
(processors), each with ?(N ?) local memory, where N denotes the size of the input and
? ? (0, 1). The computation proceeds in rounds, where each machine can perform unlimited
local computation in a round and exchange O(N ?) data at the end of the round. The parallel
time of an algorithm is measured by the total number of computationcommunication rounds.
The MPC model is a variant of the Bulk Synchronous Parallel (BSP) model [38]. It is also a
more powerful model than the PRAM since any PRAM algorithm can be simulated in the
MPC model [25, 21] while some problem can be solved in a faster parallel time in the MPC
model. For example, computing the XOR of N bits takes O(1/?) parallel time in the MPC
model but needs nearlogarithmic parallel time on the most powerful CRCW PRAM [10].
A natural question to ask is: which problems can be solved in faster parallel time in
the MPC model than on a PRAM? This question has been studied by a line of recent
papers [25, 19, 29, 3, 1, 6, 22, 15, 7, 14, 13, 32, 20]. Most of these results studied the graph
problems, which are the usual benchmarks of parallel/distributed models. Many graph
problems such as graph connectivity [35, 33, 30], graph biconnectivity [37, 36], maximal
matching [26], minimum spanning tree [27] and maximal independent set [31, 2] can be
solved in the standard logarithmic time in the PRAM model, but these problems have been
shown to have a better parallel time in the MPC model.
In addition, we hope to develop fully scalable algorithms for the graph problems, i.e.,
the algorithm should work for any constant ? > 0. The previous literatures show that a
graph problem in the MPC model with large local memory size may be much easier than
the same problem in the MPC model but with a smaller local memory size. In particular,
when the local memory size per machine is close to the number of vertices n, many graph
problems have efficient algorithms. For example, if the local memory size per machine is
n/ logO(1) n, the connectivity problem [7] and the approximate matching problem [5] can
be solved in O(log log n) parallel time. If the local memory size per machine is ?(n), then
the MPC model meets the congested clique model [12]. In this setting, the connectivity
problem and the minimum spanning tree problem can be solved in O(1) parallel time [24].
If the local memory size per machine is n1+?(1), many graph problems such as maximal
matching, approximate weighted matchings, approximate vertex and edge covers, minimum
cuts, and the biconnectivity problem can be solved in O(1) parallel time [29, 8]. The
landscape of graph algorithms in the MPC model with small local memory is more nuanced
and challenging for algorithm designers. If the local memory size per machine is n1??(1),
then the best connectivity algorithm takes parallel time O(log D log log n) where D is the
diameter of the graph [4], and the best approximate maximum matching algorithm takes
parallel time Oe(?log n) [32].
Therefore, the main open question is: which kind of the graph problems can have faster
fully scalable MPC algorithms than the standard logarithmic PRAM algorithms?
Two fundamental graph problems in graph theory are 2edge connectivity and 2vertex
connectivity (biconnectivity). In this work, we studied these two problems in the MPC model.
Consider an nvertex, medge undirected graph G. A bridge of G is an edge whose removal
increases the number of connected components of G. In the 2edge connectivity problem, the
goal is to find all the bridges of G. For any two different edges e, e0 of G, e, e0 are in the same
biconnected component (block) of G if and only if there is a simple cycle which contains
both e, e0. If we define a relation R such that eRe0 if and only if e = e0 or e, e0 are contained
by a simple cycle, then R is an equivalence relation [18]. Thus, a biconnected component is
an induced graph of an equivalence class of R. In the biconnectivity problem, the goal is to
output all the biconnected components of G. We proposed faster, fully scalable algorithms
for the both 2edge connectivity problem and the biconnectivity problem by parameterizing
the running time as a function of the diameter and the bidiameter of the graph. The
diameter D of G is the largest diameter of its connected components. The definition of
bidiameter is a natural generalization of the definition of diameter. If vertices u, v are in
the same biconnected component, then the cycle length of (u, v) is defined as the minimum
length of a simple cycle which contains both u and v. The bidiameter D0 of G is the largest
cycle length over all the vertex pairs (u, v) where both u and v are in the same biconnected
component. Our main results are 1) a fully scalable O(log D log logm/n n) parallel time
2edge connectivity algorithm, 2) a fully scalable O(log D log2 logm/n n + log D0 log logm/n n)
parallel time biconnectivity algorithm. Our 2edge connectivity algorithm achieves the same
parallel time as the connectivity algorithm of [4]. We also show an ?(log D0) conditional
lower bound for the biconnectivity problem.
1.1
The Model
Our model of computation is the Massively Parallel Computation (MPC) model [25, 21, 11].
Consider two nonnegative parameters ? ? 0, ? > 0. In the (?, ?)MPC model [4], there
are p machines (processors) each with local memory size s, where p ? s = ?(N 1+? ), s = ?(N ?)
and N denotes the size of the input data. Thus, the space per machine is sublinear in N , and
the total space is only an O(N ? ) factor more than the input size. In particular, if ? = 0, the
total space available in the system is linear in the input size N . The space size is measured
by words each containing ?(log(s ? p)) bits. Before the computation starts, the input data is
distributed on ?(N/s) input machines. The computation proceeds in rounds. In each round,
each machine can perform local computation on its local data, and send messages to other
machines at the end of the round. In a round, the total size of messages sent/received by a
machine should be bounded by its local memory size s = ?(N ?). For example, a machine can
send s size 1 messages to s machines or send a size s message to 1 machine in a single round.
However, it cannot broadcast a size s message to every machine. In the next round, each
machine only holds the received messages in its local memory. At the end of the computation,
the output data is distributed on the output machines. An algorithm in this model is called
a (?, ?)MPC algorithm. The parallel time of an algorithm is the total number of rounds
needed to finish its computation. In this paper, we consider ? an arbitrary constant in (0, 1).
1.2
Our Results
Our main results are efficient MPC algorithms for 2edge connectivity and biconnectivity
problems. In our algorithms, one important subroutine is computing the DepthFirstSearch
(DFS) sequence [4] which is a variant of the Euler tour representation proposed by [37, 36] in
1984. We show how to efficiently compute the DFS sequence in the MPC model with linear
total space. Conditioned on the hardness of the connectivity problem in the MPC model, we
prove a hardness result on the biconnectivity problem.
For 2edge connectivity and biconnectivity, the input is an undirected graph G = (V, E)
with n = V  vertices and m = E edges. N = n + m denotes the size of the representation
of G, D denotes the diameter of G, and D0 denotes the bidiameter of G. We state our
results in the following.
Biconnectivity. In the biconnectivity problem, we want to find all the biconnected
components (blocks) of the input graph G. Since the biconnected components of G define a partition
on E, we just need to color each edge, i.e., at the end of the computation, ?e ? E, there is a
unique tuple (x, c) with x = e stored on an output machine, where c is called the color of e,
such that the edges e1, e2 are in the same biconnected components if and only if they have
the same color.
I Theorem 1 (Biconnectivity in MPC). For any ? ? [0, 2] and any constant ? ? (0, 1), there
is a randomized (?, ?)MPC algorithm which outputs all the biconnected components of the
graph G in O log D ? log2 log( Nlo1g+n?/n) + log D0 ? log log( Nlo1g+n?/n) parallel time. The success
probability is at least 0.95. If the algorithm fails, then it returns FAIL.
The worst case is when the input graph is sparse and the total space available is linear in the
input size, i.e., N = n + m = O(n) and ? = 0. In this case, the parallel running time of our
algorithm is O(log D ? log2 log n + log D0 ? log log n). If the graph is slightly denser (m = n1+c
for some constant c > 0), or the total space is slightly larger (? > 0 is a constant), then we
obtain O(log D + log D0) time.
A cut vertex (articulation point) in the graph G is a vertex whose removal increases the
number of connected components of G. Since a vertex v is a cut vertex if and only if there
are two edges e1, e2 which share the endpoint v and e1, e2 are not in the same biconnected
component, our algorithm can also find all the cut vertices of G.
2Edge connectivity. In the 2edge connectivity problem, we want to output all the bridges
of the input graph G. Since an edge is a bridge if and only if each of its endpoints is either a
cut vertex or a vertex with degree 1, the 2edge connectivity problem should be easier than
the biconnectivity problem. We show how to solve 2edge connectivity in the same parallel
time as the algorithm proposed by [4] for solving connectivity.
I Theorem 2 (2Edge connectivity in MPC). For any ? ? [0, 2] and any constant ? ? (0, 1),
there is a randomized (?, ?)MPC algorithm which outputs all the bridges of the graph G
in O log D ? log log( Nlo1g+n?/n) parallel time. The success probability is at least 0.97. If the
algorithm fails, then it returns FAIL.
DFS sequence. A rooted tree with a vertex set V can be represented by n = V  pairs
(v1, par(v1)), (v2, par(v2)), ? ? ? , (vn, par(vn)) where par : V ? V is a set of parent pointers,
i.e., for a nonroot vertex v, par(v) denotes the parent of v, and for the root vertex v,
par(v) = v. We show an algorithm which can compute the DFS sequence (Definition 6) of
the rooted tree in the MPC model with linear total space.
I Theorem 3 (DFS sequence of a tree in MPC). Given a rooted tree represented by a set
of parent pointers par : V ? V , there is a randomized (0, ?)MPC algorithm which outputs
the DFS sequence in O(log D) parallel time, where ? ? (0, 1) is an arbitrary constant, D is
the depth of the tree. The success probability is at least 0.99. If the algorithm fails, then it
returns FAIL.
Conditional hardness for biconnectivity. A conjectured hardness for the connectivity
problem is the one cycle vs. two cycles conjecture: for any ? ? 0 and any constant ? ? (0, 1), any
(?, ?)MPC algorithm requires ?(log n) parallel time to determine whether the input nvertex
graph is a single cycle or contains two disjoint length n/2 cycles. This conjectured hardness
result is widely used in the MPC literature [25, 11, 28, 34, 40]. Under this conjecture, we
show that ?(log D0) parallel time is necessary for the biconnectivity problem, and this is
true even when D = O(1), i.e., the diameter of the graph is a constant.
I Theorem 4 (Hardness of biconnectivity in MPC). For any ? ? 0 and any constant ? ? (0, 1),
unless there is a (?, ?)MPC algorithm which can distinguish the following two instances: 1)
a single cycle with n vertices, 2) two disjoint cycles each contains n/2 vertices, in o(log n)
parallel time, any (?, ?)MPC algorithm requires ?(log D0) parallel time for testing whether
a graph G with a constant diameter is biconnected.
1.3
Our Techniques
Biconnectivity. At a high level our biconnectivity algorithm is based on a framework
proposed by [36]. The main idea is to construct a new graph and reduce the problem of
finding biconnected components of G to the problem of finding connected components of the
new graph G0. At first glance, it should be efficiently solved by the connectivity algorithm [4].
However, there are two main issues: 1) since the parallel time of the MPC connectivity
algorithm of [4] depends on the diameter of the input graph, we need to make the diameter
of G0 small, 2) we need to construct G0 efficiently. Let us first consider the first issue, and
we will discuss the second issue later.
We give an analysis of the diameter of G0 = (V 0, E0) constructed by [36]. Without loss of
generality, we can suppose the input G = (V, E) is connected. Each vertex in G0 corresponds
to an edge of G. Let T be an arbitrary spanning tree of G with depth d. Each nontree
edge e can define a simple cycle Ce which contains the edge e and the unique path between
the endpoints of e in the tree T . Thus, the length of Ce is at most 2d + 1. If there is a
such cycle containing any two tree edges (u, v), (v, w), vertices (u, v), (v, w) are connected in
G0. For each nontree edge e, we connect the vertex e to the vertex e0 in graph G0 where
e0 is an arbitrary tree edge in the cycle Ce. By the construction of G0, any e, e0 from the
same connected components of G0 should be in the same biconnected components of G. Now
consider arbitrary two edges e, e0 in the same biconnected component of G. There must be
a simple cycle C which contains both edges e, e0 in G. Since all the simple cycles defined
by the nontree edges are a cycle basis of G [18], the edge set of C can be represented by
the xor sum of all the edge sets of k basis cycles C1, C2, ? ? ? , Ck where Ci is a simple cycle
defined by a nontree edge ei on the cycle C. k is upper bounded by the bidiameter of G.
Furthermore, we can assume Ci intersects Ci+1. There should be a path between e, e0 in G0,
and the length of the path is at most Pk
i=1 Ci ? O(k ? d). So, the diameter of G0 is upper
bounded by O(k ? d). Thus, according to [4], we can find the connected components of G0 in
? (log k + log d) parallel time, where d and k are upper bounded by the diameter and the
bidiameter of G respectively.
Now let us consider how to construct G0 efficiently. The bottleneck is to determine
whether the tree edges (u, v), (v, w) should be connected in G0 or not. Suppose w is the
parent of v and v is the parent of u. The vertex (u, v) should connect to the vertex (v, w) in
G0 if and only if there is a nontree edge that connects a vertex x in the subtree of u and
a vertex y which is on the outside of the subtree of v. For each vertex x, let lev(x) be the
minimum depth of the least common ancestor (LCA) of (x, y) over all the nontree edges
(x, y). Then (u, v) should be connected to (v, w) in G0 if and only if there is a vertex x in
the subtree of u in G such that lev(x) is smaller than the depth of v. Since the vertices in a
subtree should appear consecutively in the DFS sequence, this question can be solved by
some range queries over the DFS sequence. Next, we will discuss how to compute the DFS
sequence of a tree.
DFS sequence. The DFS sequence of a tree is a variant of the Euler tour representation of
the tree. For an nvertex tree T , [36] gives an O(log n) parallel time PRAM algorithm for the
Euler tour representation of T . However, since their construction method will destroy the
tree structure, it is hard to get a faster MPC algorithm based on this framework. Instead, we
follow the leaf sampling framework proposed by [4]. Although the DFS sequence algorithm
proposed by [4] takes O(log d) time where d is the depth of T , it needs ?(n log d) total
space. The bottleneck is the subroutine which needs to solve the least common ancestors
problem and generate multiple path sequences. The previous algorithm uses the doubling
algorithm for the subroutine, i,e., for each vertex v, they store the 2ith ancestor of v for
every i ? [dlog de]. This is the reason why [4] cannot achieve the linear total space. We show
how to compress the tree T into a new tree T 0 which only contains at most n/dlog de vertices.
We argue that applying the doubling algorithm on T 0 is sufficient for us to find the DFS
sequence of T .
2Edge connectivity. Without loss of generality, we can assume the input graph G is
connected. Consider a rooted spanning tree T and an edge e = (u, v) in G. Suppose the
depth of u is at least the depth of v in T , i.e., v cannot be a child of u. The edge e is not a
bridge if and only if either e is a nontree edge or there is a nontree edge (x, y) connecting
the subtree of u and a vertex on the outside of the subtree of u. Similarly, the second case
can be solved by some range queries over the DFS sequence of T .
Conditional hardness for biconnectivity. We want to reduce the connectivity problem to
the biconnectivity problem. For an undirected graph G, if we add an additional vertex
v? and connects v? to every vertex of G, then the diameter of the resulting graph G0 is
at most 2 and each biconnected components of G0 corresponds to a connected component
of G. Furthermore, the bidiameter of G0 is upper bounded by the diameter of G plus 2.
Therefore, if the parallel time of an algorithm A0 for finding the biconnected components
of G0 depends on the bidiameter of G0, there exists an algorithm A which can find all the
connected components of G in the parallel time which has the same dependence on the
diameter of G.
1.4
A Roadmap
Section 2 introduces the notation and some useful definitions. Section 3 describes the offline
algorithms for 2edge connectivity and biconnectivity. It also includes some crucial properties
of the algorithms. In Section 4, we show an linear space offline algorithm to find the DFS
sequence of a tree. All of these offline algorithms can be implemented in the MPC model
efficiently. Section 5 contains the conditional hardness result for the biconnectivity problem
in the MPC model. For the MPC implementations and all the missing technical proofs, we
refer readers to the full version of the paper.
2
2.1
Preliminaries
Notation
We follow the notation of [4]. [n] denotes the set of integers {1, 2, ? ? ? , n}.
defined as maxu,v?V :cyclenG(u,v)6=? cyclenG(u, v).
Diameter and bidiameter. Consider an undirected graph G with a vertex set V and an
edge set E. For any two vertices u, v, we use distG(u, v) to denote the distance between u and
v in graph G. If u, v are not in the same (connected) component of G, then distG(u, v) = ?.
The diameter diam(G) of G is the largest diameter of its connected components, i.e.,
diam(G) = maxu,v?V :distG(u,v)6=? distG(u, v). (v1, v2, ? ? ? , vk) ? V k is a cycle of length k ? 1
if v1 = vk and ?i ? [k ? 1], (vi, vi+1) ? E. We say a cycle (v1, v2, ? ? ? , vk) is simple if k ? 4
and each vertex only appears once in the cycle except v1 (vk). Consider two different vertices
u, v ? V . We use cyclenG(u, v) to denote the minimum length of a simple cycle which
contains both vertices u and v. If there is no simple cycle which contains both u and v,
cyclenG(u, v) = ?. cyclenG(u, u) is defined as 0. The bidiameter of G, bidiam(G), is
Representation of a rooted forest. Let V denote a set of vertices. We represent a rooted
forest in the same manner as [4]. Consider a mapping par : V ? V . For i ? N>0 and v ? V ,
we define par(i)(v) as par(par(i?1)(v)), and par(0)(v) is defined as v itself. If ?v ? V, ?i > 0
such that par(i)(v) = par(i+1)(v), then we call par a set of parent pointers on V . For v ? V ,
if par(v) = v, then we say v is a root of par. Notice that par actually can represent a rooted
forest, thus par can have more than one root. The depth of v ? V , deppar(v) is the smallest
i ? N such that par(i)(v) is the same as par(i+1)(v). The root of v ? V , par(?)(v) is defined
as par(deppar(v))(v). The depth of par, dep(par) is defined as maxv?V deppar(v).
Ancestor and path. For two vertices u, v ? V , if ?i ? N such that u = par(i)(v), then u is
an ancestor of v (in par). If u is an ancestor of v, then the path P (v, u) (in par) from v to u
is a sequence (v, par(v), par(2)(v), ? ? ? , u) and the path P (u, v) is the reverse of P (v, u), i.e.,
P (u, v) = (u, ? ? ? , par(2)(v), par(v), v). If an ancestor u of v is also an ancestor of w, then
u is a common ancestor of (v, w). Furthermore, if a common ancestor u of (v, w) satisfies
deppar(u) ? deppar(x) for any common ancestor x of (v, w), then u is the lowest common
ancestor (LCA) of (v, w).
Children and leaves. For any nonroot vertex u of par, u is a child of par(u). For any
vertex v ? V , childpar(v) denotes the set of all the children of v, i.e., childpar(v) = {u ?
V  u 6= v, par(u) = v}. If u is the kth smallest vertex in the set childpar(v), then we define
rankpar(u) = k, or in other words, u is the kth child of v. If v is a root vertex of par, then
rankpar(v) is defined as 1. childpar(v, k) denotes the kth child of v. For simplicity, if par
is clear in the context, we just use child(v), rank(v) and child(v, k) to denote childpar(v),
rankpar(v) and childpar(v, k) for short. If child(v) = ?, then v is a leaf of par. We denote
leaves(par) as the set of all the leaves of par, i.e., leaves(par) = {v  child(v) = ?}.
2.2
DepthFirstSearch Sequence
The Euler tour representation of a tree is proposed by [37, 36]. It is a crucial building block
in many graph algorithms including biconnectivity algorithms. The DepthFirstSearch
(DFS) sequence [4] of a rooted tree is a variant of the Euler tour representation. Let us first
introduce some relevant concepts of the DFS sequence.
I Definition 5 (Subtree [4]). Consider a set of parent pointers par : V ? V on a vertex set
V . Let v be a vertex in V , and let V 0 = {u ? V  v is an ancestor of u}. par0 : V 0 ? V 0 is a
set of parent pointers on V 0. If ?u ? V 0 \ {v}, par0(u) = par(u) and par0(v) = v, then par0
is a subtree of v in par. For u ? V 0, we say u is in the subtree of v.
The definition of the DFS sequence is the following:
I Definition 6 (DFS sequence [4]). Consider a set of parent pointers par : V ? V on a
vertex set V . Let v be a vertex in V . If v is a leaf in par, then the DFS sequence of the
subtree of v is (v). Otherwise, the DFS sequence of the subtree of v is defined recursively as
(v, a1,1, a1,2, ? ? ? , a1,n1 , v, a2,1, a2,2, ? ? ? , a2,n2 , v, ? ? ? , ak,1, ak,2, ? ? ? , ak,nk , v),
where k =  child(v) and ?i ? [k], (ai,1, ai,2, ? ? ? , ai,ni ) is the DFS sequence of the subtree of
child(v, i), i.e., the ith child of v.
If par : V ? V has a unique root v, then we define the DFS sequence of par as the DFS
sequence of the subtree of v. By the definition of the DFS sequence, for any two consecutive
elements ai and ai+1 in the sequence, ai is either a parent of ai+1 or ai is a child of ai+1.
Furthermore, for any vertex v, if both elements ai and aj (i < j) in the DFS sequence A are
v, any element ak between ai and aj (i.e., i ? k ? j) should be a vertex in the subtree of v.
3
2Edge Connectivity and Biconnectivity
Consider a connected undirected graph G with a vertex set V and an edge set E. In the
2edge connectivity problem, the goal is to find all the bridges of G, where an edge e ? E is
called a bridge if its removal disconnects G. In the biconnectivity problem, the goal is to
partition the edges into several groups E1, E2, ? ? ? , Ek, i.e., E = Sik=1 Ei, ?i 6= j, Ei ? Ej = ?,
such that ?e 6= e0 ? E, e and e0 are in the same group if and only if there is a simple cycle
in G which contains both e and e0. A subgraph induced by an edge group Ei is called a
biconnected component (block). In other words, the goal of the biconnectivity problem is to
find all the blocks of G.
In this section, we describe the algorithms for both the 2edge connectivity problem and
the biconnectivity problem in the offline setting.
3.1
2Edge Connectivity
The 2edge connectivity problem is much simpler than the biconnectivity problem. We first
compute a spanning tree of the graph. Only a tree edge can be a bridge. Then for any
nonroot vertex v, if there is no nontree edge which crosses between the subtree of v and the
outside of the subtree of v, then the tree edge which connects v to its parent is a bridge.
I Lemma 7 (2Edge connectivity). Consider an undirected graph G = (V, E). Let B be the
output of Bridges(G). Then B is the set of all the bridges of G.
3.2
Biconnectivity
In this section, we will show a biconnectivity algorithm. It is a modification of the algorithm
proposed by [36]. The high level idea is to construct a new graph G0 based on the input
graph G, and reduce the biconnectivity problem of G to the connectivity problem of G0.
Since the running time of the connectivity algorithm [4] depends on the diameter of the
graph, we also give an analysis of the diameter of the graph G0.
Algorithm 1 2Edge Connectivity Algorithm.
Input:
A connected undirected graph G = (V, E).
Output:
A subset of edges B ? E.
Finding bridges (Bridges(G = (V, E)) ):
1. Compute a rooted spanning tree of G. The spanning tree is represented by a set of
parent pointers par : V ? V .
2. Compute lev : V ? Z?0: for each v ? V,
lev(v) ? min deppar(v),
min
w?V \{par(v)}:(v,w)?E
deppar(the LCA of (v, w)) .
3. Compute the DFS sequence A of par.
4. Initialize B ? ?. For each nonroot vertex v, let ai, aj be the first and the last
appearance of v in A respectively. If mink:i?k?j lev(ak) ? deppar(v), B ? B ?
{(v, par(v))}. Output B.
Algorithm 2 Biconnectivity Algorithm.
Input:
A connected undirected graph G = (V, E).
Output:
A coloring col : E ? V of the edges.
Finding blocks (Biconn(G = (V, E)) ):
1. Compute a rooted spanning tree of G. The spanning tree is represented by a set of
parent pointers par : V ? V .
2. Compute lev : V ? Z?0: for each v ? V,
lev(v) ? min deppar(v),
min
w?V \{par(v)}:(v,w)?E
deppar(the LCA of (v, w)) .
3. Compute the DFS sequence A of par.
4. Let r be the root of par. Initialize V 0 ? V \ {r}, E0 ? ?.
5. For each v ? V 0, let ai, aj be the first and the last appearance of v in A respectively.
If mink?{i,i+1,??? ,j} lev(ak) < deppar(par(v)), E0 ? E0 ? {(v, par(v))}.
6. For each (u, v) ? E, if neither u nor v is the LCA of (u, v) in par, E0 ? E0 ? {(u, v)}.
7. Compute the connected components of G0 = (V 0, E0). Let col0 : V 0 ? V 0 be the
coloring of the vertices in V 0 such that ?u0, v0 ? V 0, u0, v0 are in the same connected
component in G0 ? col0(u0) = col0(v0).
8. Initialize col : E ? V . For each e = (u, v) ? E, if deppar(u) ? deppar(v), set
col(e) ? col0(u); otherwise, set col(e) ? col0(v). Output col : E ? V .
I Lemma 8 (Biconnectivity). Consider an undirected graph G = (V, E). Let col : E ? V be
the output of Biconn(G). Then ?e, e0 ? E, e 6= e0, col satisfies col(e) = col(e0) ? there is a
simple cycle in G which contains both e and e0. Furthermore, the diameter of the graph G0
constructed by Biconn(G) is at most O(dep(par) ? bidiam(G)), the number of vertices of G0
is at most V , and the number of edges of G0 is at most E.
Algorithm 3 Leaf Sampling Algorithm for DFS Sequence.
Predetermined:
A threshold value s. //s will be the local memory size in the MPC model.
Input:
A rooted tree represented by a set of parent pointers par : V ? V on a set V of n
vertices (i.e., par has a unique root r).
Output:
The DFS sequence of the rooted tree represented by par.
Leaf sampling algorithm (LeafSampling(s, par : V ? V ) ):
1. If n ? s, return the DFS sequence of par directly.
2. Set t ? ?(s1/3 log n), L ? leaves(par).
3. Each v ? L is independently chosen with probability p = min(1, t/L), and let
S = {l1, l2, ? ? ? , lk} be the set of samples. If S2 > s, output FAIL.
4. For every pair of sampled leaves x, y ? S with x 6= y, find the least common ancestor
px,y of (x, y), and set pxy,x, pxy,y to be two children of px,y such that pxy,x is an
ancestor of x and pxy,y is an ancestor of y.
5. Sort l1, l2, ? ? ? , lk ? S such that ?i < j ? [k], rank(plilj,li ) < rank(plilj,lj ).
6. Find the paths A01 = P (r, l1), A02 = P (par(l1), pl1,l2 ), A03 = P (pl1l2,l2 , l2), ? ? ? , A02k?2 =
P (par(lk?1), plk?1,lk ), A02k?1 = P (plk?1lk,lk , lk), A02k = P (l2k, r), i.e., the paths: r ?
l1 ? the LCA of (l1, l2) ? l2 ? ? ? ? ? lk?1 ? the LCA of (lk?1, lk) ? lk ? r.
7. Set A0 ? A01A02 ? ? ? A02k, i.e., A0 is the concatenation of A01, A02, ? ? ? , A02k.
8. For each element a0 in the ith (i > 1) position of the sequence A0,
i
if the vertex a0i is a leaf, keep a0i as a single copy;
Otherwise,
? if a0i?1 = par(a0i), i.e., i is the first position that the vertex a0i appears in A0, split
a0i into rank(a0i+1) copies; //a0i+1 is a child of a0i.
? if a0i?1, a0i+1 ? child(a0i), split a0i into rank(a0i+1) ? rank(a0i?1) copies;
? if a0i+1 = par(a0i), i.e., i is the last position that the vertex a0i appears in A0, split
a0i into  child(a0i) ? rank(a0i?1) copies. //a0i?1 is a child of a0i.
Let A00 be the result sequence.
9. For each v ? V , if par(v) appears in A00 but v does not appear in A00, recursively find
the DFS sequence of the subtree of v, and insert the such sequence into the position
after the rank(v)th appearance of par(v) in A00. Output the final result sequence A.
4
An Offline DFS Sequence Algorithm in Linear Space
In Section 4.1, we will review an algorithmic framework proposed by [4] for the DFS sequence.
In Section 4.2, 4.3, 4.4, we will discuss the subroutines needed for our DFS sequence algorithm
in the offline setting.
4.1
DFS Sequence via Leaf Sampling
In the following, we review the leaf sampling algorithmic framework proposed by [4] for
finding the DFS sequence of a rooted tree.
I Theorem 9 (Leaf sampling algorithm [4]). Consider a set of parent pointers par : V ? V
on a set V of n vertices. Suppose par has a unique root. For any ? ? 0 and any constant
? ? (0, 1), if both of step 4 and step 6 in LeafSampling(n?, par) can be implemented in
the (?, ?)MPC model with O(log(dep(par))) parallel time, then the leaf sampling algorithm
with parameter s = n? on input par : V ? V can be implemented in the (?, ?)MPC model.
Furthermore, with probability at least 0.99, LeafSampling(n?, par) can output the DFS
sequence of par in O(log(dep(par))) parallel time. If the algorithm fails, then it returns FAIL.
By Theorem 9, we only need to give a linear total space MPC algorithm for the LCA
problem and the path generation problem to design an efficient DFS sequence algorithm in
the (0, ?)MPC model.
In [4], they proposed to use doubling algorithms to compute the LCA and generate the
paths. Since they need to store the every 2ith ancestor for each vertex, the total space
needed is ?(n ? log(the depth of the tree)). We show that we only need to apply the doubling
algorithm for a compressed tree, instead of applying it for the original tree.
Algorithm 4 Construction of a Compressed Rooted Tree.
Input:
A rooted tree represented by a set of parent pointers par : V ? V on a set V of n
vertices (par has a unique root r).
Output:
A vertex set V 0 ? V , a set of parent pointers par0 : V 0 ? V 0 on V 0.
Tree compression (Compress(par : V ? V ) ):
1. Compute the depth of par, the depth of each vertex and set d ? dep(par), t ? dlog de.
2. V 0 ? {v ? V  deppar(v) mod t = 0, deppar(v) + t ? d}.
3. Initialize par0 : V 0 ? V 0. For each v ? V 0, par0(v) ? par(t)(v).
4. Output V 0, par0.
4.2
Compressed Rooted Tree
Given a set of parent pointers par : V ? V , we will show how to compress the rooted tree
represented by par.
I Lemma 10 (Properties of a compressed rooted tree). Let par : V ? V be a set of parent
pointers on a vertex set V with V  > 1, and par has a unique root. Let t = dlog(dep(par))e
and let (V 0, par0) =Compress(par). Then it has the following properties:
1. V 0 ? V / log(dep(par)).
2. ?v ? V 0, i ? N, par0(i)(v) = par(i?t)(v) ? V 0.
3. ?v ? V, ?i ? {0, 1, ? ? ? , 2t}, such that par(i)(v) ? V 0.
4.3
Least Common Ancestor
Given a rooted tree represented by a set of parent pointers par : V ? V on a vertex set
V , and a set of q queries Q = {(u1, v1), (u2, v2), ? ? ? , (uq, vq)} where ?i ? [q], ui 6= vi, ui, vi ?
leaves(par), we show a space efficient algorithm which can output the LCA of each queried
Algorithm 5 Lowest Common Ancestor.
Input:
A rooted tree represented by a set of parent pointers par : V ? V on a set V of n vertices
(par has a unique root r), and a set of q queries Q = {(u1, v1), (u2, v2), ? ? ? , (uq, vq)}
where ?i ? [q], ui 6= vi, ui, vi ? leaves(par).
Output:
lca : Q ? V ? V ? V .
Finding LCA (LCA(par : V ? V, Q) ):
1. (V 0, par0) ?Compress(par). //(see Lemma 10).
2. Set d ? dep(par), t ? dlog de and compute mappings g0, g1, ? ? ? gt : V 0 ? V 0 such that
?v ? V 0, j ? {0, 1, ? ? ? , t}, gj (v) = par0(2j)(v).
3. For each query (ui, vi) ? Q: //Suppose deppar(ui) ? deppar(vi).
a. If deppar(ui) > deppar(vi)+2t, find an ancestor ubi of ui in par such that deppar(ubi) ?
deppar(vi) + 2t and deppar(ubi) ? deppar(vi). Otherwise, ui ? ui.
b
b. If ?j ? [4t] par(j)(ui) is the LCA of (ui, vi) in par, set lca(ui, vi) = (par(j)(ui), x, y)
b b b
where x, y are children of par(j)(ui) and x, y are ancestors of ui, vi respectively. The
b b
query of (ui, vi) is finished.
c. Find an ancestor u0i of ubi in par such that u0i is the closest vertex to ubi in V 0, i.e.,
deppar(ubi) ? deppar(u0i) is minimized. Similarly, find an ancestor vi0 of vi in par such
that v0 is the closest vertex to vi in V 0, i.e., deppar(vi) ? deppar(vi0) is minimized.
i
d. Find u0i0 6= v00 ? V 0 such that they are ancestors of u0i and vi0 respectively, and
i
par0(u0i0) = par0(vi00) is the LCA of (u0i, vi0) in par0.
e. Find the smallest j ? [2t] such that par(j)(u0i0) = par(j)(vi00). Set lca(ui, vi) =
(par(j)(u0i0), par(j?1)(u0i0), par(j?1)(vi00)).
pair of vertices. Notice that the assumption that queries only contain leaves is without loss
of generality: we can attach an additional child vertex v to each nonleaf vertex u. Thus, v
is a leaf vertex. When a query contains u, we can use v to replace u in the query, and the
result will not change.
Before we analyze the algorithm LCA(par, Q), let us discuss some details of the algorithm.
1. We precompute deppar(v) and deppar0 (u) for every v ? V and u ? V 0.
2. To implement step 3a, we firstly check whether deppar(ui) > deppar(vi) + 2t. If it is
not true, we can set ui to be ui directly. Otherwise, according to Lemma 10, there
b
is a j ? {0, 1, ? ? ? , 2t} such that par(j)(ui) ? V 0. Since deppar(ui) > deppar(vi) + 2t,
deppar(par(j)(ui)) > deppar(vi). We initialize ui to be par(j)(ui) ? V 0. For k = t ? 0, if
b
deppar(gk(ubi)) > deppar(vi) (i.e., deppar(par0(2k)(ui)) > deppar(vi)), we set ubi ? gk(ubi) =
b
par0(2k)(ui). Due to Lemma 10 again, the final ui must satisfy deppar(ubi) ? deppar(vi)
b b
and deppar(ubi) ? deppar(vi) + 2t. This step takes time O(t).
I Lemma 11 (LCA algorithm). Let par : V ? V be a set of parent pointers on a vertex set
V . par has a unique root. Let Q = {(u1, v1), (u2, v2), ? ? ? , (uq, vq)} be a set of q pairs of
vertices where ?i ? [q], ui 6= vi, ui, vi ? leaves(par). Let lca : Q ? V ? V ? V be the output
of LCA(par, Q). For (ui, vi) ? Q, (pi, pi,ui , pi,vi ) = lca(ui, vi) satisfies that pi is the LCA
of (ui, vi), pi,ui , pi,vi are ancestors of ui, vi respectively, and pi,ui , pi,vi are children of pi.
Furthermore, the space used by the algorithm is at most O(Q + V ).
4.4
MultiPaths Generation
Consider a rooted tree represented by a set of parent pointers par : V ? V on a vertex set
V and a set of q vertexancestor pairs Q = {(u1, v1), (u2, v2), ? ? ? , (uq, vq)} where ?i ? [q], vi
is an ancestor of ui. We show a space efficient algorithm MultiPaths(par, Q) which can
generate all the paths P (u1, v1), P (u2, v2), ? ? ? , P (uq, vq).
Algorithm 6 MultiPaths Generation.
Input:
A rooted tree represented by a set of parent pointers par : V ? V on a set V
of n vertices (par has a unique root r), and a set of q vertexancestor pairs Q =
{(u1, v1), (u2, v2), ? ? ? , (uq, vq)} where ?i ? [q], vi is an ancestor of ui.
Output:
P1, P2, ? ? ? , Pq.
Generating multiple path sequences (MultiPaths(par : V ? V, Q) ):
1. (V 0, par0) ?Compress(par). //(see Lemma 10).
2. Set d ? dep(par), t ? dlog de and compute mappings g0, g1, ? ? ? gt : V 0 ? V 0 such that
?v ? V 0, j ? {0, 1, ? ? ? , t}, gj (v) = par0(2j)(v).
3. For each vertexancestor pair (ui, vi) ? Q:
a. If deppar(ui) ? deppar(vi) ? 2t, generate the path sequence
Pi = (ui, par(1)(ui), par(2)(ui), ? ? ? , vi) directly.
b. Otherwise, find the minimum j ? [2t] such that par(j)(ui) ? V 0. Set u0i ? par(j)(ui).
Find an ancestor vi0 of u0i in par0 such that deppar(vi0) ? deppar(vi) and deppar(vi0) ?
2t ? deppar(vi).
c. Generate the path P 0(u0i, vi0) in par0.
d. Initialize a sequence A as the concatenation of (ui), P 0(u0i, vi0) and (vi).
e. Repeat: for each element ai in A, if ai is not the last element and ai+1 6= par(ai),
insert par(ai) between ai and ai+1; until A does not change. Output the final
sequence A as the path sequence Pi.
Before we analyze the correctness of the algorithm, let us discuss some details.
1. In step 3a, if the length of the path is at most 2t, then we can generate the path in O(t)
rounds. In the jth round, we can find the vertex par(j)(ui) = par(par(j?1)(ui)).
2. In step 3b, we want to find v0. We initialize vi0 as u0i. For k = t ? 0, if deppar(gk(vi0)) >
i
deppar(vi) (i.e., deppar(par0(2k)(vi0)) > deppar(vi)), we set vi0 ? gk(vi0) = par0(2k)(vi0).
I Lemma 12 (Generation of multiple paths). Let par : V ? V be a set of parent pointers on
a vertex set V . par has a unique root. Let Q = {(u1, v1), (u2, v2), ? ? ? , (uq, vq)} ? V ? V be
a set of pairs of vertices where ?j ? [q], vj is an ancestor of uj in par. Let P1, P2, ? ? ? , Pq
be the output of MultiPaths(par, Q). Then ?j ? [q], Pj = P (uj , vj ), i.e., Pj is a sequence
which denotes a path from uj to vj in par. Furthermore, the space used by the algorithm is
at most O(V  + Pj?[q] Pj ).
5
Hardness of Biconnectivity in MPC
There is a conjectured hardness which is widely used in the MPC literature [25, 11, 28, 34, 40].
B Conjecture 1 (1cycle vs. 2cycles). For any ? ? 0 and any constant ? ? (0, 1), distinguishing
the following two instances in the (?, ?)MPC model requires ?(log n) parallel time:
1. a single cycle contains n vertices,
2. two disjoint cycles, each contains n/2 vertices.
Under the above conjecture, we show that ?(log bidiam(G)) parallel time is necessary to
compute the biconnected components of G. This claim is true even for the constant diameter
graph G, i.e., diam(G) = O(1).
I Theorem 13 (Hardness of biconnectivity in MPC). For any ? ? 0 and any constant
? ? (0, 1), unless the one cycle vs. two cycles conjecture (Conjecture 1) is false, any (?,
?)MPC algorithm requires ?(log bidiam(G)) parallel time for testing whether a graph G with
a constant diameter is biconnected.
Proof. For ? ? 0 and an arbitrary constant ? ? (0, 1), suppose there is a (?, ?)MPC
algorithm A which can determine whether an arbitrary constant diameter graph G is
biconnected in o(log bidiam(G)) parallel time. Then we give a (?, ?)MPC algorithm for
solving one cycle vs. two cycles problem as the following:
1. For a one cycle vs. two cycles instance nvertex graph G0 = (V 0, E0), construct a new
graph G = (V, E): V = V 0 ? {v?}, E = E0 ? {(v, v?)  v ? V 0}.
2. Run A on G. If G is not biconnected, G0 has two cycles. Otherwise G0 is a single cycle.
It is easy to see that the diameter of G is 2. If G0 is a single cycle, then G is biconnected and
bidiam(G) = ?(n). If G0 contains two cycles, then G contains two biconnected components
and bidiam(G) = ?(n).
The first step of the above algorithm takes O(1) parallel time and only requires linear
total space. The graph G has n + 1 vertices and 2n edges. Thus, the above algorithm is also
a (?, ?)MPC algorithm. The parallel time of the above algorithm is the same as the time
needed for running A on G which is o(log bidiam(G)) = o(log n). Thus the existence of the
algorithm A implies that the one cycle vs. two cycles conjecture (Conjecture 1) is false. J
1
2
3
4
5
6
7
8
Kook Jin Ahn and Sudipto Guha . Access to data and number of iterations: Dual primal algorithms for maximum matching under resource constraints . ACM Transactions on Parallel Computing (TOPC) , 4 ( 4 ): 17 , 2018 .
Noga Alon , L?szl? Babai, and Alon Itai . A fast and simple randomized parallel algorithm for the maximal independent set problem . Journal of algorithms , 7 ( 4 ): 567  583 , 1986 .
Alexandr Andoni , Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. Parallel algorithms for geometric graph problems . In Proceedings of the fortysixth annual ACM symposium on Theory of computing , pages 574  583 . ACM, 2014 .
Alexandr Andoni , Zhao Song , Clifford Stein , Zhengyu Wang , and Peilin Zhong . Parallel Graph Connectivity in Log Diameter Rounds , 2018 . In FOCS 2018. arXiv: 1805 .03055.
Coresets meet EDCS: algorithms for matching and vertex cover on massive graphs . In Proceedings of the Thirtieth Annual ACMSIAM Symposium on Discrete Algorithms , pages 1616  1635 . SIAM, 2019 .
Sepehr Assadi and Sanjeev Khanna . Randomized composable coresets for matching and vertex cover . In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures , pages 3  12 . ACM, 2017 .
Sepehr Assadi , Xiaorui Sun, and Omri Weinstein . Massively Parallel Algorithms for Finding WellConnected Components in Sparse Graphs . arXiv preprint , 2018 . arXiv: 1805 .02974.
Giorgio Ausiello , Donatella Firmani, Luigi Laura, and Emanuele Paracone . Largescale graph biconnectivity in MapReduce . Department of Computer and System Sciences Antonio Ruberti Technical Reports , 4 ( 4 ), 2012 .
Boaz Barak , Jonathan A Kelner, and David Steurer. Dictionary learning and tensor decomposition via the sumofsquares method . In Proceedings of the FortySeventh Annual ACM on Symposium on Theory of Computing (STOC) , pages 143  151 . ACM, 2015 . arXiv: 1407 . 1543 .
Journal of the ACM (JACM) , 36 ( 3 ): 643  670 , 1989 .
Paul Beame , Paraschos Koutris, and Dan Suciu . Communication steps for parallel query processing . In Proceedings of the 32nd ACM SIGMODSIGACTSIGAI symposium on Principles of database systems , pages 273  284 . ACM, 2013 .
Soheil Behnezhad , Mahsa Derakhshan, and MohammadTaghi Hajiaghayi. Brief announcement: Semimapreduce meets congested clique . arXiv preprint , 2018 . arXiv: 1802 .10297.
Massively parallel symmetry breaking on sparse graphs: MIS and maximal matching . arXiv preprint , 2018 . arXiv: 1807 .06701.
Sebastian Brandt , Manuela Fischer , and Jara Uitto . Matching and MIS for Uniformly Sparse Graphs in the LowMemory MPC Model . arXiv preprint , 2018 . arXiv: 1807 .05374.
Artur Czumaj , Jakub ??cki, Aleksander M?dry, Slobodan Mitrovi?, Krzysztof Onak, and Piotr Sankowski . Round compression for parallel matching algorithms . In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing , pages 471  484 . ACM, 2018 .
Jeffrey Dean and Sanjay Ghemawat . MapReduce: Simplified Data Processing on Large Clusters . To appear in OSDI, page 1 , 2004 .
Communications of the ACM , 51 ( 1 ): 107  113 , 2008 .
Reinhard Diestel . Graph theory . Springer Publishing Company, Incorporated, 2018 .
Alina Ene , Sungjin Im, and Benjamin Moseley . Fast clustering using MapReduce . In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining , pages 681  689 . ACM, 2011 .
Manuela Fischer , Mohsen Ghaffari , and Jara Uitto . Simple Graph Coloring Algorithms for Congested Clique and Massively Parallel Computation . arXiv preprint , 2018 . arXiv: 1808 .08419.
Michael T Goodrich , Nodari Sitchinava, and Qin Zhang. Sorting, Searching, and Simulation in the MapReduce Framework . In ISAAC, volume 7074 , pages 374  383 . Springer, 2011 .
Sungjin Im , Benjamin Moseley, and Xiaorui Sun . Efficient massively parallel methods for dynamic programming . In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing , pages 798  811 . ACM, 2017 .
Michael Isard , Mihai Budiu, Yuan Yu, Andrew Birrell , and Dennis Fetterly . Dryad: distributed dataparallel programs from sequential building blocks . In ACM SIGOPS operating systems review , volume 41 ( 3 ), pages 59  72 . ACM, 2007 .
Tomasz Jurdzi?ski and Krzysztof Nowicki . MST in O(1) rounds of congested clique . In Proceedings of the TwentyNinth Annual ACMSIAM Symposium on Discrete Algorithms , pages 2620  2632 . SIAM, 2018 .
Howard Karloff , Siddharth Suri, and Sergei Vassilvitskii . A model of computation for MapReduce . In Proceedings of the twentyfirst annual ACMSIAM symposium on Discrete Algorithms , pages 938  948 . Society for Industrial and Applied Mathematics, 2010 .
Richard M Karp , Eli Upfal , and Avi Wigderson . Constructing a perfect matching is in random NC . Combinatorica, 6 ( 1 ): 35  48 , 1986 .
Valerie King , Chung Keung Poon, Vijaya Ramachandran, and Santanu Sinha . An optimal EREW PRAM algorithm for minimum spanning tree verification . Information Processing Letters , 62 ( 3 ): 153  159 , 1997 .
Connected components in mapreduce and beyond . In Proceedings of the ACM Symposium on Cloud Computing , pages 1  13 . ACM, 2014 .
Silvio Lattanzi , Benjamin Moseley, Siddharth Suri, and Sergei Vassilvitskii . Filtering: a method for solving graph problems in mapreduce . In Proceedings of the twentythird annual ACM symposium on Parallelism in algorithms and architectures , pages 85  94 . ACM, 2011 .
Sixue Liu and Robert E Tarjan . Simple Concurrent Labeling Algorithms for Connected Components . arXiv preprint , 2018 . arXiv: 1812 .06177.
Michael Luby . A simple parallel algorithm for the maximal independent set problem . SIAM journal on computing , 15 ( 4 ): 1036  1053 , 1986 .
arXiv preprint , 2018 . arXiv: 1807 .08745.
Technical report, HARVARD UNIV CAMBRIDGE MA AIKEN COMPUTATION LAB , 1985 .
Tim Roughgarden , Sergei Vassilvitskii, and Joshua R Wang. Shuffles and circuits:(on lower bounds for modern parallel computation) . In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures , pages 1  12 . ACM, 2016 .
Yossi Shiloach and Uzi Vishkin . An O (log n) parallel connectivity algorithm . Technical report , Computer Science Department, Technion, 1980 .
Robert E Tarjan and Uzi Vishkin . An efficient parallel biconnectivity algorithm . SIAM Journal on Computing , 14 ( 4 ): 862  874 , 1985 .
Robert Endre Tarjan and Uzi Vishkin . Finding biconnected componemts and computing tree functions in logarithmic parallel time . In 25th Annual Symposium onFoundations of Computer Science , 1984 ., pages 12  20 . IEEE, 1984 .
Leslie G Valiant . A bridging model for parallel computation . Communications of the ACM , 33 ( 8 ): 103  111 , 1990 .
Virginia Vassilevska Williams . Multiplying matrices faster than CoppersmithWinograd . In Proceedings of the fortyfourth annual ACM symposium on Theory of computing (STOC) , pages 887  898 . ACM, 2012 .
Grigory Yaroslavtsev and Adithya Vadapalli . Massively Parallel Algorithms and Hardness for SingleLinkage Clustering under Lp Distances . In International Conference on Machine Learning , pages 5596  5605 , 2018 .
Spark: Cluster computing with working sets . HotCloud , 10 ( 10 10): 95 , 2010 .