#### Reconstructing Phylogenetic Level-1 Networks from Nondense Binet and Trinet Sets

Algorithmica
Reconstructing Phylogenetic Level-1 Networks from Nondense Binet and Trinet Sets
Katharina T. Huber 0 1 2 3
Leo van Iersel 0 1 2 3
Vincent Moulton 0 1 2 3
Celine Scornavacca 0 1 2 3
Taoyang Wu 0 1 2 3
Taoyang Wu 0 1 2 3
0 ISEM, CNRS - Université Montpellier , Montpellier , France
1 Delft Institute of Applied Mathematics, Delft University of Technology , Delft , The Netherlands
2 School of Computing Sciences, University of East Anglia , Norwich , United Kingdom
3 Institut de Biologie Computationnelle , Montpellier , France
Binets and trinets are phylogenetic networks with two and three leaves, respectively. Here we consider the problem of deciding if there exists a binary level-1 phylogenetic network displaying a given set T of binary binets or trinets over a taxon set X , and constructing such a network whenever it exists. We show that this is NP-hard for trinets but polynomial-time solvable for binets. Moreover, we show that the problem is still polynomial-time solvable for inputs consisting of binets and trinets as long as the cycles in the trinets have size three. Finally, we present an O (3|X | pol y(| X |)) time algorithm for general sets of binets and trinets. The latter two algorithms generalise to instances containing level-1 networks with arbitrarily many leaves, and thus provide some of the first supernetwork algorithms for computing networks from a set of rooted phylogenetic networks.
1 Introduction
A key problem in biology is to reconstruct the evolutionary history of a set of taxa using
data such as DNA sequences or morphological features. These histories are commonly
represented by phylogenetic trees, and can be used, for example, to inform genomics
studies, analyse virus epidemics and understand the origins of humans [23]. Even so,
in case evolutionary processes such as recombination and hybridization are involved,
it can be more appropriate to represent histories using phylogenetic networks instead
of trees [2].
Generally speaking, a phylogenetic network is any type of graph with a subset
of its vertices labelled by the set of taxa that in some way represents evolutionary
relationships between the taxa [13]. Here, however, we focus on a special type of
phylogenetic network called a level-1 network. We present the formal definition for
this type of network in the next section but, essentially, it is a binary, directed acyclic
graph with a single root, whose leaves correspond to the taxa, and in which any
two cycles are disjoint (for example, see Fig. 2 below). This type of network was
first considered in [20] and is closely related to so-called galled-trees [3,7].
Level1 networks have been used to, for example, analyse virus evolution [10], and are
of practical importance since their simple structure allows for efficient construction
[7,10,15] and comparison [17].
One of the main approaches that have been used to construct level-1 networks
is from triplet sets, that is, sets of rooted binary trees with three leaves (see e.g.
[10,18,19,22]). Even so, it has been observed that the set of triplets displayed by a
level-1 network does not necessarily provide all of the information required to uniquely
define or encode the network [5]. Motivated by this observation, in [11] an algorithm
was developed for constructing level-1 networks from a network analogue of triplets:
rooted binary networks with three leaves, or trinets. This algorithm relies on the fact
that the trinets displayed by a level-1 network do indeed encode the network [11].
Even so, the algorithm was developed for dense trinet sets only, i.e. sets in which there
is a trinet associated to each combination of three taxa.
In this paper, we consider the problem of constructing level-1 networks from
arbitrary sets of level-1 trinets and binets, where a binet is an even simpler building block
than a trinet, consisting of a rooted binary network with just two leaves. We consider
binets as well as trinets since they can provide important information to help piece
together sets of trinets. Our approach can be regarded as a generalisation of the
wellknown supertree algorithm called Build [1,23] for checking whether or not a set of
triplets is displayed by a phylogenetic tree and constructing such a tree if this is the case.
In particular, the algorithm we present in Sect. 4 is one of the first supernetwork
algorithms for constructing a phylogenetic network from a set of networks. Note that some
algorithms have already been developed for computing unrooted supernetworks—see,
for example [8,12].
We expect that our algorithm could be useful in practice as there are programs which
can be used to compute trinets from biological data [14,25] (and also binets as subnets
of the computed trinets). Some of these programs use optimisation criteria such as
likelihood which can be very computationally expensive for large datasets, but which
are much more practical for small datasets. Note that a similar strategy was used in
the quartet puzzling [24] approach for computing phylogenetic trees from four-leaved
trees or quartets based on likelihood, before likelihood became more practical for larger
data sets. It should be noted, however, that most of the current programs for computing
phylogenetic networks are based on the trees embedded within a network, and so they
might not be able to distinguish between different types of trinets [21]. Hopefully the
development of new models will make it possible to deal with potential difficulties in
this respect. Also, it could be of interest to build networks from networks on slightly
larger subsets (such as size-four and size-five subets) and try to merge these instead
of or as well as trinets, as such subsets may be more informative than size-three ones.
We now summarize the contents of the paper. After introducing some basic notation
in the next section, in Sect. 3 we begin by presenting a polynomial-time algorithm for
deciding whether or not there exists some level-1 network that displays a given set of
level-1 binets, and for constructing such a network if it actually exists (see Theorem 1).
Then, in Sect. 4, we present an exponential-time algorithm for an arbitrary set
consisting of binets and trinets (see Theorem 2). This algorithm uses a top-down approach
that is somewhat similar in nature to the Build algorithm [1,23] but it is
considerably more intricate. The algorithm can be generalised to instances containing level-1
networks with arbitrarily many leaves since trinets encode level-1 networks [11].
In Sect. 5 we show that for the special instance where each cycle in the input
trinets has size three our exponential-time algorithm is actually guaranteed to work in
polynomial time. This is still the case when the input consists of binary level-1 networks
with arbitrarily many leaves as long as all their cycles have length three. However, in
Sect. 6 we prove that in general it is NP-hard to decide whether or not there exists a
binary level-1 network that displays an arbitrary set of trinets (see Theorem 4). We also
show that this problem remains NP-hard if we insist that the network contains only one
cycle. Our proof is similar to the proof that it is NP-hard to decide the same question
for an arbitrary set of triplets given in [18], but the reduction is more complicated. In
Sect. 7, we conclude with a discussion of some directions for future work.
2 Preliminaries
Let X be some finite set of labels. We will refer to the elements of X as taxa. A rooted
phylogenetic network N on X is a simple directed acyclic graph which has a single
indegree-0 vertex (the root, denoted by ρ(N )), no indegree-1 outdegree-1 vertices
and its outdegree-0 vertices (the leaves) bijectively labelled by the elements of X . We
will refer to rooted phylogenetic networks as networks for short. In addition, we will
identify each leaf with its label and denote the set of leaves of a network N by L(N ).
For a set N of networks, L(N ) is defined to be ∪N ∈N L(N ). A network is called binary
if all vertices have indegree and outdegree at most two and all vertices with indegree
two (the reticulations) have outdegree one. Refining a vertex with outdegree d > 2
N1(x, y; z) N2(x, y; z)
Fig. 1 The two binary level-1 binets and the eight binary level-1 trinets
means replacing the vertex by a path of d − 1 vertices of outdegree 2. A cycle of a
network is the union of two non-identical, internally-vertex-disjoint, directed s-t paths,
for any two distinct vertices s, t . The size of the cycle is the number of vertices that are
on at least one of these paths. A cycle is tiny if it has size three and large otherwise.
A network is said to be a tiny cycle network if all its cycles are tiny. A binary network
is said to be a binary level-1 network if all its cycles are disjoint. We only consider
binary level-1 networks in this paper, see Fig. 2 for an example containing one tiny
and two large cycles. Note that, when |X | = 1, there exists a unique binary level-1
network on X consisting of a single vertex labelled by the only element of X .
If N is a network on X and X ⊆ X nonempty, then a vertex v of N is a stable
ancestor of X (in N ) if every directed path from the root of N to a leaf in X contains v.
The lowest stable ancestor of X is the unique vertex L S A(X ) that is a stable ancestor
of X and such that there is no directed path from L S A(X ) to any of the other stable
ancestors of X .
A binet is a network with exactly two leaves and a trinet is a network with exactly
three leaves. In this paper, we only consider binary level-1 binets and trinets. There
exist two binary level-1 binets and eight binary level-1 trinets (up to relabelling) [11],
all presented in Fig. 1. In the following, we will use the names of the trinets and
binets indicated in that figure. For example, T1(x , y; z) denotes the only rooted tree
on {x , y, z} where {x , y} is a cluster, where a cluster is the entire set of leaf descendants
of a node. Trinet T1(x , y; z) is also called a triplet.
A set B of binets on a set X of taxa is called dense if for each pair of taxa from X
there is at least one binet in B on those taxa. A set T of (binets and) trinets on X is
dense if for each combination of three taxa from X there is at least one trinet in T on
those taxa.
Given a phylogenetic network N on X and a subset X ⊆ X , we define the
network N |X as the network obtained from N by deleting all vertices and arcs that are not
on a directed path from the lowest stable ancestor of X to a leaf in X and repeatedly
suppressing indegree-1 outdegree-1 vertices and replacing parallel arcs by single arcs
until neither operation is applicable.
Two networks N , N on X are said to be equivalent if there exists an isomorphism
between N and N that maps each leaf of N to the leaf of N with the same label.
Given two networks N , N with L(N ) ⊆ L(N ), we say that N displays N
if N |L(N ) is equivalent to N . Note that this definition in particular applies to the
cases that N is a binet or trinet. In addition, we say that N displays a set N of networks
if N displays each network in N .
Given a network N , we use the notation T(N ) to denote the set of all trinets and
binets displayed by N . For a set N of networks, T(N ) denotes N ∈N T(N ). Given
a set T of trinets and/or binets on X and a nonempty subset X ⊆ X , we define the
restriction of T to X as
T| X := {T |(L(T ) ∩ X )
T ∈ T and |L(T ) ∩ X | ∈ {2, 3}}.
The following observation will be useful.
Observation 1 Let T be a set of trinets and binets on X and suppose that there exists a
binary level-1 network N on X such that T ⊆ T(N ). Then, for any nonempty subset X
of X , N | X is a binary level-1 network displaying T| X .
Proof Let X be a nonempty subset of X and consider a trinet or binet T ∈ T| X .
Then there exists a binet or trinet T ∈ T such that T = T |(L(T ) ∩ X ). Since T ∈
T ⊆ T(N ), T is displayed by N . When restricting N to N | X , first the vertices and
arcs that are not on a directed path from the lowest stable ancestor of X to a leaf in X
are deleted. Hence, all vertices and arcs on directed paths from L S A( X ) to leaves
in L(T ) ∩ X are kept. Thus, T = T |(L(T ) ∩ X ) is still displayed. Suppressing
indegree-1 outdegree-1 vertices and replacing parallel arcs by single arcs does not
change this. Hence, T is displayed by N .
We call a network cycle-rooted if its root is contained in a cycle. A cycle-rooted
network is called tiny-cycle rooted if its root is in a tiny cycle and large-cycle rooted
otherwise. If N is a cycle-rooted binary level-1 network whose root ρ (N ) is in cycle C ,
then there exists a unique reticulation r that is contained in C . We say that a leaf x is
high in N if there exists a path from ρ (N ) to x that does not pass through r , otherwise
we say that x is low in N . If N is not cycle-rooted, then we define all leaves to be
high in N . We say that two leaves are at the same elevation in N if they are either
both high or both low in N . Two leaves x , y that are both high in N are said to be on
the same side in N if x and y are both reachable from the same child of ρ (N ) by two
directed paths. A bipartition {L , R} of the set H of high leaves in N is the bipartition
of H induced by N if all leaves in L are on the same side SL in N and all the leaves
in R are on the same side SR in N , with SL = SR . (Note that one between SL and
SR could be empty). Finally, if N is cycle-rooted, we say that a subnetwork N of N
is a pendant subnetwork if there exists in N some arc (u, ρ ) that is a cut-arc, i.e., an
arc whose removal disconnects the graph, with ρ the root of N and u a vertex of the
cycle containing the root of N . If, in addition, u is not a reticulation, then N is said
to be a pendant sidenetwork of N . If, in addition, L(N ) ⊆ S with S a part of the
bipartition of the high leaves of N induced by N , then we say that N is a pendant
sidenetwork on side S. See Fig. 2 for an illustration of these definitions.
We end this section by giving a short description of the Build algorithm [1, 23],
which decides if there exists a rooted tree (i.e. a network without reticulations)
displaying a given set of triplets L. The Build algorithm constructs a graph RL (L) with a
vertex for each taxon and an edge {x , y} precisely if there exists a triplet T1(x , y; z) ∈ L
for some z. If RL (L) is connected, the algorithm halts and reports that there exists no
rooted tree displaying L. Otherwise, let X1, . . . , Xk be the vertex sets of the connected
components of RL (L). The algorithm recursively tries to construct trees displaying
L| X1, . . . L| Xk . If such trees exist, Build outputs a rooted tree consisting of a new
root with arcs to the roots of the recursively computed trees. Otherwise, the algorithm
reports that there exists no solution.
Our algorithm for trinets described in Sect. 4 can be seen as a generalization of
the Build algorithm. It uses a graph R which generalizes the RL graph, but also has
three additional steps which use different graphs. Our algorithm for binets described
in the next section also uses a similar recursive approach. Finally, we note that our
algorithms always construct binary networks, but this is just for convenience. These
algorithms could be adapted to construct nonbinary networks, just as Build constructs
nonbinary trees.
3 Constructing a Network from a Set of Binets
In this section we describe a polynomial-time algorithm for deciding if there exists
some binary level-1 network displaying a given set B of binets, and constructing such
a network if it exists. We treat this case separately because it is much simpler than the
trinet algorithms and gives an introduction to the techniques we use.
The first step of the algorithm is to construct the graph Rb(B)1, which has a vertex
for each taxon and an edge {x , y} if (at least) one of N (x ; y) and N (y; x ) is contained
in B.
1 The superscript b indicates that this definition is only used for binets. In Sect. 4, we will introduce a graph
R(T) which will be used for general sets of binets and trinets and is a generalisation of Rb(B) in the sense
that Rb(B) = R(B) if B contains only binets.
Fig. 3 Example of a step of the algorithm for constructing a network N from the set B of binets. The
graph Rb(B) has connected components X1 = {a, b, c, d, e} and X2 = { f, g}. Hence, network N is
obtained by combining recursively computed networks N (X1) and N (X2) by hanging them below a new
root. See Fig. 4 for the first recursive step
If the graph Rb(B) is disconnected and has connected components X1, . . . , X p,
then the algorithm constructs a network N by recursively computing networks
N (X1), . . . , N (X p) displaying B|X1, . . . , B|X p respectively, creating a new root
node ρ and adding arcs from ρ to the roots of N (X1), . . . , N (X p), and
refining arbitrarily the root ρ in order to make the network binary. See Fig. 3 for an
example.
If the graph Rb(B) is connected, then the algorithm constructs the graph Kb(B),
which has a vertex for each taxon and an edge {x , y} precisely if T (x , y) ∈ B. In
addition, the algorithm constructs the directed graph Ωb(B), which has a vertex for
each connected component of Kb(B) and an arc (π1, π2) precisely if there exists a
binet N (y; x ) ∈ B with x ∈ V (π1) and y ∈ V (π2) (with V (π ) denoting the vertex
set of a given connected component π ).
The algorithm searches for a nonempty strict subset U of the vertices of Ωb(B)
such that there is no arc (π1, π2) with π1 ∈/ U and π2 ∈ U . This can be done in
polynomial time by collapsing directed cycles until an acyclic digraph is obtained
and then searching for an indegree-0 vertex. If there exists no such set U then the
algorithm halts and outputs that there exists no solution. Otherwise, let H be the
union of the vertex sets of the connected components of Kb(B) that correspond to
elements of U and define Low = X \H . The algorithm recursively constructs networks
N (H ) displaying B|H and N (Low) displaying B|Low. Subsequently, the algorithm
constructs a network N consisting of vertices ρ , v, r , arcs (ρ , v), (v, r ), (ρ , r ),
networks N (Low), N (H ) and an arc from v to the root of N (H ) and an arc from r to the
root of N (Low). See Fig. 4 for an example of this case.
Finally, when |X | ≤ 2 (in some recursive step), the problem can be solved trivially.
When |X | = 1, the algorithm outputs a network consisting of a single vertex labelled by
the only element of X , which is the root as well as the leaf of the network. When |X | = 2
and there is a single binet remaining, the algorithm outputs that binet. When |X | = 2
and there are at least two binets remaining, then the algorithm halts and outputs that
there exists no solution.
This completes the description of the algorithm for binets. Clearly, it is a
polynomial-time algorithm and its correctness is shown in the following theorem.
Fig. 4 Example of the recursive step, which constructs a network N (X1) from binet set B|X1, with B
and X1 as in Fig. 3. Network N (X1) is cycle-rooted because graph Rb is connected. One possible strict
subset of the vertices of Ωb(B|X1) with no incoming arcs is {a, b}. Hence, H = {a, b} can be made
the high leaves of the network, and Low = {c, d, e} the low leaves. Combining recursively computed
networks N (H ) and N (Low) by hanging them below a new cycle as described by the algorithm then gives
network N (X1). Note that other valid subsets of the vertices of Ωb(B|X1) are {a, b, d}, {a, b, c}, {a, b, c, d}
and {d}, which lead to alternative solutions
Theorem 1 Given a set B of binets on a set X of taxa, there exists a polynomial-time
algorithm that decides if there exists a binary level-1 network on X that displays all
binets in B, and that constructs such a network if it exists.
Proof We prove by induction on | X | that the algorithm described above produces a
binary level-1 network on X displaying B if such a network exists. The induction basis
for | X | ≤ 2 is clearly true. Now let | X | ≥ 3, B a set of binets on X and assume that
there exists some binary level-1 network on X displaying B. There are two cases.
First assume that the graph Rb(B) is disconnected and has connected components
C1, . . . , C p. Then the algorithm recursively computes networks N |C1, . . . , N |C p
displaying the sets B|C1, . . ., B|C p respectively. Such networks exist by Observation 1
and can be found by the algorithm by induction. It follows that the network N which
is constructed by the algorithm displays all binets in B of which both taxa are in
the same connected component of Rb(B). Each other binet is of the form T (x , y)
by the definition of graph Rb(B). Hence, those binets are also displayed by N , by
construction.
Now assume that the graph Rb(B) is connected. Then we claim that there exists
no binary level-1 network displaying B that is not cycle-rooted. To see this, assume
that there exists such a network, let v1, v2 be the two children of its root and Xi the
leaves reachable by a directed path from vi , for i = 1, 2. Then there is no edge {a, b}
in Rb(B) for any a ∈ X1 and b ∈ X2. Since X1 ∪ X2 = X , it follows that Rb(B) is
disconnected, which is a contradiction. Hence, any network that is a valid solution is
cycle-rooted.
The algorithm then searches for a nonempty strict subset U of the vertices of Ω b(B)
with no incoming arc, i.e., for which there is no arc (π1, π2) with π1 ∈/ U and π2 ∈ U .
First assume that there exists no such set U . Then the algorithm reports that there
exists no solution. To prove that this is correct, assume that N is some binary level-1
network on X displaying B and let H be the set of leaves that are high in N . The graph
Kb(B) contains no edges between taxa that are high in N and taxa that are low in N
(because such taxa x , y cannot be together in a T (x , y) binet). Hence, the set H is a
union of vertex sets of connected components of Kb(B) and their representing vertices
of Ωb(B) form a subset U . If there were an arc (π1, π2) in Ωb(B) with π1 ∈/ U and
π2 ∈ U , then there would be a binet N (y; x ) ∈ B with x ∈ V (π1) and y ∈ V (π2). This
binet N (y; x ) would not be displayed by N because y ∈ H and x ∈/ H . Therefore,
we conclude that there is no arc (π1, π2) in Ωb(B) with π1 ∈/ U and π2 ∈ U . Hence,
we have obtained a contradiction to the assumption that there is no such set U .
Now assume that there exists such a set U . Then the algorithm recursively
constructs networks N (H ) displaying B|H and N (Low) displaying B|Low, with H the
union of the vertex sets of the connected components of Kb(B) corresponding to the
elements of U , and with Low = X \H . The algorithm then constructs a network N
consisting of a cycle with networks N (H ) hanging from the side of the cycle and
network N (Low) hanging below the cycle, as in Fig. 4. Networks N (H ) and N (Low)
exist by Observation 1 and can be found by the algorithm by induction. Because these
networks display B|H and B|Low respectively, each binet from B that has both its
leaves high or both its leaves low in N is displayed by N . Each other binet is of the
form N (x ; y) with x low and y high in N , because otherwise there would exist an
element in U which would have an incoming arc in Ωb(B). Hence, such binets are
also displayed by N .
4 Constructing a Network from a Nondense Set of Binets and Trinets
In this section we present an algorithm to construct a binary level-1 network displaying
a given nondense set of binets and trinets, if any exists. This algorithm can be regarded
as a generalisation of the Build algorithm [1,23] for checking whether or not there
exists a rooted phylogenetic tree displaying a set of triplets.
4.1 Outline
Let T be a set of binary level-1 trinets and binets on a set X of taxa. In this section we
will describe an exponential-time algorithm for deciding whether there exists a binary
level-1 network N on X with T ⊆ T(N ). Note that, if T contains trinets or binets
that are not level-1, we know that such a network cannot exist because all binets and
trinets displayed by a binary level-1 network are binary level-1 networks.
Throughout this section, we will assume that there exists some binary level-1
network on X that displays T and we will show that in this case we can reconstruct such
a network N .
Our approach aims at constructing the network N recursively; the recursive steps
that are used depend on the structure of N . The main steps of our approach are the
following:
1. We determine whether the network N is cycle-rooted (see Sect. 4.2);
2. If this is the case, we guess the high and low leaves of N (see Sect. 4.3);
3. Then, we guess how to partition the high leaves into the “left” and “right” leaves (see Sect. 4.4);
4. Finally, we determine how to partition the leaves on each side into the leaves of the different sidenetworks on that side (see Sect. 4.5).
Fig. 5 The set T of trinets that we use to illustrate the inner workings of our algorithm
Although we could do Steps 2 and 3 in a purely brute force way, we present several
structural lemmas which restrict the search space and will be useful in Sect. 5.
Once we have found a correct partition of the leaves (i.e., after Step 4), we
recursively compute networks for each block of the partition and combine them into a single
network. In the case that the network is not cycle-rooted, we do this by creating a root
and adding arcs from this root to the recursively computed networks. Otherwise, the
network is cycle-rooted. In this case, we construct a cycle with outgoing cut-arcs to
the roots of the recursively computed networks, as illustrated in Fig. 2.
The fact that we can recursively compute networks for each block of the computed
partition follows from Observation 1.
In the next sections we present a detailed description of our algorithm to reconstruct
N . We will illustrate the procedure by applying it to the example set of trinets depicted
in Fig. 5. The pseudocode is presented in Algorithm 1 and Table 1 gives an overview
of the different graphs used by the algorithm.
4.2 Is the Network Cycle-Rooted?
To determine whether or not N is cycle-rooted, we define a graph R(T) as follows.
The vertex set of R(T) is the set X of taxa and the edge set has an edge {a, b} if
there exists a trinet or binet T ∈ T with a, b ∈ L(T ) that is cycle-rooted or contains
a common ancestor of a and b different from the root of T (or both). For an example,
see Fig. 6.
Lemma 1 Let N be a binary level-1 network and T ⊆ T(N ). If R(T) is disconnected
and has connected components C1, . . . , C p, then T is displayed by the binary level-1
network N obtained by creating a new root ρ and adding arcs from ρ to the roots
of N |C1, . . . , N |C p, and refining arbitrarily the root ρ in order to make the resulting
network binary.
Proof By Observation 1, N displays each binet and each trinet of T whose leaves
are all in the same connected component of R(T). Consider a binet B ∈ T on {a, b}
with a and b in different components. Then there is no edge {a, b} in R(T) and
hence B is not cycle-rooted, i.e. B = T (a, b), and B is clearly displayed by N .
Now consider a trinet T ∈ T on {a, b, c} with a, b, c in three different
components. Then, none of {a, b}, {b, c} and {a, c} is an edge in R(T). Hence, none of the
pairs {a, b}, {b, c}, {a, c} has a common ancestor other than the root of T . Employing
Fig. 1, this is impossible and so T cannot exist. Finally, consider a trinet T ∈ T
on {a, b, c} with a, b ∈ Ci and c ∈ C j with i = j . Then there is no edge {a, c}
and no edge {b, c} in R(T). Consequently, T is not cycle-rooted and the pairs {a, c}
and {b, c} do not have a common ancestor in T other than the root of T . Hence,
T ∈ {T1(a, b; c), N3(a; b; c), N3(b; a; c)}. If T = T1(a, b; c), then N |Ci
displays T |Ci = T (a, b) and hence N displays T . If T = N3(a; b; c), then N |Ci
displays T |Ci = N (a; b) and, so, N displays T . Symmetrically, if T = N3(b; a; c),
then N |Ci displays T |Ci = N (b; a) and, so, N displays T . We conclude that N
displays T.
Hence, if R(T) is disconnected, we can recursively reconstruct a network for each
of its connected components and combine the solutions to the subproblems in the way
detailed in Lemma 1. If all input trinets are of the form T1(x , y; z), then this simulates
the Build algorithm [1,23].
If R(T) is connected, then we can apply the following lemma:
Lemma 2 Let N be a binary level-1 network on X and T ⊆ T(N ). If R(T) is
connected and |X | ≥ 2, then N is cycle-rooted.
Proof Suppose to the contrary that R(T) is connected and that N is not cycle-rooted.
Let v1, v2 be the two children of the root of N and Xi the leaves of N reachable by
a directed path from vi , for i = 1, 2. Note that X1 ∩ X2 = ∅ and X1 ∪ X2 = X .
Let a ∈ X1 and b ∈ X2 and let T be any trinet or binet displayed by N that contains a
and b. Then we have that T is not cycle-rooted and that the only common ancestor
of a and b in T is the root of T . Hence, there is no edge {a, b} in R(T) for any a ∈ X1
and b ∈ X2, which implies that R(T) is disconnected; a contradiction.
In the remainder of this section, we assume that R(T) is connected and thus that N
is cycle-rooted.
4.3 Separating the High and the Low Leaves
We define a graph K(T) whose purpose is to help decide which leaves are at the same
elevation in N . The vertex set of K(T) is the set of taxa X and the edge set contains
an edge {a, b} if there exists a trinet or binet T ∈ T with a, b ∈ L(T ) and in which a
and b are at the same elevation in T .
Lemma 3 Let N be a cycle-rooted binary level-1 network and T ⊆ T(N ). If C is a
connected component of K(T), then all leaves in C are at the same elevation in N .
Proof We prove the lemma by showing that there is no edge in K(T) between any
two leaves that are not at the same elevation in N . Let h and be leaves that are,
respectively, high and low in N . Then, in any trinet or binet T displayed by N that
contains h and , we have that T is cycle-rooted and that h is high in T and is low
in T . Hence, there is no edge {h, } in K(T).
We now define a directed graph Ω(T) whose purpose is to help decide which leaves
are high and which ones are low in N . The vertex set of Ω(T) is the set of connected
components of K(T) and the arc set contains an arc (π, π ) precisely if there is a
cyclerooted binet or trinet T ∈ T with h, ∈ L(T ) with h ∈ V (π ) high in T , ∈ V (π )
low in T . See Fig. 7 for an example for both graphs.
Lemma 4 Let T be a set of binets and trinets on a set X of taxa. Let N be a cycle-rooted
binary level-1 network displaying T. Then there exists a nonempty strict subset U of
the vertices of Ω(T) for which there is no arc (π, π ) with π ∈ U , π ∈/ U such that
the set of leaves that are high in N equals ∪π∈U V (π ).
Proof Let H be the set of leaves that are high in N . Note that H = ∅. By Lemma 3, H
is the union of connected components of K(T) and hence the union of a set U of
vertices of Ω(T) that respresent those components. We need to show that there is
no arc (π, π ) with π ∈ U , π ∈/ U in Ω(T). To see this, notice that if there were
such an arc, there would be a trinet or binet T ∈ T that is cycle-rooted and has
leaves h, ∈ L(T ) with h ∈ V (π ) high in T and ∈ V (π ) low in T . However, such
a trinet can only be displayed by N if either h is high in N and is low in N or h
and are at the same elevation in N . This leads to a contradiction because h ∈ V (π )
Fig. 7 The graph K(T) in solid
lines and the directed
graph Ω(T) in dashed lines, for
the set T of trinets in Fig. 5
∈ V (π ) with π ∈ U and, hence, h is low in N and
We now distinguish two cases. The first case is that the root of the network is in a
cycle with size at least four, i.e., the network is large-cycle rooted. The second case
is that the root of the network is in a cycle with size three, i.e., that the network is
tiny-cycle rooted. To construct a network from a given set of binets and trinets, the
algorithm explores both options.
4.3.1 The Network is Large-Cycle Rooted
In this case, we can simply try all subsets of vertices of Ω(T) with no incoming arcs
(i.e. arcs that begin outside and end inside the subset). For at least one such set U will
hold that π∈U V (π ) is the set of leaves that are high in the network by Lemma 4.
A set T of binets and trinets on a set X of taxa is called semi-dense if for each pair
of taxa from X there is at least one binet or trinet that contains both of them. If T is
semi-dense, then we can identify the set of high leaves by the following lemma.
Lemma 5 Let T be a semi-dense set of binets and trinets on a set X of taxa. Let N be a
binary large-cycle rooted level-1 network displaying T. Let H be the set of leaves that
are high in N . Then there is a unique indegree-0 vertex π0 of Ω(T) and H = V (π0).
Proof Since T is semi-dense, for any two leaves h, h ∈ H that are below different
cut-arcs leaving the cycle C containing the root of N , there exists a binet or a trinet T
in T containing both h and h . Then, since T is displayed by N , T has to be a binet or a
trinet where h and h are at the same elevation. This implies that there is an edge {h, h }
in K(T). Then, since there exist at least two different cut-arcs leaving C , the leaves
in H are all in the same connected component of K(T). Then, by Lemma 3, H forms
a connected component of K(T). Hence, H is a vertex of Ω(T). This vertex has
indegree-0 because no trinet or binet T displayed by N has a leaf ∈/ H that is high
in T and a leaf h ∈ H that is low in T . Therefore, H is an indegree-0 vertex of Ω(T).
Moreover, by construction, there is an arc from H to each other vertex of Ω(T).
Hence, H is the unique indegree-0 vertex of Ω(T).
For this case, we define a modified graph K†(T), which is the graph obtained from K(T)
as follows. For each pair of leaves a, b ∈ X , we add an edge {a, b} if there is no such
edge yet and there exists a large-cycle rooted trinet T ∈ T with a, b ∈ L(T ) (i.e. T
is of type S1(x , y; z) or S2(x ; y; z)). The idea behind these extra edges is that if the
network is tiny-cycle rooted and it displays a large-cycle rooted trinet, then all leaves
of this trinet must be in the same pendant subnetwork and hence at the same elevation.
The directed graph Ω†(T) is defined in a similar way as Ω(T) but its vertex set is
the set of connected components of K†(T). Its arc set has, as in Ω(T), an arc (π, π )
if there is a binet or trinet T ∈ T that is cycle-rooted and has leaves h, ∈ L(T )
with h ∈ V (π ) high in T , ∈ V (π ) low in T .
Fig. 8 Example for the case that the network is tiny-cycle rooted. From left to right are depicted a set of
trinets T, its graphs K†(T) (solid) and Ω†(T) (dashed) and the resulting network N , obtained by combining
networks N (H ) and N (Low). Note that the two edges labelled † are in K†(T) but not in K(T)
Our approach for this case is to take a non-empty strict subset U of the vertices
of Ω†(T) that has no incoming arcs and to take H to be the union of the elements
of U . Then, a network displaying T can be constructed by combining a network N (H )
displaying T|H and a network N (Low) displaying T|Low, with Low = X \H . (An
example is depicted in Fig. 8). The next lemma shows the correctness of this step.
Lemma 6 Let T be a set of binets and trinets on a set X of taxa. Let N be a binary
tinycycle rooted level-1 network displaying T. Then there is a non-empty strict subset U of
the vertices of Ω†(T) such that there is no arc (π, π ) with π ∈ U , π ∈/ U . Moreover,
if U is any such set of vertices, then there exists a binary tiny-cycle rooted level-1
network N displaying T in which π∈U V (π ) is the set of leaves that are high in N .
Proof Let H denote the set of leaves that are high in N . Then, H is the union of
the vertex sets of one or more connected components of K(T) by Lemma 3. Any
large-cycle rooted trinet which contains a leaf in H and a leaf not in H cannot be
displayed by N because N is tiny-cycle rooted. Hence, H is also the union of one
or more connected components of K†(T). These components form a subset U of the
vertices of Ω†(T). Furthermore, there is no arc (π, π ) with π ∈ U , π ∈/ U since no
trinet or binet T displayed by N has a leaf that is in H and high in T and a leaf that is
not in H and that is low in T .
Now consider any nonempty strict subset U of the vertices of Ω†(T) with no
incoming arcs, let H be the union of the vertex sets of the corresponding connected
components of K†(T) and let Low = X \H . Let N (H ) be a binary level-1 network
displaying T|H and let N (Low) be a binary level-1 network displaying T|Low. Such
networks exist by Observation 1. Let N be the network consisting of vertices ρ , v, r ,
arcs (ρ , v), (v, r ), (ρ , r ), networks N (Low), N (H ) and an arc from v to the root
of N (H ) and an arc from r to the root of N (Low). Clearly, N is a tiny-cycle rooted
level-1 network and H is the set of leaves that are high in N .
It remains to prove that N displays T. First observe that for any h ∈ H and ∈
X \H , there is no edge {h, } in K†(T) (because otherwise h and would lie in the
same connected component). Hence, by construction of K†(T), any binet or trinet
containing h and can not be tiny-cycle rooted and cannot have h and at the same
elevation. Moreover, in any such binet or trinet, h must be high and must be low,
because otherwise there would be an arc entering U in Ω†(T).
Consider any trinet or binet T ∈ T. If the leaves of T are all in H or all in L
then T ∈ T|H or T ∈ T|L and so T is clearly displayed by N . If T is a binet
containing one leaf h ∈ H and one leaf ∈ X \H , then T must be N ( ; h) (by the
previous paragraph) and, again, T is clearly displayed by N . Now suppose that T
contains one leaf h ∈ H and two leaves , ∈ X \H . Since we have argued in the
previous paragraph that T is tiny-cycle rooted, T must be of the form N2( , ; h),
N5( ; ; h) or N5( ; ; h). Moreover, since N (Low) displays the binet on and ,
and since h is high in T and , low, it again follows that T is displayed by N .
Finally, assume that T contains two leaves h, h ∈ H and a single leaf ∈ X \H .
Then (since T is tiny-cycle rooted) T must be of the form N1(h, h ; ) or N4(h; h ; ).
Since N (H ) displays the binet on h and h , it follows that N again displays T .
Note that the proof of Lemma 6 describes how to build a tiny-cycle rooted level-1
network displaying T if such a network exists. Therefore, we assume from now on
that the to be constructed network is large-cycle rooted.
The next step is to divide the set H of leaves that are high in N into the leaves that
are “on the left” and the leaves that are “on the right” of the cycle containing the root
or, more precisely, to find the bipartition of H induced by some network displaying a
given set of binets and trinets. We use the following definition.
Definition 1 A bipartition of some set H ⊆ X is called feasible with respect to a set
of binets and trinets T if the following holds:
(F1) If there is a binet or trinet T ∈ T containing leaves a, b ∈ H that has a common
ancestor in T that is not the root of T , then a and b are in the same part of the
bipartition and
(F2) If there is a trinet S1(x , y; z) ∈ T with x , y ∈ H and z ∈ X \H , then x and y
are in different parts of the bipartition.
Note that one part of a feasible bipartition may be empty. The next lemma shows that
the bipartition of H induced by N must be feasible. Hence, to find the right bipartion
we only need to consider feasible ones.
Lemma 7 Let N be a cycle-rooted binary level-1 network, let T ⊆ T(N ), let H be
the set of leaves that are high in N and let {L , R} be the bipartition of H induced
by N . Then {L , R} is feasible with respect to T.
Proof First consider a binet or trinet T ∈ T containing leaves a, b ∈ H that have a
common ancestor in T that is not the root of T . Since N displays T , it follows that a
and b have a common ancestor in N that is not the root of N . Hence, a and b are on
the same side in N . Since {L , R} is the bipartition of H induced by N , it now follows
that a and b are in the same part of the bipartition, as required.
Now consider a trinet S1(x , y; z) ∈ T with x , y ∈ H and z ∈ X \H . Since N
displays T , we have that x and y are not on the same side in N . Since {L , R} is the
bipartition of H induced by N , it follows that x and y are contained in different parts
of the bipartition, as required.
Fig. 9 The graph M(T, H ) in
solid lines and the graph
sWet(TT,oHft)riinnedtsaisnheFdigl.in5esa,nfdor the
H = {a, b, c, d, e, f, g, h, i, m}.
A proper 2-colouring of
W(T, H ) is to color {a} and {b}
in red and {c, d, e, f, g, h, i, m}
in blue (Color figure online)
We now show how a feasible bipartition of a set H ⊆ X can be found in polynomial
time. We define a graph M(T, H ) = (H, E (M)) with an edge {a, b} ∈ E (M) if
there is a trinet or binet T ∈ T with a, b ∈ L(T ) distinct and in which there is a
common ancestor of a and b that is not the root of T . The idea behind this graph is
that leaves that are in the same connected component of this graph have to be in the
same part of the bipartition.
Now define a graph W(T, H ) = (V (W), E (W)) as follows. The vertex set V (W)
is the set of connected components of M(T, H ) and there is an edge {π, π } ∈ E (W)
precisely if there exists a trinet S1(x , y; z) ∈ T with x ∈ V (π ), y ∈ V (π ) and z ∈
X \H . The purpose of this graph is to ensure that groups of leaves are in different parts
of the bipartition, whenever this is necessary. See Fig. 9 for an example.
Lemma 8 Let T be a set of binets and trinets on X and H ⊆ X . A bipartition {L , R}
of H is feasible with respect to T if and only if
(I) V (π ) ⊆ L or V (π ) ⊆ R for all π ∈ V (W) and
(II) there does not exist {π, π } ∈ E (W) with (V (π ) ∪ V (π )) ⊆ L or (V (π ) ∪
V (π )) ⊆ R.
Proof The lemma follows directly from observing that (F1) holds if and only if (I)
holds and that (F2) holds if and only if (II) holds.
By Lemma 8, all feasible bipartitions can be found by finding all 2-colourings of the
graph W(T, H ). At least one of them is the bipartition induced by a valid solution N
(if one exists) by Lemma 7.
For example, consider the input set of trinets T from Fig. 5. Since T is not
semi-dense, we have to guess which connected components of K(T) form the
set H of leaves that are high in the network (see Sect. 4.3). If we guess H =
{a, b, c, d, e, f, g, h, i, m}, then we obtain the graphs M(T, H ) and W(T, H )
as depicted in Fig. 9. The only possible 2-colouring (up to symmetry) of the
graph W(T, H ) is indicated in the figure. From this we can conclude that a and b
are on the same side of the network and that all other high leaves (c, d, e, f, g, h, m)
are “on the other side” (i.e., none of them is on the same side as a or b).
4.5 Finding the Pendant Sidenetworks
The next step is to divide the leaves of each part of the bipartition of the set of high
leaves of the network into the leaves of the pendant sidenetworks. For this, we define
the following graph and digraph.
Let T be a set of binets and trinets on X , let H X , let {L , R} be some
bipartition of H that is feasible with respect to T and let S ⊆ S ∈ {L , R}. Consider the
graph O(T, S , H ) with vertex set S and an edge {a, b} if
– There exists a trinet or binet T ∈ T|S with a, b ∈ L(T ) that has a cycle that
contains the root or a common ancestor of a and b (or both) or;
– There exists a trinet T ∈ T with L(T ) = {a, b, c} with c ∈/ H and such that c is
low in T and a and b are high in T and both in the same pendant sidenetwork of T
or;
– T1(a, b; c) ∈ T|S for some c ∈ S .
The directed graph D(T, S , H ) (possibly having loops) has a vertex for each
connected component of O(T, S , H ) and it has an arc (π1, π2) (possibly, π1 = π2)
precisely if there is a trinet in T of the form S2(x ; y; z) with x ∈ V (π1), y ∈ V (π2)
and z ∈/ H .
For example, Fig. 10 shows the set of trinets from Fig. 5 restricted to the set S =
R = {c, d, e, f, g, h, i, m}. The corresponding graphs O(T, R, H ) and D(T, R, H ),
with H = {a, b, c, d, e, f, g, h, i, m}, are depicted in Fig. 11.
The following lemma shows that, if the digraph D(T, S , H ) has no indegree-0
vertex, there exists no binary level-1 network displaying T in which H is the set of
high leaves and all leaves in S are on the same side.
Lemma 9 Let T be a set of binets and trinets on X , let H X , let {L , R} be a
bipartition of H that is feasible with respect to T and let S ⊆ S ∈ {L , R}. If the graph
D(T, S , H ) has no indegree-0 vertex, then there exists no binary level-1 network N
that displays T in which H is the set of high leaves and all leaves in S are on the
same side.
Proof Suppose that there exists such a network N . Let {L , R} be the bipartition of H
induced by N and suppose without loss of generality that L ∩ S = ∅. Let L1, . . . , Lq
be the partition of L induced by the pendant sidenetworks of N , ordered from the
nearest to the farthest from the root. Let i be the first index for which Li ∩ S = ∅.
Then, by the definition of O(T, S , H ), Li ∩ S is the union of one or more connected
Fig. 11 The graph O(T, R, H )
in solid lines and the
digraph D(T, R, H ) in dashed
lines, with R and H as in Fig. 10
Let T, X, H, L and R be as above. We present a sidenetwork partitioning
algorithm, which proceeds as follows for each S ∈ {L , R}. Choose one indegree-0
vertex of D(T, S, H ) and call it S1. This will be the set of leaves of the first
pendant sidenetwork on side S. Then, construct the graph O(T, S\S1, H ) and digraph
D(T, S\S1, H ), select an indegree-0 vertex and call it S2. Continue like this, i.e.
let Si be an indegree-0 vertex of D(T, S\(S1 ∪ . . . ∪ Si−1), H ), until an empty graph
or a digraph with no indegree-0 vertex is obtained. In the latter case, there is no
valid solution (under the given assumptions) by Lemma 9. Otherwise, we obtain sets
L1, . . . , Lq and R1, . . . , Rq containing the leaves of the pendant sidenetworks on
both sides.
In the example in Fig. 11, the only indegree-0 vertex of D(T, R, H ) is {c, d}. Hence,
we have R1 = {c, d}. Since O(T, R\{c, d}) is connected, R2 = {e, f, g, h, i, m}
follows.
4.6 Constructing the Network
We build a binary level-1 network N ∗ based on the sets H, L1, . . . , Lq , R1, . . . , Rq
as follows. Let N (Li ) be a binary level-1 network displaying T|Li for i = 1, . . . , q
and let N (Ri ) be a binary level-1 network displaying T|Ri for i = 1, . . . , q (note
that it is possible that one of q and q is 0.). We can build these networks recursively,
and they exist by Observation 1. In addition, we recursively build a network N (Low)
displaying T|Low with Low = X \H . Now we combine these networks into a single
network N ∗ as follows. We create a root ρ, a reticulation r , and two directed paths
(ρ , u1, . . . , uq , r ), (ρ , v1, . . . , vq , r ) from ρ to r (if q = 0 (respectively q = 0)
then there are no internal vertices on the first (resp. second) path). Then we add an
arc from ui to the root of N (Li ), for i = 1, . . . , q, we add an arc from vi to the root
of N (Ri ) for i = 1, . . . , q and, finally, we add an arc from r to the root of N (Low).
This completes the construction of N ∗. For an example, see Fig. 2.
We now prove that the network N ∗ constructed in this way displays the input trinets,
assuming that there exists some solution that has H as its set of high leaves and {L , R}
as the bipartition of H induced by it.
Lemma 10 Let T be a set of binets and trinets, let N be a cycle-rooted binary level-1
network displaying T, let H be the set of leaves that are high in N and let {L , R}
be the feasible bipartition of H induced by N . Then the binary level-1 network N ∗
constructed above displays T.
Proof The proof is by induction on the number |L(T)| of leaves in T. The induction
basis for |L(T)| ≤ 2 is trivial. Hence, assume |L(T)| ≥ 3.
For each pendant subnetwork N of N ∗ with leaf-set X , there exists a binary
level1 network displaying T|X by Observation 1. Hence, the network N that has been
computed recursively by the algorithm displays T|X by induction. It follows that any
trinet or binet whose leaves are all in the same pendant sidenetwork of N ∗ is displayed
by N ∗. Hence, it remains to consider binets and trinets containing leaves in at least
two different pendant subnetworks of N ∗.
Let B ∈ T be a binet on leaves that are in two different pendant subnetworks
of N ∗. If B = N (y; x ) then, because B is displayed by N , y is low in N and hence
also low in N ∗. Since x and y are in different pendant subnetworks of N ∗, it follows
that x is high in N ∗ and hence B is displayed by N ∗. If B = T (x , y) then there is an
edge {x , y} in K(T) and hence x and y are at the same elevation in N ∗. Since x and y
are in different pendant subnetworks, both must be high in N ∗ and it follows that N ∗
displays B.
Now consider a trinet T ∈ T on leaves x , y, z that are in at least two different
pendant subnetworks. At least one of x , y, z is high in N ∗ because otherwise all three
leaves would be in the same pendant subnetwork N (Low), with Low = X \H . We
now consider the different types of trinet that T can be.
First suppose that T = T1(x , y; z). Then x , y, z form a clique in K(T) and hence
all of x , y and z are high in N ∗. Moreover, by feasibility, x and y are in the same part
of the bipartition {L , R} and hence on the same side in N ∗. If x and y are in the same
pendant subnetwork then the binet T |{x , y} = T (x , y) is displayed by this pendant
subnetwork. Hence, in that case, T is clearly displayed by N ∗. Now assume that x
and y are in different pendant subnetworks and assume without loss of generality
that x , y ∈ R. If z ∈ L then, again, T is clearly displayed by N ∗. Hence assume
that x , y, z ∈ R. Then, for each set R ⊆ R containing x , y, z, the graph O(T, R , H )
has an edge between x and y. Hence, either x and y are in the same pedant sidenetwork,
or z is in a pendant sidenetwork above the pendant sidenetworks that contain x and y.
Hence, T is displayed by N ∗.
Now suppose that T ∈ {N1(x , y; z), N4(x ; y; z)}. Then there is an edge {x , y}
in K(T) and hence x and y are at the same elevation in N ∗. First note that x , y and z
are not all high in N ∗ because otherwise x , y and z would all be in the same part S
of the bipartition {L , R} by feasibility and in the same pendant sidenetwork because
they form a clique in O(T, S, H ). Hence, z is not at the same elevation as x and y and
hence z is not in the same connected component of K(T) as x and y. Then there is an
arc (π, π ) in Ω(T) with π the component containing x and y and π the component
containing z. Hence x and y are high in N ∗ and z is low in N ∗ (since π has indegree
greater than zero). Then, x and y are in the same part S of the bipartition {L , R} by
feasibility and in the same pendant subnetwork of N ∗ because there is an edge {x , y}
in O(T, S). Hence, since the binet T |{x , y} is displayed by the pendant subnetwork
containing x and y, we conclude that T is displayed by N ∗.
Now suppose that T = S1(x , y; z). We can argue in the same way as in the previous
case that x and y are high in N ∗ and that z is low in N ∗. By feasibility, x and y are in
different parts of the bipartition {L , R} and, hence, N ∗ displays T .
Now suppose that T ∈ {N2(x , y; z), N5(x ; y; z)}. Then we can argue as before
that x and y are at the same elevation in N ∗ and that z is not at the same elevation as x
and y and hence that z is not in the same connected component of K(T) as x and y.
Then there is an arc (π, π ) in Ω(T) with π the component containing z and π the
component containing x and y. Hence, z is high in N ∗ and x and y are low in N ∗.
Since the binet T |{x , y} is displayed by N (Low), we conclude that T is displayed
by N ∗.
Now suppose that T = N3(x ; y; z). Observe that x , y, z are all high in N3(x ; y; z)
because this trinet is not cycle-rooted. Therefore, x , y, z form a clique in K(T) and
hence all of x , y and z are high in N ∗. Moreover, by feasibility, x and y are in the
same part of the bipartition {L , R}, say in R, and hence on the same side in N ∗. First
suppose that z ∈ L. Then, T|R contains the binet T |{x , y} which is cycle-rooted.
Hence, there is an edge {x , y} in O(T, R, H ) and x and y are in the same pendant
sidenetwork. Since T |{x , y} is displayed by this pendant subnetwork, it follows that T
is displayed by N ∗. Now assume that z ∈ R. Then the trinet T is in T|R and has a
common ancestor of x and y contained in a cycle. Hence, as before, x and y are in
the same pendant sidenetwork of N ∗ and, since T |{x , y} is displayed by that pendant
sidenetwork, it follows that T is displayed by N ∗.
Finally, suppose that T = S2(x ; y; z). As in the case T ∈ {N1(x , y; z), N4(x ; y; z)},
we can argue that x and y are high in N ∗ and that z is low in N ∗. Then, by feasibility, x
and y are on the same side S in N ∗. First suppose that x and y are in the same pendant
sidenetwork of N ∗. Consider an iteration i of the sidenetwork partitioning algorithm
with x , y ∈ S\(S1 ∪ . . . ∪ Si−1). Then there is an arc (π1, π2) in D(T, S\(S1 ∪
. . . ∪ Si−1), H ) with x ∈ V (π1) and y ∈ V (π2) (possibly π1 = π2). Hence, Si does
not contain y because π2 does not have indegree-0. It follows that x and y are in
different sidenetworks and that the sidenetwork containing x is above the sidenetwork
containing y. Hence, N ∗ displays T , which concludes the proof of the lemma.
See Algorithm 1 for the pseudocode of the algorithm and Table 1 for an overview
of the definitions of the graphs used in the algorithm. Note that Lemma 5 shows
correctness of Lines 14-16, which speed up the algorithm significantly in the case that
the input is semi-dense.
Theorem 2 There exists an O(3|X| pol y(|X |)) time algorithm that constructs a binary
level-1 network N displaying a given set T of binets and trinets on a taxon set X , if
such a network exists.
Proof If the graph R(T) is disconnected and has connected components C1, . . . , C p,
then we recursively compute binary level-1 networks N1, . . . , N p displaying T|C1,
. . . , T|C p respectively. Then, by Lemma 1, T is displayed by the binary
level1 network N obtained by creating a root ρ and adding arcs from ρ to the roots
of N |C1, . . . , N |C p, and refining the root ρ in order to make the network binary.
If R(T) is connected, then any binary level-1 network N displaying T is
cyclerooted by Lemma 2. If there exists such a network that is tiny-cycle rooted, then we
can find such a network by Lemma 6.
Otherwise, we can “guess”, using Lemma 4, a set of leaves H such that there exists
some binary level-1 network N displaying T in which H is the set of leaves that are
high. Moreover, using Lemma 8, we can “guess” a feasible partition {L , R} of H with
respect to T by “guessing” a proper 2-colouring of the graph W(T, H ). The total
number of possible guesses for the tripartition {L , R, X \H } is at most 3|X|.
If there exists a binary level-1 network N displaying T then, by Lemma 10, there
exists some tripartition (L , R, X \H ) for which network N ∗ from Lemma 10 displays
all binets and trinets in T.
Table 1 Overview of the graphs used by Algorithm 1
S ⊆ S ∈ {L , R}
An edge {a, b} if there exists T ∈ T with a, b ∈ L(T )
and in which a and b are at the same elevation in T
An edge {a, b} if there exists T ∈ T with a, b ∈ L(T )
distinct and in which there is a common ancestor of a
and b that is not the root of T
–There exists T ∈ T with L(T ) = {a, b, c} with c ∈/ H
and such that c is low in T and a and b are high in T
and both in the same pendant sidenetwork of T or
–T1(a, b; c) ∈ T|S for some c ∈ S
It remains to analyse the running time. Each recursive step takes O (3|X | pol y(| X |))
time and the number of recursive steps is certainly at most | X |, leading to
O (3|X | pol y(| X |)) in total since, by Observation 1, the various recursive steps are
independent of each other.
Note that the running time analysis in the proof Theorem 2 is pessimistic since,
by Lemma 3, the set H of high leaves must be the union of a subset of the vertices
of Ω (T) with no incoming arcs. Moreover, the number of feasible bipartitions of H
is also restricted because each such bipartition must correspond to a 2-colouring of
the graph W (T, H ). Hence, the number of possible guesses is restricted (but still
exponential).
We conclude this section by extending Theorem 2 to instances containing networks
with arbitrarily many leaves.
Algorithm 1: Constructing a binary level-1 network displaying a given set T
of binets and trinets, if such a network exists
1 Step 1: Determine if the network is cycle-rooted
4 Recurse on the connected components.
9 else
10
11
return ∅.
// The network is cycle-rooted;
Step 2: Find the high leaves
if there exists a non-empty strict subset U of the vertices of Ω†(T) with no incoming arcs then
// The network is tiny-cycle rooted;
Set H = π∈U V (π );
Construct a network N on X by combining a network N (H ) displaying T|H and a
network N (Low) displaying T|(X \H ) as detailled in Lemma 6; return N ;
// The network is large-cycle rooted;
if T is semi-dense then
if there is a unique indegree-0 vertex π0 of Ω(T) then
Set H = V (π0) and go to line 24;
for all non-empty strict subsets U of the vertices of Ω(T) with no incoming arcs do
Set H = π∈U V (π );
Step 3: Separate the left and the right leaves
Find all feasible bipartitions of H by finding all 2-colourings of W(T, H );
if there exists at least one feasible bipartition then
for each such bipartition {L , R} do
Step 4: Find the pendant sidenetworks
Apply the sidenetwork partitioning algorithm described in Sect. 4.5;
if the sidenetwork partitioning algorithm does not find a D(T, S , H )
without indegree-0 vertex then
Construct a network N as described in Sect. 4.6; return N ;
32 return ∅.
Corollary 1 There exists an O (3|X | pol y(| X |)) time algorithm that constructs a
binary level-1 network N displaying a given set N of binary level-1 networks, if
such a network exists.
Proof We apply Theorem 2 to the set T(N ) of binets and trinets displayed by the
networks in N . To check that the resulting network N displays N , consider a
network N ∈ N . Since binary level-1 networks are encoded by their trinets [11], any
binary level-1 network displaying T(N ) is equivalent to N . Hence, N |L(N ) is
equivalent to N . Therefore, N is displayed by N . Since |T(N )| = O (| X |3), the running
time is O (3|X | pol y(| X |)) as in Theorem 2.
Recall that a network is a tiny-cycle network if each cycle consists of exactly three
vertices. It is easy to see that each tiny-cycle network is a level-1 network. Note that all
binary level-1 binets and trinets except for S1(x , y; z) and S2(x ; y; z) are tiny-cycle
networks. We prove the following.
Theorem 3 Given a set T of tiny-cycle binets and tiny-cycle trinets, we can decide
in polynomial time if there exists a binary level-1 network displaying T and construct
such a network if it exists.
Proof Let N be a binary level-1 network displaying T. If N is not a tiny-cycle network,
then we construct a tiny-cycle network N from N as follows (see Fig. 12 for an
illustration). For each cycle of N consisting of internally vertex-disjoint directed paths
(s, v1, . . . , vn, t ) and (s, w1, . . . , wm , t ) with n + m ≥ 2, do the following. Delete
arcs (vn, t ) and (wm , t ), suppress vn and wm , add vertices q and r and arcs (q, r ),
(r, t ), (q, t ) and (r, s). Finally, if s is not the root of N , let p be the parent of s in N and
replace arc ( p, s) by an arc ( p, q). Let N be the obtained network. It is easy to verify
that any binary tiny-cycle network that is displayed by N is also displayed by N and
that N is a tiny-cycle network. Hence, we may restrict our attention to constructing
tiny-cycle networks.
The only two cases to consider are that the to be constructed network is not
cyclerooted and that it is tiny-cycle rooted. By Lemmas 1 and 2, we can deal with the first
case in the same way as in the polynomial-time algorithm for binets from Sect. 3
with R(T) instead of Rb(B). By Lemma 6, we can deal with the second case in the
same way as in the polynomial-time algorithm for binets with Ω†(T) instead of Ωb(B)
(and hence K†(T) instead of Kb(B)).
Note that Theorem 3 applies to sets of binets and trinets that do not contain any
trinets of the form S1(x , y; z) and S2(x ; y; z). The following corollary generalises this
Fig. 12 Transformation from a binary level-1 network N to a tiny-cycle network N , used in the proof of
Theorem 3 (with n = 3 and m = 2)
theorem to general instances of tiny-cycle networks. It follows from Theorem 3 in the
same way as Corollary 1 follows from Theorem 2.
Corollary 2 Given a set N of tiny-cycle networks, we can decide in polynomial time
if there exists a binary level-1 network displaying N and construct such a network if
it exists.
In this section, we show that it is NP-hard to construct a binary level-1 network from a
nondense set of trinets. The reduction is a nontrivial adaptation of the reduction given
by Jansson, Nguyen and Sung [19] for deciding if there exists a level-1 network that is
consistent with a given set of triplets in the following sense. A network N is consistent
with a triplet T if N contains a subgraph that is a subdivision of T . The notions of
triplet consistency and trinet display are fundamentally different in networks (while
they are the same in trees). In particular, a network displays only one trinet on three
taxa but may be consistent with two distinct triplets on these three taxa (for example,
network N in Fig. 13 is consistent with triplets T1(b, x1; z1) and T1(b, z1; x1) but the
only trinet on these taxa that N displays is S1(x1, z1; b)). Consequently, Theorem 4
does not follow directly from the result in [19].
Because trinets provide more information than triplets, one might hope that
constructing a network from trinets is computationally easier than from triplets. However,
Theorem 4 shows that this is (in the considered setting) not the case. The proof uses
a reduction from SetSplitting, which is similar to the reduction in [19]. However,
with trinets it is much more difficult to enforce a simple structure (a cycle with one
leaf below it and all other leaves on its left and right sides) without enforcing on which
side each leaf is. To achieve this, the reduction uses three different types of trinets,
and it is not clear if the problem remains NP-hard if only trinets of type S1(x , y; z) or
only trinets of type S2(x ; y; z) are present.
Theorem 4 Given a set of trinets T, it is NP-hard to decide if there exists a binary
level-1 network N displaying T. In addition, it is NP-hard to decide if there exists such
a network with a single reticulation.
Proof We reduce from the NP-hard problem SetSplitting [6].
The SetSplitting problem is defined as follows. Given a set U and a collection C
of size-3 subsets of U , decide if there exists a bipartition of U into sets A and B such
that for each C ∈ C holds that C ∩ A = ∅ and C ∩ B = ∅? If such a bipartition exists,
then we call it a set splitting for C.
The reduction is as follows. For an example see Fig. 13. Assume that C =
{C1, . . . , Ck }, with k ≥ 1, and furthermore that the elements of U are totally ordered
(by an operation <). We create a taxon set X and a trinet set T on X as follows. For
each u ∈ U , put a taxon u0 in X . In addition, add a special taxon b to X . Then, for
each Ci = {u, v, w} with u < v < w, add taxa ui , ui , vi , vi , wi , wi to X and add
the following trinets to T: T1(vi , vi ; ui ), T1(wi , wi ; vi ), T1(ui , ui ; wi ), S2(vi ; vi ; b),
C1 = {x, y, z}
C2 = {q, x, z}
Fig. 13 An example of the reduction if the input of the SetSplitting problem is C = {{x, y, z}, {q, x, z}}
(with q < x < y < z). A valid set splitting is {{q, z}, {x, y}} and the network N indicated in the figure
displays all trinets produced by the reduction
S2(wi ; wi ; b), S2(ui ; ui ; b), S1(ui , u0; b), S1(vi , v0; b), S1(wi , w0; b), resulting in T
comprising of three different types of trinets. This completes the reduction.
First we show that, if there exists a set splitting for C, then there exists a network N
on X that displays T and has exactly one reticulation. Let { A, B} be a set splitting
for C. We construct a network N whose root is in a cycle and each arc leaving this
cycle ends in a leaf. Hence, N has precisely one reticulation, whose child is leaf.
Label this leaf by taxon b. Let {L , R} be the bipartition of X \{b} induced by N . Then,
for each u ∈ A we put u0 in L and ui and ui for i ≥ 1 in R. Symmetrically, for
each u ∈ B we put u0 in R and ui and ui for i ≥ 1 in L. It remains to describe the
order of the leaves on each side of the network. For each i ∈ {1, . . . , k} and for any
two leaves xi , yi that are on the same side in N , put xi above yi and yi if x < y (and
put yi above xi and xi if y > x ). In addition, put each xi above xi . The ordering can
be completed arbitrarily. For an example, see the network in Fig. 13. To see that N
displays T, first observe that all trinets of the form S2(xi ; xi ; b) are displayed by N
because we put xi above xi and on the same side. In addition, all trinets of the form
S1(xi , x0; b) are also displayed by N because xi and x0 are on opposite sides of N .
Now consider a constraint Ci = {u, v, w} ∈ C with u < v < w. Since { A, B} is a
set splitting, |Ci ∩ A| = 2 or |Ci ∩ B| = 2. Suppose that u and v are in the same set,
say u, v ∈ A. The other two cases can be dealt with in a similar manner. It then follows
that w ∈ B and hence that ui , vi , ui , vi ∈ R and wi , wi ∈ L. It then follows that trinets
T1(wi , wi ; vi ) and T1(ui , ui ; wi ) are displayed by N . Moreover, since u < v, we have
that ui is above vi and vi . Hence, also trinet T1(vi , vi ; ui ) is displayed by N . We
conclude that T is displayed by N .
It remains to show that if there exists a network on X that displays T, then there exists
a set splitting A, B for C. So assume that there exists a network N on X that displays T.
From Lemma 2 it follows that N is cycle-rooted. By Lemma 3, all leaves in X \{b} are
at the same elevation in N . Consequently, by Lemma 4, the leaves in X \{b} are all high
in N and b is low in N . Let {L , R} be the bipartition of X \{b} induced by N . Define
A = {u ∈ U | u0 ∈ L} and B = {u ∈ U | u0 ∈ R}. We claim that A and B form a set
splitting for C. To show this, assume the contrary, i.e., that there exists some constraint
Ci = {u, v, w} ∈ C, with u < v < w, such that either u, v, w ∈ A or u, v, w ∈ B.
Assume without loss of generality that u, v, w ∈ A. Then u0, v0, w0 ∈ L. Hence,
ui , vi , wi ∈ R since the trinets S1(ui , u0; b), S1(vi , v0; b) and S1(wi , w0; b) are
contained in T. Then, since S2(ui ; ui ; b) ∈ T, we obtain that ui ∈ R. Moreover, ui and ui
are in different sidenetworks of N . Then, since T1(ui , ui ; wi ) ∈ T, the sidenetwork
containing wi is strictly above the sidenetwork containing ui . However, it follows in
the same way from trinets S2(vi ; vi ; b) and T1(vi , vi ; ui ) that the sidenetwork of N
containing ui is strictly above the sidenetwork containing vi . Furthermore, it follows
from the trinets S2(wi ; wi ; b) and T1(wi , wi ; vi ) that the sidenetwork containing vi is
strictly above the sidenetwork containing wi . Hence, we have obtained a contradiction.
It follows that A and B form a set splitting of C, completing the proof.
7 Concluding Remarks
We have presented an exponential time algorithm for determining whether or not
an arbitrary set of binets and trinets is displayed by a level-1 network, shown that
this problem is NP-hard, and given some polynomial time algorithms for solving it
in certain special instances. It would be interesting to know whether other special
instances are also solvable in polynomial time (for example, when either S1(x , y;
z)type or S2(x ; y; z)-type trinets are excluded).
We note that the problem of deciding if a set of binets and trinets is displayed by
a level-1 network remains NP-hard when the input set is semi-dense, i.e. for each
combination of two taxa it contains at least one binet or trinet containing those two
taxa. Although the trinet set produced in the proof of Theorem 4 is not semi-dense,
it is not difficult to make it semi-dense without affecting the reduction, by adding a
binet T (x , y) for all x , y ∈ X \{b}.
As mentioned in the introduction, our algorithms can be regarded as a supernetwork
approach for constructing phylogenetic networks. It is therefore worth noting that our
main algorithms extend to the case where the input consists of a collection of binary
level-1 networks, where each input network is allowed to have any number of leaves.
Furthermore, we have shown that constructing a binary level-1 network from a set of
trinets is NP-hard in general. One could instead consider the problem of constructing
such networks from networks on (at least) m leaves, 3 ≤ m ≤ | X | (or m-nets for short).
However, it can be shown that it is also NP-hard to decide if a set of level-1 m-nets is
displayed by a level-1 network or not. This can be shown by a simple reduction from
the problem for trinets.
It would also be of interest to develop algorithms to reconstruct level-k networks
for k ≥ 2 from m-nets (a binary network is said to be level-k if every biconnected
component contains at most k reticulations [4]). Note that the trinets displayed by
a level-2 network always encode the network [16], but that in general trinets do not
encode level-k networks [9]. Therefore, in light also of the results in this paper, we
anticipate that these problems might be quite challenging in general. Even so, it could
still be very useful to develop heuristics for tackling such problems as this has proven
very useful in both supertree and phylogenetic network construction.
Acknowledgements We thank the anonymous reviewers for their helpful comments. Leo van Iersel was
partially supported by a Veni Grant of the Netherlands Organization for Scientific Research (NWO).
Katharina Huber and Celine Scornavacca thank the London Mathematical Society for partial support in the context
of their Computer Science Small Grant Scheme (SC7-1314-04). This publication is the contribution No.
2015-166 of the Institut des Sciences de l’Evolution de Montpellier (ISE-M, UMR 5554). This work has been
partially funded by the French Agence Nationale de la Recherche, Investissements d’avenir/Bioinformatique
(ANR-10-BINF-01-02, Ancestrome).
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution,
and reproduction in any medium, provided you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license, and indicate if changes were made.
1. Aho , A.V. , Sagiv , Y. , Szymanski, T.G. , Ullman , J.D. : Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions . SIAM J. Comput . 10 ( 3 ), 405 - 421 ( 1981 )
2. Bapteste , E. , van Iersel , L.J.J. , Janke , A. , Kelchner , S. , Kelk , S.M. , McInerney , J.O. , Morrison , D.A. , Nakhleh , L. , Steel , M. , Stougie , L. , Whitfield , J. : Networks: expanding evolutionary thinking . Trends Genet . 29 ( 8 ), 439 - 441 ( 2013 )
3. Cardona , G. , Llabrés , M. , Rosselló , F. , Valiente , G. : Metrics for phylogenetic networks I: generalizations of the Robinson-Foulds metric . IEEE/ACM Trans. Comput. Biol. Bioinform . 6 ( 1 ), 46 - 61 ( 2009 )
4. Choy , C. , Jansson , J. , Sadakane , K. , Sung , W.-K. : Computing the maximum agreement of phylogenetic networks . Theor. Comput. Sci . 335 ( 1 ), 93 - 107 ( 2005 )
5. Gambette , P. , Huber , K.T.: On encodings of phylogenetic networks of bounded level . J. Mol. Biol . 65 ( 1 ), 157 - 180 ( 2012 )
6. Garey , M.R. , Johnson , D.S. : Computers and intractability, W. H. Freeman and Co., 1979 , A guide to the theory of NP-completeness , A Series of Books in the Mathematical Sciences
7. Gusfield , D. , Eddhu , S. , Langley , C. : Optimal, efficient reconstruction of phylogenetic networks with constrained recombination . J. Bioinform. Comput. Biol . 2 , 173 - 213 ( 2004 )
8. Holland , B. , Conner , G. , Huber , K.T. , Moulton , V. : Imputing supertrees and supernetworks from quartets . Syst. Biol . 56 , 57 - 67 ( 2007 )
9. Huber , K.T., van Iersel , L.J.J. , Moulton , V. , Wu , T.: How much information is needed to infer reticulate evolutionary histories? Syst . Biol. 64 , 102 - 111 ( 2014 )
10. Huber , K.T., van Iersel , L.J.J. , Kelk , S.M. , Suchecki , R.: A practical algorithm for reconstructing level-1 phylogenetic networks . IEEE/ACM Trans. Comput. Biol. Bioinform . 8 ( 3 ), 635 - 649 ( 2011 )
11. Huber , K.T. , Moulton , V. : Encoding and constructing 1-nested phylogenetic networks with trinets . Algorithmica 66 ( 3 ), 714 - 738 ( 2013 )
12. Huson , D. , Dezulian , T. , Klopper , T. , Steel , M. : Phylogenetic super-networks from partial trees . IEEE/ACM Trans. Comput. Biol. Bioinform . 1 ( 4 ), 151 - 158 ( 2004 )
13. Huson , D.H. , Rupp , R.: Summarizing multiple gene trees using cluster networks . Workshop on Algorithms in Bioinformatics (WABI). Lecture Notes in Computer Science 5251 , 296 - 305 ( 2008 )
14. Huson , D.H. , Scornavacca , C. : Dendroscope 3: an interactive tool for rooted phylogenetic trees and networks . Syst. Biol . 61 ( 6 ), 1061 - 1067 ( 2012 )
15. Huynh , T.N.D. , Jansson , J. , Nguyen , N.B. , Sung , W.-K. : Constructing a smallest refining galled phylogenetic network . Research in Computational Molecular Biology (RECOMB). Lecture Notes in Bioinformatics 3500 , 265 - 280 ( 2005 )
16. van Iersel, L.J.J. , Moulton , V. : Trinets encode tree-child and level-2 phylogenetic networks . J. Math. Biol . 68 ( 7 ), 1707 - 1729 ( 2014 )
17. Jansson , J. , Lingas , A. : Computing the rooted triplet distance between galled trees by counting triangles . J. Discret. Algorithms 25 , 66 - 78 ( 2014 )
18. Jansson , J. , Sung , W.-K. : Inferring a level-1 phylogenetic network from a dense set of rooted triplets . Theor. Comput. Sci . 363 ( 1 ), 60 - 68 ( 2006 )
19. Jansson , J. , Nguyen , N.B. , Sung , W.-K. : Algorithms for combining rooted triplets into a galled phylogenetic network . SIAM J. Comput . 35 ( 5 ), 1098 - 1121 ( 2006 )
20. Ma , B. , Wang , L. , Li , M. : Fixed Topology Alignment with Recombination , Combinatorial Pattern Matching (CPM 1998 ). Lecture Notes in Computer Science 1448 , 174 - 188 ( 1998 )
21. Pardi , F. , Scornavacca , C. : Reconstructible phylogenetic networks: do not distinguish the indistinguishable . PLoS Comput. Biol . 11 ( 4 ), e1004135 ( 2015 )
22. Poormohammadi , H. , Eslahchi , C. , Tusserkani , R.: Constructing rooted phylogenetic networks from rooted triplets . PLoS One 9 ( 9 ), e106531 ( 2014 )
23. Semple , C. , Steel , M. : Phylogenetics . Oxford University Press, Oxford ( 2003 )
24. Strimmer , K. , Von Haeseler , A. : Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies . Mol. Biol. Evol . 13 ( 7 ), 964 - 969 ( 1996 )
25. Yu , Y. , Dong , J. , Liu , K.J. , Nakhleh , L.: Maximum likelihood inference of reticulate evolutionary histories . Proc. Nat. Acad. Sci . 111 (46), 16448 - 16453 ( 2014 )