Throughout this section, a rooted forest always means a directed acyclic graph in which every node has indegree at most 1 and outdegree at most 2.
Let F be a rooted forest. The roots (respectively, leaves) of F are those nodes whose indegrees (respectively, outdegrees) are 0. The size of F, denoted by —F—, is the number of roots in F minus 1. A node v of F is unifurcate if it has only one child in F. If a root v of F is unifurcate, then contracting
v
in
F is the operation that modifies F by deleting v. If a nonroot node v of F is unifurcate, then contracting
v
in
F is the operation that modifies F by first adding an edge from the parent of v to the child of v and then deleting v.
For convenience, we view each node u of F as an ancestor and descendant of u itself. A node u is lower than another node v≠u in F if u is a descendant of v in F. The lowest common ancestor (LCA) of a set U of nodes in F is the lowest node v in F such that for every node u∈U, v is an ancestor of u in F. For a node v of F, the subtree of
F
rooted at
v is the subgraph of F whose nodes are the descendants of v in F and whose edges are those edges connecting two descendants of v in F. If v is a root of F, then the subtree of F rooted at v is a component tree of F. F is a rooted tree if it has only one root.
A rooted binary forest is a rooted forest in which the outdegree of every nonleaf node is 2. Let F be a rooted binary forest. F is a rooted binary tree if it has only one root. If v is a nonroot node of F with parent p and sibling u, then detaching the subtree of
F
rooted at
v is the operation that modifies F by first deleting the edge (p,v) and then contracting p. A detaching operation on F is the operation of detaching the subtree of F rooted at a nonroot node.
Hybridization networks and phylogenetic trees
Let X be a set of existing species. A hybridization network on X is a directed acyclic graph N in which the set of nodes of outdegree 0 (still called the leaves) is X, each nonleaf node has outdegree 2, there is exactly one node of indegree 0 (called the root), and each nonroot node has indegree larger than 0. Note that the indegree of a nonroot node in N may be larger than 1. A node of indegree larger than 1 in N is called a reticulation node of N. Intuitively speaking, a reticulation node corresponds to a reticulation event. The hybridization number of a reticulation node in N is its indegree in N minus one. The hybridization number of N is the total hybridization number of reticulation nodes in N.
A phylogenetic tree on X is a rooted binary tree whose leaf set is X. A hybridization network N on X
displays a phylogenetic tree T on X if N has a subgraph M such that M is a rooted tree, the root of M has exactly two children in M, and modifying M by contracting its unifurcate nodes yields T. A hybridization network of two phylogenetic trees T
_{1} and T
_{2} on X is a hybridization network N on X such that N displays both T
_{1} and T
_{2}. A hybridization network of T
_{1} and T
_{2} is minimum if its hybridization number is minimized among all hybridization networks of T
_{1} and T
_{2}. Obviously, if N is a minimum hybridization network of T
_{1} and T
_{2}, then the indegree of every reticulation node in N is exactly 2 and hence the hybridization number of N is equal to the number of reticulation nodes in N. For convenience, we define the hybridization number of T
_{1} and T
_{2} to be the minimum hybridization number of a hybridization network of T
_{1} and T
_{2}.
We are now ready to define one problem studied in this paper:
Hybridization Network Construction (HNC):
Agreement forests
Throughout this subsection, let T
_{1} and T
_{2} be two phylogenetic trees on the same set X of species. If we can apply a sequence of detaching operations on each of T
_{1} and T
_{2} so that they become the same forest F, then we refer to F as an agreement forest (AF) of T
_{1} and T
_{2}. A maximum agreement forest (MAF) of T
_{1} and T
_{2} is an agreement forest of T
_{1} and T
_{2} whose size is minimized over all agreement forests of T
_{1} and T
_{2}. The size of an MAF of T
_{1} and T
_{2} is called the rSPR distance between T
_{1} and T
_{2}. The following lemma is shown in [16].
Lemma 1
[16] Given two phylogenetic trees T
_{1} and T
_{2}, we can compute the rSPR distance between T
_{1} and T
_{2} in O(2.4^{2d
}
n) time, where n is the number of leaves in T
_{1} and T
_{2} and d is the rSPR distance between T
_{1} and T
_{2}.
Let F be an agreement forest of T
_{1} and T
_{2}. Obviously, for each i∈{1,2}, the leaves of T
_{
i
} onetoone correspond to the leaves of F. For convenience, we hereafter identify each leaf v of F with the leaf of T
_{
i
} corresponding to v. Similarly, for each i∈{1,2}, the nonleaf nodes of F correspond to distinct nonleaf nodes of T
_{
i
}. More precisely, a nonleaf node u of F corresponds to the LCA of {_{
v1},…,_{
v
ℓ
}} in T
_{
i
}, where v
_{1}, …, _{
v
ℓ
}are the leaf descendants of u in F. Again for convenience, we hereafter identify each nonleaf node u of F with the nonleaf node of T
_{
i
} corresponding to u. With these correspondences, we can use F, T
_{1}, and T
_{2} to construct a directed graph G
_{
F
} as follows:

The nodes of G
_{
F
} are the roots of F.

For every two roots _{
r1}and _{
r2}of F, there is an edge from _{
r1}to _{
r2}in G
_{
F
} if and only if _{
r1}is an ancestor of _{
r2}in T
_{1} or T
_{2}.
We refer to G
_{
F
} as the decision graph associated with
F. If G
_{
F
} is acyclic, then F is an acyclic agreement forest (AAF) of T
_{1} and T
_{2}; otherwise, F is a cyclic agreement forest (CAF) of T
_{1} and T
_{2}. If F is an AAF of T
_{1} and T
_{2} and its size is minimized over all AAFs of T
_{1} and T
_{2}, then F is a maximum acyclic agreement forest (MAAF) of T
_{1} and T
_{2}. Note that our definition of an AAF is the same as those in [15, 17] but is different from that in [16]. Moreover, it is known that the size of an MAAF of T
_{1} and T
_{2} is equal to the hybridization number of T
_{1} and T
_{2}[18]. The following lemma is shown in [19]:
Lemma 2
[19] Suppose that C is a cycle of G
_{
F
} and r
_{1}, …, _{
r
ℓ
}are the nodes of C. Then, each _{
r
j
}∈{_{
r1},…,_{
r
ℓ
}} has two children u
_{
j
} and u’
_{
j
} in F. Moreover, for every nonroot node v of F not contained in {_{
u1}
,…,_{
u
ℓ
}
}, C remains a cycle in G
_{
F
} after F is modified by detaching the subtree of F rooted at v.
Let N be a minimum hybridization network of T
_{1} and T
_{2}. Suppose that we modify N to obtain a forest F(N) by first removing all edges entering reticulation nodes, then removing those nodes v such that neither v nor its descendants are in X, and further contracting all unifurcate nodes. Obviously, F(N) is an AAF of T
_{1} and T
_{2} and the size of F(N) is exactly the hybridization number of N. So, each MAAF F’ of T
_{1} and T
_{2}
represents the set of all minimum hybridization networks N such that F(N) is the same as F’. Thus, to enumerate a representative set of minimum hybridization networks of T
_{1} and T
_{2}, the idea in previous work [6] has been to enumerate all MAAFs of T
_{1} and T
_{2} and construct a minimum hybridization network for each enumerated MAAF. Since we can easily use an MAAF of T
_{1} and T
_{2} to construct a hybridization network displaying T
_{1} and T
_{2}[6], the difficulty is in how to enumerate all MAAFs of T
_{1} and T
_{2}.
We are now ready to define another problem studied in this paper:
Hybridization Network Enumeration (HNE):

Input: Two phylogenetic trees T
_{1} and T
_{2} on the same set X of species.

Input: Two phylogenetic trees T
_{1} and T
_{2} on the same set X of species.

Goal: To enumerate all MAAFs of T
_{1} and T
_{2} and construct a minimum hybridization network of T
_{1} and T
_{2} from each MAAF of T
_{1} and T
_{2}.
Basically, HNE is the problem of enumerating a representative set of minimum hybridization networks of two given phylogenetic trees. As in previous studies [5, 6, 8], when we consider HNC and HNE, we always assume that each given phylogenetic tree has been modified by first introducing a new root and a dummy leaf and then letting the old root and the dummy leaf be the children of the new root.
The following lemma is shown in [19]:
Lemma 3
[19] The dummy leaf alone does not form a component tree of an MAAF of T
_{1} and T
_{2}.
Extending Whidden et al.’s Algorithm
Throughout this subsection, let T
_{1} and T
_{2} be two phylogenetic trees on the same set X of species. We sketch the fastest known algorithm (due to Whidden et al.[16]) for computing an MAF of T
_{1} and T
_{2}, and then state a slight extension of the algorithm that will be used in our algoirthm for HNE.
The basic idea behind Whidden et al.’s algorithm is as follows. For k = 0, 1, 2, …(in this order), we try to find an AF of T
_{1} and T
_{2} of size k and stop immediately once such an AF is found. To find an AF of T
_{1} and T
_{2} of size k, we start by setting _{
F1}=_{
T1} and _{
F2}=_{
T2} and associating a label set{x} to each leaf x of F
_{1} and F
_{2}. We then repeatedly modify F
_{1} and F
_{2} (until either _{
F1}>k or F
_{1} becomes a forest without edges) as follows. We find two arbitrary sibling leaves u and v in F
_{2}. If u and v are also siblings in F
_{1}, then we modify F
_{1} and F
_{2} separately by merging the identical subtrees of F
_{1} and F
_{2} rooted at the parent of u and v each into a single leaf whose label set is the union of the label sets of u and v. On the other hand, if u and v are not siblings in F
_{1}, then we distinguish three cases as follows. Case 1:
u and v are in different component trees of F
_{1}. In this case, in order to transform F
_{1} and F
_{2} into an AF of T
_{1} and T
_{2}, we have two choices to modify them, namely, by either detaching the subtree rooted at u or detaching the subtree rooted at v. Case 2:
u and v are in the same component tree of F
_{1} and either (1) u and the parent of v are siblings in F
_{1} or (2) v and the parent of u are siblings in F
_{1}. In this case, if (1) (respectively, (2)) holds, then we modify F
_{1} by detaching the subtree rooted at the sibling of v (respectively, u). Case 3:
u and v are in the same component tree of F
_{1} and neither (1) nor (2) in Case 2 holds. In this case, in order to transform F
_{1} and F
_{2} into an AF of T
_{1} and T
_{2}, we have three choices to modify them. The first two choices are the same as those in Case 1. In the third choice, we modify F
_{1} by detaching the subtrees rooted at those nonroot nodes w such that the parent of w appears on the (not necessarily directed) path between u and v in F
_{1} but w does not.
By the above three cases, we always have the following:
 (a)
 (b)
All component trees of F
_{2} except at most one have no edges.
 (c)
For each component tree _{
Γ2}of F
_{2} without edges, F
_{1} has a component tree _{
Γ1}without edges such that the label sets associated with the unique leaves of _{
Γ1}and _{
Γ2}are identical.
Once _{
F1} becomes larger than k, we know that F
_{1} and F
_{2} have no AF of size k. On the other hand, once F
_{1} becomes a forest without edges, we can use the label sets L(v) of the leaves v of F
_{1} to obtain an AF of T
_{1} and T
_{2} of size _{
F1} by modifying T
_{1} as follows. For each leaf v of F
_{1} such that L(v) does not contain the dummy leaf, detach the subtree of T
_{1} rooted at the LCA of the leaves in L(v).
Now, we are now ready to make a key observation in this paper. By (b) and (c) in the above, Whidden et al.’s MAF algorithm can actually be used to solve the following slightly more general problem in O(2.4^{2k
}
n) time:
rSPR Distance Checking (rSPRDC):

Input:(_{
T1},_{
T2},k,_{
F1},_{
F2}), where T
_{1} and T
_{2} are two phylogenetic trees on the same set X of species, k is an integer, F
_{1} (respectively, F
_{2}) is a rooted forest obtained from T
_{1} (respectively, T
_{2}) by performing zero or more detaching operations, and every component tree of F
_{2} except at most one is identical to a component tree of F
_{1}.

Goal: To decide if performing k more detaching operations on F
_{1} leads to an AF of T
_{1} and T
_{2}.
Finally, if we want to enumerate all MAFs of T
_{1} and T
_{2}, then we need to modify Whidden et al.’s algorithm as follows. First, we do not distinguish Cases 2 and 3 because modifying F
_{1} as in Case 2 may lose some MAF of T
_{1} and T
_{2}. Moreover, whenever an AF of T
_{1} and T
_{2} of size k is found, we do not stop immediately and instead continue to find other AFs of T
_{1} and T
_{2} of size k. The resulting algorithm runs more slowly, namely, in O(^{3k
}
n) time.
Speeding up HybridNet
Throughout this subsection, let T
_{1} and T
_{2} be two phylogenetic trees on the same set X of species. We first sketch how HybridNet enumerates all MAAFs of T
_{1} and T
_{2}, and then explain how to speed it up.
First, we need several definitions. For a rooted forest F, we use (F) to denote the family of the leaf sets of the component trees of F. Let F and F’ be two forests each obtained by performing zero or more detaching operations on T
_{1}. If F≠^{
F
″
}and for every set
, there is a set
with
, then we say that F is finer than
F’ and F’ is coarser than
F.
To enumerate all MAAFs of T
_{1} and T
_{2}, the idea behind HybridNet is to design an algorithm for the following problem:
Generalized Agreement Forest (GAF)

Input:(_{
T1},_{
T2},k,_{
F1}), where T
_{1} and T
_{2} are two phylogenetic trees on the same set X of species, k is an integer, and F
_{1} is a rooted forest obtained from T
_{1} by performing zero or more detaching operations.

Goal: To find a sequence of AFs of T
_{1} and T
_{2} including all AFs F of T
_{1} and T
_{2} such that (1) F can be obtained by performing at most k detaching operations on F
_{1} (or equivalently, at most _{
F1} + kdetaching operations on T
_{2}) and (2) no AF of T
_{1} and T
_{2} is finer than F
_{1} and coarser than F.
In the supplementary material of [19], an O(^{3k
}
n)time algorithm for solving GAF is detailed. The algorithm differs from Whidden et al.’s algorithm for enumerating all MAFs of T
_{1} and T
_{2} only in that we start with F
_{1} (as it is given) and _{
F2}=_{
T2}(instead of starting with _{
F1}=_{
T1}and _{
F2}=_{
T2}) and then repeatedly modify F
_{1} and F
_{2} until either _{
F1}>k + _{
k0}or F
_{1} becomes a forest without edges, where k
_{0} is the original size of F
_{1}. Now, we are now ready to make two other key observations in this paper. To speed up Chen and Wang’s algorithm for solving GAF, we modify it as follows:

Heuristic 1: Every time before we start to make multiple choices of modifying F
_{1} and F
_{2}, we call the algorithm for rSPRDC in Lemma 1 on input (_{
T1},_{
T2},k−_{
F1} + _{
k0},_{
F1},_{
F2}) to check if performing k−_{
F1} + _{
k0}more detaching operations on F
_{1} leads to an AF of T
_{1} and T
_{2}.
As the result, if we know that performing k−_{
F1} + _{
k0}more detaching operations on F
_{1} does not lead to an AF of T
_{1} and T
_{2}, then no more choice of modifying F
_{1} and F
_{2} is necessary; otherwise, we proceed to make multiple choices of modifying F
_{1} and F
_{2} the same as before but with the following difference:
The intuition behind Heuristic 2 is that if u and v are far apart in F
_{1}, then either u and v fall into two different connected components of F
_{1} so that we do not have to try Case 3 in the Extending Whidden et al.’s Algorithm section, or u and v fall into the same connected component of F
_{1} and we can detach a lot of subtrees from F
_{1} in Case 3.
Finally, to enumerate all MAAFs of T
_{1} and T
_{2}, we initialize k = 0 and then proceed as follows.
 1.
Simulate the spedup algorithm for GAF on input (
_{
T1},
_{
T2},
k,
_{
T1}). During the simulation, whenever an AF
F of
T
_{1} and
T
_{2} is enumerated, perform one of the following steps depending on whether
F is acyclic or not:
 (a)
If F is acyclic, output it.
 (b)
If F is cyclic, then output all AAFs F’ of T
_{1} and T
_{2} such that F’ can be obtained from F by performing k−F detaching operations on F.
 2.
If at least one AAF of T
_{1} and T
_{2} was outputted in Step 11a or 11b, then stop; otherwise, increase k by 1 and go to Step 1.
Note that Step 11b is nontrivial. As described in the supplementary material of [19], Lemma 2 is very helpful for this purpose. More specifically, we first find a cycle C in G
_{
F
}
^{
″
}
in O(^{
F
″
}
^{2}) time. By Lemma 2, in order to make F’ acyclic, we have to choose one node r of C and modify F’ by detaching the subtree of F’ rooted at an (arbitrary) child of r. Note that since r is a root of F’, detaching the subtree of F’ rooted at a child of r is achieved by simply deleting r from F’ and is hence independent of the choice of the child. Moreover, if the parent r’ of the dummy leaf in F’ is a node of C, then by Lemma 3, we can exclude r’ from consideration when choosing r. So, we have at most ^{
F
″
}≤k−1 ways to break C. After modifying F’ in this way, we again construct G
_{
F’
} and test if it is acyclic. If it is acyclic, then we can output F’; otherwise, we again find a cycle C in G
_{
F’
} and use it to modify F’ as before. We repeat modifying F’ in this way, until either F’ becomes acyclic, or ^{
F
″
}=kand G
_{
F’
} is still cyclic. Once F’ becomes acyclic, we output it. The total time taken by Step 11b is O(^{
k2}
^{(k−1)k−}^{
F
″
}), because we make a total number of at most O(^{(k−1)k−}^{
F
″
}) choices for breaking cycles.
Experiments show that Heuristics 1 and 2 help us speed up the algorithm substantially. However, the two heuristics may not help in the worst case. That is, we are unable to prove that the two heuristics improve the worstcase time complexity of the algorithm which is O(^{3d
}X + ^{3d
}
^{(k−1)k−d + 2}) (as shown in [19]), where d is the size of an MAF of T
_{1} and T
_{2}. We note that k and d are usually quite close.
The new algorithm for HNE
In this subsection, we only design an algorithm for HNE. Note that it is trivial to obtain a faster algorithm for HNC by modifying the algorithm for HNE so that it stops immediately once an MAAF is found.
Throughout this subsection, let T
_{1} and T
_{2} be two phylogenetic trees on the same set X of species. As mentioned before, we can easily use an MAAF of T
_{1} and T
_{2} to construct a hybridization network displaying T
_{1} and T
_{2}[6]. So, we only explain how to enumerate all MAAFs of T
_{1} and T
_{2}.
In the last subsection, we have explained how to speed up HybridNet so that it can enumerate all MAAFs of T
_{1} and T
_{2} within shorter time. Indeed, we can make HybridNet even faster. The idea is to preprocess T
_{1} and T
_{2} so that the given trees become smaller or the problem becomes to solve two or more smaller independent subproblems. More specifically, we perform the following two reductions on T
_{1} and T
_{2} until neither of them is available.
Subtree reduction
Suppose that T
_{1} has a nonleaf node v
_{1} and T
_{2} has a nonleaf node v
_{2} such that the subtree of T
_{1} rooted at v
_{1} is identical to the subtree of T
_{2} rooted at v
_{2}. Then, we modify T
_{1} (respectively, T
_{2}) by merging the subtree of T
_{1} (respectively, T
_{2}) rooted at v
_{1} (respectively, v
_{2}) into a single leaf whose label set is the union of the label sets of the merged leaves. It is known [2] that this reduction preserves the MAAFs of T
_{1} and T
_{2}.
Cluster reduction
Suppose that subtree reductions on T
_{1} and T
_{2} are not available but T
_{1} has a nonleaf node T
_{1} and T
_{2} has a nonleaf node T
_{2} such that the subtree of T
_{1} rooted at T
_{1} has the same leaf set as the subtree of T
_{2} rooted at T
_{2}. Then, we split T
_{1} (respectively, T
_{2}) into two trees T’
_{1} and T”
_{1} (respectively, T’
_{2} and T”
_{2}) as follows. T’
_{1} (respectively, T’
_{2}) is simply the subtree of T
_{1} (respectively, T
_{2}) rooted at T
_{1} (respectively, T
_{2}), while T”
_{1} (respectively, T”
_{2}) is obtained by merging the subtree T
_{1} (respectively, T
_{2}) rooted at T
_{1} (respectively, T
_{2}) into a single leaf whose label set is the union of the label sets of the merged leaves. It is known [20] that the set of MAAFs of T
_{1} and T
_{2} is the Cartesian product of the set of MAAFs of T’
_{1} and T’
_{2} and the set of MAAFs of T”
_{1} and T”
_{2}.
After the preprocessing stage, if no cluster reduction has been performed in the preprocessing stage, then we run the spedup HybridNet (as described in the last subsection) on T
_{1} and T
_{2}; otherwise, we have obtained two or more subproblems. Suppose that we have h subproblems and the ith subproblem (1≤i≤h) is to enumerate all MAAFs of two trees _{
T1,i
}and _{
T2,i
}. Then, for each 1≤i≤h, we run the spedup HybridNet to enumerate the set _{
i
}of MAAFs of _{
T1,i
} and _{
T2,i
}. Finally, we output the Cartesan product _{1}×⋯×_{
h
}.