Generating normal networks via leaf insertion and nearest neighbor interchange

Zhang, Louxin

doi:10.1186/s12859-019-3209-3

Research
Open access
Published: 17 December 2019

Generating normal networks via leaf insertion and nearest neighbor interchange

Louxin Zhang¹

BMC Bioinformatics volume 20, Article number: 642 (2019) Cite this article

1465 Accesses
8 Citations
1 Altmetric
Metrics details

Abstract

Background

Galled trees are studied as a recombination model in theoretical population genetics. This class of phylogenetic networks has been generalized to tree-child networks and other network classes by relaxing a structural condition imposed on galled trees. Although these networks are simple, their topological structures have yet to be fully understood.

Results

It is well-known that all phylogenetic trees on n taxa can be generated by the insertion of the n-th taxa to each edge of all the phylogenetic trees on n−1 taxa. We prove that all tree-child (resp. normal) networks with k reticulate nodes on n taxa can be uniquely generated via three operations from all the tree-child (resp. normal) networks with k−1 or k reticulate nodes on n−1 taxa. Applying this result to counting rooted phylogenetic networks, we show that there are exactly $\frac {(2n)!}{2^{n} (n-1)!}-2^{n-1} n!$ binary phylogenetic networks with one reticulate node on n taxa.

Conclusions

The work makes two contributions to understand normal networks. One is a generalization of an enumeration procedure for phylogenetic trees into one for normal networks. Another is simple formulas for counting normal networks and phylogenetic networks that have only one reticulate node.

Background

Phylogenetic networks have been used to date both vertical and horizontal genetic transfers in evolutionary genomics and population genetics in the past two decades [1–3]. A rooted phylogenetic network (RPN) is a directed acyclic digraph in which all the sink nodes are of indegree 1 and a unique source node is designated as the root, where the former represent a set of taxa (e.g, species, genes, or individuals in a population) and the latter represents the least common ancestor of the taxa. Moreover, the other nodes in a RPN are divided into tree nodes and reticulate nodes, where reticulate nodes represent reticulate evolutionary events such as horizontal genetic transfers and genetic recombination.

The topological properties of RPNs are much more complicated than phylogenetic trees [2, 4, 5]. Therefore, different mathematical issues arise in the study of RPNs. First, phylogenetic reconstruction problems are often NP-hard even for trees [6, 7]. As such, a phylogenetic reconstruction method often uses nearest neighbor interchanges (NNIs) or other rearrangement operations to search for an optimal tree or network [8, 9]. Recently, different variants of NNI have been proposed for RPNs [10–16].

Second, to develop efficient algorithms for NP-complete problems on RPNs, simple classes of RPNs have been introduced, including galled trees [17, 18], tree-child networks (TCNs) [19], normal networks [20], reticulation-visible networks [4] and tree-based networks [21, 22] (see also [5, 23]). For instance, a RPN is a TCN if every non-leaf node has a child that is a tree node or a leaf. Although these network classes have been intensively investigated, their topological structures remain unclear [5, 24]. How to efficiently enumerate and count normal networks remains unclear [25–30].

This work makes two contributions to understanding TCNs and normal networks. It is a well-known fact that all phylogenetic trees on n taxa can be generated by inserting the n-th taxa in every edge of all the phylogenetic trees on n−1 taxa. We prove that all TCNs with k reticulate nodes on n taxa can be uniquely generated via three operations from TCNs with k−1 or k reticu- late nodes on n−1 taxa (Theorem 1, “Generating TCNs and normal networks” section). Using this fact, we obtain recurrence formulas for counting TCNs and normal networks (“Counting formulas” section). In particular, simple formulas are given for the number of RPNs and normal networks with one reticulate node, respectively.

Methods

Basic notation

A RPN over a finite set of taxa X is an acyclic digraph such that:

there is a unique node of indegree 0 and outdegree 1, called the root;
there are exactly |X| nodes of outdegree 0 and indegree 1, called the leaves of the RPN, each being labeled with a unique taxon in X;
each non-leaf/non-root node is either a reticulate node that is of indegree 2 and outdegree 1, or a tree node that is of indegree 1 and outdegree 2; and
there are no parallel edges between a pair of nodes.

Two RPNs are drawn in Fig. 1, where each edge is directed away from the root and both the root and edge orientation are omitted. For a RPN N, we use ${\mathcal {V}}(N), {\mathcal {R}}(N), {\mathcal {T}}(N)$ and ${\mathcal {E}}(N)$ to denote the set of all nodes, the set of reticulate nodes and the set of tree nodes and the set of directed edges for N, respectively.

Let $u\in {\mathcal {V}}(N)$ and $v\in {\mathcal {V}}(N)$. The node u is said to be a parent (resp. a child) of v if $(u, v)\in {\mathcal {E}}(N)$ (resp. $(v, u)\in {\mathcal {E}}(N)$). Every reticulate node r has a unique child, named c(r), whereas every tree node t has a unique parent, named p(t). Furthermore, u is an ancestor of v or, equivalently, v is belowu if there is a direct path from the network root to v that contains u. We say that u and v are incomparable if neither of them is an ancestor of the other.

Let $e=(u, v)\in {\mathcal {E}}(N)$. It is a reticulate edge if v is a reticulate node and a tree edge otherwise. Hence, a tree edge leads to either a tree node or a leaf.

A phylogenetic tree is simply a RPN with no reticulate nodes.

A TCN is a RPN in which every non-leaf node has a child that is a tree node or a leaf or, equivalently, there is a path from every non-leaf node to some leaf that consists only of tree edges. Both RPNs in Fig. 1 are tree-child.

A normal network is a TCN in which every reticulate node satisfies the following condition:

(The normal condition) The two parents are incomparable.

The first PRN in Fig. 1 is not normal, as a parent of the left most reticulate node is an ancestor of the other in the network.

Generating TCNs and normal networks

We define the following rearrangement operations for TCNs N on [1,n], which are illustrated in Fig. 2:

Leaf insertion For a tree edge $e=(u, v)\in {\mathcal {E}}(N)$, insert a new node w to subdivide e and attach Leaf n+1 below w as its child. The resulting network is denoted by Leaf-Insert(N,e,n+1), in which w is a tree node.
Fig. 2
Insertion and child rotations for tree-child networks. a Leaf 3 is attached to a tree edge. b The reticulation insertion is applied to attach a new reticulate node r onto two tree edges. The child rotation swaps the tree node w (yellow) and Leaf 4. Here, green nodes and edges are added nodes
Full size image
Reticulation insertion For a pair of tree edges e₁=(u₁,v₁) and e₂=(u₂,v₂) of N, which are not necessarily distinct, insert a new node w₁ to subdivide e₁ and a new node w₂ to subdivide e₂, attach a new reticulate node r as the common child of w₁ and w₂ and make Leaf (n+1) to be the child of r. In this case, we say that rstraddles e₁ and e₂. We use Ret-Insert(N,e₁,e₂,n+1) to denote the resulting network. We simply write Ret-Insert(N,e,n+1) if e₁=e₂=e.
Child rotation Let r be a reticulate node with parents $u \in {\mathcal {T}}(N)$ and v. If u is not an ancestor of v, exchange the unique child of r and the other child of u. The resulting network is denoted by C-Rotate(N,u,r).

Note that a child rotation is a special case of the rNNI rearrangement introduced by Gambette et al. in [12]. Let ${\mathcal {T}CN}_{k}(n)$ denote the set of TCNs with k reticulations on [1,n].

Proposition 1

Let $M\in {\mathcal {T}CN}_{k}(n)$ and let e₁ and e₂ be two tree edges of M. Then,

$$\begin{array}{@{}rcl@{}} \text{Ret-Insert}\left(M, e_{1}, e_{2}, n+1\right)\in {\mathcal{T}CN}_{k+1}(n+1),\\ \text{Ret-Insert}(M, e_{1}, n+1)\in {\mathcal{T}CN}_{k+1}(n+1). \end{array} $$

Proof

The second statement is a special case of the first. Let e₁=(u,v) and e₂=(x,y). Since e₁ and e₂ are tree edges, both v and y are tree nodes or leaves. Let r be the added reticulate node. Then the parents of r have v and y as their child, respectively, the nodes u and x have the parents of r as their tree node child; Leaf n+1 is the tree child r. Additionally, all the other nodes have the same children as in M. Therefore, Ret-Insert(M,e₁,e₂,n+1) is a TCN. □

Proposition 2

Let $M\in {\mathcal {T}CN}_{k+1}(n+1)$. Assume that $r\in {\mathcal {R}}(M)$ and its parents are u and v such that u is not an ancestor of v in M. Then,

$$\begin{array}{@{}rcl@{}} \text{C-Rotate}\left(M, u, r \right)\in {\mathcal{T}CN}_{k+1}(n+1). \end{array} $$

Proof

Let M^′=C-Rotate(M,u,r). Since u is a parent of r and M is tree-child, u is a tree node. Let w be the other child of u and let z be the unique child of r. Since M is tree-child, z and w are tree nodes (see Fig. 2). The tree node z becomes the child of u Therefore, every node also has a child that is a tree node or a leaf in M^′. □

By definition, w becomes a child of r and z becomes a tree node child of u in M^′. If M^′ contains a directed cycle C, C must contain v and w, implying that u is an ancestor of v in M, a contradiction. Therefore, M^′ is acyclic and $M' \in {\mathcal {T}CN}_{k+1}(n+1)$.

Proposition 3

Let $N\in {\mathcal {T}CN}_{k+1}(n+1)$.

(i) If Leaf (n+1) is the child of a reticulate node r, N can then be obtained from an $M\in {\mathcal {T}CN}_{k}(n)$ via a reticulation insertion.

(ii) If Leaf (n+1) is the child of a tree node t and the sibling of n+1 is also a tree node, N can then obtained from an $M\in {\mathcal {T}CN}_{k+1}(n)$ via a leaf insertion.

(iii) If Leaf (n+1) is a child of a tree node t and the sibling of n+1 is a reticulate node, N can then be obtained from an $M\in {\mathcal {T}CN}_{k+1}(n+1)$ via a child rotation.

Proof

(i) Let r have parents u₁ and u₂ in N. Since N is a TCN, u₁ and u₂ are tree nodes and so are their children other than r. Let w_i and v_i be the parent and the child of u_i such that v_i≠r, respectively, for each i=1,2. Since r is a reticulate node, v₁ and v₂ are tree nodes. Without loss of generality, we assume that u₂ is not the parent of u₁. There are two cases for consideration.

If u₁ is the parent of u₂, then u₁=w₂ and u₂=v₁ (Fig. 3a). Removing Leaf (n+1),u₁ and u₂ (together with incident edges) and adding an edge e=(w₁,v₂) produce a TCN M with k reticulations such that N=Ret-Insert(M,e,n+1).

If u₁ is not the parent of u₂, then, w₁≠u₂ (Fig. 3b). After removing Leaf n+1,u₁ and u₂ (together with incident edges) and adding two edges e_i=(w_i,v_i) (i=1,2), we obtain a TCN M such that N=Ret-Insert(M,e₁,e₂,n+1).

(ii) Let u be the parent of t and let v be the sibling of Leaf n+1 (Fig. 3c). By assumption, v is a tree node. After removing t and Leaf (n+1) (together with incident edges) and adding e=(u,v), we obtain a TCN $M\in {\mathcal {T}CN}_{k+1}(n)$ such that N=Leaf-Insert(M,(u,v),n+1).

(iii) Let y be the sibling of n+1 that is a reticulate node (Fig. 3d). Let z be the child of y and let M=C-Rotate(N,t,y). Since z is below y and y is below t in N, neither attaching the tree node z below t nor attaching Leaf (n+1) below y generates a directed cycle in M. Hence, $M\in {\mathcal {T}CN}_{k+1}(n+1)$ in which Leaf (n+1) is the child of a reticulate node y such that N=C-Rotate(M,t,y). □

Proposition 4

Let $N_{1}, N_{2} \in {\mathcal {T}CN}_{k}(n)$.

(i) Leaf-Insert(N₁,e₁,n+1) is identical to Leaf-Insert(N₂,e₂,n+1) iff N₁=N₂ and e₁=e₂.

(ii) Ret-Insert(N₁,e₁,e1′,n+1) is identical to Ret-Insert(N₂,e₂,e2′,n+1) iff N₁=N₂.

(iii) Assume the parent of Leaf n is a reticulate node y_i in N_i for i=1,2. C-Rotate(N₁,x₁,y₁) is identical to C-Rotate(N₂,x₂,y₂) iff N₁=N₂.

Proof

(i) Let $N_{i} \in {\mathcal {T}CN}_{k}(n)$ and $e_{i}\in {\mathcal {V}}(N_{i}), i=1, 2$. Let M₁=Leaf-Insert(N₁,e₁,n+1) and M₂=Leaf-Insert(N₂,e₂,n+1) such that M₁=M₂. Then, there exists a node mapping ϕ from M₁ to M₂ such that (i) it maps a leaf in M₁ to the same leaf and (ii) $(\phi (u), \phi (v))\in {\mathcal {E}}(M_{2})$ if and only if $(u, v)\in {\mathcal {E}}(M_{1})$. Since n+1 is inserted as a leaf, ϕ maps the parent p₁ of (n+1) in M₁ to the parent p₂ of n+1 in M₂, implying that ϕ induces an isomorphic mapping from N₁ to N₂. This proves the necessity condition. The sufficient condition is straightforward.

(ii) and (iii) Both statement can be proved similarly. The proposition is proved. □

Figure 4 show how to generate the left TCN given in Fig. 1.

Results

Main theorems

Taken together, Propositions 1–4 imply the following theorem.

Theorem 1

Each TCN of ${\mathcal {T}CN}_{k+1}(n+1)$ can be obtained from either (i) a unique TCN of ${\mathcal {T}CN}_{k+1}(n)$ by attaching Leaf n+1 to a tree edge or (ii) a unique TCN $N\in {\mathcal {T}CN}_{k}(n)$ by applying one of the following operations:

Insertion of a reticulate node r with the child Leaf (n+1) into a tree edge or straddling two tree edges;
Insert r into a tree edge (u,v), as described in (a), and then conduct the child rotation to switch the child of r and the tree node child of v.
Insert r straddling two tree edges e^′=(u^′,v^′) and e^′′=(u^′′,v^′′), as described in (a), and then conduct the child rotation to switch the child of r and the tree node child of v^′′ (resp. v^′) if u^′′ (resp. u^′) is not an ancestor of u^′ (resp. u^′′).

If we restrict the operations on normal networks, we obtain all the normal networks in ${\mathcal {T}CN}_{k+1}(n+1)$. However, inserting a reticulate node and then applying the child rotation may lead to a scenario that a reticulation no longer satisfy the normal condition (Fig. 5). Hence, the child-rotation operation should be taken after some verification when all normal networks are enumerated.

Theorem 2

Each normal network of ${\mathcal {T}CN}_{k+1}(n+1)$ can be obtained from either (i) a unique normal network in ${\mathcal {T}CN}_{k+1}(n)$ by attaching Leaf n+1 to a tree edge or (ii) a unique normal network $N\in {\mathcal {T}CN}_{k}(n)$ by applying one of the following operations for each pair of incomparable edges e₁=(u₁,v₁) and e₂=(u₂,v₂) in N:

Insert a reticulate node r with the child (n+1) straddling e₁ and e₂. Let p_i be the tree node inserted into e_i for i=1,2.
Insert r as described in (a) and then conduct the child rotation to make v₁ to be the child of r and n+1 the child of p₁, respectively, unless a reticulate edge (x,y) exists in N (Fig. 5) such that:
- y is below v₁;
- x is not an ancestor of v₁;
- x is an ancestor of v₂.
Insert r as described in (a) and then conduct the child rotation to make v₂ to be the child of r and n+1 the child of p₂, respectively, unless a reticulate edge (x,y) exists in N such that:
- y is below v₂;
- x is not an ancestor of v₂;
- x is an ancestor of v₁.

Proof

The statement for normal networks is based on the fact that if N is obtained from N^′ vis one of the three operations given in Theorem 1, that the normality of N implies the normality of N^′.

The conditions in (b) and (c) are used to exclude the child rotations that make the normal condition invalid for some existing reticulate nodes in the generated TCN. □

Counting formulas

Let N be a TCN. For a pair of edges (u₁,v₁) and (u₂,v₂) of N, they are incomparable if neither of v₁ and v₂ is an ancestor of the other. Let u(N) be the number of unordered pairs of incomparable edges in N and let:

$$\begin{array}{@{}rcl@{}} u_{n-1, k-1}=\sum_{N\in {\mathcal{T}CN}_{k-1}(n-1)} u(N). \end{array} $$

(1)

Define a_n,k to be $|{\mathcal {T}CN}_{k}(n)|, 0\leq k< n$ and b_n,k to be the number of normal networks in TCN_k(n),0≤k<n.

Theorem 3

(i) The a_n,k can be calculated through the following recurrence formula:

$$\begin{array}{@{}rcl@{}} a_{n,k}&=& (2n+k-3)\{a_{n-1,k}+(2n+k-4)a_{n-1, k-1}\} \\ &&+u_{n-1, k-1}, \end{array} $$

(2)

where a_2,0=1 and u_n−1,k−1 is defined in Eq. (1).

(ii) The b_n,1 can be calculated through the following recurrence formula:

$$\begin{array}{@{}rcl@{}} {}b_{n, n-1}&=&0, \\ b_{n, 1}&=& (2n-2)b_{n-1, 1}+3u_{n-1, 0}\quad n>2, \end{array} $$

(3)

where u_n−1,0 is the total number of unordered pairs of incomparable edges in all the phylogenetic trees on n−1 taxa.

Proof

(i) The unique tree on two taxa is a TCN and thus a_2,0=1.

Each TCN of ${\mathcal {T}CN}_{k}(n-1)$ has 2n+k−3 tree edges and Leaf n can be attached to each of these edges. The first term of the right hand side of Eq. (2) counts the TCNs obtained by applying the leaf insertion in Theorem 1.

Consider $N\in {\mathcal {T}CN}_{k-1}(n-1)$. N has n−1 leaves, n+k−3 tree nodes, and thus 2n+k−4 tree edges. The reticulation insertion can be used on a single edge or a pair of edges in N. Thus, we can insert a reticulate node r with the child Leaf n in $2n+k-4 +{2n+k-4 \choose 2}=(2n+k-3)(2n+k-4)/2$ possible ways. After the insertion of r in a tree edge (u,v), we can apply a child rotation to exchange Leaf n with v, as u is not an ancestor of v after r was inserted. Similarly, after r is connected to a pair of edges e₁ and e₂, we can apply a child rotation once if one edge is below the other and in two possible ways if neither is an ancestor of the other.

In summary, for each unordered pair of tree edges (e₁,e₂), we can generate three different tree child networks with k reticulations on [1,n] if they are incomparable and two otherwise. Thus, we have the second and third terms of the formula.

(ii) The fact that b_n,n−1=0 was first proved by Bickner [25]. □

In the case that n>2 and k≤n−2, Eq. (3) for b_n,1 follows from the following two facts:

Only two incomparable edges in normal networks in ${\mathcal {T}CN}_{k-1}(n-1)$ can be used to generate normal networks in ${\mathcal {T}CN}_{k}(n)$;
For each unordered pair of incomparable edges in a tree on [1,n−1], three normal networks can be obtained by applying insertion of reticulate node and two child rotations.

Unfortunately, we do not know how to obtain a simple formula for b_n,k in general. By Theorem 3, one still can compute the number of normal networks with k reticulate nodes on [1,n],b_n,k, by enumeration. For each 1≤k≤n−2 and 3≤n≤7,b_n,k is listed in Table 1.

Table 1 Counts of the normal networks with k reticulations on [n],1≤k≤n−2 and 3≤n≤7

Full size table

It is challenging to obtain a simple formula for counting u_n,k for arbitrary k. But we can find a closed formula for u_n,0 and thus obtain a recurrence formula for a_n,1 and b_n,1.

Lemma 1

For any n≥2, the total number of unordered pairs of incomparable edges in all the phylogenetic trees on n taxa is:

$$\begin{array}{@{}rcl@{}} u_{n, 0}=\frac{(n+1)(2n)!}{2^{n}(n)!} -2^{n}n! \end{array} $$

(4)

Proof

Let T be a phylogenetic tree on [1,n−1] and let O(T) denote the set of ordered pairs of comparable edges in T, where that (x,y)∈O(T) means the edge x is above the edge y. Then, it is not hard to verify:

$$\begin{array}{@{}rcl@{}} {}O(T)&=&\cup_{e\in {\mathcal{E}}(T)} \{(e, x) \;|\; x\in {\mathcal{E}}(T) \text{ s.t. \text{e} is above \text{x}}\}\\ &=&\cup_{e\in {\mathcal{E}}(T)} \{(y, e) \;|\; y\in {\mathcal{E}}(T) \text{ s.t. \text{e} is below \text{y}}\} \end{array} $$

□

Assume T^′ is obtained from T by attaching Leaf n in an edge e=(u,v). In T^′, the parent w of Leaf n is the tree node inserted in e, implying that e is subdivided into two edges of T^′:

$$e_{1}=(u, w), \;\;e_{2}=(w, v),$$

and

$${\mathcal{E}}(T')=\{e_{1}, e_{2}, (w, n)\}\cup {\mathcal{E}}(T) - \{e\}. $$

Thus,

$$\begin{array}{@{}rcl@{}} O(T') &=& \{(e_{1}, e_{2}), (e_{1}, (w, n))\} \\ && \cup \{(x, y) \in O(T) \;|\; x\neq y, x\neq e \neq y \}\\ & & \cup \{ (e', e_{1}), (e', e_{2}), (e', (w, n)) \;|\; (e', e)\in O(T)\} \\ & & \cup \{(e_{1}, e^{\prime\prime}), (e_{2}, e^{\prime\prime}) \;|\; (e, e^{\prime\prime})\in O(T)\}. \end{array} $$

Hence,

$$\begin{array}{@{}rcl@{}} && |O(T')|\\ &= &|O(T)|+2+2|\{(x, e) \;|\; x\in {\mathcal{E}}(T): (x, e)\in O(T) \}|\\ && + |\{(e, y) \;|\; y\in {\mathcal{E}}(T) \text{ s.t. \text{y} is below \text{e}}\}|. \end{array} $$

Since T has 2n−3 edges,

$$\begin{array}{@{}rcl@{}} {} \sum_{T'\in \mathcal{LI}(T, n)}|O(T')| &=&(|{\mathcal{E}}(T)|+3) |O(T)|+2 |{\mathcal{E}}(T)| \\ &=&2n\times O(T)+2(2n-3). \end{array} $$

(5)

where $\mathcal {LI}(T, n)$ denotes the set of 2n−3 phylogenetic trees that are obtained by a Leaf-Insertion on T.

Let c_n be the total number of unordered pairs of comparable edges in all the phylogenetic trees on n taxa. Clearly, c₂=2. Since there are $\frac {(2n-4)!}{2^{n-2}(n-2)!}$ phylogenetic trees with n−1 leaves, which each have 2n−3 edges, Eq. (5) implies:

$$\begin{array}{@{}rcl@{}} c_{n}&=&2nc_{n-1}+\frac{2(2n-2)!}{2^{n-1}(n-1)!} \end{array} $$

or, equivalently,

$$\begin{array}{@{}rcl@{}} \frac{1}{n!}c_{n}&=&2\left(\frac{1}{(n-1)!}c_{n-1}\right)+\frac{(2n-2)!}{2^{n-2}n!(n-1)!}. \end{array} $$

Therefore,

$$\begin{array}{@{}rcl@{}} c_{n} &=& {n!2^{n}} \sum^{n-1}_{k=1} \frac{1}{(k+1)!}\frac{(2k)!}{2^{2k}(k)!}\\ &=& \frac{n!2^{n-1}}{\pi} \sum^{n-1}_{k=1} \int^{4}_{0}\left(\frac{x}{4}\right)^{k}\left(\frac{4-x}{x}\right)^{1/2}dx\\ &=& \frac{n!2^{n+1}}{\pi} \int^{1}_{0} (1-x^{n-1})\left(\frac{x}{1-x}\right)^{1/2}dx\\ &=& 2^{n} n! - \frac{(2n)!}{2^{n-1}n!}, \end{array} $$

where $\frac {(2k)!}{(k+1)!k!}$ is the k-th Catalan number C_k that is equal to the integral appearing above ([31]). Since there are $\frac {(2n-2)!}{2^{n-1}(n-1)!}$ phylogenetic trees on [1,n] each having (2n−1) edges,

$$\begin{array}{@{}rcl@{}} u_{n, 0}&=&\frac{(2n-2)!}{2^{n-1}(n-1)!} {2n-1 \choose 2} -c_{n}\\ & = & \frac{(2n-2)!}{2^{n-1}(n-1)!} (n-1)(2n-1) + \frac{(2n)!}{2^{n-1}n!} - 2^{n} n! \\ & = & \frac{(n+1)(2n)!}{2^{n}n!} -2^{n} n!. \end{array} $$

Theorem 4

For any n≥3, the numbers of TCNs and normal networks with exactly one reticulate node on n taxa are:

$$\begin{array}{@{}rcl@{}} a_{n, 1} &= & \frac{(2n)!}{2^{n} (n-1)!}-2^{n-1} n! \end{array} $$

(6)

and

$$\begin{array}{@{}rcl@{}} b_{n, 1} =\frac{(n+2)(2n)!}{2^{n}n!}-3\cdot 2^{n-1}n!, \end{array} $$

(7)

respectively.

Proof

Since $a_{n-1, 0}=\frac {(2n-4)!}{2^{n-2}(n-2)!}$, by Theorem 3,

$$\begin{array}{@{}rcl@{}} a_{n, 1}=2(n-1)a_{n-1, 1}+\frac{(3n-2)(2n-3)!}{2^{n-2}(n-2)!}-2^{n-1}(n-1)! \end{array} $$

or, equivalently,

$$\begin{array}{@{}rcl@{}} \frac{a_{n, 1}}{(n-1)!}=2\left(\frac{a_{n-1, 1}}{(n-2)!}\right)+\frac{(3n-2)(2n-2)!}{2^{n-1}((n-1)!)^{2}}-2^{n-1} \end{array} $$

Therefore, since a_2,1=2,

$$\begin{aligned} \frac{a_{n, 1}}{(n-1)!}&\\ &= 2^{n-2} \left(\frac{a_{2, 1}}{(2-1)!}\right)\\ &\quad+ \sum^{n-2}_{i=1} \frac{(3n-3i+1)(2n-2i)!}{2^{n-2i+1}((n-i)!)^{2}} - 2^{n-1} (n-2) \\ &= \sum^{n-2}_{i=1} \frac{(3n-3i+1)(2n-2i)!}{2^{n-2i+1}((n-i)!)^{2}} - 2^{n-1} (n-3) \\ &= 2^{n-1}\sum^{n-2}_{i=1} \frac{(3(n-i)+1)(2(n-i))!}{2^{2(n-i)}((n-i)!)^{2}} - 2^{n-1} (n-3) \\ &= 2^{n-1}\sum^{n-1}_{k=2} \frac{(3k+1)(2k)!}{2^{2k}(k!)^{2}} - 2^{n-1} (n-3) \\ &= 2^{n-1}\sum^{n-1}_{k=1} {2k\choose k} \frac{3k+1}{4^{k}} - 2^{n-1} (n-1). \end{aligned} $$

By induction, we can show that $\sum ^{n}_{k=0}{2k\choose k}4^{-k}=\frac {(2n+1)!}{2^{2n}(n!)^{2}}$ and $\sum ^{n}_{k=0}{2k\choose k}k4^{-k}=\frac {(2n+1)!}{3\cdot 2^{2n}n!(n-1)!}$. Continuing the above calculation, we obtain:

$${}\frac{a_{n, 1}}{(n-1)!}=\left\{\frac{(2n-1)!}{2^{(n-1)}(n-1)!}\left[\frac{1}{(n-2)!}+\frac{1}{(n-1)!}\right]-n\right\}.$$

□

This proves Eq. (6).

Similarly, by Theorem 3 and Lemma 1, we have:

$$\begin{array}{@{}rcl@{}} b_{n, 1}&=& 2(n-1)b_{n-1, 1} \\ & &+ 3\cdot \left(\frac{n(2n-3)!}{2^{n-2}(n-2)!}-2^{n-1}(n-1)!\right)\\ \end{array} $$

or, equivalently,

$$\begin{array}{@{}rcl@{}} \frac{b_{n, 1}}{(n-1)!}&=& 2\frac{b_{n-1, 1}}{(n-2)!} + 3 {2n-3\choose n-1}\frac{n}{2^{n-2}}-3\cdot 2^{n-1}. \end{array} $$

Since b_2,1=0,

$$\begin{array}{@{}rcl@{}} &&\frac{b_{n, 1}}{(n-1)!}\\ &=& 2^{n-2}\frac{b_{2, 1}}{(2-1)!}\\ && +3\cdot 2^{n} \cdot \sum^{n-2}_{i=1} {2n-2i-1\choose n-i}\frac{n-i+1}{2^{2n-2i}}\\ && -3\cdot 2^{n-1} (n-2)\\ &=& 3\cdot 2^{n} \cdot \sum^{n-1}_{k=1} {2k-1\choose k}\frac{k+1}{2^{2k}}-3\cdot 2^{n-1} (n-1)\\ &=& 3\cdot 2^{n-1} \cdot \sum^{n-1}_{k=1} {2k \choose k}\frac{k+1}{2^{2k}}-3\cdot 2^{n-1} (n-1)\\ &=&\frac{(n+2)(2n)!}{2^{n}n!}-3\cdot 2^{n-1}n!. \end{array} $$

This proves Eq. (7).

Remark 1

Every RPN with exactly one reticulate node is a TCN. Therefore, a_n,1 is actually the number of RPNs with one reticulate node.

Conclusions

It is well-known that all phylogenetic trees on n taxa can be generated by the insertion of the n-th taxa in each edge of all the phylogenetic trees on the first n−1 taxa. The main result of this work is a generalization of this fact into TCNs. This leads to a simple procedure for enumerating both normal networks and TCNs, the C-code for which is available upon request. It is fast enough to count all the normal networks on eight taxa. Recently, Cardona et al. introduced a novel operation to enumerate TCNs. Their program was successfully used to compute the exact number of tree-child networks on six taxa. Although our program is faster than theirs, it still cannot be used to count TCNs on eight taxa on a PC.

Another contribution of this work is Eq. (6) and (7) for counting RPNs with exactly one reticulate node. Semple and Steel [30] presented formulas for counting unrooted networks with one reticulate node. Since an unrooted networks can be oriented into a different number of RPNs, it is note clear how to use their results to derive a formula for the count of RPNs. Bouvel et al. [26] presented a formula for counting RPNs with one reticulate node. Our formula is much simpler than the formula given in [26].

Lastly, the following problem is open:

Is there a simple formula like Eq. (6) for the count of TCNs with k reticulate nodes on n taxa for each k>1?

Availability of data and materials

Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

Abbreviations

NNI:: Nearest neighbor interchange
RPN:: Rooted phylogenetic network
TCN:: Tree-child network

References

Doolittle WF. Phylogenetic classification and the universal tree. Science. 1999; 284(5423):2124–8.
Article CAS Google Scholar
Gusfield D. ReCombinatorics: the Algorithmics of Ancestral Recombination Graphs and Explicit Phylogenetic Networks. Cambridge, USA: MIT Press; 2014.
Book Google Scholar
Jain R, Rivera MC, Lake JA. Horizontal gene transfer among genomes: the complexity hypothesis. Proc Natl Acad Sci. 1999; 96(7):3801–6.
Article CAS Google Scholar
Huson DH, Rupp R, Scornavacca C. Phylogenetic Networks: Concepts, Algorithms and Applications. Cambridge, UK: Cambridge University Press; 2010.
Book Google Scholar
Steel M. Phylogeny: Discrete and Random Processes in Evolution. Philadelphia, USA: SIAM; 2016.
Book Google Scholar
Chor B, Tuller T. Finding a maximum likelihood tree is hard. J ACM (JACM). 2006; 53(5):722–44.
Article Google Scholar
Foulds LR, Graham RL. The steiner problem in phylogeny is np-complete. Adv Appl Math. 1982; 3(1):43–9.
Article Google Scholar
Felsenstein J. Inferring Phylogenies, vol. 2. Sunderland: Sinauer Associates; 2004.
Google Scholar
Yu Y, Dong J, Liu KJ, Nakhleh L. Maximum likelihood inference of reticulate evolutionary histories. Proc Natl Acad Sci. 2014; 111(46):16448–53.
Article CAS Google Scholar
Bordewich M, Linz S, Semple C. Lost in space? Generalising subtree prune and regraft to spaces of phylogenetic networks. J Theor Biol. 2017; 423:1–12.
Article Google Scholar
Francis A, Huber KT, Moulton V, Wu T. Bounds for phylogenetic network space metrics. J Math Biol. 2018; 76(5):1229–48.
Article Google Scholar
Gambette P, Van Iersel L, Jones M, Lafond M, Pardi F, Scornavacca C. Rearrangement moves on rooted phylogenetic networks. PLoS Comput Biol. 2017; 13(8):1005611.
Article Google Scholar
Huber KT, Linz S, Moulton V, Wu T. Spaces of phylogenetic networks from generalized nearest-neighbor interchange operations. J Math Biol. 2016; 72(3):699–725.
Article Google Scholar
Huber KT, Moulton V, Wu T. Transforming phylogenetic networks: Moving beyond tree space. J Theor Biol. 2016; 404:30–9.
Article Google Scholar
Janssen R, Jones M, Erdős PL, Van Iersel L, Scornavacca C. Exploring the tiers of rooted phylogenetic network space using tail moves. Bull Math Biol. 2018; 80(8):2177–208.
Article Google Scholar
Klawitter J, Linz S. On the subnet prune and regraft distance. Electron J Combin. 2019; 26:2–3.
Google Scholar
Gusfield D, Eddhu S, Langley C. The fine structure of galls in phylogenetic networks. INFORMS J Comput. 2004; 16(4):459–69.
Article Google Scholar
Wang L, Zhang K, Zhang L. Perfect phylogenetic networks with recombination. J Comput Biol. 2001; 8(1):69–78.
Article CAS Google Scholar
Cardona G, Rossello F, Valiente G. IEEE/ACM Trans Comput Biol Bioinforma (TCBB). 2009; 6(4):552–569.
Willson SJ. Unique determination of some homoplasies at hybridization events. Bull Math Biol. 2007; 69(5):1709–25.
Article Google Scholar
Francis AR, Steel M. Which phylogenetic networks are merely trees with additional arcs?Syst Biol. 2015; 64(5):768–77.
Article Google Scholar
Zhang L. On tree-based phylogenetic networks. J Comput Biol. 2016; 23(7):553–65.
Article CAS Google Scholar
Zhang L. Clusters, trees, and phylogenetic network classes. In: Bioinformatics and Phylogenetics. New York: Springer: 2019. p. 277–315.
Google Scholar
Gunawan AD, Yan H, Zhang L. Compression of phylogenetic networks and algorithm for the tree containment problem. J Comput Biol. 2019; 26(3):285–94.
Article CAS Google Scholar
Bickner DR. On normal networks. PhD thesis, Iowa State University, Department of Mathematics. 2012.
Bouvel M, Gambette P, Mansouri M. Counting level-k phylogenetic networks. arXiv preprint arXiv:1909.10460. 2019.
Cardona G, Pons JC, Scornavacca C. Generation of Binary Tree-Child phylogenetic networks. PLoS Computat Biol. 2019; 15(9):e1007347.
Article CAS Google Scholar
Fuchs M, Gittenberger B, Mansouri M. Counting phylogenetic networks with few reticulation vertices: tree-child and normal networks. Australas J Comb. 2019; 73(2):385–423.
Google Scholar
McDiarmid C, Semple C, Welsh D. Counting phylogenetic networks. Ann Comb. 2015; 19(1):205–24.
Article Google Scholar
Semple C, Steel M. IEEE/ACM Trans Comput Biol Bioinforma (TCBB). 2006; 3(1):84.
Stanley RP. Catalan Numbers. Cambridge, UK: Cambridge University Press; 2015.
Book Google Scholar

Download references

Acknowledgements

The author thanks Mike Steel for useful discussion of counting networks and comments on the manuscript and Yurui Chen for implementing the implementation of the enumeration method given in this work. He also thanks anonymous reviewers and Jonathan Klawitter for useful feedback on the first draft of this paper.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 20 Supplement 20, 2019: Proceedings of the 17th Annual Research in Computational Molecular Biology (RECOMB) Comparative Genomics Satellite Workshop: Bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-20-supplement-20.

Funding

Publication costs of this work are funded by Singapore’s Ministry of Education Academic Research Fund Tier-1 [grant R-146-000-238-114] and the National Research Fund [grant NRF2016NRF-NSFC001-026].

Author information

Authors and Affiliations

Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore, 119076, Singapore
Louxin Zhang

Authors

Louxin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The author conducted research and wrote the manuscript. The author read and approved the final manuscript.

Corresponding author

Correspondence to Louxin Zhang.

Ethics declarations

Ethics approval and consent to participate

This research did not involve any human subjects, human material, or human data. The ethics approval is not applicable.

Consent for publication

Not applicable.

Competing interests

The author declares that he has no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Zhang, L. Generating normal networks via leaf insertion and nearest neighbor interchange. BMC Bioinformatics 20 (Suppl 20), 642 (2019). https://doi.org/10.1186/s12859-019-3209-3

Download citation

Published: 17 December 2019
DOI: https://doi.org/10.1186/s12859-019-3209-3

Generating normal networks via leaf insertion and nearest neighbor interchange

Abstract

Background

Results

Conclusions

Background

Methods

Basic notation

Generating TCNs and normal networks

Proposition 1

Proof

Proposition 2

Proof

Proposition 3

Proof

Proposition 4

Proof

Results

Main theorems

Theorem 1

Theorem 2

Proof

Counting formulas

Theorem 3

Proof

Lemma 1

Proof

Theorem 4

Proof

Remark 1

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

About this supplement

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us