A novel approach to phylogenetic tree construction using stochastic optimization and clustering

Qin, Ling; Chen, Yixin; Pan, Yi; Chen, Ling

doi:10.1186/1471-2105-7-S4-S24

Volume 7 Supplement 4

Symposium of Computations in Bioinformatics and Bioscience (SCBB06)

Research
Open access
Published: 12 December 2006

A novel approach to phylogenetic tree construction using stochastic optimization and clustering

Ling Qin¹,
Yixin Chen²,
Yi Pan³ &
…
Ling Chen^1,4

BMC Bioinformatics volume 7, Article number: S24 (2006) Cite this article

8956 Accesses
9 Citations
Metrics details

Abstract

Background

The problem of inferring the evolutionary history and constructing the phylogenetic tree with high performance has become one of the major problems in computational biology.

Results

A new phylogenetic tree construction method from a given set of objects (proteins, species, etc.) is presented. As an extension of ant colony optimization, this method proposes an adaptive phylogenetic clustering algorithm based on a digraph to find a tree structure that defines the ancestral relationships among the given objects.

Conclusion

Our phylogenetic tree construction method is tested to compare its results with that of the genetic algorithm (GA). Experimental results show that our algorithm converges much faster and also achieves higher quality than GA.

Background

An evolutionary tree, or phylogenetic tree, is a model of the evolutionary history for a set of species. With more and more DNA and protein sequences have been obtained [1–3], the problem of inferring the evolutionary history and constructing the phylogenetic tree has become one of the major problems in computational biology. This is because the evolutionary relationship of species provides a great deal of information about their biochemical machinery. For example, RNA's secondary structure is most accurately determined by selecting correlated mutations of a class of related species.

A phylogenetic tree is a tree showing the evolutionary interrelationships among various species or other entities that are believed to have a common ancestor. In a phylogenetic tree, each node with descendants represents the most recent common ancestor of the descendants, and the edge lengths correspond to time estimates. Each node in a phylogenetic tree is called a taxonomic unit, and the leaves usually denote a set of objects (proteins, species, etc.). Internal nodes are generally referred to as Hypothetical Taxonomic Units (HTUs) as they cannot be directly observed [3–6].

To construct a tree from a set of species, one must have a metric to decide if a tree is better than another one. Many criteria have been proposed. But in general, they turn out to be NP-hard to optimize. There is still no consensus in the biology community on how to make a good tree.

One way to counteract this problem is to execute many different phylogenetic clustering methods resulting in various starting tree topologies. A choice from the generated trees gives rise to the best one. Another way to handle this problem is to use a global optimization technique to derive the optimal topology of the tree. In this paper, ant colony algorithm is applied both as a clustering method and as a global optimization technique so that the optimal tree can be found even with a bad initial tree topology.

During the past years, a number of efforts have been contributed to phylogenetic analyses using genomic sequences, which could be either whole genomes (complete gene sequence sets) or complete protein sequence sets [7–9]. There are three main methods for constructing phylogenetic trees: distance-based methods such as neighbour-joining, parsimony-based methods such as maximum parsimony, and character-based methods such as maximum likelihood or Bayesian inference [1, 2]. The distance based approaches avoid the high computational complexity of multiple sequence alignment (including genome reorganization) to compute an evolutionary distance and try to construct the phylogenetic trees efficiently. The phylogenetic clustering method in this article belongs to the distance based category.

Ant Colony Optimization (ACO) is a new evolution simulation algorithm proposed by Italian researchers Dorigo et al [10]. Inspired by studies on biological ant colony, they recognize the similarities between the ants' food-hunting activities and TSP, and successfully resolve the TSP problems using the same principle that the ants have used to find the shortest route to food source via communication and cooperation, and it has been applied to lots of combinational optimization problems [10, 11].

We note that in the ant colony algorithm, ants can volatilize a kind of chemical odour called pheromone when they encounter each other or in the process of seeking their fellows. Enlightened by this fact, we first apply weights of rejection and acceptance between the objects to form a complete digraph in which the vertexes represent the objects and the initial weight of each edge between vertexes is the weights of acceptance between the objects. The novel clustering process by artificial ants is illustrated in Fig. 1, 2, 3 and 4, during the process, the pheromone on each edge of the digraph will be updated with the artificial ants' adaptive movements, and some adaptive strategies are also presented to speed up the clustering progress. Finally the clusters got by the ants are used to progressively construct the phylogenetic trees.

Results

Constructing a specific digraph for the objects

Ants can volatilize a kind of chemical odour called pheromone when they encounter each other or in the process of seeking their fellows. Based on this kind of odour, ants will naturally attract those who have similar features and repel those that are different. In this paper, artificial ants were set to travel on the graph and deposits pheromone on the edges they passed. As showed in Fig. 1 and Fig. 2, in each step, the artificial ant selects the next vertex according to the acceptance weight in digraph and some heuristic information. The pheromone on each edge of the digraph will be updated with the artificial ants' adaptive movements, and some adaptive strategies are also presented to speed up the clustering progress.

Strong component analysis

The more similar the objects are, the higher the quantity of pheromone may be deposited on the edge between their vertexes. To make full use of the quantity of pheromone on each edge, we omit some connections whose pheromone value is less than a certain threshold to get a new digraph, and the strong connected components of the new digraph forms the finial clusters. This way, the initial objects are separated into a few clusters by the ant sub-colony. Finally these clusters obtained by the ants are used to construct the phylogenetic trees progressively.

Optimizing the phelogenetic trees

Artificial ants in the same sub-colony try to construct an independent phylogenetic tree as a solution of the problem by their cooperation; and different sub-colonies construct different trees so as to maintain diversity of candidates. After optimizing these trees, the performance of these solutions is improved. Meanwhile, the pheromones on the edges of high fitness valued trees are increased to strengthen the ants' clustering process.

The phylogenetic tree construction method showed in this paper is tested to compare its results with that of GA, experimental results show that our algorithm is easier to implement and more efficient. Comparing to GA, it can converge much faster and obtain higher solution quality.

Discussion

Reconstruction of the phylogeny is one of the most important problems in evolutionary study, which is very difficult for large data sets in macromolecular databases. The number of possible phylogenetic trees is exponentially large and the space of topologies cannot be searched exhaustively. Even heuristic searches can be very slow in this case, especially when computationally intensive optimality criteria such as maximum likelihood (ML) are used.

An exhaustive search for the ML topology is usually computationally prohibitive for more than 20 taxa (species). At the same time, the clustering approach for a phylogeny inference is advantageous for a number of reasons, including the ability to model a variety of factors affecting nucleotide sequence evolution, robustness to violations of model assumptions, and resistance to long branch attractions. The use of stochastic algorithm provides an opportunity to develop new efficient and fast methods for phylogeny analysis.

Conclusion

The proposed adaptive ant colony algorithm for phylogenetic tree construction method (AAPTC) consists of three components, including initialization, constructing phylogenetic trees through clustering, and phylogenetic tree optimization.

In the stage of initialization, a weighted digraph is built where the vertices represent the data to be clustered and the weight is the acceptance rate between the two objects it connected.

In the course of constructing the phylogenetic trees, the ants travel in the digraph and update the pheromone on the paths it passed. At each step, ants choose the next vertex according to a certain probability depending on the pheromone and the heuristic information of the edge. The digraph is first modified by omitting some edges whose pheromone value is less than a threshold, and then the strong connected components of the updated digraph are computed to form the clusters which are used to construct the phylogenetic trees.

After getting a group of phylogenetic trees, the ant colony and its pheromone feedback system act as a global optimization technique to derive the optimal topology of the phylogenetic tree.

The algorithm showed in this paper is tested using randomly generated sequences. Using the same sequences, we also test the GA method. Our experiments were implemented on Dell Precession workstation 380 with IntelP4 Hyper Threading Processor of 3.2 GHz and 800 M Front Bus Speed.

As showed in [3], the simulated data sets used are generated in two different ways. The first set of simulated data consists of trees where the topology of the tree is fixed and randomly generated branch lengths are assigned to each node-to-node connection. The resulting distance matrix is used as input for the methods to be tested. In this way four sets were generated (S11, S12, S13 and S14) which consists of distance matrices defining ancestral relationships among 24, 96, 1000, and 4000 objects, respectively. The second sets S21, S22, S23 and S24 include stochastically generated distance matrices. For these data sets, the optimal tree is not known.

In fact, AAPTC not only provides a novel clustering method to obtain a group of good initial tree topologies, but also has global optimization on these trees. In this way, the AAPTC produces an ensemble of trees of almost similar quality. Whereas the GA method cannot guarantee the topology quality since its sole initial tree topology was generated by some other clustering methods such as NJ or FITCH. The ensemble of high qualified solutions of AAPTC allows experts to decide which topology is most likely since the quality criterion (the fitness value) used does not guarantee the optimal tree topology.

Tables 1, 2, 3 and 4 show the performance comparison of the two methods. In all the tables, the performance of a method is measured by the fitness value between the original and calculated distance matrices. The number of examined trees is depicted in parentheses. For AAPTC the mean, standard deviation, highest, and lowest fitness value derived from 50 independent trials are given. The basic parameters are set as m = n ρ = 0.05 C = 10 q0 = 0.95. We also use the vertebrate dataset [5, 12] to evaluate the performance of our algorithm. Vertebrate database contains in total 832 mitochondrial proteins from 64 vertebrates. The results of the neighbour join based phylogeny and taxonomy tree is shown in [8], and ant colony based phylogeny is shown as fig 5.

Table 1 The experimental results on the first data set.

Full size table

Table 2 The experimental results on the second data set.

Full size table

Table 3 The average iterations.

Full size table

Table 4 The comparison for fitness value between GA and AAPTC.

Full size table

Methods

Ant Colony Algorithm

Here we briefly introduce AC and its applications using TSP as an example. In the TSP, a given set of n cities has to be visited exactly once and the tour ends in the initial city. We denote the edge between city i and j as (i, j) and its distance as d_ij (i, j ∈ [1, n]). Let τ_ij(t) be the intensity of pheromone on (i, j) at time t, and use τ_ij(t) to simulate the pheromone of real ants. Suppose m is the total number of ants, at time t the kth ant selects from its current city i to city j according to the following probability distribution:

Where allowedk is a set of the cities can be chosen by the kth ant at city i for the next step, η_ij is a heuristic function which is defined as the visibility of the link between cities i and j, for instance it can defined as 1/d_ij.

The relative influence of the trail information τ_ij(t) and the visibility η_ij are determined by the parameters α, β. When α = 1 and β = 0, the algorithm becomes a complete heuristic algorithm with positive feedback and when α = 0 and β = 1, it is just a traditional greedy algorithm. For every ant, its path traversing all the cities forms a solution. The intensity of pheromone is updated by Eq. (2):

τ_ij(t + 1) = ρ τ_ij(t)+Δτ_ij (2)

Where 0<ρ<1 represents the evaporation of τ_ij(t) between time t and t+1, Δτ_ij is the increment of the pheromone on (i, j) in step t, and Δτ_ij^k is the pheromone laid by the kth ant on it, it takes different formula depending on the model used.

For example, in the most popularly used model called "ant circle system", it is given as Eq.(4).

where Q is a constant and L_k is the total length travelled by the k^th ant.

Constructing phylogenetic trees by clustering and optimization

The overall algorithm for constructing the phylogenetic trees is given below.

Begin

1
Initialization

1.1 Initialize parameters: minC, m ε, γ, α, β;

1.2 Initialize the pheromone digraph;

1.3 For each ant in each sub-colony do

Chooses an initial object to visit randomly;

End for

2
While (not termination) do // 500 iterations

2.1 For each sub-colony do //m sub-colonies

2.1.1 Set a root node rt as the ancestor of all the objects;and let CO denote the current object set, the initial value of CO equals the given object set of the problem

2.1.2 While (not termination) do

//500 interations

For each ant k in current sub-colony do

//m ants

While (allowed_k not empty) do

Compute probability function p;

Select the next object j to visit;

allowedk =allowedk -{ j };

End while

Reset allowedk= CO;

End for

Have local pheromone updating on each edge in the digraph according to the evolutionary distance between objects;

Adaptively update the parameters of α, β;

End while

2.1.3 Transfer the pheromone digraph to another digraph by omitting the edges whose pheromone value is less thanε ; find out the strongest connected components of the updated digraph as clusters. Join the small clusters with the nearest cluster till there are two clusters left, we denote them as clu1 and clu2;

2.1.4 let clu1 and clu2 be the internal node to denote the children of the root node rt

2.1.5 If the number of objects in clu1 is lower than 1 let rt=clu1, CO = { clu1} break;

2.1.6 If the number of objects in clu2 is lower than 1 let rt=clu1, CO = { clu2} break;

2.2 obtain each phylogenetic tree constructed by each sub-colony

End for

2.3 Calculate the fitness value of each sub-colony

2.4 Have crossover and mutation operation to improve the quality of the trees

2.5 Have global pheromone updating operation according to the fitness value of the constructed phylogenetic trees

End while

3
Output the phylogenetic trees constructed by the colony

End

In the while loop between line 2 and line 3, based on τ_ave, the parameter of threshold ε could be defined as ε = γ*τ_ave, where γ is a constant. The population size of the ant colony m is normally equal to n/2, here n is the number of the given objects. The value of parameters α and β are subject to be adjusted adaptively in process of the algorithm. In line of 2.4, the crossover and mutation operation are executed by branch moving, swapping techniques introduced in [13].

Initialization of the Pheromone Diagraph

The initialization stage of the algorithm constructs a weighted digraph with the vertexes representing the given objects and the weighted edges between vertexes representing the acceptance weight between the two objects it connected. The acceptance weight between two objects can be calculated from the evolutionary distance between the objects.

Definition 1: The set of objects

A set of n objects is defined as S=(CO, RT) where CO = {object₁, object₂,...object_n} represents the object set, and rt is the ancestor of all the objects in CO.

A similarity or evolutionary distance is often obtained by pair-wise comparisons of DNA or protein sequences. The measurements of the evolutionary distance can be classified into the following three categories: the first type usually let the number of homologous genes divided by the total number of genes, or its variants be the evolutionary distance [14]; in the second kind, regularities identified in genetic sequences by compression algorithms are used to represent biological significance for evolutionary history, but these data compression based methods often involve of aggregated errors [15]; in the third category, the evolutionary distance is measured by string composition based on the singular value decomposition (SVD) of a string frequency matrix [13], or on the composition vector on short strings of a fixed length or the information discrepancy on short strings of a fixed length [1, 2]. In this paper, we use the cosine distance introduced in [4, 5] as the evolutionary distance between the objects.

Definition 2: The evolutionary distance between objects

The evolutionary distance d(object_i, object_j) between object_i and object_j is defined as:

Here, C (object_i, object_j) is the cosine of the angle between vector i and vector j defined in [3, 4].

Definition 3: The mean distance and the shortest distance

We use d_mean(object_i) to denote the mean distance from object_i to all the other objects, namely

We also denote the shortest distance from object_ito all the other objects as d_min(object_i).

Definition 4: The acceptance weights

For two objects object_iand object_jthe acceptance weights for object_ito object_jis defined as Eq.(8):

Similarly, the acceptance weights for object_jto object_iis as Eq.(9):

From the definition we can see that the more similar two objects are, the greater acceptance weight to each other will be. We also can see that acceptance weight between two objects is not symmetric, namely, normally

accept_j (object_i) ≠ accept_i(object_j) (10)

According to the definitions above, we could form a weighed digraph where each vertex represents an object. Denote the weight of the directed edge from object_i to object_j as τ_ij(0). This value will be updated according to the pheromone deposited by the ants passing it. Its initial value τ_ij(0) is set as the acceptance weight:

τ_ij(0) = accept_i(object_j) (11)

In traditional ant colony algorithm, pheromone on all edges is usually initialized as zero. This is not helpful for ants to choose path at the early stages. However, in AAPTC, the proposed initial pheromone value set on the digraph is much important for ants' latter movements, that is to say it can make great influence on the initial topology of the phylogenetic trees. Based on this initial value, in the latter stages the ants will update this pheromone digraph for the construction and optimization of the phylogenetic trees.

Heuristic function

The heuristic function ηij in Eq.(1) is a problem dependent function that measures the "quality" of the edge (i, j) which connects the vertexes i and j representing the two objects. Here the "quality" means the preference of the edge to be selected by the ants. Obviously, the less distance between the two connected objects, the more preferred the edge should have. Therefore, η_ij should be associated with the distance between objects. So it is given by the following formula.

η_ij= 1/d (object_i,object_j) (12)

Different from pheromone τ_ij, η_ij is static and unidirectional heuristic information determined by the distance information.

Pheromone Updating

In the algorithm, based on the following formula, pheromone on edge (i,j) is updated on the paths the ants just passed after each iteration.

Here constant ρ ∈ (0,1) is the coefficient of evaporation. At an individual iteration the pheromone on each path will be evaporated by a rate of ρ.

In the local updating period,

is the increment of τ_ij by ant k, which is defined by Eq.(14)

Here Q is a constant. From the formulas above, it is easy to see that the more ants pass through an edge, the more pheromone deposited on it, and the more probability for the two vertexes connected by the edge to be included in the same strong connected component of the weighted digraph constructed in the third stage of the algorithm. In the global updating period,

is the increment of τ_ijby sub-colony k, which is defined as follows :

Here, Tree_k denotes the phylogenetic tree constructed by sub-colony_k, fitness(sub-colony_k) is the fitness value of Tree_k. According to [9], once the topology and the branch lengths of Treek are determined, a new distance matrix can be deduced. By comparing this distance matrix D^kwith the original distance matrix D (calculated from the given objects), a quality measurement showed in Eq.(16) can be assigned to the tree as its fitness value:

The summation extends over all n(n-1)/2 distances between the n objects. If the distances are concentrated within a narrow scope, a high fitness value will be assigned to the tree, and if the reconstructed distance equal the original distances, the fitness will reach the highest value C. By global pheromone updating, the pheromone deposited on the edges of high fitness valued trees will be much higher than others, thus the objects connected by these edges can hardly be separated by the ants during the clustering process.

Updating Parameters

The second stage of the algorithm consists the step of updating the value of α, β which are the parameters of the Eq.(1) which is the probability distribution for the ant's selecting the next vertex. In Eq.(1), parameters α, β determine the relative influence of the trail strength τ_ijand the heuristic information η_ij. At the initial stage of the algorithm, the pheromone value on each edge is relatively small. To speedup the convergence, the ants should select the path mainly according to the heuristic information η_ij. Therefore, the value of α should be relatively large in this stage. After some iteration, the pheromone values on the edges are increased, their influence become more and more important. Therefore the value of β should be relatively large. Since the adjustment of the values of α and β should be based on the strength of pheromone on the edges In Eq.(17) we define τ_ave as the average amount of pheromone on the pheromone digraph and in Eq.(18) define δ as the pheromone distributing weight to measure the distribution of pheromone on the graph.

Using the pheromone distributing weight δ, the algorithm updates the value of α, β as follows:

α = log ^1+δ (19)

Once the pheromone digraph is updated, α and β will be adaptively modified by the pheromone distributing weight and make influence on the effect of pheromone and heuristic function. By adjusting the value of α, β adaptively, the algorithm can accelerate the convergence and also can avoid local convergence and precocity. Therefore, this adaptive procedure is much important for AAPTC. Furthermore, since the amount of pheromone is an important measure for tree construction, the pheromone distributing weight δ is also a critical factor to terminate the iterations of the algorithm.

References

Hodge T, Cope MJTV: A Myosin Family Tree. Journal of Cell Science 2000, 113: 3353–3354.
CAS PubMed Google Scholar
Saitou N, Nei M: The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol Biol Evol 1987, 4: 406–425.
CAS PubMed Google Scholar
Reijmers TH, Wehrens R, Daeyaert FD, Lewi PJ, et al.: Using genetic algorithms for the construction of phylogenetic trees: application to G-protein coupled receptor sequences. Biosystems 1999, 49: 31–43.
Article CAS PubMed Google Scholar
Hao B, Qi J: Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance. Proceedings of the 2003 IEEE Bioinformatics Conference 2003, 375–385.
Google Scholar
Xiaomeng W, Xiufeng W, Gang W, Dong X, Guohui L: Phylogenetic Analysis Using Complete Signature Information of Whole Genomes and Clustered Neighbor-Joining Method. International Journal on Bioinformatics Research and Applications 2006, 2(3):219–248.
Article Google Scholar
Wu X, Wan XF, Xu D, Lin GH: Whole genome phylogeny based on clustered signature string composition. Posters in 2005 IEEE Computational Systems Bioinformatics Conference (CSB2005) 2005, 53–54.
Google Scholar
Chen X, Wong SK, Li M: A compression algorithm for DNA sequences and its applications in genome comparison. In proceedings of the sixth annual international computing and combinatorics conference (RECOMB). ACM press; 2000:107–117.
Google Scholar
Grumbach S, Tahi F: A new challenge for compression algorithms: genetic sequences. Journal of Information Processing Management 1994, 30: 866–875.
Google Scholar
Hao B: Fractals from genomes-exact solutions of a biology inspired problem. Physica 2000, A282: 225–246.
Article Google Scholar
Dorigo M, Maniezzo V, Colomi A: Ant system: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics-Part B 1996, 26: 29–41.
Article CAS Google Scholar
Ling C, Jie S, Ling Q: An Adaptive Ant Colony Algorithm Based on Equilibrium of Distribution. Journal of Software 2003, 14: 1148–1151.
Google Scholar
Stuart G, Moffett K, Bozarth RF: A comprehensive vertebrate phylogeny using vector representation of protein sequences from whole genomes. Molecular Biology and Evolution 2002, 19: 554–562.
Article CAS PubMed Google Scholar
Kuntz P, Snyder D: New results on ant-based heuristic for highlighting the organization of large graphs. In Proceedings of the 1999 Congress or Evolutionary Computation. IEEE Press, Piscataway, NJ; 1999:1451–1458.
Google Scholar
Herniou E, et al.: Use of Whole genome sequence data to infer baculovirus phylogency. Journal of virology 2001, 75: 8117–8126.
Article PubMed Central CAS PubMed Google Scholar
Grumbach S, Tahi F: A new challenge for compression algorithms: genetic sequences. Journal of Information Processing Management 1994, 30: 875–866.
Article CAS Google Scholar

Download references

Acknowledgements

This paper is supported in part by a US Department of Energy ECPI grant, the Chinese National Natural Science Foundation under grant No. 60673060, Chinese National Foundation for Science and Technology Development under contract 2003BA614A-14, and the Natural Science Foundation of Jiangsu Province under contract BK2005047.

This article has been published as part of BMC Bioinformatics Volume 7, Supplement 4, 2006: Symposium of Computations in Bioinformatics and Bioscience (SCBB06). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/7?issue=S4.

Author information

Authors and Affiliations

Department of Computer Science, Nanjing University of Aeronautics and Astronautics, Nanjing, 210096, China
Ling Qin & Ling Chen
Department of Computer Science and Engineering, Washington University in St.Louis, USA
Yixin Chen
Department of Computer Science, Georgia State University, 34 Peachtree Street, Suite 1450, Atlanta, GA, 30302-4110, USA
Yi Pan
Department of Computer Science, Yangzhou University, Yangzhou, 225009, China
Ling Chen

Authors

Ling Qin
View author publications
You can also search for this author in PubMed Google Scholar
Yixin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yi Pan
View author publications
You can also search for this author in PubMed Google Scholar
Ling Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ling Qin.

Additional information

Authors' contributions

LQ conceived, designed and performed the study under the supervision of LC and YP; LQ and YC (Yixin Chen) collected and analyzed the data; LQ wrote the computer code; LQ and YC designed algorithms; LQ and LC wrote the manuscript; All authors have read and approved the final manuscript.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Qin, L., Chen, Y., Pan, Y. et al. A novel approach to phylogenetic tree construction using stochastic optimization and clustering. BMC Bioinformatics 7 (Suppl 4), S24 (2006). https://doi.org/10.1186/1471-2105-7-S4-S24

Download citation

Published: 12 December 2006
DOI: https://doi.org/10.1186/1471-2105-7-S4-S24

Symposium of Computations in Bioinformatics and Bioscience (SCBB06)

A novel approach to phylogenetic tree construction using stochastic optimization and clustering