Meta-Alignment with Crumble and Prune: Partitioning very large alignment problems for performance and parallelization

Roskin, Krishna M; Paten, Benedict; Haussler, David

doi:10.1186/1471-2105-12-144

Table 3 Prune results for different sized datasets and underlying alignment methods.

From: Meta-Alignment with Crumble and Prune: Partitioning very large alignment problems for performance and parallelization

		50 leaves		100 leaves		500 leaves		1000 leaves
		Time	Agreement	Time	Agreement	Time	Agreement	Time	Agreement
Pecan¹		21.9	0.914	297.	0.879	_a	_a	_a	_a
Prune w/Pecan	60%	7.26	0.880	39.2	0.862	_a	_a	_a	_a
	30%	3.13	0.909	19.6	0.839	_a	_a	_a	_a
	15%	7.26	0.912	13.3	0.878	125.	0.844	_a	_a
	7%	4.24	0.909	13.5	0.849	29.1	0.907	122.	0.877
FSA²		63.1	0.933	266.	0.856	_a	_a	_a	_a
Prune w/FSA	60%	33.8	0.912	78.9	0.838	589.	0.871	_a	_a
	30%	10.5	0.893	23.8	0.838	142.	0.879	_a	_a
	15%	4.25	0.885	17.1	0.857	40.8	0.877	150.	0.861
	7%	3.00	0.866	4.23	0.842	12.7	0.903	34.8	0.887
MUSCLE³		55.6	0.905	138.	0.799	_b	_b	_b	_b
Prune w/MUSCLE	60%	40.7	0.899	77.9	0.777	886.	0.862	_b	_b
	30%	24.7	0.896	42.8	0.777	368.	0.883	_b	_b
	15%	15.1	0.905	29.1	0.828	185.	0.899	440.	0.900
	7%	24.7	0.905	18.8	0.841	114.	0.924	228	0.928
MAFFT⁴		3.17	0.897	5.39	0.806	20.1	0.886	25.2	0.912
SATé⁵		101.	0.915	301.	0.840	_b	_b	_b	_b

¹ Pecan was run with default parameters.
² FSA was run with the --exonerate, --anchored, --softmasked, and --fast flags.
³ MUSCLE was run with default parameters.
⁴ MAFFT was run with the --treein option.
⁵ SATé was run with the -t option but limited to two iterations. We found that more iterations did almost nothing for accuracy.
^aThe majority of these problems were unable to be aligned due to running out of memory.
^bThe majority of these problems took longer than 3 days and were aborted.
The run-time and average agreement score of Prune alignments of different sized datasets. Several sets of simulated alignment problems were generated using a root sequence of 10 kilobases. The neutral evolution of each root sequence was simulated over 50, 100, 500, and 1000 species trees. Fifty problems were generated per tree size for a total of two hundred test alignment problems. The agreement and run-time (in minutes) for each problem size is the average over the fifty simulated alignments. Each underlying alignment method was tested on the dataset (Pecan, FSA, MUSCLE). Prune was then used to break the problems down into sub-trees that contained at most 60%, 30%, 15%, and 7% of the nodes in the entire tree. The largest number of stages was six but most of the problems had no more than 3 stages. Pecan, FSA, and MUSCLE were used as the underlying alignment method to Prune. We also performed alignment using MAFFT and SATé to compare against. To ensure a fair comparison, the true tree topology was passed to SATé (using -t option) and to MAFFT (using the poorly documented --treein option). We were unable to apply some alignment algorithms to large problems because of very long run-times and memory issues. Using Prune, we were able to use Pecan, FSA, and MUSCLE to solve alignment problems that were much deeper than could be solved without Prune. Prune achieved a very large speedup with little loss of accuracy and sometimes with an increase in accuracy.

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com

BMC Bioinformatics

Contact us