Meta-Alignment with Crumble and Prune: Partitioning very large alignment problems for performance and parallelization

Roskin, Krishna M; Paten, Benedict; Haussler, David

doi:10.1186/1471-2105-12-144

Table 2 Crumble results for different sized simulated datasets and underlying alignment methods.

From: Meta-Alignment with Crumble and Prune: Partitioning very large alignment problems for performance and parallelization

		60 kb		150 kb		500 kb		1000 kb
		Time	Agreement	Time	Agreement	Time	Agreement	Time	Agreement
Pecan¹	3.43	0.896	10.6	0.905	46.9	0.906	100	0.906
Crumble w/Pecan	60%	3.29	0.894	7.18	0.904	21.5	0.905	51.9	0.906
	30%	2.56	0.889	4.66	0.903	11.9	0.905	23.5	0.905
	15%	2.39	0.859	3.77	0.893	8.29	0.903	13.9	0.905
FSA²		37.4	0.886	_a	_a	_a	_a	_a	_a
Crumble w/FSA	60%	25.8	0.881	69.8	0.903	_a	_a	_a	_a
	30%	21.0	0.873	3act9.2	0.898	_a	_a	_a	_a
	15%	17.7	0.849	25.5	0.893	104.	0.811	_a	_a
MUSCLE³	_a	_a	_a	_a	_a	_a	_a	_a
Crumble w/MUSCLE	60%	_a	_a	_a	_a	_a	_a	_a	_a
	30%	128	0.707	_a	_a	_a	_a	_a	_a
	15%	63.1	0.679	251.	0.705	_a	_a	_a	_a

¹ Pecan was run with default parameters.
² FSA was run with the --exonerate, --anchored, and --softmasked flags.
³ MUSCLE was run with default parameters.
^aThe majority of these problems were unable to be aligned due to running out of memory.
The run-time and average agreement score of Crumble alignments of different sized datasets. Several sets of simulated alignment problems were generated using a root sequence of 60, 150, 500, and 1000 kilobases. The neutral evolution of each root sequence was simulated over a nine species tree. Fifty problems were generated per root size for a total of two hundred test alignment problems. The agreement and run-time (in minutes) for each problem size is the average over the fifty simulated alignments. Crumble was used to break the problems down to sub-problems that were 60%, 30%, and 15% of the length of the original problem. The approximate core size was set to 60%, 30%, and 15% of the length of the original problem and the block was allowed to be at most 4 kb larger as measured in any of the sequences. Pecan, FSA, and MUSCLE were used as the underlying alignment method. PrePecan was used to generate the constraints. We were unable to apply FSA directly (not using Crumble) to 150 kb or larger problems because FSA required more than the 4GBs of memory we had available per cluster node. Using Crumble we were able to run FSA on problems as large as half a megabase. MUSCLE had more memory issues but we were able to use it on problems as large as 150 kb using Crumble. For Pecan, Crumble achieved more than a seven fold speedup with almost no loss of accuracy on the largest problem size.

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com

BMC Bioinformatics

Contact us