Experimenting with database segmentation size vs time performance for mpiBLAST on an IBM HS21 blade cluster

Harris, Daniel; Jaromczyk, Jerzy W; Schardl, Christopher L

doi:10.1186/1471-2105-11-S4-P9

Volume 11 Supplement 4

UT-ORNL-KBRIN Bioinformatics Summit 2010

Poster presentation
Open access
Published: 23 July 2010

Experimenting with database segmentation size vs time performance for mpiBLAST on an IBM HS21 blade cluster

Daniel Harris¹,
Jerzy W Jaromczyk¹ &
Christopher L Schardl²

BMC Bioinformatics volume 11, Article number: P9 (2010) Cite this article

2631 Accesses
1 Citations
Metrics details

Background

Large-scale genomic projects such as the Epichloë festucae Genome Project require regular use of bioinformatic tools. When using BLAST in conjunction with larger databases, processing complex sequences often uses substantial computation time. Parallelization is considered a standard method of curbing extensive computing requirements and parallel implementations of BLAST, such as mpiBLAST, are freely available.

Materials and methods

In this experiment, the implementation segments a database into smaller databases so that BLAST queries can be more effectively performed in parallel on smaller database segments. Since there are overhead costs from distributing tasks and merging the results from each parallel run, we investigate how the usefulness of database segmentation changes as the size and the number of the database segments change. When segmentation curbs time-performance, we ask the question: "How many segments will yield the best performance or will adding processors always help?" Specifically, we consider three different times: a one-time preprocessing (segmentation of database), queue wait-time, and CPU-time. We conducted experiments to monitor time-performance as the number of database segments vary on an IBM HS21 blade cluster running mpiBLAST against fungal protein sequences from the Epichloë festucae Genome Project. The cluster has 340 computer nodes (1,360 cores, 12.8 Teraflops) whose resources are shared with other researchers and are controlled through the SLURM batch-job resource-manager and scheduled through the Moab batch-job scheduler.

Results and conclusion

We observe that the shared nature of computing resources with multiple users has a direct consequence when determining what database segmentation configuration to use in practice. For example, in our experiment, the average CPU-time (in minutes) for one node is 221.93, for twelve nodes is 52.30, and for 32 nodes is 26.1; the average queue wait-time (in minutes) for one node is 1.35, for twelve nodes is 5.78, and for 32 nodes is 150.24 (Figure 1). Therefore, the composite time (in minutes) for one node is 223.28, for twelve nodes is 58.08, and for 32 nodes is 176.38 (Figure 1). Thus, the composite time for twelve nodes is the shortest for our experiment. Additionally, the preprocessing (segmenting database) required a fixed one-time cost of approximately three days. The collected data allows us to execute efficient planning and scheduling of our mpiBLAST experiments in an environment with uncontrollable variables such as queue wait-time. This work is based upon research supported by the NSF under Grant No. 0814194 and NIH Research Project Grant Program (R01) from the Joint DMS/BIO/NIGMS Math/Bio Program under Grant No. 1R01GM086888-01.

Author information

Authors and Affiliations

Department of Computer Science, University of Kentucky, Lexington, KY, 40506, USA
Daniel Harris & Jerzy W Jaromczyk
Department of Plant Pathology, University of Kentucky, Lexington, KY, 40506, USA
Christopher L Schardl

Authors

Daniel Harris
View author publications
You can also search for this author in PubMed Google Scholar
Jerzy W Jaromczyk
View author publications
You can also search for this author in PubMed Google Scholar
Christopher L Schardl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jerzy W Jaromczyk.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Harris, D., Jaromczyk, J.W. & Schardl, C.L. Experimenting with database segmentation size vs time performance for mpiBLAST on an IBM HS21 blade cluster. BMC Bioinformatics 11 (Suppl 4), P9 (2010). https://doi.org/10.1186/1471-2105-11-S4-P9

Download citation

Published: 23 July 2010
DOI: https://doi.org/10.1186/1471-2105-11-S4-P9

UT-ORNL-KBRIN Bioinformatics Summit 2010

Experimenting with database segmentation size vs time performance for mpiBLAST on an IBM HS21 blade cluster

Background

Materials and methods

Results and conclusion

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

UT-ORNL-KBRIN Bioinformatics Summit 2010

Experimenting with database segmentation size vs time performance for mpiBLAST on an IBM HS21 blade cluster

Background

Materials and methods

Results and conclusion

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us