A distributed framework for aligning short reads to genomes
BMC Bioinformatics volume 15, Article number: P22 (2014)
Computational methods that employ next-generation sequencing technologies often depend on the alignment of short reads  to genomes. In a typical workflow, such methods might require millions of independent alignment operations. Although using a high-performance cluster (HPC) to distribute these computational independent tasks can speed up the process significantly, a HPC can be expensive, wasteful and sometime not a feasible solution. We propose a distributed framework that aims specifically at distributing the task of aligning short reads to genomes to multiple machines efficiently and effectively. This framework aims to be simple to set up and grow.
Materials and methods
To accomplish this, we introduce the framework using the Go programming language, which has primitive support for concurrent computation, and utilizes a high performance network library called ZeroMQ [2, 3] for effective distribution of queries. Specifically, we use the Pipeline pattern from ZeroMQ. This pattern includes three main parts: (1) ventilator (which distributes reads to workers), (2) worker (which does the main computation and sends results to a sink) and (3) sink (which collects results from workers). There are three stages in our design. In the listening stage, the system sets up. The ventilator sends the REQ message including other important information to workers. The workers load the index into the RAM. In the query stage, the ventilator distributes the reads to the workers. The workers work on aligning the reads to the index loaded in the listening stage. In the last stage, the system closes. The ventilator sends an END message to the workers after it distributes all the reads so that workers can close sockets after processing all reads.
Simulation showed that the running time of alignment decreased linearly with the number of the workers. This system is easy to use and deploy.
Morozova O, Marra MA: Applications of next-generation sequencing technologies in functional genomics. Genomics. 2008, 92 (5): 255-264.
Hintjens P: Messaging for many applications. 2013, ZeroMQ: Sebastpol: O'Reilly Media, Inc
An Intro to ZeroMQ(ØMQ) On Ubuntu 12.04. [http://babounehacks.blogspot.com/]
About this article
Cite this article
Guo, S., Phan, V. A distributed framework for aligning short reads to genomes. BMC Bioinformatics 15 (Suppl 10), P22 (2014). https://doi.org/10.1186/1471-2105-15-S10-P22