A fast algorithm for determining the best combination of local alignments to a query sequence
 Gavin C Conant^{1}Email author and
 Andreas Wagner^{1}
DOI: 10.1186/14712105562
© Conant and Wagner; licensee BioMed Central Ltd. 2004
Received: 15 November 2003
Accepted: 18 May 2004
Published: 18 May 2004
Abstract
Background
Existing sequence alignment algorithms assume that similarities between DNA or amino acid sequences are linearly ordered. That is, stretches of similar nucleotides or amino acids are in the same order in both sequences. Recombination perturbs this order. An algorithm that can reconstruct sequence similarity despite rearrangement would be helpful for reconstructing the evolutionary history of recombined sequences.
Results
We propose a graphbased algorithm for combining multiple local alignments to a query sequence into the single combination of alignments that either covers the maximal portion of the query or results in the single highest alignment score to the query. This algorithm can help study the process of genome rearrangement, improve functional gene annotation, and reconstruct the evolutionary history of recombined proteins. The algorithm takes O(n^{2}) time, where n is the number of local alignments considered.
Conclusions
We discuss two example applications of the algorithm. The algorithm is able to provide useful reconstructions of the metazoan mitochondrial genome. It is also able to increase the percentage of a query sequence's amino acid residues for which similar stretches of amino acids can be found in sequence databases.
Keywords
local alignment alignment combinationBackground
The introduction of the SmithWaterman local alignment algorithm [1] and the subsequent development of the FASTA [2] and BLAST [3, 4] database search tools has revolutionized comparative sequence studies. In the age of completely sequenced genomes, these algorithms are often used to compare a DNA or protein sequence of unknown function – a query sequence – with one or more reference sequences. Such reference sequences are often contained in databases of many thousand DNA or protein sequences. If the query sequence is similar to one or more reference sequences with known function, an informed guess about the function of the query sequence is possible. Sequence alignment algorithms, by their nature, assume that similarities between pairs of sequences are linearly ordered (homologous residues occur in the same order in both sequences). Thus, these algorithms are not wellsuited to compare sequences that have undergone rearrangements through recombination. However, during evolution, rearrangements of genomic DNA occur frequently and on all scales, from individual genes to entire genomes. To give but a few examples: the mouse chromosome 16 shows substantial regions of synteny (blocks of genes with conserved order) to six different human chromosomes [5]; a comparison of metazoan mitochondrial genomes demonstrated that the gene orders between the major metazoan phyla are often essentially random with respect to each other [6]; the genomes of three yeast species closely related to the baker's yeast S. cerevisiae show identifiable inversions and translocations in comparison to that yeast [7]; and Seoighe and Wolfe estimate that baker's yeast itself has undergone roughly 84 reciprocal translocations since a wholegenome duplication event roughly 100 Mya [8].
The question of how to deduce the number and order of rearrangements that have occurred between any two genomes has been the subject of intense research because such rearrangements can be used for phylogenetic inference (see for instance [9, 10]). Here we address an even simpler question: How can one extend existing sequence comparison algorithms such that they are no longer sensitive to the linear ordering of sequence similarity? Doing so would permit an automatic comparison of two or more sequences for which rearrangements have occurred and an identification of the rearranged sequence fragments. The algorithm we propose achieves this goal. It combines local alignments between a query and one or more reference sequences without regard to the alignments' order in the reference sequence(s). The algorithm is equally applicable to short and long (genomescale) sequences and to nucleotide and aminoacid sequences. The algorithm is also agnostic about how local alignments are generated (i.e. using BLAST, [3, 4]; FASTA, [2]; or dynamic programming, [1]).
Aside from being a starting point for reconstructing the rearrangement history of genomes [10], the algorithm also has other uses. First, it can give clues about the evolutionary origins of proteins: Proteincoding genes often contain multiple functional modules or domains, and similar domains occur in various combinations in different proteins [11–15]. Such combinations of domains have come about through recombination, and the algorithm can easily and automatically identify recombined proteins in large databases of protein sequences.
Second, the algorithm may help infer a query sequence's function from database searches in cases where the query sequence shows similarity to short regions of several database sequences, but where there is no single database sequence with similarity extending over most of the query's length. For such query sequences, it may sometimes be possible to infer function by combining information from multiple partial matches. Current sequence database search tools are less than ideal for this purpose. They return lists of reference sequences ordered by the statistical significance of their similarity to the query. Such lists of matches, however, cannot be readily converted into a single combination of alignments for use in functional inference. Our algorithm solves this problem.
We refer to the combination of alignments selected under either criterion as the "Optimal Alignment Combination" (OAC). Unfortunately, as the number of local alignments to a query sequence grows, the number of possible combinations of local alignments increases very quickly. In the worst case when no two local alignments overlap, 2^{n} such combinations are possible (although in this special case finding the OAC is trivial). Note that n can be very large in realistic applications: Programs such as BLAST may return many hundreds of database matches when performing whole genome comparisons. Clearly, an exhaustive search for the OAC can be computationally demanding. However, our algorithm can determine an OAC in O(n^{2}) time, where n is the number of initial local alignments.
Algorithm description
We represent each local alignment as a node in a directed graph (see Figure 1A). Each node m is given a name, as well as a starting position m_{ start }, a length m_{ len }and the score m_{ score }of the alignment corresponding to it.
We sequentially process each node m, checking all other nodes n to see if they meet the following criterion
m_{ start }+ m_{ len } 1 ≤ n_{start + overlap}
Note that m_{ start }+ m_{ len } 1 gives the end position for the alignment represented by m. A directed edge is added from m to n in any case where the above criterion is met.
Overlap is an integer greater than or equal to zero. Its use in the above expression reflects the fact that we allow local alignments to overlap by a limited amount, because alignments may not end exactly where sequence similarity does. Nodes representing alignments with overlapping regions longer than overlap are not connected by edges: they cannot cooccur in any alignment combination. A list is kept of every node with an outdegree of 0. These nodes are potential end points for the OAC, since there are no other alignments which end after them relative to the query.
The final step in our algorithm is a variant of a depthfirst search [19] of the graph, starting from each of the nodes of outdegree 0. The pseudocode shown in Figure 1B describes this search. The code is shown with the longest combination as the optimality criterion. We discuss the algorithm with this optimality criterion first and then discuss the minor changes required to use alignment scores as the optimality criterion.
Our algorithm is a divideandconquer approach based on the observation that membership in alignment combinations that can be part of an OAC is associative. This associatively means that if nodes m and n can occur in an optimal alignment combination, and if n and o can occur in the combination, then m and o can also occur in the optimal alignment combination. Practically, this means that it is possible to recurse through the graph, picking for every node the combination of ancestors – nodes representing alignments starting before the current node's alignment on the query sequence – that gives the longest combination up to and including that node. The depthfirst ordering of the nodes in the search guarantees that all of a node's ancestors will have been processed already and that their best combination of alignments will be known by the time the node is visited. In other words, as nodes are processed in the search, the OAC problem is solved up to the currently visited node.
Figure 1B describes this search, showing a recursive routine recurse_search that analyzes a node "node". The routine first steps through each node i in the list in_nodes, which contains all nodes who have outgoing edges pointed at the current node. Each node i is checked to see if it has already been processed (lines 2–5 in Figure 1B), in which case i.Done has the value TRUE. For any i where i.Done is FALSE, recurse_search is called recursively for that node (line 4).
Once this recursion is complete, the algorithm finds the best alignment combination up to the current node node as follows. First, note that the length of the best combination for a node is stored in its variable best. To determine node.best, each member i of in_nodes is examined to see if it forms the best combination when combined with node (lines 7–11). The best combination for each node i (i.best) is already known as a result of the depthfirst search ordering. To create a combination including the current node, the algorithm adds this i.best value to the length of the alignment corresponding to the current node (node.len), subtracting any overlap (over). The value of over is obtained from the data structure edges{node, i}, which returns this value in O(1) time (lines 8 and 10).
For each node i, we compare i.best+node.len – over (line 8) to the best alignment combination found so far (best_so_far). If the new combination is better, it replaces best_so_far. When this loop over all nodes i has completed, best_so_far must contain the best combination; that length is assigned to node.best (line 10).
After the depthfirst search from each node of outdegree 0 (see above) is complete, one simply has to examine each of these nodes with outdegree 0 to find the one associated with the combination of alignments with the largest number of residues in it. This, by definition, is the OAC.
When looking for the largest alignment score rather than the longest alignment combination, two modifications to the above routine must be made. In the first place, node.score replaces node.len in lines 8 and 10 of figure 1B. This has the effect of selecting combinations with high scores rather than long combinations. The second modification comes in calculating the value of over. If over were not computed, the score of any overlapped residues would be counted twice: once in node.score and once in i.score. Thus, over must be determined using the two alignments, which are held in the data structures node.alignment and i.alignment. Knowledge of the alignment score scheme used (for instance BLOSUM62 with a gap opening penalty of 12 and an extension penalty of 2) then allows one to calculate the score of the two combined alignments with the overlapping residues counted only once. All other details remain the same for this optimality criterion.
Statistical significance of OACs
To evaluate the statistical significance of an alignment, programs such as BLAST typically determine expectation values (Evalues). An Evalue gives the expected number of chance hits having an alignment score at least as high as that observed, given the size of the database used in the search. Analytic descriptions of the statistics of ungapped alignments and their expected scores are known [20]. Although it has not been formally shown that gapped alignments obey the same distributions as do ungapped alignments, there is considerable empirical evidence that this is the case [21]. Karlin and Altschul have given a formula for computing a Pvalue – a measure of significance closely related to an Evalue – for the sum of the scores of an alignment combination, which we use here to compute the significance of our combinations [22] (but see also [21]). Their analysis allows the calculation of a Pvalue for the combination of r different alignments, given the scores of those alignments (S_{i}) and two parameters (K and λ) that characterize the alignment scoring matrix used (for example BLOSUM [18]). Specifically, the probability of seeing a combination of alignments where the sum of the normalized alignment scores (given by ) is at least as large as some critical value t is given by:
If we assume that the number of cases in a random database where T exceeds the T_{ obs }from our real data follows a Poisson distribution, we can use this Pvalue to obtain E (see [20] for details). For alignments of practical importance, P is small (<10^{2}) and in that case P and Evalues are essentially identical.
We note that combinations of alignments will always have a lower net Evalue than a single alignment of the same alignment score, since Evalues decrease exponentially with increasing alignment score. We illustrate this point with a simple example involving the duplicate S. cerevisiae genes SSA1 and SSB1. In the first comparison, we aligned SSA1 against the complete sequence of SSB1. This resulted in an alignment of length 582, a (nonnormalized) score of 1814, and an Evalue of 7.0 × 10^{189}. We next split the SSB1 sequence into two equally sized pieces and aligned those pieces to SSA1. These two nonoverlapping alignments were input into our algorithm for finding OACs. The result was an alignment combination of 582 residues with an Evalue of 1.8 × 10^{182}. When considered separately, the alignment of SSA1 to the first half of SSB1 has a score of 1043 and an Evalue of 3.5 × 10^{107}. The alignment of SSA1 to the second half of SSB1 has a score of 771 and an Evalue of 2.3 × 10^{78}.
Note that the Evalues of the global alignment and the OAC are different (7.0 × 10^{189} versus 1.8 × 10^{182}) even though the sum of the independent alignment scores (1043 + 771) was equal to the alignment score when the fulllength genes were aligned (1814).
Performance
An arbitrary graph of n nodes can have at most O(n^{2}) edges (one edge between every possible pair of nodes). The above algorithm visits every edge in the graph three times: once initialising the graph, and twice in the above search procedure. Thus, the worst case of the algorithm, in running time and memory, is O(n^{2}). In other words, the running time of the algorithm has an upper bound proportional to the square of the number of alignments. Realworld performance is also acceptable: a list of 580 local alignments can be processed in less than 70 ms on an 800 MHz Pentium III, while 3100 local alignments take only 3 seconds on the same platform.
Example data
We have evaluated the performance of our algorithm for two different problems. The first regards recombination in metazoan mitochondrial genomes. We compared the human mitochondial genome (GenBank accession number NC_001807 [23]; our query sequence) to three other mitochondrial reference genomes: the hagfish Myxine glutinosa (GenBank accession number NC_002639, [24]), the fruit fly Drosophila melanogaster (Genbank accession number NC_001709), and the nematode Caenorhabditis elegans (GenBank accession number NC_001328). We used the LALIGN package [25] to find all local alignments between each pair of genomes. Only alignments with Evalues of 10^{5} or less were included in our analysis. We input the resulting alignments into the above algorithm, searching for the combination of alignments that resulted in the highest alignment score. We allowed an overlap of 50 nucleotides between alignments.
As we have discussed, this algorithm to combine multiple local alignments to a query sequence can serve to improve functional gene annotation, the reconstruction of the evolutionary history of shuffled proteins, and the discrimination of rapidly from slowly evolving gene regions. The algorithm is thus a tool to further increase leverage in inferring functional and evolutionary information from sequences.
Algorithm implementation
Source code for our c++ implementation of this algorithm is available from our website: http://www.unm.edu/~compbio/software/find_max_cover.
Abbreviations
 OAC :

O ptimal A lignment C ombination
Declarations
Acknowledgements
We would like to especially thank Michael Fuller for help with figure preparation. We would also like to thank Michael Gilchrist and Annette Evangelisti for helpful discussions during the preparation of this manuscript. GCC is supported by the Department of Energy's Computational Sciences Graduate Fellowship program, administered by the Krell Institute. AW would like to thank the NIH for its support through NIH grant GM06388201.
Authors’ Affiliations
References
 Smith TF, Waterman MS: Identification of common molecular subsequences. Journal of Molecular Biology 1981, 147: 195–197.View ArticlePubMedGoogle Scholar
 Pearson WR, Lipman DJ: Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences, USA 1988, 85: 2444–2448.View ArticleGoogle Scholar
 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic Local Alignment Search Tool. Journal of Molecular Biology 1990, 215: 403–410. 10.1006/jmbi.1990.9999View ArticlePubMedGoogle Scholar
 Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped Blast and PsiBlast : A newgeneration of protein database search programs. Nucleic Acids Research 1997, 25: 3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
 Mural RJ, Adams MD, Myers EW, Smith HO, Gabor Miklos GL, Wides R, Halpern A, Li PW, Sutton GG, Nadeau J, Salzberg SL, Holt RA, Kodira CD, Lu F, Chen L, Deng Z, Evangelista CC, Gan W, Heiman TJ, Li J, Li Z, Merkulov GV, Milshina NV, Naik AK, Qi R, Chris Shue B, Wang A, Wang J, Wang X, Yan X, Ye J, Yooseph S, Zhao Q, Zheng L, Zhu SC, Biddick K, Bolanos R, Delcher AL, Dew IM, Fasulo D, Flanigan MJ, Huson DH, Kravitz SA, Miller JR, Mobarry CM, Reinert K, Remington KA, Zhang Q, Zheng XH, Nusskern DR, Lai Z, Lei Y, Zhong W, Yao A, Guan P, Ji RR, Gu Z, Wang ZY, Zhong F, Xiao C, Chiang CC, Yandell M, Wortman JR, Amanatides PG, Hladun SL, Pratts EC, Johnson JE, Dodson KL, Woodford KJ, Evans CA, Gropman B, Rusch DB, Venter E, Wang M, Smith TJ, Houck JT, Tompkins DE, Haynes C, Jacob D, Chin SH, Allen DR, Dahlke CE, Sanders R, Li K, Liu X, Levitsky AA, Majoros WH, Chen Q, Xia AC, Lopez JR, Donnelly MT, Newman MH, Glodek A, Kraft CL, Nodell M, Ali F, An HJ, BaldwinPitts D, Beeson KY, Cai S, Carnes M, Carver A, Caulk PM, Center A, Chen YH, Cheng ML, Coyne MD, Crowder M, Danaher S, Davenport LB, Desilets R, Dietz SM, Doup L, Dullaghan P, Ferriera S, Fosler CR, Gire HC, Gluecksmann A, Gocayne JD, Gray J, Hart B, Haynes J, Hoover J, Howland T, Ibegwam C, Jalali M, Johns D, Kline L, Ma DS, MacCawley S, Magoon A, Mann F, May D, McIntosh TC, Mehta S, Moy L, Moy MC, Murphy BJ, Murphy SD, Nelson KA, Nuri Z, Parker KA, Prudhomme AC, Puri VN, Qureshi H, Raley JC, Reardon MS, Regier MA, Rogers YHC, Romblad DL, Schutz J, Scott JL, Scott R, Sitter CD, Smallwood M, Sprague AC, Stewart E, Strong RV, Suh E, Sylvester K, Thomas R, Ni Ni N, Tsonis C, Wang G, Wang G, Williams MS, Williams SM, Windsor SM, Wolfe K, Wu MM, Zaveri J, Chaturvedi K, Gabrielian AE, Ke Z, Sun J, Subramanian G, Venter JC: A comparision of wholegenome shotgunderived mouse chromosome 16 and the human genome. Science 2002, 296: 1661–1671. 10.1126/science.1069193View ArticlePubMedGoogle Scholar
 Blanchette M, Kunisawa T, Sankoff D: Gene order breakpoint evidene in animal mitochondrial phylogeny. Journal of Molecular Evolution 1999, 49: 193–203.View ArticlePubMedGoogle Scholar
 Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 2003, 423: 241–254. 10.1038/nature01644View ArticlePubMedGoogle Scholar
 Seoighe C, Wolfe KH: Extent of genomic rearrangement after genome duplication in yeast. Proceedings of the National Academy of Sciences, USA 1998, 95: 4447–4452. 10.1073/pnas.95.8.4447View ArticleGoogle Scholar
 Sankoff D, Blanchette M: Multiple genome rearrangement and breakpoint phylogeny. Journal of Computational Biology 1998, 5: 555–570.View ArticlePubMedGoogle Scholar
 Sankoff D, ElMabrouk N: Genome Rearrangement. Topics in Computational Biology (Edited by: Jiang T, Xu Y and Zhang M). Boston, MIT Press 2001.Google Scholar
 Teichmann SA, Park J, Chothia C: Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements. Proceedings of the National Academy of Sciences, USA 1998, 95: 14658–14663. 10.1073/pnas.95.25.14658View ArticleGoogle Scholar
 Apic G, Gough J, Teichmann SA: Domain combinations in archaeal, eubacterial and eukaryotic proteomes. Journal of Molecular Biology 2001, 310: 311–325. 10.1006/jmbi.2001.4776View ArticlePubMedGoogle Scholar
 Bashton M, Chothia C: The geometry of domain combinations in proteins. Journal of Molecular Biology 2002, 315: 927–939. 10.1006/jmbi.2001.5288View ArticlePubMedGoogle Scholar
 Li WH, Gu Z, Wang H, Nekrutenko A: Evolutionary analyses of the human genome. Nature 2001, 409: 847–849. 10.1038/35057039View ArticlePubMedGoogle Scholar
 Müller A, MacCallum RM, Sternberg MJE: Structural characterization of the human proteome. Genome Research 2002, 12: 1625–1641. 10.1101/gr.221202PubMed CentralView ArticlePubMedGoogle Scholar
 Dayhoff MO, Eck RV, Park CM: A model of evolutionary change in proteins. Atlas of protein sequence and structure (Edited by: Dayhoff M O). Washington, D.C., National Biomedical Research Foundation 1972, 5: 89–99.Google Scholar
 Dayhoff MO, Schwartz RM, Orcutt BC: A model of evolutionary change in proteins. Atlas of protein sequence and structure (Edited by: Dayhoff M O). Washington, D.C., National Biomedical Research Foundation 1978, 5, suppl. 2: 345–352.Google Scholar
 Henikoff S, Henikoff JG: Aminoacid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences, USA 1992, 89: 10915–10919.View ArticleGoogle Scholar
 Moret BME, Shapiro HD: Algorithms from P to NP: Volume I: Design and Efficiency. Redwood City, CA, The Benjamin/Cummings Publishing Company, Inc. 1991.Google Scholar
 Karlin S, Altschul SF: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences, USA 1990, 87: 2264–2268.View ArticleGoogle Scholar
 Altschul SF, Gish W: Local alignment statistics. Methods in Enzymology 1996, 266: 460–480. 10.1016/S00766879(96)660297View ArticlePubMedGoogle Scholar
 Karlin S, Altschul SF: Applications and statistics for multiple highscoring segments in molecular sequences. Proceedings of the National Academy of Sciences, USA 1993, 90: 5873–5877.View ArticleGoogle Scholar
 Ingman M, Kaessmann H, Paabo S, Gyllensten U: Mitochondrial genome variation and the origin of modern humans. Nature 2000, 408: 708–713. 10.1038/35047064View ArticlePubMedGoogle Scholar
 Delarbre C, Rasmussen AS, Arnason U, Gachelin G: The complete mitochondrial genome of the hagfish Myxine glutinosa: Unique features of the control region. Journal of Molecular Evolution 2001, 53: 634–641. 10.1007/s002390010250View ArticlePubMedGoogle Scholar
 Huang XQ, Miller W: A timeefficient, linear space local similarity algorithm. Advances in Applied Mathematics 1991, 12: 337–357.View ArticleGoogle Scholar
 Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, Louis EJ, Mewes HW, Murakami Y, Philippsen P, Tettelin H, Oliver SG: Life with 6000 genes. Science 1996, 274: 546–567. 10.1126/science.274.5287.546View ArticlePubMedGoogle Scholar
 Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, George RA, Lewis SE, Richards S, Ashburner M, Henderson SN, Sutton GG, Wortman JR, Yandell MD, Zhang Q, Chen LX, Brandon RC, Rogers YH, Blazej RG, Champe M, Pfeiffer BD, Wan KH, Doyle C, Baxter EG, Helt G, Nelson CR, Gabor GL, Abril JF, Agbayani A, An HJ, Pfannkoch C, Baldwin D, Ballew RM, Basu A, Baxendale J, Bayraktaroglu L, Beasley EM, Beeson KY, Benos PV, Berman BP, Bhandari D, Bolshakov S, Borkova D, Botchan MR, Bouck J, Brokstein P, Brottier P, Burtis KC, Busam DA, Butler H, Cadieu E, Center A, Chandra I, Cherry JM, Cawley S, Dahlke C, Davenport LB, Davies P, de Pablos B, Delcher A, Deng Z, Mays AD, Dew I, Dietz SM, Dodson K, Doup LE, Downes M, Rocha S, Dunkov BC, Dunn P, Durbin KJ, Evangelista CC, Ferraz C, Ferriera S, Fleischmann W, Fosler C, Gabrielian AE, Garg NS, Gelbart WM, Glasser K, Glodek A, Gong F, Gorrell JH, Gu Z, Guan P, Harris M, Harris NL, Harvey D, Heiman TJ, Hernandez JR, Houck J, Hostin D, Houston KA, Howland TJ, Wei MH, Ibegwam C, Jalali M, Kalush F, Karpen GH, Ke Z, Kennison JA, Ketchum KA, Kimmel BE, Kodira CD, Kraft C, Kravitz S, Kulp D, Lai Z, Lasko P, Lei Y, Levitsky AA, Li J, Li Z, Liang Y, Lin X, Liu X, Mattei B, McIntosh TC, McLeod MP, McPherson D, Merkulov G, Milshina NV, Mobarry C, Morris J, Moshrefi A, Mount SM, Moy M, Murphy B, Murphy L, Muzny DM, Nelson DL, Nelson DR, Nelson KA, Nixon K, Nusskern DR, Pacleb JM, Palazzolo M, Pittman GS, Pan S, Pollard J, Puri V, Reese MG, Reinert K, Remington K, Saunders RD, Scheeler F, Shen H, Shue BC, Kiamos I, Simpson M, Skupski MP, Smith T, Spier E, Spradling AC, Stapleton M, Strong R, Sun E, Svirskas R, Tector C, Turner R, Venter E, Wang AH, Wang X, Wang ZY, Wassarman DA, Weinstock GM, Weissenbach J, Williams SM, Woodage T, Worley KC, Wu D, Yang S, Yao QA, Ye J, Yeh RF, Zaveri JS, Zhan M, Zhang G, Zhao Q, Zheng L, Zheng XH, Zhong FN, Zhong W, Zhou X, Zhu S, Zhu X, Smith HO, Gibbs RA, Myers EW, Rubin GM, Venter JC: The genome sequence of Drosophila melanogaster. Science 2000, 287: 2185–2195. 10.1126/science.287.5461.2185View ArticlePubMedGoogle Scholar
 Gish W: WUBLAST.
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.