Volume 10 Supplement 13
Paired-end read length lower bounds for genome re-sequencing
© Chikhi and Lavenier; licensee BioMed Central Ltd. 2009
Published: 19 October 2009
Next-generation sequencing technology is enabling massive production of high-quality paired-end reads. Many platforms (Illumina Genome Analyzer, Applied Biosystems SOLID, Helicos HeliScope) are currently able to produce "ultra-short" paired reads of lengths starting at 25 nt. An analysis by Whiteford et al.  on sequencing using unpaired reads shows that ultra-short reads theoretically allow whole genome re-sequencing and de novo assembly of only small eukaryotic genomes. Chaisson, Brinza and Pevzner  recently determined that the paired read length threshold for de novo assembly of the E. coli genome is ≈ 35 nt, and ≈ 60 nt for the S. cerevisiae genome. The latter read length is unfeasible for some next-generation technologies. By conducting an analysis extending Whiteford et al. results, we investigate to what extent genome re-sequencing is feasible with ultra-short paired reads. We obtain theoretical read length lower bounds for re-sequencing that are also applicable to paired-end de novo assembly.
A novel algorithm that utilizes a suffix array has been specifically designed to compute the uniqueness of paired reads with fixed or variable mate-pair distance. The algorithm is a non-trivial extension of the RepAnalyse algorithm  to paired reads. Bacterial and eukaryotic genomes are analyzed to determine the uniqueness of paired reads given a fixed mate-pair distance of 300 nt. Longer mate-pair distances with high variability are also considered for the E. coli genome.
- Whiteford N, Haslam N, Weber G, Prugel-Bennett A, Essex JW, Roach PL, Bradley M, Neylon C: An analysis of the feasibility of short read sequencing. Nucleic Acids Research 2005, 33(19):e171. 10.1093/nar/gni170PubMed CentralView ArticlePubMedGoogle Scholar
- Chaisson MJ, Brinza D, Pevzner PA: de novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Research 2009, 19(2):336–346. 10.1101/gr.079053.108PubMed CentralView ArticlePubMedGoogle Scholar
- Whiteford N: String Matching in DNA Sequences: Implications for Short Read Sequencing and Repeat Visualisation. PhD thesis, University of Southampton; 2007.Google Scholar
This article is published under license to BioMed Central Ltd.