Volume 13 Supplement 18
Detecting structural variants involving repetitive elements: capturing transposition events of IS elements in the genome of Escherichia coli
© Lee et al; licensee BioMed Central Ltd. 2012
Published: 14 December 2012
Discordant read pairs [1, 2] – those deviating either from expected insert size range or correct relative orientation – have served as vital clues to identifying structural variants (SV) in genomes. Collecting discordant read pairs is the first step in SV detection and is often done by sequence alignment. When there are repetitive elements, such as insertion sequence (IS), a class of transposable elements in bacterial genomes, discordant read pairs can have multiple mapping loci – making them more challenging to be placed and interpreted. Instead of resolving such tangled mapping results, many tools simply ignore these mapped read pairs, potentially missing SVs involving repetitive elements.
We present an idea of using approximate de Bruijn graphs (A-Bruijn graphs)  to identify discordant read pairs, in order to discover SVs. Repeats are easily recognized in A-Bruijn graphs, as all repetitive elements of the same kind are collapsed into a contiguous edge. When read pairs representing repetitive elements are mapped to a reference A-Bruijn graph, only those from novel insertions are flagged as discordant and the rest – those from preexisting insertion loci – mapped concordantly.
We applied this approach to whole genome sequencing data  (~100x per sample using 90bp x 2 paired end Illumina sequencing) obtained from 38 lines of Escherichia coli PFM2, a derivative strain of E. coli K-12 MG1655, and 34 lines of a mismatch repair deficient (deletion of mutL) derivative that were propagated for ~3,080 and ~375 generations respectively via a mutation accumulation (MA) strategy. All of the inferred IS insertions were directly confirmed by PCR experiments.
A total of 27 IS transpositions has been detected and includes 5 out of 12 IS families present in E. coli K-12. We have also identified an insertion of IS186 that is fixed among all MA lines and not present in the reference E. coli genome. 24 out 27 inferred insertions were validated by PCR and 3 of them are currently under analysis. The fixed insertion of IS186 in the samples was also confirmed by PCR.
Our method can pinpoint SVs by identifying discordant read pairs resulting from novel insertions of repetitive elements, where many other currently available tools fail. This result serves as a first step towards inferring the neutral rate of IS transposition in bacterial genomes.
We thank Indiana University, the entire Foster Lab, and MURI award W911NF-09-1-0444 to P. L. Foster, M. Lynch, H. Tang, and S. Finkel for support.
- Raphael BJ, Volik S, Collins C, Pevzner PA: Reconstructing tumor genome architectures. Bioinformatics 2003, 19(Suppl 2):ii162–171. 10.1093/bioinformatics/btg1074View ArticlePubMedGoogle Scholar
- Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H, Albertson D, Pinkel D, et al.: Fine-scale structural variation of the human genome. Nature genetics 2005, 37(7):727–732. 10.1038/ng1562View ArticlePubMedGoogle Scholar
- Pevzner PA, Tang H, Tesler G: De novo repeat classification and fragment assembly. Genome research 2004, 14(9):1786–1796. 10.1101/gr.2395204PubMed CentralView ArticlePubMedGoogle Scholar
- Lee H, Popodi E, Tang H, Foster PL: Rate and molecular spectrum of spontaneous mutations in the bacterium Escherichia coli as determined by whole-genome sequencing. Proc Natl Acad Sci 2012, 109(41):E2774-E2783. 10.1073/pnas.1210309109PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.