Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Variable-order reference-free variant discovery with the Burrows-Wheeler Transform

Fig. 1

Strategy for finding SNPs/INDELs. 1 Underlyng (unknown) genotype, including an INDEL. 2 Input reads sequenced from the genotype (including sequencing errors). 3 eBWT, LCP, and contexts preceding (LEFT) and following (RIGHT) each eBWT character. In bold: LCP minima. In gray: eBWT cluster. Note that we explicitly compute only column eBWT (the other columns are shown only for illustrative purposes). LCP minima are computed on-the-fly, whereas contexts LEFT and RIGHT are reconstructed using backward search and the FL mapping, respectively. 4 Output INDEL \(\mathtt {TGC \rightarrow T}\), extended by one nucleotide to the left and two to the right. Note that the output INDEL is left-shifted, whereas originally (in the unknown genotype) it was right-shifted. To call the INDEL, we (i) compute (via backward search) the two consensus sequences AT and ATGC of the two alleles’ left-contexts (i.e. the strings obtained by concatenating symbols in LEFT and eBWT), and (ii) align them, possibly allowing an INDEL to their right-end. In the figure, the best alignment is the one that deletes GC from ATGC. SNPs are computed similarly, the only difference being that the best alignment of the left-contexts does not introduce insertions nor deletions

Back to article page