Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Identification and correction of systematic error in high-throughput sequence data

Figure 1

Types of errors. A screenshot from the IGV browser [21] showing three types of error in reads from an Illumina sequencing experiment: (1) A random error likely due to the fact that the position is close to the end of the read. (2) Random error likely due to sequence specific error- in this case a sequence of Cs are probably inducing errors at the end of the low complexity repeat. (3) Systematic error: although it is likely that the GGT sequence motif and the GGC motifs before it created phasing problems leading to the errors, the extent of error is not explained by a random error model. In this case, all the base calls in one direction are wrong as revealed by the 11 overlapping mate-pairs. In particular, all differences from the reference genome are base-call errors, verified by the mate-pair reads, which do not differ from the reference. Given the background error rate, the probability of observing 11 error-pairs at a single location, given that 11 mate-pair reads overlap the location, is 1.5 × 10-26. Moreover, given the presence of such errors at a single location, the probability that all of the errors occur on the same strand (i.e., on the forward mate pair) is 1 1024 =0.00098. Note that the IGV browser made an incorrect SNP call at the systematic error site (colored bar in top panel).

Back to article page