Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: A haplotype-based normalization technique for the analysis and detection of allele specific expression

Fig. 1

a Schematic for the normalization procedure. For a given heterozygous SNV the underlying proportion of reference and alternative alleles is unknown. After mapping, the proportion of reference/alternative alleles is observed, but may contain biases. To correct for this, a null dataset is generated for this site containing a 50:50 ratio of the two alleles (see panel b), and this data, together with null data from all other heterozygous sites is mapped using the same procedure as used for the original alignment. The observed proportion of mapped alleles from the null dataset is then used to correct the original data. b Generation of the null dataset. All reads and read pairs covering a heterozygous SNV are shown in the left hand panel. From these data, read pairs are randomly selected and the second haplotype is generated from known SNV data for the individual. In the right hand panel, three examples of this process are shown. At the top, the original read pair contains the reference allele at the SNV of interest (C/T), as well as the reference allele at a neighbouring SNV (G/A). The second haplotype is thus generated with the alternative alleles at both positions. In the middle, the original read pair contains two alternative alleles at the SNV sites, so an alternative read pair is generated with both reference alleles. At the bottom, the read pair contains the reference allele at the central SNV site, and what appears to be a sequencing error upstream at a site where no SNV has been identified. As such, a read pair is created with the sequencing error unchanged, and the alternative allele at the SNV position. This process is repeated for all read pairs to generate a null dataset with coverage of 4000X, and reads are converted into fastq format for remapping

Back to article page