Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Detecting genomic deletions from high-throughput sequence data with unsupervised learning

Fig. 1

High-level approach. EigenDel takes BAM file as input. Clipped reads (CR) and discordant reads (DR) are used to obtain deletion candidates (total 35 candidates in the figure, denoted as D1 to D35). Then, some candidates, such as D2 and D6, are discarded by the depth filter. EigenDel extracts features (F1, F2,...) for each remaining deletion candidates and classify them into four clusters named C1 to C4 by unsupervised learning. There are 7, 6, 6 and 9 candidates in clusters C1 (blue), C2 (yellow), C3 (red) and C4 (green) respectively. Finally, false deletion candidates are removed from each cluster. 17 remaining candidates are called as true deletions, including 6 in C1, 4 in C2, 4 in C3 and 3 in C4

Back to article page