Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: CNVind: an open source cloud-based pipeline for rare CNVs detection in whole exome sequencing data based on the depth of coverage

Fig. 1

Workflow of the CNVind tool. The first step in data processing in the proposed approach is mapping the DNA reads to the sequencing regions of the reference genome. The mapping result is a matrix of numbers describing the depth of coverage in a given sequencing region. Then, quality control process is applied. After that, for each of the sequencing regions, a set of other sequencing regions is selected to model the background, in our experiments, we examined the selection of the (I) k most correlated, and (II) k random sequencing regions. As a result of this process, n (n depicts the number of sequencing regions in the input dataset) subsets of sequencing regions are created, each subset contains \(k+1\) sequencing regions. Then, each of the n subsets is normalized; from each normalized depth of coverage dataset, the single sequencing region currently under consideration is extracted. Finally, normalized results for individual sequencing regions are combined into a single, normalized matrix; based on the normalized matrix of the coverage depths, raw coverage depths, and coordinates of the sequencing regions, the process of segmentation and CNVs calling is applied. The result of the entire process is the set of detected CNVs. It is worth noting that the normalization process in the proposed approach takes place n times for \(k+1\) sequencing regions. In contrast, in the CODEX application, the normalization process occurs only once, taking into account the entire set of sequencing regions. Moreover, the process of independently normalizing each sequencing region along with the background modeling subset could be time-consuming so that this step can be performed by the presented CNVind application in a cloud computing environment, on a computer cluster, or a single server.

Back to article page