Skip to main content
Figure 3 | BMC Bioinformatics

Figure 3

From: Identification of pathogen genomic variants through an integrated pipeline

Figure 3

Schematic of the Platypus data processing pipeline. Diagram of all the external and internally developed programs involved in processing Plasmodium samples as implemented by the Platypus pipeline. The SNV analysis is coded as a sequence of C programs, combining programs written by a variety of other research groups in Java and C. These programs used are the Burrows-Wheeler Aligner [27], Sequence Alignment Map Tools [13], the Genome Analysis Toolkit [1], and the Picard tool set. Each script encodes a single step of the data cleaning and analysis pipeline, including error checking and customization of the program. The GC bias normalization algorithms and the CNV detection algorithms were written in MATLAB (The Mathworks) and coded in C after export, and are integrated into the pipeline as shell scripts as well. Reads are aligned to the P. falciparum 3D7 reference genome version using the Burrow-Wheelers Aligner. Alignment files are then converted to a binary map and sorted and indexed using SAMTools. (steps I-II, A) Sequencing run statistics for each sample, including GC bias metrics and quality score distribution statistics, are then collected using a number of Picard programs. Read group identifiers are then added, and unmapped reads are removed from the alignment using SAMTools. Next optical and PCR duplicates are removed using Picard, and the entire alignment is realigned around possible insertion-deletion events using GATK. Base quality scores are recalibrated using GATK, and the depth of coverage at every base pair position is calculated. (step III, A) The alignment is then indexed and is ready for normalization and analysis. A. Schematic of entire workflow, with color-coding corresponding to the steps in B. Diagram of every program and action used in the Platypus pipeline, with file types traced passing between programs written in between. We request that all users of the Platypus acknowledge both this manuscript and the referenced manuscripts for the other programs included in the manual. Please see the instruction manual, available on the website, for more information.

Back to article page