Skip to main content
Fig. 5 | BMC Bioinformatics

Fig. 5

From: Performance evaluation of pipelines for mapping, variant calling and interval padding, for the analysis of NGS germline panels

Fig. 5

Data pre-processing, sequence alignment, post-alignment processing, variant discovery and validation workflow. Prior to sequence alignment, adapter and low-quality trimming were applied on the FASTQ files using the Cutadapt tool. Fastq files were then aligned to the hg19 reference human genome assembly (GRCh37) using the Burrows Wheeler Aligner (BWA)-Maximal Exact Match (MEM), Bowtie2 and Stampy sequence alignment algorithms. Following sequence alignment, sam files were sorted by coordinate using Picard SortSam tool. Duplicates were marked and removed using Picard MarkDuplicates tool and read groups were added using Picard AddOrReplaceReadGroups. Local realignment around indels (insertions/deletions) was performed using the Genome Analysis ToolKit (GATK) IndelRealigner tool and base quality score recalibration was performed using the GATK BaseRecalibrator tool. The GATK-UnifiedGenotyper, GATK-HaplotypeCaller and SAMtools mpileup/call algorithms were used for variant calling. Genetic variants were functionally annotated using the ANNOVAR tool. The workflow was repeated three times using the TruSight Cancer genomic interval file, with null, 50 bp and 100 bp interval padding. Data analysis was also performed using the Illumina’s BWA Enrichment application (not shown in the figure). BC breast cancer, CDS coding sequence, DP depth of coverage, GATK Genome Analysis ToolKit, Indel insertions/deletions, VAF variant allele frequency, VUS variant of uncertain clinical significance

Back to article page