Input file(s) | Output file(s) | Timea (hrs:mins) | |
---|---|---|---|
Stage 1. Process the raw GBS data | |||
Step 1 Parse the raw reads | - CASAVA generated paired-end (R1, R2) files (.fastq.gz) | - Parsing summary information (.txt) | 2:24 |
- Read length distribution summary (.txt) | |||
- Barcode-ID file (.txt) | - Parsed paired-end [PE] reads (.fastq) | ||
- Parsed, unpaired R1 reads (.fastq) | |||
Step 2 Trim based on quality | - Parsed PE reads (.fastq) | - High quality, parsed PE reads (.fastq) | 0:10 |
- High quality, parsed singletons (.fastq) | |||
Step 3 Demultiplex | - One pair (R1, R2) of high quality files (.fastq) per library | - One pair (R1, R2) of high quality files (.fastq) per genotype | 0:16 |
- Barcode-ID file (.txt) | |||
Stage 2. Build the Mock Reference | |||
Step 4 Cluster reads and assemble the Mock Reference [MR] | - Genotype-specific PE files (.fastq) | - Mock Reference [centroids] (.fasta) | 0:14b |
- Barcode-ID file (.txt) | - Mock Reference [genome] (.fasta) | ||
Stage 3. Map the processed reads and generate standardized alignment files | |||
Step 5 Align with BWA-mem and process with SAM tools | - Genotype-specific high quality PE files (.fastq) | - Filtered reads (.bam) | 3:36 |
- Sorted BAM files (.sorted.bam) | |||
- Reference or MR [genome] (.fasta) | - Indexed BAM files (.sorted.bam.bai) | ||
- Barcode-ID file (.txt) | - Indexed reference or MR (.fasta.idx) | ||
- One base call alignment summary file (.mpileup) per genotype | |||
Step 6 Parse mpileup output and produce the SNP discovery master matrix | - One base call alignment summary file (.mpileup) per genotype | - One base call alignment summary count file (.txt) per genotype | 4:37 |
- Barcode-ID file (.txt) | - SNP discovery master matrix (.txt) | ||
Stage 4. Call SNPs and Genotypes | |||
Step 7 SNP genotyping across the population | - SNP discovery master matrix (.txt) | - SNP genotyping matrix for the population (.txt) | 0:04 |