Skip to main content

Table 2 Performance of GBS-SNP-CROP under three different sampling strategies for building the Mock Reference: Using all 48 individuals in the population (MR48), using only the 5 individuals with the highest number of parsed reads (MR05), and using only the single most read-abundant genotype (MR01)

From: GBS-SNP-CROP: a reference-optional pipeline for SNP discovery and plant germplasm characterization using variable length, paired-end genotyping-by-sequencing data

Pipelines

Total number of centroids used to build the Mock Referencea

Total number of paired-end reads used for SNP callingb

Number of SNPs calledc

Avg. depthd

Hetero (%)e

Homo (%)f

Missing data (%)g

Time (hrs:mins)h

GBS-SNP-CROP-MR48

1,276,734

92,667,123

14,712

70.74

32.47

59.31

8.20

14:30

GBS-SNP-CROP-MR05

500,795

132,920,383

20,226

71.02

34.50

57.18

8.31

12:06

GBS-SNP-CROP-MR01

229,549

154,506,669

21,318

69.34

34.51

56.85

8.29

11:03

  1. a Total number of non-redundant consensus sequences (centroids) identified via clustering to represent the GBS fragment space. This is also the number of FASTA entries in the “MockRef_Clusters.fasta” file
  2. b Number of reads retained by the pipeline after mapping procedures and thus used for SNP calling
  3. c Total number of SNPs called, given all SNP calling filters and genotyping criteria described in the text
  4. d Average read depth for all SNPs across the entire population
  5. e Percentage of heterozygous genotype calls
  6. f Percentage of homozygous genotype calls
  7. g Percentage of missing cells (i.e. no genotype call for a given SNP*accession combination) in the final SNP genotype matrix
  8. h The total computation time required for all pipeline analysis when executed on a Unix workstation with 16 GB RAM and a 2.6 GHz Dual Intel processor