Skip to main content

Table 3 Comparative data usage and computation times for five different analyses of 150 bp paired-end GBS data from 48 accessions of Actinidia arguta

From: GBS-SNP-CROP: a reference-optional pipeline for SNP discovery and plant germplasm characterization using variable length, paired-end genotyping-by-sequencing data

Pipeline Min required read length (bp)a Max usable read length (bp)b Total number of usable R1 readsc Total usable bases (Gb)d Time (hrs:mins)e
Reference-based      
 GBS-SNP-CROP-RG 32 NA 128,577,030 16.82 8:30
 TASSEL-GBS-mxTagL32 50 32 120,593,880 3.85 0:35
 TASSEL-GBS-mxTagL64 75 64 105,908,174 6.77 1:10
Reference-independent      
 GBS-SNP-CROP-MR01 32 NA 128,577,030 16.82 11:03
 TASSEL-UNEAK 32 64 134,352,640 8.60 0:27
  1. a GBS-SNP-CROP utilizes the entire R1 and R2 paired-end sequences of all parsed and quality trimmed reads longer than a user-specified (i.e. adjustable) minimum length, in this case 32 bp. The TASSEL-GBS pipelines utilize a uniform user-specified portion (e.g. 32 bp, 64 bp) from the beginning of acceptable R1 (single-end) reads that exceed a minimum length (e.g. 50 bp, 75 bp) before barcode and cut site trimming. TASSEL-UNEAK utilizes up to 64 bp from the beginning of acceptable R1 (single-end) reads that exceed a minimum length of 32 bp after barcode and cut site trimming
  2. b The maximum length of sequences utilized by GBS-SNP-CROP is set by the sequencing platform (e.g. 100 bp, 150 bp, etc.). In TASSEL-GBS, the user specifies a maximum tag length, thereby effectively setting a uniform tag length. The maximum usable sequence length in TASSEL-UNEAK is 64 bp, with all shorter reads greater than 32 bp padded with poly-A’s to a uniform 64 bp tag length
  3. c The number of R1 (i.e. single-end) reads ultimately used by each pipeline, after filtering based on quality and read length requirements. The R1 (single-end) counts are shown here to facilitate comparison across pipelines. Because GBS-SNP-CROP utilizes paired-end reads, the total number of actual reads used (R1 and R2) is twice this number
  4. d The total number of nucleotides of sequence data used in each analysis
  5. e The total computation time required for each analysis when executed on a Unix workstation with 16 GB RAM and a 2.6 GHz Dual Intel processor