Skip to main content

Table 3 Comparative data usage and computation times for five different analyses of 150 bp paired-end GBS data from 48 accessions of Actinidia arguta

From: GBS-SNP-CROP: a reference-optional pipeline for SNP discovery and plant germplasm characterization using variable length, paired-end genotyping-by-sequencing data

Pipeline

Min required read length (bp)a

Max usable read length (bp)b

Total number of usable R1 readsc

Total usable bases (Gb)d

Time (hrs:mins)e

Reference-based

     

 GBS-SNP-CROP-RG

32

NA

128,577,030

16.82

8:30

 TASSEL-GBS-mxTagL32

50

32

120,593,880

3.85

0:35

 TASSEL-GBS-mxTagL64

75

64

105,908,174

6.77

1:10

Reference-independent

     

 GBS-SNP-CROP-MR01

32

NA

128,577,030

16.82

11:03

 TASSEL-UNEAK

32

64

134,352,640

8.60

0:27

  1. a GBS-SNP-CROP utilizes the entire R1 and R2 paired-end sequences of all parsed and quality trimmed reads longer than a user-specified (i.e. adjustable) minimum length, in this case 32 bp. The TASSEL-GBS pipelines utilize a uniform user-specified portion (e.g. 32 bp, 64 bp) from the beginning of acceptable R1 (single-end) reads that exceed a minimum length (e.g. 50 bp, 75 bp) before barcode and cut site trimming. TASSEL-UNEAK utilizes up to 64 bp from the beginning of acceptable R1 (single-end) reads that exceed a minimum length of 32 bp after barcode and cut site trimming
  2. b The maximum length of sequences utilized by GBS-SNP-CROP is set by the sequencing platform (e.g. 100 bp, 150 bp, etc.). In TASSEL-GBS, the user specifies a maximum tag length, thereby effectively setting a uniform tag length. The maximum usable sequence length in TASSEL-UNEAK is 64 bp, with all shorter reads greater than 32 bp padded with poly-A’s to a uniform 64 bp tag length
  3. c The number of R1 (i.e. single-end) reads ultimately used by each pipeline, after filtering based on quality and read length requirements. The R1 (single-end) counts are shown here to facilitate comparison across pipelines. Because GBS-SNP-CROP utilizes paired-end reads, the total number of actual reads used (R1 and R2) is twice this number
  4. d The total number of nucleotides of sequence data used in each analysis
  5. e The total computation time required for each analysis when executed on a Unix workstation with 16 GB RAM and a 2.6 GHz Dual Intel processor