Skip to main content

Table 5 Performance comparison on the SRR005718 library among GPU-DupRemoval, FastUniq, CD-HIT-DUP, and Fulcrum. The library consists of 32.160.546 of 36 bp paired-end reads generated with an Illumina platform

From: Removing duplicate reads using graphics processing units

Tool Prefix length Mismatches Removed Time Memory
GPU-DupRemoval 36 0 2.9 % 5 m 6.6 GB
  10 1 3.6 % 5 m 8.2 GB
   3 4.0 % 4 m 8.2 GB
  15 1 3.5 % 5 m 6.9 GB
   3 3.9 % 4 m 6.9 GB
CD-HIT-DUP N/A 0 2.9 % 6 m 26.9 GB
   1 3.3 % 8 m 35.2 GB
   3 3.0 % 11 m 37.7 GB
Fulcrum 36 0 2.9 % 35 m 720 MB
  10 1 3.6 % 1h 4 m 720 MB
   3 4.2 % 1h 10 m 720 MB
  15 1 3.6 % 34 m 1.4 GB
   3 4.1 % 36 m 1.0GB
FastUniq N/A 0 2.9 % 6 m 10.1 GB
  1. The first column reports the name of the tool. The second column reports the prefix length used for clustering the reads for GPU-DupRemoval and Fulcrum. The third column reports the constraint on the allowed number of mismatches. The fourth column reports the percentage of reads that have been removed. The fifth and sixth column report the computing time and the peak of memory required to perform the experiment. Tool settings: i) GPU-DupRemoval -g 0 -D 0 (for identical duplicates) and -g 0 -p <prefix_length > -D <nb_of_mismatches > (for nearly-identical duplicates); ii) CD-HIT-DUP -u 0 -c <nb_of_mismatches >; iii) Fulcrum -b <prefix_length > -s -t p (for clustering) and -q 0 -n 12 -s -t p -c <nb_mismatches >. <prefix_length > was set to 36 for identical duplicates and to 10/15 for nearly-identical duplicates