Skip to main content

Table 3 Table reports the percentage of removed duplicates varying the allowed number of difference from 0 to 3 mismatches for a synthetic paired-end read library consisting of 1 millions of 100 bp reads. The library consists of 25 % of duplicates

From: Removing duplicate reads using graphics processing units

  Mismatches
Tool 0 ≤1 ≤2 ≤3
GPU-DupRemoval 10.0 % 20.0 % 22.5 % 25.0 %
CD-HIT-DUP 10.0 % 10.0 % 16.2 % 17.5 %
FastUniq 10.0 % - - -
Fulcrum 10.0 % 20.0 % 22.5 % 25.0 %
  1. Tool settings: i) GPU-DupRemoval -g 0 -D 0 (for identical duplicates) and -g 0 -p 10 -D <nb_of_mismatches > (for nearly-identical duplicates); ii) CD-HIT-DUP -u 0 -c <nb_of_mismatches >; iii) Fulcrum -b <prefix_length > -s -t p (for clustering) and -q 0 -s -t p -c <nb_mismatches >. As for Fulcrum, <prefix_length > was set to 100 for identical duplicates and to 25 for nearly-identical duplicates. FastUniq does not require specific parameters apart from those aimed at specifying input and output files