Skip to main content

Table 2 Percentage of removed duplicates are reported, varying the allowed number of differences from 0 to 3 mismatches for a synthetic single-end read library consisting of 1 millions of 100 bp reads. The library consists of 25 % of duplicates

From: Removing duplicate reads using graphics processing units

 

Mismatches

Tool

0

≤1

≤2

≤3

GPU-DupRemoval

10.0 %

20.0 %

22.5 %

25.0 %

CD-HIT-DUP

10.0 %

10.0 %

16.0 %

25.0 %

Fastx-Toolkit Collapser

10.0 %

-

-

-

Fulcrum

10.0 %

20.0 %

22.5 %

25.0 %

  1. Clustering for GPU-DupRemoval and Fulcrum has been performed analyzing prefixes of 25 bases when used to remove nearly-identical duplicates. As for identical duplicates clustering has been performed on the entire length of the reads for both tools. It should be pointed out that GPU-DupRemoval automatically clusters the reads according to their length when used to remove identical duplicates. Tool settings: i) GPU-DupRemoval -g 0 -D 0 (for identical duplicates) and -g 0 -p 25 -D <nb_of_mismatches > (for nearly-identical duplicates); ii) CD-HIT-DUP -u 0 -c <nb_of_mismatches >; iii) Fulcrum -b <prefix_length > -s -t s (for clustering) and -q 0 -s -t s -c <nb_mismatches >. <nb_of_mismatches >: the allowed number of mismatches. It has been set to 0, 1, 2, 3 for the different experiments. <prefix_length > was set to 100 for identical duplicates and to 25 for nearly-identical duplicates. Fastx-Toolkit Collapser does not require any parameter apart those aimed at specifying the input and the output files