Skip to main content

Table 1 Results for exact and approximate tag sequence matching

From: TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets

Tag sequence

Library

Reads matching with # Mismatches

  

0

1

2

3

4

5

> 5

5'-end

LIB019

38,271

(89.37)

2,253

(5.26)

564

(1.32)

185

(0.43)

50

(0.12)

64

(0.15)

1,438

(3.36)

 

LIB020

14,491

(84.60)

1,629

(9.51)

430

(2.51)

165

(0.96)

31

(0.18)

24

(0.14)

359

(2.10)

 

LIB021

41,764

(84.74)

4,748

(9.63)

1,345

(2.73)

427

(0.87)

125

(0.25)

111

(0.23)

762

(1.55)

3'-end

LIB019

7,194

(16.80)

12,156

(28.39)

2,454

(5.73)

688

(1.61)

683

(1.59)

766

(1.79)

18,884

(44.10)

 

LIB020

2,855

(16.67)

2,460

(14.36)

561

(3.28)

279

(1.63)

275

(1.61)

904

(5.28)

9,795

(57.18)

 

LIB021

7,981

(16.19)

6,924

(14.05)

1,800

(3.65)

942

(1.91)

908

(1.84)

2,480

(5.03)

28,247

(57.32)

Concatenated

LIB019

931

(2.17)

282

(0.66)

132

(0.31)

51

(0.12)

104

(0.24)

32

(0.07)

-

 

LIB020

185

(1.08)

45

(0.26)

19

(0.11)

12

(0.07)

17

(0.10)

8

(0.05)

-

 

LIB021

1,302

(2.64)

464

(0.94)

215

(0.44)

120

(0.24)

135

(0.27)

30

(0.06)

-

  1. Results for the 5'-end tag sequence (5'-GTG GTG TGT TGG GTG TGT TTG GNN NNN NNN N; Length: 31 bp; matching within 46 bp), 3'-end tag sequence (NNN NNN NNN CCA AAC ACA CCC AAC ACA CCA-3'; Length: 30 bp; matching within 45 bp) and the concatenated tag sequences (Length: 61 bp). Note that the numbers are based on the dereplicated datasets. Percentages are shown in parenthesis.