Skip to main content

Table 4 Metagenomic datasets used in this study

From: Artificial and natural duplicates in pyrosequencing reads of metagenomic data

     

% of natural duplicates under

hypothetical sample types

     

High-complexity b

Moderate-complexity c

Project/Sample a

Environment

Platform

Number

Reads

% of total

Duplicates

3 mb

100 kb

10 kb

3 mb

100 kb

10 kb

16339/SRR000905

Marine

GS_20

208633

5.74

0.01

0.52

4.98

0.10

3.22

24.88

28969/SRR000674

Coastal water

GS_FLX

201671

17.65

0.02

0.51

4.87

0.10

3.13

24.27

29421/SRR001308

Waste water

GS_FLX

378601

12.39

0.03

0.93

8.94

0.20

5.65

37.09

30445/SRR001663

Marine

GS_FLX

369811

15.39

0.03

0.93

8.68

0.19

5.49

36.53

30563/SRR001669

Human gut

GS_20

41649

7.26

0.00

0.11

1.00

0.03

0.65

6.16

33243/SRR006907

Freshwater

GS_FLX

255722

20.57

0.02

0.61

6.07

0.13

3.88

28.71

38721/SRR023845

Phyllosphere

GS_FLX

543285

11.17

0.05

1.33

12.41

0.29

7.93

45.07

Western channel/Apr_Day_gDNA

Saline water

Titanium

421004

23.38

0.04

1.04

9.80

0.20

6.23

39.42

Ocean viruses/Arctic_Shotgun

Ocean viruses

GS_20

688590

7.14

0.05

1.67

15.46

0.36

9.86

50.15

North Atlantic/BATS-174-2

Ocean gyre

GS_20

288735

17.56

0.02

0.73

6.92

0.16

4.43

31.24

  1. aDatasets are either from NCBI Short Read Archive with project IDs and run accession numbers at http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi or from CAMERA with project and sample names at http://camera.calit2.net.
  2. bHigh-, cmoderate-complexity microbial (or viral) environment with average genome length of 3 mb, 100 kb, and 10 kb