Skip to main content

Table 3 The datasets used for evaluation

From: SAGE: String-overlap Assembly of GEnomes

Dataset

Organism

Accession

Reference

Genome

Read

Number

Number of

Coverage

  

number

genome

length

length

of reads

base pairs

 

1

Bacillus subtilis

DRR000852

NC_000964.3

4,215,606

75

3,519,504

263,962,800

62.62

2

Chlamydia trachomatis

ERR021957

NC_000117.1

1,042,519

37

7,825,944

289,559,928

277.75

3

Streptococcus pseudopneumoniae

SRR387784

NC_015875.1

2,190,731

100

4,407,248

440,724,800

201.18

4

Francisella tularensis

SRR063416

NC_006570.2

1,892,775

101

6,907,220

697,629,220

368.57

5

Leptospira interrogans

SRR397962

NC_005823.1

4,277,185

100

7,127,250

712,725,000

166.63

6

Porphyromonas gingivalis

SRR413299

NC_002950.2

2,343,476

100

9,497,946

949,794,600

405.29

7

Escherichia coli

SRR072099

NC_000913.2

4,639,675

36

30,355,432

1,092,795,552

235.53

8

Clostridium thermocellum

SRR400550

NC_009012.1

3,843,301

36

31,994,160

1,151,789,760

299.69

9

Caenorhabditis elegans

SRR065390

WS222

100,286,070

100

67,617,092

6,761,709,200

67.42

  1. The datasets are sorted increasingly by the total number of base pairs. All datasets and reference genome sequences are obtained from the NCBI, except C.elegans that is from http://www.wormbase.org