Skip to main content

Table 2 Genome reconstruction of 6 bacterial genomes using different sequencing platforms and assembly strategies

From: SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information

Organism

Assembler

Scaffolder

Expected scaffolds

Final scaffolds

Unaligned scaffolds

Sum (bp)

N50

Gap size (bp)

Indels

Rearran-gements

Runtime

B. trehalosi

Ray

-

Unknown

34

-

2,384,099

212,852

0

-

-

-

 

AHA

Unknown

21

-

2,390,466

245,559

6,367

-

-

110 min

 

SSPACE LongRead

Unknown

7

-

2,410,351

1,215,562

8,899

-

-

16 min

CLC

-

Unknown

62

-

2,361,409

146,347

0

-

-

-

 

AHA

Unknown

36

-

2,389,684

222,352

16,915

-

-

118 min

 

SSPACE LongRead

Unknown

6

-

2,395,822

1,361,277

8,650

-

-

19 min

Newbler

-

Unknown

58

-

2,362,898

117,742

0

-

-

-

 

AHA

Unknown

21

-

2,391,876

505,738

12,781

-

-

117 min

 

SSPACE LongRead

Unknown

5

-

2,393,982

1,317,689

7,692

-

-

16 min

E. coli K12

Ray

-

1

99

0

4,583,740

95,924

0

0

2

-

 

AHA

1

57

0

4,632,207

220,952

32,147

2

2

194 min

 

SSPACE LongRead

1

11

0

4,636,946

570,605

30,741

1

9

28 min

CLC

-

1

126

0

4,554,695

88,183

0

-

-

-

 

AHA

1

57

0

4,636,666

497,336

34,587

2

6

214 min

 

SSPACE LongRead

1

1

0

4,642,513

4,642,513

18,788

3

8

28 min

Newbler

-

1

80

0

4,567,139

117,490

0

-

-

-

 

AHA

1

12

0

4,652,318

3,320,126

45,090

6

14

201 min

 

SSPACE LongRead

1

2

0

4,635,316

3,716,545

7,793

7

10

32 min

E .coli O157:H7

Ray

-

10

144

1

5,432,073

112,112

0

-

-

-

 

AHA

10

110

1

5,475,255

227,802

34,035

1

2

226 min

 

SSPACE LongRead

10

38

1

5,845,919

348,040

58,068

2

23

31 min

CLC

-

10

293

13

5,335,444

105,156

0

-

-

-

 

AHA

10

238

8

5,437,860

201,528

42,214

4

9

312 min

 

SSPACE LongRead

10

33

2

5,539,369

1,172,184

51,676

13

17

32 min

Newbler

-

10

279

14

5,322,767

142,438

0

-

-

-

 

AHA

10

209

8

5,471,954

254,465

65,936

5

9

297 min

 

SSPACE LongRead

10

39

3

5,565,065

703,452

75,126

11

34

37 min

F. tularensis

Ray

-

3

100

0

1,806,660

25,623

0

-

-

-

 

AHA

3

38

0

1,859,591

82,151

47,651

1

5

95 min

 

SSPACE LongRead

3

8

0

1,886,509

279,967

27,386

1

8

14 min

CLC

-

3

110

1

1,780,141

25,117

0

-

-

-

 

AHA

3

53

1

1,844,586

63,063

50,494

0

6

104 min

 

SSPACE LongRead

3

7

1

1,877,533

444,696

19,639

2

6

18 min

Newbler

-

3

316

0

1,653,291

8,912

0

-

-

-

 

AHA

3

61

0

1,965,997

69,167

255,189

7

7

95 min

 

SSPACE LongRead

3

7

0

1,867,474

480,062

160,504

16

13

14 min

M. haemolytica

Ray

-

Unknown

80

-

2,639,260

75,015

0

-

-

-

 

AHA

Unknown

44

-

2,676,952

108,006

25,336

-

-

148 min

 

SSPACE LongRead

Unknown

14

-

2,682,588

703,034

29,889

-

-

21 min

CLC

-

Unknown

129

-

2,630,768

63,442

0

-

-

-

 

AHA

Unknown

41

-

2,769,108

239,432

73,082

-

-

166 min

 

SSPACE LongRead

Unknown

8

-

2,742,871

1,996,208

33,032

-

-

25 min

S. enterica

Ray

-

4

119

2

4,972,739

90,542

0

-

-

-

 

AHA

4

40

2

5,012,323

203,631

34,496

0

4

190 min

 

SSPACE LongRead

4

20

2

5,112,337

488,483

27,988

0

6

28 min

CLC

-

4

238

5

4,974,534

43,328

0

-

-

-

 

AHA

4

62

4

5,064,555

376,354

68,292

3

7

200 min

 

SSPACE LongRead

4

7

3

5,038,082

3,235,544

21,588

6

2

34 min

Newbler

-

4

101

12

4,990,994

372,513

0

-

-

-

 

AHA

4

69

12

5,040,830

787,589

30,907

2

6

193 min

 

SSPACE LongRead

4

4

12

5,036,244

3,729,047

10,430

3

11

29 min

  1. In italic-bold the platform/strategy that leads to the lowest amount of assembled scaffolds is highlighted. The number of expected scaffolds refers to the number of chromosomes plus the number of plasmids present in the reference genome (if available). Generally the combination 1) draft assembly using CLCbio for Illumina MiSeq reads or Newbler for Roche 454 reads and 2) scaffolding using SSPACE-LongRead for PacBio CLR reads gives the best results in terms of closure and time. Notably some draft assembly contigs are not covered with PacBio reads (such as PhiX control or bacterial host sequences). The number of errors introduced during scaffolding is only limited and often are a consequence of true variations between the sequenced library and the earlier deposited reference genome.