Skip to main content

Table 1 Comparison of the five methods on the E. coli simulation data

From: RegCloser: a robust regression approach to closing genome gaps

Methods

Draft

GapCloser

GapFiller

Sealer

Phrap

RegCloser

Contig length

4,531,075

4,642,829

4,631,201

4,542,679

4,641,796

4,641,652

Contig number

137

1

26

59

7

1

Contig N50

78,557

4,642,829

331,261

174,037

1,334,246

4,641,652

Genome fraction

97.532%

99.924%

99.687%

97.851%

99.892%

100%

# mis-assemblies

0

0

0

1

0

0

# local mis-assemblies

0

25

7

20

23

0

# mismatches

0

245

43

55

313

51

# indels

0

32

13

17

42

12

# closed gaps (# total gaps = 136)

 

136

111

78

130

136

# correctly closed gaps

 

112

103

57

108

136

# closed TRs (# total TRs = 26)

 

26

12

25

24

26

# correctly closed TRs

 

8

4

6

6

26

# incorrectly closed TRs/ # incorrectly closed gaps

 

18/24

8/8

19/21

18/22

0/0

  1. The assemblies are aligned to the reference genome using QUAST 5.2.0. Genome fraction is the percentage of the reference genome covered by assembled contigs. Mis-assemblies are locations on assembled contigs where the left and right flanking sequences align over 1 kb away, or they overlap by > 1 kb, or they align on opposite strands. Local mis-assemblies are positions on contigs where the flanking sequences have a gap or overlap < 1 kbp and > 80 bp on the same strand of the reference. The best values of each quality metric are highlighted in bold. RegCloser correctly closes all the 136 gaps including the 26 tandem repeat (TR)-related gaps, and leads to a complete genome with 100% genome fraction and no mis-assemblies or local mis-assemblies. For the other four methods, the TR-related gaps account for most of the incorrectly closed gaps