Skip to main content

Table 2 Genome reconstruction of 6 bacterial genomes using different sequencing platforms and assembly strategies

From: SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information

Organism Assembler Scaffolder Expected scaffolds Final scaffolds Unaligned scaffolds Sum (bp) N50 Gap size (bp) Indels Rearran-gements Runtime
B. trehalosi Ray - Unknown 34 - 2,384,099 212,852 0 - - -
  AHA Unknown 21 - 2,390,466 245,559 6,367 - - 110 min
  SSPACE LongRead Unknown 7 - 2,410,351 1,215,562 8,899 - - 16 min
CLC - Unknown 62 - 2,361,409 146,347 0 - - -
  AHA Unknown 36 - 2,389,684 222,352 16,915 - - 118 min
  SSPACE LongRead Unknown 6 - 2,395,822 1,361,277 8,650 - - 19 min
Newbler - Unknown 58 - 2,362,898 117,742 0 - - -
  AHA Unknown 21 - 2,391,876 505,738 12,781 - - 117 min
  SSPACE LongRead Unknown 5 - 2,393,982 1,317,689 7,692 - - 16 min
E. coli K12 Ray - 1 99 0 4,583,740 95,924 0 0 2 -
  AHA 1 57 0 4,632,207 220,952 32,147 2 2 194 min
  SSPACE LongRead 1 11 0 4,636,946 570,605 30,741 1 9 28 min
CLC - 1 126 0 4,554,695 88,183 0 - - -
  AHA 1 57 0 4,636,666 497,336 34,587 2 6 214 min
  SSPACE LongRead 1 1 0 4,642,513 4,642,513 18,788 3 8 28 min
Newbler - 1 80 0 4,567,139 117,490 0 - - -
  AHA 1 12 0 4,652,318 3,320,126 45,090 6 14 201 min
  SSPACE LongRead 1 2 0 4,635,316 3,716,545 7,793 7 10 32 min
E .coli O157:H7 Ray - 10 144 1 5,432,073 112,112 0 - - -
  AHA 10 110 1 5,475,255 227,802 34,035 1 2 226 min
  SSPACE LongRead 10 38 1 5,845,919 348,040 58,068 2 23 31 min
CLC - 10 293 13 5,335,444 105,156 0 - - -
  AHA 10 238 8 5,437,860 201,528 42,214 4 9 312 min
  SSPACE LongRead 10 33 2 5,539,369 1,172,184 51,676 13 17 32 min
Newbler - 10 279 14 5,322,767 142,438 0 - - -
  AHA 10 209 8 5,471,954 254,465 65,936 5 9 297 min
  SSPACE LongRead 10 39 3 5,565,065 703,452 75,126 11 34 37 min
F. tularensis Ray - 3 100 0 1,806,660 25,623 0 - - -
  AHA 3 38 0 1,859,591 82,151 47,651 1 5 95 min
  SSPACE LongRead 3 8 0 1,886,509 279,967 27,386 1 8 14 min
CLC - 3 110 1 1,780,141 25,117 0 - - -
  AHA 3 53 1 1,844,586 63,063 50,494 0 6 104 min
  SSPACE LongRead 3 7 1 1,877,533 444,696 19,639 2 6 18 min
Newbler - 3 316 0 1,653,291 8,912 0 - - -
  AHA 3 61 0 1,965,997 69,167 255,189 7 7 95 min
  SSPACE LongRead 3 7 0 1,867,474 480,062 160,504 16 13 14 min
M. haemolytica Ray - Unknown 80 - 2,639,260 75,015 0 - - -
  AHA Unknown 44 - 2,676,952 108,006 25,336 - - 148 min
  SSPACE LongRead Unknown 14 - 2,682,588 703,034 29,889 - - 21 min
CLC - Unknown 129 - 2,630,768 63,442 0 - - -
  AHA Unknown 41 - 2,769,108 239,432 73,082 - - 166 min
  SSPACE LongRead Unknown 8 - 2,742,871 1,996,208 33,032 - - 25 min
S. enterica Ray - 4 119 2 4,972,739 90,542 0 - - -
  AHA 4 40 2 5,012,323 203,631 34,496 0 4 190 min
  SSPACE LongRead 4 20 2 5,112,337 488,483 27,988 0 6 28 min
CLC - 4 238 5 4,974,534 43,328 0 - - -
  AHA 4 62 4 5,064,555 376,354 68,292 3 7 200 min
  SSPACE LongRead 4 7 3 5,038,082 3,235,544 21,588 6 2 34 min
Newbler - 4 101 12 4,990,994 372,513 0 - - -
  AHA 4 69 12 5,040,830 787,589 30,907 2 6 193 min
  SSPACE LongRead 4 4 12 5,036,244 3,729,047 10,430 3 11 29 min
  1. In italic-bold the platform/strategy that leads to the lowest amount of assembled scaffolds is highlighted. The number of expected scaffolds refers to the number of chromosomes plus the number of plasmids present in the reference genome (if available). Generally the combination 1) draft assembly using CLCbio for Illumina MiSeq reads or Newbler for Roche 454 reads and 2) scaffolding using SSPACE-LongRead for PacBio CLR reads gives the best results in terms of closure and time. Notably some draft assembly contigs are not covered with PacBio reads (such as PhiX control or bacterial host sequences). The number of errors introduced during scaffolding is only limited and often are a consequence of true variations between the sequenced library and the earlier deposited reference genome.