Skip to main content

Table 2 Comparison of assemblers on Staphylococcus aureus (SA), Rhodobacter sphaeroides (RS) and human chromosome 14 (HG)

From: Clover: a clustering-oriented de novo assembler for Illumina sequences

Data (Mb Assembler Contigs Scaffolds
Num N50 (kb) E-size (kb) Errs N50C (kb) E-sizeC (kb) Num N50 (kb) E-size (kb) Errs N50C (kb) E-sizeC (kb)
SA Clover 128 43.9 53.1 13 41.3 50.5 12 1490 947 2 1490 890
2.9 ABySS 90 129.1 181.1 16 69.8 102.5 61 170 199 0 107 127
  Bambus2 109 50.2 69.1 178 16.7 19.5 17 1084 1120 0 1084 1120
  CABOG Could not run because of incompatible read lengths in one library
  MSR-CA 94 59.2 60.4 22 49.2 51.4 17 2412 2026 1 1022 1039
  SGA 1252 4.0 4.7 3 4.0 4.7 546 208 166 2 208 164
  SOAPdenovo 107 288.2 252.3 58 62.7 67.5 99 332 302 0 288 227
  SPAdes 98 62.6 87.9 9 57.0 75.1 41 1703 1144 2 684 570
  Velvet 162 48.4 60.3 19 41.5 49.8 45 762 664 18 284 282
RS Clover 453 20.1 23.8 19 19.5 21.9 59 2483 1795 1 2483 1795
4.6 ABySS 644 19.7 25.1 57 13.3 18.5 414 51 56 0 46 47
  Bambus2 177 93.2 94.5 360 12.8 16.3 92 2439 1375 1 390 1106
  CABOG 322 20.2 24.1 31 17.9 21.5 130 66 520 3 65 381
  MSR-CA 395 22.1 24.2 32 19.1 21.5 43 2976 2039 3 2976 2017
  SGA 3067 2.3 3.3 4 2.3 3.3 2096 51 53 0 51 53
  SOAPdenovo 204 131.7 157.2 401 14.6 18.7 166 660 688 0 660 559
  SPAdes 768 11.8 13.7 7 11.7 13.5 352 718 840 0 718 840
  Velvet 583 15.7 18.6 24 14.5 16.9 178 353 380 16 301 352
HG Clover 24,527 3.4 5.3 718 3.2 5.0 2089 839 943 385 409 502
88.3 ABySS 21,222 14.7 19.0 1876 10.4 13.4 19,249 18 24 13 13 19
  Bambus2 13,592 5.9 23.3 8175 4.3 6.3 1792 324 528 240 200 274
  CABOG 3361 45.3 58.8 2346 23.7 30.6 479 393 549 39 309 457
  MSR-CA 30,103 4.9 6.8 1656 4.3 5.9 1425 893 1420 1430 282 407
  SGA 56,939 2.7 3.8 375 2.7 3.7 30,975 83 113 24 81 111
  SOAPdenovo 21,818 16.7 21.9 6587 7.8 10.4 13,502 454 533 384 227 276
  SPAdes 16,854 12.7 16.7 1519 10.4 13.6 9245 173 223 199 129 162
  Velvet 45,564 2.3 3.3 3665 2.1 3.0 3565 1190 1825 8659 86 124
  1. Num the number of sequences produced, N50 the N50 statistic calculated with respect to the genome size, E-size the most likely size of the sequence containing some random base in the genome, Errs the number of misjoins and for the contig value, also the number of indels > 5 bases, N50C the N50 calculated after splitting all sequences at error locations, and E-sizeC the E-size calculated after splitting all sequences at error locations. The best result in each column, for each dataset, is indicated in bold