Skip to main content

Table 1 Data sets included in database 17DataSets

From: Cluster oligonucleotide signatures for rapid identification by sequencing

Data set N S i n \(\bar {L} \pm \sigma (L)\) n c c/n n/n s 0 s s s n s c δ c
Anisogramma 15248 28 26 54 545 94 33 139 2.6 61% 1 4% 24 86% 27 96% 28 100% 4%
Pectobacterium 72624 37 42 79 1671 290 43 258 3.3 54% - - 25 68% 28 76% 35 95% 19%
Ceratorhiza 24645 37 35 72 647 60 36 137 1.9 50% 7 19% 24 65% 25 68% 34 92% 24%
Coniella 23078 48 46 94 481 64 45 143 1.5 48% 7 15% 32 67% 37 77% 48 100% 23%
Talaromyces 54964 88 86 174 625 220 126 626 3.6 72% - - 87 99% 88 100% 88 100% -
Elsinoe 79740 132 63 195 586 146 54 199 1.0 28% 1 1% 37 28% 40 30% 43 33% 2%
Claviceps 77453 140 139 279 553 45 92 376 1.3 33% 16 11% 58 41% 63 45% 82 59% 14%
Ceratocystis 112291 193 179 372 582 205 115 631 1.7 31% 52 27% 74 38% 82 42% 149 77% 35%
Phytophthora 201815 253 238 491 798 24 319 1103 2.2 65% - - 149 59% 166 66% 184 73% 7%
Diaporthe 213202 399 338 737 530 99 196 1008 1.4 27% 149 37% 140 35% 150 38% 266 67% 29%
Peronospora 428994 513 400 913 824 377 349 1984 2.2 38% 64 12% 200 39% 222 43% 310 60% 17%
Alternaria 280418 551 550 1101 509 11 187 734 0.7 17% - - 78 14% 86 16% 101 18% 3%
Aspergillus 547127 1032 1032 2064 530 39 591 2331 1.1 29% 19 2% 285 28% 313 30% 414 40% 10%
Colletotrichum 691867 1198 918 2116 576 297 477 2010 0.9 23% 562 47% 379 32% 397 33% 667 56% 23%
Tilletia 743335 1200 915 2115 618 259 574 2666 1.3 27% 394 33% 376 31% 403 34% 649 54% 20%
Penicillium 743954 1438 1437 2875 517 12 597 2675 0.9 21% 57 4% 310 22% 325 23% 413 29% 6%
Fusarium 1604775 2946 2261 5207 533 133 1165 4417 0.8 22% 1492 51% 969 33% 1001 34% 1778 60% 26%
  1. N: size of data set (nucleotides), S: number of sequences (other than sequences with more than 5 ambiguous bases), i: number of internal clades in the phylogenetic tree, n: total number of phylogenetic clades n=S+i, \(\bar {L}\): average length of sequences in the data set (rounded to closest integer), σ(L): corrected sample standard deviation for the sequence length (rounded to closest integer). n: number of signable clades, c: number of clusters (λ=36) identified by aodp, c/n: ratio between clusters and phylogenetic clades, n/n: ratio between signable clades and phylogenetic clades, s0: number of sequences that are not included in any signable clades, ss: signable sequences (also unique signable sequence patterns), sn: unique signable clade patterns, sc: unique cluster patterns, δc=scsn: discrimination increase attributable to clusters (difference between unique cluster patterns and unique signable clade patterns)