Org 1 | Org 2 | Frag L | Frag N |
D
3
| LikelyBin accuracy | CB seeds | CompostBin accuracy |
---|
S. meliloti
|
A. aurescens
| 400 | 500 | 1.02 | 0.94 | 10 25 | 0.93 0.93 |
L. lactis
|
F. tularensis
| 400 | 500 | 1.15 | 0.92 | 10 25 | 0.76 0.12* |
S. pneumoniae
|
H. pylori
| 400 | 500 | 0.97 | 0.96 | 10 25 | 0.12* 0.96 |
P. marinus
|
S. aureus
| 400 | 500 | 0.99 | 0.93 | 10 25 | 0.73 0.83 |
M. jannaschii
|
S. aureus
| 400 | 500 | 0.92 | 0.94 | 10 25 | 0.17* 0.91 |
- Frag L, Fragment length; Frag N, Number of fragments per source; CB seeds, labeled fragments supplied to CompostBin for training. LikelyBin consistently performed equally to or above CompostBin performance despite being completely unsupervised, while CompostBin required a fraction of input fragments to be labeled to seed its clustering alorithm. We supplied training fragments to CompostBin without regard to their origin (protein or RNA-coding). In a likely practical scenario, only 16S RNA-coding fragments would be labeled, but would have different k-mer distributions from protein-coding regions, possibly confounding classification. (*) Convergence toward a good clustering was not observed in CompostBin for these datasets; accuracy can be less than 50% due to labeled input.