Skip to main content

Table 4 Performance comparison of LikelyBin and CompostBin on pairs of genomes analyzed in Figures 5, 6, Table 2.

From: Unsupervised statistical clustering of environmental shotgun sequences

Org 1 Org 2 Frag L Frag N D 3 LikelyBin accuracy CB seeds CompostBin accuracy
S. meliloti A. aurescens 400 500 1.02 0.94 10
25
0.93
0.93
L. lactis F. tularensis 400 500 1.15 0.92 10
25
0.76
0.12*
S. pneumoniae H. pylori 400 500 0.97 0.96 10
25
0.12*
0.96
P. marinus S. aureus 400 500 0.99 0.93 10
25
0.73
0.83
M. jannaschii S. aureus 400 500 0.92 0.94 10
25
0.17*
0.91
  1. Frag L, Fragment length; Frag N, Number of fragments per source; CB seeds, labeled fragments supplied to CompostBin for training. LikelyBin consistently performed equally to or above CompostBin performance despite being completely unsupervised, while CompostBin required a fraction of input fragments to be labeled to seed its clustering alorithm. We supplied training fragments to CompostBin without regard to their origin (protein or RNA-coding). In a likely practical scenario, only 16S RNA-coding fragments would be labeled, but would have different k-mer distributions from protein-coding regions, possibly confounding classification. (*) Convergence toward a good clustering was not observed in CompostBin for these datasets; accuracy can be less than 50% due to labeled input.