Skip to main content

Table 4 Performance comparison of LikelyBin and CompostBin on pairs of genomes analyzed in Figures 5, 6, Table 2.

From: Unsupervised statistical clustering of environmental shotgun sequences

Org 1

Org 2

Frag L

Frag N

D 3

LikelyBin accuracy

CB seeds

CompostBin accuracy

S. meliloti

A. aurescens

400

500

1.02

0.94

10

25

0.93

0.93

L. lactis

F. tularensis

400

500

1.15

0.92

10

25

0.76

0.12*

S. pneumoniae

H. pylori

400

500

0.97

0.96

10

25

0.12*

0.96

P. marinus

S. aureus

400

500

0.99

0.93

10

25

0.73

0.83

M. jannaschii

S. aureus

400

500

0.92

0.94

10

25

0.17*

0.91

  1. Frag L, Fragment length; Frag N, Number of fragments per source; CB seeds, labeled fragments supplied to CompostBin for training. LikelyBin consistently performed equally to or above CompostBin performance despite being completely unsupervised, while CompostBin required a fraction of input fragments to be labeled to seed its clustering alorithm. We supplied training fragments to CompostBin without regard to their origin (protein or RNA-coding). In a likely practical scenario, only 16S RNA-coding fragments would be labeled, but would have different k-mer distributions from protein-coding regions, possibly confounding classification. (*) Convergence toward a good clustering was not observed in CompostBin for these datasets; accuracy can be less than 50% due to labeled input.