Skip to main content

Table 4 Recall rates for simulated human reads of different length, various distance functions, n = 2

From: Centroid based clustering of high throughput sequencing reads based on n-mer counts

 

EM

k-means

d 2 ∗

χ 2

Symmetrized KL

Read length

Recall

std. dev.

Recall

std. dev.

Recall

std. dev.

Recall

std. dev.

Recall

std. dev.

2 clusters

30

0.737

0.133

0.735

0.136

0.610

0.083

0.737

0.140

0.736

0.134

50

0.762

0.141

0.760

0.143

0.649

0.105

0.760

0.144

0.762

0.141

75

0.781

0.145

0.778

0.147

0.677

0.122

0.778

0.148

0.781

0.145

100

0.794

0.148

0.791

0.150

0.719

0.131

0.791

0.150

0.794

0.148

150

0.812

0.152

0.810

0.153

0.803

0.147

0.810

0.154

0.812

0.152

200

0.827

0.153

0.825

0.155

0.824

0.151

0.824

0.155

0.826

0.153

250

0.839

0.153

0.837

0.155

0.838

0.151

0.837

0.156

0.839

0.153

300

0.850

0.153

0.848

0.155

0.850

0.152

0.847

0.156

0.850

0.153

400

0.867

0.152

0.866

0.154

0.869

0.152

0.866

0.154

0.867

0.152

3 clusters

30

0.573

0.110

0.573

0.108

0.447

0.076

0.715

0.131

0.572

0.111

50

0.604

0.124

0.603

0.126

0.474

0.090

0.674

0.134

0.603

0.125

75

0.629

0.135

0.629

0.138

0.626

0.139

0.664

0.144

0.629

0.136

100

0.647

0.142

0.647

0.146

0.671

0.148

0.668

0.150

0.647

0.143

150

0.675

0.153

0.675

0.156

0.724

0.157

0.687

0.159

0.675

0.153

200

0.696

0.160

0.696

0.164

0.692

0.161

0.706

0.167

0.696

0.160

250

0.714

0.166

0.714

0.170

0.714

0.166

0.723

0.172

0.714

0.166

300

0.730

0.171

0.730

0.173

0.730

0.170

0.738

0.176

0.730

0.170

400

0.756

0.177

0.757

0.179

0.757

0.176

0.762

0.180

0.756

0.176

  1. Mean recall rates and standard deviation for various read lengths and 2 or 3 clusters. For every read length clustering was performed on 50 simulated read sets, each set originating from 1000 randomly chosen human RNA reference sequences and having 100000 reads. Clustering was performed using all distance functions considered in the paper, including those which do not guarantee convergence. Results for L2 and d2 distance are not shown. Word length is n = 2.