Centroid based clustering of high throughput sequencing reads based on n-mer counts

Solovyov, Alexander; Lipkin, W Ian

doi:10.1186/1471-2105-14-268

BMC Bioinformatics

Table 8 Recall rates for simulated human reads, n = 3, different error rates

From: Centroid based clustering of high throughput sequencing reads based on n-mer counts

	EM		k-means		L ₂		d ₂
Error rate	Recall	std. dev.	Recall	std. dev.	Recall	std. dev.	Recall	std. dev.
2 clusters
0	0.827	0.153	0.822	0.155	0.823	0.155	0.823	0.155
0.001	0.827	0.152	0.823	0.155	0.823	0.155	0.823	0.155
0.005	0.826	0.153	0.822	0.155	0.823	0.155	0.823	0.155
0.01	0.826	0.152	0.822	0.155	0.822	0.154	0.822	0.154
0.02	0.825	0.152	0.821	0.155	0.822	0.155	0.822	0.154
0.03	0.825	0.152	0.821	0.155	0.821	0.155	0.821	0.154
0.04	0.824	0.152	0.820	0.155	0.820	0.154	0.820	0.154
0.05	0.823	0.152	0.819	0.155	0.820	0.155	0.820	0.154
3 clusters
0	0.696	0.160	0.696	0.166	0.693	0.162	0.692	0.163
0.001	0.696	0.160	0.697	0.166	0.694	0.162	0.692	0.163
0.005	0.695	0.160	0.695	0.164	0.693	0.161	0.692	0.162
0.01	0.694	0.159	0.695	0.164	0.692	0.161	0.691	0.162
0.02	0.693	0.159	0.693	0.164	0.691	0.161	0.690	0.162
0.03	0.692	0.158	0.693	0.164	0.690	0.160	0.689	0.161
0.04	0.691	0.158	0.691	0.163	0.688	0.160	0.687	0.161
0.05	0.690	0.157	0.690	0.162	0.687	0.158	0.686	0.159
4 clusters
0	0.623	0.153	0.626	0.158	0.644	0.159	0.640	0.159
0.001	0.624	0.153	0.625	0.158	0.644	0.158	0.640	0.158
0.005	0.622	0.153	0.625	0.157	0.644	0.158	0.639	0.158
0.01	0.621	0.152	0.623	0.156	0.642	0.157	0.637	0.157
0.02	0.619	0.151	0.622	0.155	0.638	0.156	0.635	0.156
0.03	0.617	0.149	0.618	0.153	0.636	0.154	0.632	0.154
0.04	0.615	0.149	0.615	0.152	0.632	0.153	0.628	0.153
0.05	0.613	0.148	0.613	0.152	0.629	0.152	0.625	0.151
5 clusters
0	0.573	0.155	0.581	0.158	0.584	0.157	0.578	0.158
0.001	0.574	0.155	0.582	0.158	0.584	0.157	0.578	0.158
0.005	0.573	0.154	0.581	0.156	0.583	0.156	0.578	0.157
0.01	0.572	0.153	0.580	0.156	0.582	0.155	0.577	0.156
0.02	0.570	0.152	0.578	0.156	0.580	0.155	0.575	0.155
0.03	0.568	0.150	0.576	0.154	0.578	0.153	0.573	0.154
0.04	0.565	0.149	0.572	0.151	0.575	0.152	0.569	0.152
0.05	0.563	0.148	0.571	0.151	0.574	0.151	0.568	0.151

Mean recall rates and standard deviation for various error rates and numbers of clusters. For every value of the error rate clustering was performed on 50 simulated read sets, each set originating from 1000 randomly chosen human RNA reference sequences and having 100000 reads. Word length is n = 3. Read length is 200bp.

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com