Skip to main content

Table 8 The number of neighbors (mean/median/maximum) and the number of observed features with and without clustering for the remote fold recognition task

From: Efficient use of unlabeled data for protein sequence classification: a comparative study

Method

Without Clustering

With Clustering

Ā 

# neighbors

# features

# neighbors

# features

full seq.

135/99/490

192,378,952

64/41/356

120,990,413

region

64/41/356

34,807,209

50/26/352

28,738,521

no tails (full seq.)

75/17/402

57,575,176

23/11/325

29,649,870

max. length (full seq.)

70/16/431

39,915,003

22/12/279

14,634,511