Skip to main content

Table 5 Subset evaluation. Accuracies by learning algorithms with default parameters set by WEKA and best data subset by combination (Column 3) and Feature selection method (column 5) are listed above

From: A model to predict the function of hypothetical proteins through a nine-point classification scoring schema

Algorithms

Best combination Subsets (from complete dataset)

Accuracy

Feature selection subsets

Accuracy

bayes_NaiveBayesUpdateable

1,6,7,9

96.67

Cfs 1,2,3,6,7,9

96.67

functions_smo_npolyk

1,2,4,6,7,9

98.00

PCA 1,2,3,4,5,6,7,8

97.00

rules_DecisionTable

6,7,9

96.00

Cfs 1,2,3,6,7,9

96.00

functions_mlp

1,2,4,6,7,9

98.33

Cfs 1,2,3,6,7,9

96.67

bayes_nbay

1,6,7,9

96.67

Cfs 1,2,3,6,7,9

96.67

trees_j48

1,2,4,6,9

97.67

PCA 1,2,3,4,5,6,7,8

97.00

  1. Column 1 lists different algorithms. Columns 2 & 4 list the best data subsets and Columns 3 & 5 accuracies, respectively. (1: Pfam; 2: Orthology; 3: Prot_interactions; 4: Best Blast hits; 5: Subcellular localization; 6: Functional linkages; 7: HPs linked to Pseudogenes 8: Homology modelling; 9: HPs linked to ncRNAs). Accuracies shown by both the subset combinations are almost same, with subset combinations from the complete dataset showing a slightly higher accuracy