Skip to main content

Table 1 Datasets statistics

From: From sequence to enzyme mechanism using multi-label machine learning

Dataset Instances Attributes Class labels
Mechanism set with CSA 248 134 82
Mechanism set with Maximum sequence identity 248 82 82
Mechanism set with Minimum Euclidean distance (InterPro) 248 82 82
Mechanism set with InterPro + CSA 248 456 82
Mechanism set with Max seq. Id. + min Eucl. Dist. (InterPro) 248 162 82
Mechanism set with all InterPro sub-signature matches 248 743 82
Mechanism set with InterPro signatures 248 322 82
Negative set with InterPro attributes 290 917 290
Mechanism set + Swiss-Prot non-EC with InterPro attributes 35,171 4418 82
Swiss-Prot non-EC set with InterPro attributes 68,667 (226,213) 4,825 0
  1. The table presents the number of instances (proteins), attributes (signatures or sequence identity values) and class values (mechanisms) for the datasets used in this work; for the swissprot-non-EC set we present the instances that need prediction (the ones sharing a signature with the mechanism set), while the total number of instances is shown between parentheses.