Skip to main content

Table 1 Datasets statistics

From: From sequence to enzyme mechanism using multi-label machine learning

Dataset

Instances

Attributes

Class labels

Mechanism set with CSA

248

134

82

Mechanism set with Maximum sequence identity

248

82

82

Mechanism set with Minimum Euclidean distance (InterPro)

248

82

82

Mechanism set with InterPro + CSA

248

456

82

Mechanism set with Max seq. Id. + min Eucl. Dist. (InterPro)

248

162

82

Mechanism set with all InterPro sub-signature matches

248

743

82

Mechanism set with InterPro signatures

248

322

82

Negative set with InterPro attributes

290

917

290

Mechanism set + Swiss-Prot non-EC with InterPro attributes

35,171

4418

82

Swiss-Prot non-EC set with InterPro attributes

68,667 (226,213)

4,825

0

  1. The table presents the number of instances (proteins), attributes (signatures or sequence identity values) and class values (mechanisms) for the datasets used in this work; for the swissprot-non-EC set we present the instances that need prediction (the ones sharing a signature with the mechanism set), while the total number of instances is shown between parentheses.