Skip to main content

Table 2 Statistical information on the 2007 datasets

From: Machine learning for discovering missing or wrong protein function annotations

Dataset

#Features

#Train

#Valid

#Test

#FunCat 2007

#GO 2007

Cellcycle

77

1628

848

1281

499

4122

Church

27

1630

844

1281

499

4122

Derisi

63

1608

842

1275

499

4116

Eisen

79

1058

529

837

461

3570

Expr

551

1639

849

1291

499

4128

Gasch1

173

1634

846

1284

499

4122

Gasch2

52

1639

849

1291

499

4128

Hom

47034

1669

870

1315

499

5828

Pheno

69

656

353

582

455

3124

Seq

478

1701

879

1339

499

4130

Spo

80

1600

837

1266

499

4116

Struc

19628

1665

860

1313

499

5838