Skip to main content

Advertisement

Table 2 Summary of datasets statistics, including size of training, testing and independent evaluation sets, and average sequence length.

From: Efficacy of different protein descriptors in predicting protein functional families

  Total Training Testing Independent testing Average sequence size
  P N P N P N P N  
EC2.4 3304 14373 1382 5068 1022 5859 900 3446 460
GPCR 2819 21515 1580 7389 717 7333 522 6793 498
TC8.A 229 23096 94 7962 72 7962 63 7172 483
Chlorophyll 999 22997 356 7928 333 7928 310 7141 480
Lipid 2192 11537 850 5779 707 4483 635 1275 312
rRNA 5855 13770 2004 5246 1940 4953 1911 3571 376