Skip to main content

Table 2 Summary of datasets statistics, including size of training, testing and independent evaluation sets, and average sequence length.

From: Efficacy of different protein descriptors in predicting protein functional families

 

Total

Training

Testing

Independent testing

Average sequence size

 

P

N

P

N

P

N

P

N

 

EC2.4

3304

14373

1382

5068

1022

5859

900

3446

460

GPCR

2819

21515

1580

7389

717

7333

522

6793

498

TC8.A

229

23096

94

7962

72

7962

63

7172

483

Chlorophyll

999

22997

356

7928

333

7928

310

7141

480

Lipid

2192

11537

850

5779

707

4483

635

1275

312

rRNA

5855

13770

2004

5246

1940

4953

1911

3571

376