Skip to main content

Table 3 Dataset training statistics and prediction accuracies of six protein functional families. DS refers to descriptor set, where D1 = amino acid composition; D2 = dipeptide composition; D3 = Moreau-Broto autocorrelation; D4 = Moran autocorrelation; D5 = Geary autocorrelation; D6 = composition, transition and distribution descriptors; D7 = quasi sequence order; D8 = pseudo amino acid composition; D9 = combination of D1+D2; and D10 = combination of D1-D8. Predicted results given as TP (true positive), FN (false negative), TN (true negative), FP (false positive), Sen (sensitivity), Spec (specificity), Q (overall accuracy) and MCC (Matthews correlation coefficient).

From: Efficacy of different protein descriptors in predicting protein functional families

Protein family

Des-criptor set

Training set

Testing set

Independent evaluation set

  
  

P

N

P

N

P

N

Q(%)

MCC

    

TP

FN

TN

FP

TP

FN

Sen(%)

TN

FP

Spec(%)

  

EC2.4

D1

1249

2120

1154

1

9065

12

724

176

80.4

3244

202

94.1

91.3

0.74

 

D2

1319

2120

1080

5

8806

1

646

154

82.9

3349

97

97.2

94.1

0.80

 

D3

1105

1756

1295

4

9166

5

768

132

85.3

3394

52

98.5

95.8

0.87

 

D4

1239

2221

1161

4

8701

5

756

144

84.0

3365

81

97.7

94.8

0.84

 

D5

1242

2223

1160

2

8690

14

753

147

83.6

3391

55

98.4

95.4

0.85

 

D6

1214

2077

1145

45

8846

4

741

159

82.3

3383

63

98.2

94.9

0.84

 

D7

1293

2624

1072

39

8295

8

696

204

77.3

3270

176

94.9

91.3

0.73

 

D8

1226

3008

1177

1

7918

1

794

106

88.2

3387

59

98.3

96.2

0.88

 

D9

1275

2747

1129

0

8177

3

782

118

86.9

3367

79

97.7

95.5

0.86

 

D10

1228

3254

1176

0

7672

1

798

102

88.7

3397

49

98.6

96.5

0.89

GPCR

D1

1590

7458

1847

1

14166

3

505

17

96.7

6735

58

99.1

99.0

0.93

 

D2

564

711

1728

3

14121

5

510

12

97.7

6737

56

99.2

99.1

0.93

 

D3

1169

4628

1122

4

10208

1

507

15

97.1

6737

56

99.2

99.0

0.93

 

D4

1257

4474

1037

1

10363

0

499

23

95.6

6745

48

99.3

99.0

0.93

 

D5

1290

4724

997

8

10113

0

494

28

94.6

6734

59

99.1

98.8

0.91

 

D6

757

2060

1536

2

12777

0

503

19

96.3

6742

51

99.2

99.0

0.93

 

D7

812

2950

1482

1

11887

0

495

27

94.8

6696

97

98.6

98.3

0.88

 

D8

653

2171

1644

0

12550

1

501

21

96.0

6769

24

99.7

99.4

0.95

 

D9

1590

7458

693

12

7322

57

512

10

98.1

6735

58

99.1

99.1

0.93

 

D10

672

2454

1625

0

12268

0

502

20

96.2

6757

36

99.5

99.2

0.94

TC8.A

D1

118

2858

49

0

13121

0

36

27

57.1

1843

2

99.9

98.5

0.73

 

D2

116

1100

50

0

14824

0

41

22

65.1

1843

2

99.9

98.7

0.78

 

D3

94

7962

53

0

14501

0

42

21

66.7

1842

3

98.6

98.7

0.78

 

D4

94

7962

47

0

11250

0

37

26

58.7

1843

2

99.9

98.5

0.74

 

D5

94

7962

47

0

11137

0

37

26

58.7

1843

2

99.9

98.5

0.74

 

D6

94

7962

64

0

15283

0

44

19

69.8

1843

2

99.9

98.9

0.81

 

D7

94

7962

59

0

15045

0

43

20

68.3

1843

2

99.9

98.9

0.80

 

D8

103

943

63

0

14981

0

48

15

76.2

1843

2

99.9

99.1

0.85

 

D9

114

810

52

0

15114

0

41

22

65.1

1843

2

99.9

98.7

0.78

 

D10

102

1068

64

0

14856

0

48

15

76.2

1843

2

99.9

99.1

0.85

Chlorophyll

D1

356

7928

166

0

14297

0

182

128

58.7

1587

11

99.3

92.7

0.71

 

D2

4S40

934

248

1

7927

1

228

82

73.6

1595

3

99.8

95.6

0.83

 

D3

425

603

264

0

15253

0

246

64

79.4

1594

4

99.8

96.4

0.86

 

D4

415

574

273

1

15282

0

247

65

79.7

1597

1

99.9

96.6

0.87

 

D5

429

615

259

1

15240

1

233

77

75.2

1597

1

99.9

95.9

0.84

 

D6

482

946

202

5

14910

0

205

105

66.1

1597

1

99.9

94.4

0.79

 

D7

394

3337

210

85

12517

2

178

132

57.4

1597

1

99.9

93.0

0.73

 

D8

371

1421

317

1

14435

0

255

55

82.3

1593

5

99.7

96.9

0.88

 

D9

399

1273

289

1

14582

1

249

61

80.3

1591

7

99.6

96.4

0.86

 

D10

381

1753

307

1

14102

1

251

59

81.0

1594

4

99.8

96.7

0.88

Lipid synthesis

D1

849

2026

705

3

8229

7

470

165

74.0

1218

57

95.5

88.4

0.73

 

D2

927

2037

629

1

8225

0

512

123

80.6

1259

16

98.6

92.7

0.84

 

D3

898

2968

659

0

7294

0

509

126

80.2

1271

4

99.7

93.2

0.84

 

D4

968

3227

588

1

7035

0

493

142

77.6

1273

2

99.8

92.5

0.83

 

D5

970

3280

586

1

6982

0

491

144

77.3

1260

15

98.8

91.7

0.81

 

D6

874

2112

681

2

8149

1

525

110

82.7

1268

7

99.5

93.9

0.86

 

D7

863

2415

692

2

7845

2

512

123

80.6

1271

4

99.7

93.4

0.85

 

D8

907

1608

615

0

4488

0

498

137

78.4

1268

7

99.5

92.5

0.83

 

D9

815

1613

740

2

8638

11

525

110

82.7

1248

27

97.9

92.8

0.84

 

D10

865

1640

657

0

4456

0

531

104

83.6

1268

7

99.5

94.2

0.87

rRNA binding

D1

548

579

3390

6

9598

22

1824

87

95.5

3511

60

98.3

97.3

0.94

 

D2

1133

1225

2811

0

8974

0

1844

67

96.5

3519

52

98.5

97.8

0.95

 

D3

1126

1638

2816

2

8560

1

1812

99

94.8

3535

36

99.0

97.5

0.95

 

D4

1337

1958

2697

0

8241

0

1783

128

93.3

3484

87

97.6

96.1

0.91

 

D5

1372

1976

2572

0

8223

0

1784

127

93.4

3479

92

97.4

96.0

0.91

 

D6

921

1208

2971

52

8991

0

1824

87

95.5

3541

30

99.2

97.9

0.95

 

D7

878

2743

3040

26

7442

14

1808

103

97.9

3481

90

97.5

96.5

0.92

 

D8

810

2245

3143

0

7954

0

1849

62

96.8

3541

30

99.2

98.3

0.96

 

D9

810

972

3075

3

9182

2

1848

63

96.7

3526

45

98.7

98.0

0.96

 

D10

900

2600

3044

0

7599

0

1858

53

97.2

3547

24

99.3

98.6

0.97