Skip to main content

Table 1 Statistical distribution of Pafig training dataset

From: On the amyloid datasets used for training PAFIG ­ how (not) to extend the experimental dataset of hexapeptides

Residue position

1

2

3

4

5

6

K

1.5

1.5

1.5

1.5

1.5

1.4

R

0.7

0.7

0.7

0.8

0.8

0.7

D

0.9

0.9

0.9

0.8

0.8

0.9

E

1.3

1.3

1.3

1.3

1.3

1.8

N

1.1

1.0

1.1

1.0

1.0

1.0

Q

0.9

0.9

1.0

1.0

0.9

0.9

P

0.7

0.7

0.6

0.7

0.6

0.6

H

1.0

1.1

1.1

1.0

1.1

1.0

M

0.8

0.7

0.7

0.7

0.8

0.7

C

0.5

0.6

0.6

0.6

0.6

0.5

S

1.3

0.8

0.8

0.8

0.9

0.9

T

1.1

1.7

1.0

1.0

1.0

1.0

F

1.2

1.2

1.2

1.2

1.2

1.1

W

0.9

1.0

0.9

0.9

0.9

0.8

Y

1.2

1.2

1.2

1.2

1.2

1.5

V

1.1

1.1

1.6

1.1

1.2

1.1

L

0.7

0.7

0.8

0.7

0.7

0.7

I

0.8

0.8

0.8

1.4

1.4

0.8

G

1.1

1.0

1.0

1.0

1.1

1.1

A

0.9

0.9

1.0

1.0

1.0

0.9

  1. Statistical distribution of Pafig full training dataset, including all positive and negative hexapeptides, normalized versus frequencies of aminoacid occurrence in all proteins deposited in UniProt. The expected values for a well balanced training dataset should equal 1. The values above 1 denote over-representation of a residue at the specific location of training hexapeptides, values below 1 show under-representation. The bias from STVIIE is in bold.