Skip to main content

Table 1 Statistical distribution of Pafig training dataset

From: On the amyloid datasets used for training PAFIG ­ how (not) to extend the experimental dataset of hexapeptides

Residue position 1 2 3 4 5 6
K 1.5 1.5 1.5 1.5 1.5 1.4
R 0.7 0.7 0.7 0.8 0.8 0.7
D 0.9 0.9 0.9 0.8 0.8 0.9
E 1.3 1.3 1.3 1.3 1.3 1.8
N 1.1 1.0 1.1 1.0 1.0 1.0
Q 0.9 0.9 1.0 1.0 0.9 0.9
P 0.7 0.7 0.6 0.7 0.6 0.6
H 1.0 1.1 1.1 1.0 1.1 1.0
M 0.8 0.7 0.7 0.7 0.8 0.7
C 0.5 0.6 0.6 0.6 0.6 0.5
S 1.3 0.8 0.8 0.8 0.9 0.9
T 1.1 1.7 1.0 1.0 1.0 1.0
F 1.2 1.2 1.2 1.2 1.2 1.1
W 0.9 1.0 0.9 0.9 0.9 0.8
Y 1.2 1.2 1.2 1.2 1.2 1.5
V 1.1 1.1 1.6 1.1 1.2 1.1
L 0.7 0.7 0.8 0.7 0.7 0.7
I 0.8 0.8 0.8 1.4 1.4 0.8
G 1.1 1.0 1.0 1.0 1.1 1.1
A 0.9 0.9 1.0 1.0 1.0 0.9
  1. Statistical distribution of Pafig full training dataset, including all positive and negative hexapeptides, normalized versus frequencies of aminoacid occurrence in all proteins deposited in UniProt. The expected values for a well balanced training dataset should equal 1. The values above 1 denote over-representation of a residue at the specific location of training hexapeptides, values below 1 show under-representation. The bias from STVIIE is in bold.