Skip to main content

Table 4 MisPred analysis of NCBI's GNOMON-predicted proteins

From: Identification and correction of abnormal, incomplete and mispredicted proteins in public databases

NCBI/GNOMON
Conflict 1 Number of proteins Identified as containing an extracellular domain Percentage Identified as suspicious by MisPred Percentage* False Positives Percentage* True errors Percentage*
Homo sapiens 10125 287 2.83% 93 32.4% ND ND ND ND
Monodelphis domestica 20110 1293 6.43% 253 19.57% ND ND ND ND
Gallus gallus 14816 909 6.14% 246 27.06% ND ND ND ND
Danio rerio 25356 2108 8.31% 562 26.66% ND ND ND ND
Conflict 2 Number of proteins Identified as containing an extra- and an intracellular domain Percentage Identified as suspicious by MisPred Percentage* False Positives Percentage* True errors Percentage*
Homo sapiens 10125 4 0.04% 0 0% 0 0.00% 0 0.00%
Monodelphis domestica 20110 32 0.16% 6 18.75% 3 9.38% 3 9.38%
Gallus gallus 14816 22 0.15% 5 22.73% 3 13.64% 2 9.09%
Danio rerio 25356 31 0.12% 11 35.48% 5 16.13% 6 19.35%
Conflict 3 Number of proteins    Identified as suspicious by MisPred Percentage* False Positives Percentage* True errors Percentage*
Homo sapiens 10125    0 0.00% 0 0.00% 0 0.00%
Monodelphis domestica 20110    2 0.01% 0 0.00% 2 0.01%
Gallus gallus 14816    2 0.01% 1 0.01% 1 0.01%
Danio rerio 25356    7 0.03% 3 0.01% 4 0.02%
Conflict 4 Number of proteins Proteins containing domains suitable for the study of domain integrity Percentage Identified as suspicious by MisPred Percentage* False Positives Percentage* True errors Percentage*
Homo sapiens 10125 1632 16.12% 255 15.63% ND ND ND ND
Monodelphis domestica 20110 6224 30.95% 111 1.78% ND ND ND ND
Gallus gallus 14816 3564 24.06% 370 10.38% ND ND ND ND
Danio rerio 25356 4387 17.31% 385 8.78% ND ND ND ND
Conflict 5 Number of proteins    Identified as suspicious by MisPred Percentage* False Positives Percentage* True errors Percentage*
Homo sapiens 10125    1 0.01% 0 0.00% 1 0.01%
Danio rerio 25356    25 0.10% 24 0.09% 1 0.004%
  1. *Values for suspicious, false positive and true positive sequences are expressed as percentage of the proteins relevant for the given conflict.
  2. ND – not determined