Skip to main content

Table 2 MisPred analysis of TrEMBL entries

From: Identification and correction of abnormal, incomplete and mispredicted proteins in public databases

UniProtKB/TrEMBL
Conflict 1 Number of proteins Identified as containing an extracellular domain Percentage Identified as suspicious by MisPred Percentage* False positives Percentage* True errors Percentage* Annotated as fragment or chimera by UniProt Identified as abnormal only by MisPred
Homo sapiens 52237 6732 12.9% 3907 58.0% ND ND ND ND ND ND
Conflict 2 Number of proteins Identified as containing an extra- and an intracellular domain Percentage Identified as suspicious by MisPred Percentage* False positives Percentage* True errors Percentage* Annotated as fragment or chimera by UniProt Identified as abnormal only by MisPred
Homo sapiens 52237 58 0.11% 9 15.5% 9 15.5% 0 0.0% 0 0
Conflict 3 Number of proteins    Identified as suspicious by MisPred Percentage* False positives Percentage* True errors Percentage* Annotated as fragment or chimera by UniProt Identified as abnormal only by MisPred
Homo sapiens 52237    0 0.0% 0 0.0% 0 0.0% 0 0
Conflict 4 Number of proteins Proteins containing domains suitable for the study of domain integrity Percentage Identified as suspicious by MisPred Percentage* False positives Percentage* True errors Percentage* Annotated as fragment or chimera by UniProt Identified as abnormal only by MisPred
Homo sapiens 52237 17073 32.7% 2531 14.8% ND ND ND ND ND ND
Conflict 5 Number of proteins    Identified as suspicious by MisPred Percentage* False positives Percentage* True errors Percentage* Annotated as fragment or chimera by UniProt Identified as abnormal only by MisPred
Homo sapiens 52237    172 0.33% 0 0.00% 172 0.33% 85 87
Mus musculus 50304    40 0.08% ND ND ND ND ND ND
Rattus norvegicus 8557    5 0.06% ND ND ND ND ND ND
Gallus gallus 5549    6 0.11% ND ND ND ND ND ND
Danio rerio 19623    387 1.97% ND ND ND ND ND ND
Caenorhabditis elegans 30000    0 0.00% 0 0 0 0 0 0
Drosophila melanogaster 26947    49 0.18% ND ND ND ND ND ND
  1. *Values for suspicious, false positive and true positive sequences are expressed as percentage of the proteins relevant for the given conflict.
  2. ND – not determined