Skip to main content

Table 2 MisPred analysis of TrEMBL entries

From: Identification and correction of abnormal, incomplete and mispredicted proteins in public databases

UniProtKB/TrEMBL

Conflict 1

Number of proteins

Identified as containing an extracellular domain

Percentage

Identified as suspicious by MisPred

Percentage*

False positives

Percentage*

True errors

Percentage*

Annotated as fragment or chimera by UniProt

Identified as abnormal only by MisPred

Homo sapiens

52237

6732

12.9%

3907

58.0%

ND

ND

ND

ND

ND

ND

Conflict 2

Number of proteins

Identified as containing an extra- and an intracellular domain

Percentage

Identified as suspicious by MisPred

Percentage*

False positives

Percentage*

True errors

Percentage*

Annotated as fragment or chimera by UniProt

Identified as abnormal only by MisPred

Homo sapiens

52237

58

0.11%

9

15.5%

9

15.5%

0

0.0%

0

0

Conflict 3

Number of proteins

  

Identified as suspicious by MisPred

Percentage*

False positives

Percentage*

True errors

Percentage*

Annotated as fragment or chimera by UniProt

Identified as abnormal only by MisPred

Homo sapiens

52237

  

0

0.0%

0

0.0%

0

0.0%

0

0

Conflict 4

Number of proteins

Proteins containing domains suitable for the study of domain integrity

Percentage

Identified as suspicious by MisPred

Percentage*

False positives

Percentage*

True errors

Percentage*

Annotated as fragment or chimera by UniProt

Identified as abnormal only by MisPred

Homo sapiens

52237

17073

32.7%

2531

14.8%

ND

ND

ND

ND

ND

ND

Conflict 5

Number of proteins

  

Identified as suspicious by MisPred

Percentage*

False positives

Percentage*

True errors

Percentage*

Annotated as fragment or chimera by UniProt

Identified as abnormal only by MisPred

Homo sapiens

52237

  

172

0.33%

0

0.00%

172

0.33%

85

87

Mus musculus

50304

  

40

0.08%

ND

ND

ND

ND

ND

ND

Rattus norvegicus

8557

  

5

0.06%

ND

ND

ND

ND

ND

ND

Gallus gallus

5549

  

6

0.11%

ND

ND

ND

ND

ND

ND

Danio rerio

19623

  

387

1.97%

ND

ND

ND

ND

ND

ND

Caenorhabditis elegans

30000

  

0

0.00%

0

0

0

0

0

0

Drosophila melanogaster

26947

  

49

0.18%

ND

ND

ND

ND

ND

ND

  1. *Values for suspicious, false positive and true positive sequences are expressed as percentage of the proteins relevant for the given conflict.
  2. ND – not determined