Skip to main content

Table 5 MisPred analysis of human genes predicted by the EnsEMBL and NCBI's GNOMON pipelines

From: Identification and correction of abnormal, incomplete and mispredicted proteins in public databases

EnsEMBL

Conflict 1

Number of proteins

Identified as containing an extracellular domain

Percentage

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*

Homo sapiens

2772

147

5.3%

23

15.65%

ND

ND

ND

ND

Monodelphis domestica

10519

680

6.46%

137

20.15%

ND

ND

ND

ND

Gallus gallus

6139

345

5.62%

113

32.75%

ND

ND

ND

ND

Danio rerio

10289

860

8.36%

317

36.86%

ND

ND

ND

ND

Conflict 2

Number of proteins

Identified as containing an extra- and an intracellular domain

Percentage

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*

Homo sapiens

2772

1

0.04%

0

0.00%

0

0.00%

0

0.00%

Monodelphis domestica

10519

10

0.1%

0

0.00%

0

0.00%

0

0.00%

Gallus gallus

6139

2

0.03%

0

0.00%

0

0.00%

0

0.00%

Danio rerio

10289

20

0.19%

5

25%

4

20%

1

5%

Conflict 3

Number of proteins

  

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*

Homo sapiens

2772

  

0

0.00%

0

0.00%

0

0.00%

Monodelphis domestica

10519

  

0

0.00%

0

0.00%

0

0.00%

Gallus gallus

6139

  

0

0.00%

0

0.00%

0

0.00%

Danio rerio

10289

  

0

0.00%

0

0.00%

0

0.00%

Conflict 4

Number of proteins

Proteins containing domains suitable for the study of domain integrity

Percentage

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*

Homo sapiens

2772

722

26.05%

48

6.65%

ND

ND

ND

ND

Monodelphis domestica

10519

3726

35.42%

119

3.19%

ND

ND

ND

ND

Gallus gallus

6139

1640

26.72%

159

9.70%

ND

ND

ND

ND

Danio rerio

10289

2565

24.93%

197

7.68%

ND

ND

ND

ND

Conflict 5

Number of proteins

  

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*

Homo sapiens

2772

  

0

0.00%

0

0.00%

0

0.00%

Danio rerio

10289

  

0

0.00%

0

0.00%

0

0.00%

NCBI/GNOMON

Conflict 1

Number of proteins

Identified as containing an extracellular domain

Percentage

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*

Homo sapiens

3012

139

4.61%

32

23.02%

ND

ND

ND

ND

Monodelphis domestica

9703

642

6.62%

112

17.45%

ND

ND

ND

ND

Gallus gallus

5604

310

5.53%

88

28.39%

ND

ND

ND

ND

Danio rerio

8905

742

8.33%

158

21.29%

ND

ND

ND

ND

Conflict 2

Number of proteins

Identified as containing an extra- and an intracellular domain

Percentage

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*

Homo sapiens

3012

2

0.07%

0

0%

0

0.00%

0

0.00%

Monodelphis domestica

9703

17

0.18%

4

23.53%

2

11.76%

2

11.76%

Gallus gallus

5604

3

0.05%

1

33.33%

1

33.33%

0

0.00%

Danio rerio

8905

16

0.18%

6

37.5%

4

25%

2

12.5%

Conflict 3

Number of proteins

  

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*

Homo sapiens

3012

  

0

0.00%

0

0.00%

0

0.00%

Monodelphis domestica

9703

  

1

0.01%

0

0.00%

1

0.01%

Gallus gallus

5604

  

0

0.00%

0

0.00%

0

0.00%

Danio rerio

8905

  

2

0.02%

1

0.01%

1

0.01%

Conflict 4

Number of proteins

Proteins containing domains suitable for the study of domain integrity

Percentage

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*

Homo sapiens

3012

792

26.3%

41

5.18%

ND

ND

ND

ND

Monodelphis domestica

9703

3420

35.25%

39

1.14%

ND

ND

ND

ND

Gallus gallus

5604

1500

26.77%

208

13.87%

ND

ND

ND

ND

Danio rerio

8905

2059

23.12%

300

14.57%

ND

ND

ND

ND

Conflict 5

Number of proteins

  

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*

Homo sapiens

3012

  

1

0.03%

0

0.00%

1

0.03%

Danio rerio

8905

  

5

0.06%

5

0.06%

0

0.00%

  1. *Values for suspicious, false positive and true positive sequences are expressed as percentage of the proteins relevant for the given conflict.
  2. ND – not determined.
  3. The data refer to human genes for which both gene prediction pipelines generated at least one gene model.