Skip to main content

Table 3 MisPred analysis of EnsEMBL entries

From: Identification and correction of abnormal, incomplete and mispredicted proteins in public databases

EnsEMBL

Conflict 1

Number of proteins

Identified as containing an extracellular domain

Percentage

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*

Homo sapiens

48403

3449

7.13%

277

8.03%

ND

ND

ND

ND

Mus musculus

31302

2038

6.51%

151

7.41%

ND

ND

ND

ND

Rattus norvegicus

33745

2390

7.08%

325

13.6%

ND

ND

ND

ND

Monodelphis domestica

32690

2369

7.25%

661

27.9%

ND

ND

ND

ND

Gallus gallus

24168

1519

6.29%

413

27.19%

ND

ND

ND

ND

Xenopus tropicalis

28324

2383

8.41%

931

39.07%

ND

ND

ND

ND

Fugu rubripes

22102

1612

7.29%

627

38.9%

ND

ND

ND

ND

Danio rerio

36065

3312

9.18%

1224

36.96%

ND

ND

ND

ND

Ciona intestinalis

20000

1452

7.26%

670

46.14%

ND

ND

ND

ND

Caenorhabditis elegans

26439

918

3.47%

117

12.75%

ND

ND

ND

ND

Drosophila melanogaster

19789

1071

5.41%

120

11.2%

ND

ND

ND

ND

Conflict 2

Number of proteins

Identified as containing an extra- and an intracellular domain

Percentage

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*

Homo sapiens

48403

101

0.21%

18

17.82%

18

17.82%

0

0.00%

Mus musculus

31302

50

0.16%

4

8.00%

4

8.00%

0

0.00%

Rattus norvegicus

33745

67

0.2%

12

17.91%

10

14.93%

2

2.99%

Monodelphis domestica

32690

101

0.31%

25

24.75%

9

8.91%

16

15.84%

Gallus gallus

24168

45

0.19%

5

11.11%

4

8.89%

1

2.22%

Xenopus tropicalis

28324

57

0.2%

11

19.3%

5

8.77%

6

10.53%

Fugu rubripes

22102

58

0.26%

19

32.76%

12

20.69%

7

12.07%

Danio rerio

36065

75

0.21%

8

10.67%

7

9.33%

1

1.33%

Ciona intestinalis

20000

29

0.15%

2

6.90%

2

6.90%

0

0.00%

Caenorhabditis elegans

26439

12

0.05%

1

8.33%

1

8.33%

0

0.00%

Drosophila melanogaster

19789

16

0.08%

1

6.25%

1

6.25%

0

0.00%

Conflict 3

Number of proteins

  

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*

Homo sapiens

48403

  

1

0.002%

0

0.00%

1

0.002%

Mus musculus

31302

  

3

0.01%

0

0.00%

3

0.01%

Rattus norvegicus

33745

  

3

0.01%

0

0.00%

3

0.01%

Monodelphis domestica

32690

  

0

0.00%

0

0.00%

0

0.00%

Gallus gallus

24168

  

1

0.004%

0

0.00%

1

0.004%

Xenopus tropicalis

28324

  

0

0.00%

0

0.00%

0

0.00%

Fugu rubripes

22102

  

2

0.01%

0

0.00%

2

0.01%

Danio rerio

36065

  

0

0.00%

0

0.00%

0

0.00%

Ciona intestinalis

20000

  

0

0.00%

0

0.00%

0

0.00%

Caenorhabditis elegans

26439

  

0

0.00%

0

0.00%

0

0.00%

Drosophila melanogaster

19789

  

0

0.00%

0

0.00%

0

0.00%

Conflict 4

Number of proteins

Proteins containing domains suitable for the study of domain integrity

Percentage

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*

Homo sapiens

48403

16681

34.46%

850

5.1%

ND

ND

ND

ND

Mus musculus

31302

9955

31.80%

306

3.07%

ND

ND

ND

ND

Rattus norvegicus

33745

11826

35.05%

474

4.01%

ND

ND

ND

ND

Monodelphis domestica

32690

11847

36.24%

381

3.22%

ND

ND

ND

ND

Gallus gallus

24168

6261

25.91%

383

6.12%

ND

ND

ND

ND

Xenopus tropicalis

28324

6733

23.78%

318

4.72%

ND

ND

ND

ND

Fugu rubripes

22102

5464

24.72%

278

5.09%

ND

ND

ND

ND

Danio rerio

36065

9402

26.07%

591

6.29%

ND

ND

ND

ND

Ciona intestinalis

20000

2114

10.57%

147

6.95%

ND

ND

ND

ND

Caenorhabditis elegans

26439

3039

11.49%

86

2.83%

ND

ND

ND

ND

Drosophila melanogaster

19789

3341

16.88%

58

1.74%

ND

ND

ND

ND

Conflict 5

Number of proteins

  

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*

Homo sapiens

48403

  

0

0.00%

0

0.00%

0

0.00%

Danio rerio

36065

  

9

0.02%

7

0.02%

2

0.01%

  1. *Values for suspicious, false positive and true positive sequences are expressed as percentage of the proteins relevant for the given conflict.
  2. ND – not determined