Skip to main content

Table 1 LocNuclei confusion matrix for 13 nuclear sub-structures

From: Detailed prediction of protein sub-nuclear localization

Observed:- > Predicted:

Chromatin

Nucleolus

Nuclear speckle

PML body

Nuclear lamina

Nuclear matrix

Nuclear envelope

Cajal body

Nuclear pore complex

Nucleoplasm

Kinetochore

Spindle apparatus

Perinucleolar

SUM predicted

Chromatin

506

32

11

7

0

6

3

1

1

4

6

1

1

579

Nucleolus

49

461

29

11

4

12

3

6

2

3

2

1

2

585

Nuclear speckle

14

19

153

2

2

4

1

1

0

2

1

0

0

199

PML body

12

13

9

38

2

3

2

1

1

0

1

0

0

82

Nuclear lamina

5

6

3

2

41

3

7

0

2

0

1

0

0

70

Nuclear matrix

7

9

5

4

2

25

1

0

0

1

0

0

0

54

Nuclear envelope

4

4

1

0

3

1

34

0

6

0

1

0

0

54

Cajal body

3

5

2

0

1

1

0

10

0

1

0

0

0

23

Nuclear pore complex

4

4

1

2

3

0

6

0

15

0

1

0

0

36

Nucleoplasm

39

38

30

11

7

8

5

10

3

13

2

1

2

169

Kinetochore

4

5

0

1

1

0

1

1

2

2

5

0

0

22

Spindle apparatus

32

27

19

8

10

9

7

5

1

1

1

8

1

129

Perinucleolar

0

3

0

0

0

0

0

1

0

0

0

0

4

8

None

18

27

29

9

4

2

2

6

1

2

4

3

3

110

% observed

33

31

14

4

4

3

3

2

2

1

1

1

1

 

SUM observed

697

653

292

95

80

74

72

42

34

29

25

14

13

 
  1. The confusion matrix for LocNuclei predictions on the development set with the columns showing the number of observed and the rows the number of predicted proteins (as shown by the sums provided in the last column and the last row). Correct predictions shown on the diagonal are highlighted in bold. The sub-structures are sorted by the number of available annotations (smallest classes at the bottom/right). The dataset was highly imbalanced in the sub-structures, e.g. only 27 proteins were annotated in spindle apparatus and perinucleolar and the smallest seven of the 13 classes together accounted for only 11% of all annotated, unique proteins (percentage values are given in the row “% observed”). Performance was largely proportional to the class size, i.e. worse for smaller. Nevertheless, LocNuclei succeeded to predict compartments with only a few samples in the training set, e.g. 8 of the 14 proteins located in the Spindle apparatus are correctly predicted