Skip to main content

Table 1 LocNuclei confusion matrix for 13 nuclear sub-structures

From: Detailed prediction of protein sub-nuclear localization

Observed:- > Predicted:ChromatinNucleolusNuclear specklePML bodyNuclear laminaNuclear matrixNuclear envelopeCajal bodyNuclear pore complexNucleoplasmKinetochoreSpindle apparatusPerinucleolarSUM predicted
Chromatin50632117063114611579
Nucleolus4946129114123623212585
Nuclear speckle14191532241102100199
PML body121393823211010082
Nuclear lamina5632413702010070
Nuclear matrix7954225100100054
Nuclear envelope4410313406010054
Cajal body3520110100100023
Nuclear pore complex4412306015010036
Nucleoplasm3938301178510313212169
Kinetochore450110112250022
Spindle apparatus32271981097511181129
Perinucleolar03000001000048
None1827299422612433110
% observed3331144433221111 
SUM observed69765329295807472423429251413 
  1. The confusion matrix for LocNuclei predictions on the development set with the columns showing the number of observed and the rows the number of predicted proteins (as shown by the sums provided in the last column and the last row). Correct predictions shown on the diagonal are highlighted in bold. The sub-structures are sorted by the number of available annotations (smallest classes at the bottom/right). The dataset was highly imbalanced in the sub-structures, e.g. only 27 proteins were annotated in spindle apparatus and perinucleolar and the smallest seven of the 13 classes together accounted for only 11% of all annotated, unique proteins (percentage values are given in the row “% observed”). Performance was largely proportional to the class size, i.e. worse for smaller. Nevertheless, LocNuclei succeeded to predict compartments with only a few samples in the training set, e.g. 8 of the 14 proteins located in the Spindle apparatus are correctly predicted