Skip to main content

Table 1 Experimental results on the remote homology detection task for all competing methods using the triple(1,3) kernel.

From: Efficient use of unlabeled data for protein sequence classification: a comparative study

  neighborhood (no clustering) clustered neighborhood
dataset ROC ROC50 p-value ROC ROC50 p-value
PDB       
full sequence .9476 .7582 - .9515 .7633 -
region .9708 .8265 .0069 .9716 .8246 .0045
no tails (full seq.) .9443 .7522 .5401 .9472 .7559 .5324
max length (full seq.) .9471 .7497 .4407 .9536 .7584 .5468
Swiss-Prot       
full sequence .9245 .6908 - .9464 .7474 -
region .9752 .8556 2.46e-04 .9732 .8605 1.5e-03
no tails (full seq.) .9361 .6938 .8621 .9395 .7160 .6259
max length (full seq.) .9300 .6514 .2589 .9348 .6817 .1369
NR       
full sequence .9419 .7328 - .9556 .7566 -
region .9824 .8861 1.08e-05 .9861 .8944 2.2e-05
no tails (full seq.) .9575 .7438 .6640 .9602 .7486 .8507
max length (full seq.) .9513 .7401 .8656 .9528 .7595 .8696
  1. * p-value: signed-rank test on ROC50 scores against full sequence in the corresponding setting
\