Skip to main content

Table 2 Experimental results for all competing methods on the remote homology detection task using the mismatch(5,1) kernel.

From: Efficient use of unlabeled data for protein sequence classification: a comparative study

  neighborhood (no clustering) clustered neighborhood
dataset ROC ROC50 p-value ROC ROC50 p-value
PDB       
full sequence .9389 .7203 - .9414 .7230 -
region .9698 .8048 .0075 .9705 .8038 .0020
no tails (full seq.) .9379 .7287 .9390 .9378 .7301 .7605
max length (full seq.) .9457 .7359 .4725 .9526 .7491 .3817
Swiss-Prot       
full sequence .9253 .6685 - .9378 .7258 -
region .9757 .8280 .0060 .9773 .8414 .0108
no tails (full seq.) .9290 .6750 .9813 .9344 .6874 .5600
max length (full seq.) .9185 .6094 .1436 .9223 .6201 .0279
NR       
full sequence .9475 .7233 - .9544 .7510 -
region .9837 .8824 1.7e-04 .9874 .8885 1.2e-04
no tails (full seq.) .9554 .7083 .7930 .9584 .7211 .7501
max length (full seq.) .9508 .7421 .7578 .9518 .7613 .9387
  1. * p-value: signed-rank test on ROC50 scores against full sequence in the corresponding setting