Table 1 Prediction results for binding sites in 62 Proteins with different data sets used for generating PSSM.

From: PSSM-based prediction of DNA binding sites in proteins

Reference Data Overall Correct predictions (%) Sensitivity (S1) % Specificity (S2) % Net Prediction (S1+S2)/2 %
Sequence only (No PSSM) 73.6 40.6 76.2 58.4(2.5)
PDNA-NR90 375 sequences 63.8 65.9 63.4 64.6(2.1)
PDNA-RDN 1386 sequences 64.0 67.1 63.3 65.2(2.1)
NCBI-NR 1,547,365 sequences 66.7 69.5 63.9 66.7(1.4)
PDB-ALL 47,179 sequences 62.6 65.6 61.8 64.7(1.8)
PIR 283,177 sequences 66.4 68.2 66.0 67.1(2.7)
  1. PDNA refers to sequences from Protein-DNA complexes in the Protein Data Bank; NR90 means non-redundant at 90% sequence identity; RDN means data is redundant because similar proteins have not been removed. Values in the brackets show the standard deviation in values obtained from six cross-validation sets. Note that the sensitivity and specificity values shown in this table only refer to those values which sum up to give the best net prediction. These two scores can be mutually adjusted by changing cutoff threshold as described in the text and hence comparison between the data sets should only be made for the net prediction value (the last column) which is the score optimized during training.