Skip to main content
Fig. 3 | BMC Bioinformatics

Fig. 3

From: String kernels for protein sequence comparisons: improved fold recognition

Fig. 3

Parameterizing the string kernel SeqKernel. The string kernel defined in this paper is defined by two parameters, β and k max (see text for details) We varied those two parameters in the respective ranges [10−5,1] and [1,20]; for each corresponding pairs of values, we applied the corresponding kernel to compute the similarities of all pairs of proteins in CATH2833 and CATH793 and checked the rankings of these similarities with the CATH classification of the proteins, using a ROC analysis. The corresponding AUC values are reported in panels a and b, respectively. High values of AUC indicate better fold recognition. Notice the different behaviors on the two datasets (panel a vs panel b). In panel c and d, we report in parallel the Pearson’s correlation coefficients between the kernel similarity measures and the STRUCTAL SAS values, for all pairs of parameters considered. As we assess the performance of SeqKernel in fold recognition, the SeqKernel values are expected to mimic the SAS scores, and therefore the larger the correlation coefficient, the better the performance of SeqKernel

Back to article page