Skip to main content
Fig. 5 | BMC Bioinformatics

Fig. 5

From: String kernels for protein sequence comparisons: improved fold recognition

Fig. 5

Classifying proteins in structural groups based on sequence-based distances. Three levels of structural classifications are considered, the C(lass), A(architecture), and T(opology) levels defined by CATH. Proteins are classified based on their shortest distance to a known group, where the distance is one of seven sequence-based distances between proteins, a RANDOM distance, a FASTA-based distance based on alignment, two distances based on the string kernel defined in this work, corresponding to two different parameter settings, (β,k max )=(0.2,10) and (β,k max )=(0.0001,2), and three other string kernel distances, Subseq [28], Spectrum [31], and WDegree, a weighted string kernel [33,34]). We also include results based on the STRUCTAL SAS scores; those results include structural information and should only be considered for reference. The classification accuracy (y-axis, in %) is computed as the ratio of proteins correctly classified over the total number of test proteins (see text for details)

Back to article page