Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

Figure 1

Flow chart of the method. Panel A shows a pair of interacting domain families, PfamA and PfamB, where for protein pairs across families, 3D structures are known that confirm interaction. The consensus sequence of each alignment as well as the set of interacting positions are used to produce random sequences from each family. Panel B shows the architecture of two ipHMMs, containing interacting states (interface residues) marked as M i , and non-interacting as M ni . Each protein sequence (positive or negative) is aligned to its corresponding ipHMM, and the sufficient statistics of this alignment are used to characterize the sequence by means of Fisher vectors (panel C). In panel D, feature selection is calculated for the entire dataset using SVDPO, and it shows how all the dimensionality reduced vectors can be placed in the same vector space, which leaves them ready for training a support vector machine (SVM). The blue box shows a query sequence pair, where each of the proteins aligns to one of the domain families. Random negative examples are generated again, but now to be used in testing. Panels B, C and D work the same way as in training. In panel E, the SVM is used for classifying test examples. All distances to the hyperplane form a histogram (panel F), where the query sequence, if it is an actual interacting pair, is expected to have a large Z-score.

Back to article page