Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Choosing negative examples for the prediction of protein-protein interactions

Figure 1

The dependence of prediction accuracy, quantified by the area under the ROC/ROC50 curves, on the co-localization threshold used to choose negative examples. Enforcing the condition that no two proteins in the set of negative examples have a GO component similarity that is greater than a given threshold (the co-localization threshold) imposes a constraint on the distribution of negative examples. This constraint makes it easier for the classifier to distinguish between positive and negative examples, and the effect gets stronger as the co-localization threshold becomes smaller. All methods are SVM-based classifiers trained using different kernels on two interaction datasets. Results are computed using five-fold cross-validation, averaged over five drawings of negative examples. The spectrum kernel method uses pairs of k-mers as features; the motif method uses the composition of discrete sequence motifs, and the non-sequence method uses features such as co-expression as measured in microarray experiments, similarity in GO process and function annotations etc. We performed our experiment on two yeast physical interaction datasets: the BIND data is derived from the BIND database; the experiments using the non-sequence data were performed on a subset of reliable interactions that are found by multiple assays in BIND; DIP/MIPS is a dataset of reliable interactions derived from the DIP and MIPS databases.

Back to article page