Comparison of methods for predicting protein-protein interactions. The top three panels plot the average precision (TP/(TP+FP)) as a function of recall (TP/(TP+FN)) for the core, core subset and small-scale benchmarks. Each precision is averaged across the 15 splits of 3×5cv, and estimated for test sets for which the negative examples are not downsampled. In the lower three panels, an edge from method A to B indicates that method A outperforms method B at p > 0.5 according to a Wilcoxon signed rank test applied to the area under the precision-recall curve, computed separately for each of the 15 splits of 3×5cv. Redundant edges have been removed for clarity; i.e., the figure shows the transitive reduction of the full graph.