From: Automatic discovery of cross-family sequence features associated with protein function

Comparison of location vs . function predictors. In panel A, the consensus prediction scores from two different fixed-target predictors over 537 test set sequences are shown in scatter plot form. The red points show scores for two identical but independently trained "nuclear" predictors. As expected, a strong correlation exists between the scores of these two predictors. The blue points show scores from a "nuclear" predictor plotted against the scores from a "transcription" predictor. The scores are still quite well correlated but the distribution of points mainly below the diagonal suggests that proteins that get high scores for "nuclear" do not always have equally high scores for "transcription", which agrees with general observations that not all nuclear proteins are involved in transcription (but all transcription proteins are nuclear). In panel B, accuracy vs. coverage plots are shown for the four combinations of predictors trained and/or tested on "nuclear" and/or "transcription". The data shown here are for the pooled test set proteins from a four-fold cross-validation experiment. The noteworthy result here is the increased performance of the "transcription"-trained predictor (blue line) compared to the "nuclear"-trained predictor (magenta line) when predicting "transcription". Panels C & D show the equivalent data for "secreted" vs. "inhibits" predictors Panels E & F show the data for "cytoplasmic" vs. "biosynthesis" predictors.

