Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data

Fig. 2

High-resolution structural information improves protein-protein binding affinity prediction. a We plot the Pearson correlation between predicted and experimentally-determined binding affinities (pKds) for the original affinity-prediction model incorporating only biochemical structural data (see Methods) and three models incorporating crystallographic features (temperature, pH and resolution) as additional parameters. See Methods for model training details. Bars indicate standard errors. b We trained affinity prediction models using high-resolution crystallographic data (≤2.5 Å), NMR structures or both high-resolution and NMR data. We plot the correlation between predicted and experimentally-determined affinities (pKds) for models trained using each type of filtered data set (white series) and compare results to models trained using the complete database of 622 protein-protein dimers (black) and models trained using randomly-selected subsets of the original data set of equal size to the high-resolution training data (gray). Bars indicate standard errors. c We performed leave-one-out cross-validation to evaluate the expected accuracy of affinity-prediction models applied to new data (see Methods). We plot the predicted vs. experimentally-determined binding affinities (pKds) of each cross-validated structural complex for models trained using the complete data set of 622 protein-protein dimers (gray), high-resolution crystallographic data (205 complexes with resolution ≤2.5 Å, red), 165 NMR complexes (orange) and the combined high-resolution + NMR data (370 complexes, blue). We report the best-fit regression line and its standard error as well as the Pearson correlation between predicted and experimentally-determined affinities (r 2) and the RMSD between predicted and experimental affinities. d plots the Pearson correlation and RMSD, respectively, for models trained using each type of filtered data set, with bars indicating standard errors

Back to article page