A meta-learning approach for B-cell conformational epitope prediction

Hu, Yuh-Jyh; Lin, Shun-Chien; Lin, Yu-Lung; Lin, Kuan-Hui; You, Shun-Ning

doi:10.1186/s12859-014-0378-y

Methodology article
Open access
Published: 18 November 2014

A meta-learning approach for B-cell conformational epitope prediction

Yuh-Jyh Hu^1,2,
Shun-Chien Lin²,
Yu-Lung Lin²,
Kuan-Hui Lin¹ &
…
Shun-Ning You¹

BMC Bioinformatics volume 15, Article number: 378 (2014) Cite this article

3871 Accesses
16 Citations
1 Altmetric
Metrics details

Abstract

Background

One of the major challenges in the field of vaccine design is identifying B-cell epitopes in continuously evolving viruses. Various tools have been developed to predict linear or conformational epitopes, each relying on different physicochemical properties and adopting distinct search strategies. We propose a meta-learning approach for epitope prediction based on stacked and cascade generalizations. Through meta learning, we expect a meta learner to be able integrate multiple prediction models, and outperform the single best-performing model. The objective of this study is twofold: (1) to analyze the complementary predictive strengths in different prediction tools, and (2) to introduce a generic computational model to exploit the synergy among various prediction tools. Our primary goal is not to develop any particular classifier for B-cell epitope prediction, but to advocate the feasibility of meta learning to epitope prediction. With the flexibility of meta learning, the researcher can construct various meta classification hierarchies that are applicable to epitope prediction in different protein domains.

Results

We developed the hierarchical meta-learning architectures based on stacked and cascade generalizations. The bottom level of the hierarchy consisted of four conformational and four linear epitope prediction tools that served as the base learners. To perform consistent and unbiased comparisons, we tested the meta-learning method on an independent set of antigen proteins that were not used previously to train the base epitope prediction tools. In addition, we conducted correlation and ablation studies of the base learners in the meta-learning model. Low correlation among the predictions of the base learners suggested that the eight base learners had complementary predictive capabilities. The ablation analysis indicated that the eight base learners differentially interacted and contributed to the final meta model. The results of the independent test demonstrated that the meta-learning approach markedly outperformed the single best-performing epitope predictor.

Conclusions

Computational B-cell epitope prediction tools exhibit several differences that affect their performances when predicting epitopic regions in protein antigens. The proposed meta-learning approach for epitope prediction combines multiple prediction tools by integrating their complementary predictive strengths. Our experimental results demonstrate the superior performance of the combined approach in comparison with single epitope predictors.

Background

The ability of an antibody to respond to an antigen, such as a virus capsid protein fragment, depends on the antibody's specific recognition of an epitope, which is the antigenic site to which an antibody binds. Based on their structure and interaction with antibodies, epitopes can be divided into two categories: linear and conformational. A linear epitope is formed by a continuous sequence of amino acids, whereas a conformational epitope is composed of discontinuous primary sequences, which are close in three-dimensional space.

Several different approaches exist for predicting linear and conformational epitopes. Previous studies relied on the varying physicochemical properties of amino acids to predict linear epitopes [1]-[3]. A study on 484 amino acid scales revealed that predictions based on the best-performing scales poorly correlated with experimentally confirmed epitopes [4]. This result prompted the development of machine-learning methods to improve prediction. BepiPred combines amino acid propensity scales with a hidden Markov model to achieve marginal improvement over methods based on physicochemical properties [5]. ABCPred uses artificial neural networks (ANN) for predicting linear B-cell epitopes [6]. Chen et al. proposed the novel amino acid pair (AAP) antigenicity scale [7], for which the authors trained a support vector machine (SVM) classifier, using the AAP propensity scale to distinguish epitopes and nonepitopes. BCPREDS uses SVM combined with a variety of kernel methods, including string kernels, radial basis kernels, and subsequence kernels, to predict linear B-cell epitopes [8].

An increase in the availability of protein structures has enabled the identification of conformational epitopes by using various computational methods. For example, DiscoTope 2.0 uses a combination of amino acid composition information, spatial neighborhood information, and a surface measure for predicting epitopes [9]. ElliPro uses Thornton's propensities and applies residue clustering to identify epitopes [10]. SEPPA 2.0 predicts conformational epitopes based on the unit patches of residue triangles, and the clustering coefficient for describing local spatial context and compactness with two new parameters appended, ASA (Accessible Surface Area) propensity, and consolidated amino acid index [11]. EPITOPIA combines structural and physiochemical features, and adopts a Bayesian classifier to predict epitopes [12]. EPSVR uses a support vector regression method to predict conformational epitopes. The meta learner EPMeta incorporates consensus results from multiple prediction servers by using a voting mechanism [13].

In this study, we propose combining multiple predictions to improve epitope prediction based on two meta-learning strategies: stacked generalization (stacking) [14],[15] and cascade generalization (cascade) [16],[17]. These strategies work in a hierarchical architecture of meta learners and base learners, in which the input space for meta learners is extended by the predictions of the base learners. We selected several linear and conformational epitope predictors as the base learners, and evaluated four inductive learning algorithms as the meta learners. To evaluate performance, we tested the combinatorial method on an independent set of antigen proteins that were not used previously to train the epitope prediction tools according to the documents on the tools and their publications. Our results indicate the potential of meta learning for epitope prediction.

Results and discussion

Prediction correlations between base learners

For a meta-learning method to perform effectively, the base learners must have complementary predictive capabilities, which can be reflected by relatively low correlation among their predictions. We selected four conformational and four linear epitope predictors as our base learners. The conformational predictors were DiscoTope 2.0 [9], ElliPro [10], SEPPA 2.0 [11], and Bpredictor [18], and the linear epitope predictors were BepiPred [5], ABCpred [6], AAP [7], and BCPREDS [8]. We calculated the Pearson's correlation coefficients for the prediction scores produced by the base prediction tools. To further analyze the correlations among predictions based on the score rankings, we sorted the prediction scores of all protein residues provided by each base learner and then conducted a Spearman's rank correlation analysis. Tables 1 and 2 list the Pearson's correlation coefficients and Spearman's rank correlation coefficients of all pairs of linear and conformational predictors, respectively. The average correlation coefficients of the linear and conformational prediction tools were 0.383 vs. 0.384 and 0.370 vs. 0.459 in the Pearson's and Spearman's correlation analyses, respectively, which indicate a relatively weak correlation among the epitope predictions of the base learners.

Table 1 Correlation analysis of linear epitope predictors

A meta-learning approach for B-cell conformational epitope prediction

Abstract

Background

Results

Conclusions

Background

Results and discussion

Prediction correlations between base learners

Performances of meta classifiers and base learners

Ablation analysis

Conclusions

Methods

Epitope prediction as inductive learning

Meta learning: stacked generalization and cascade generalization

Analysis of prediction performances: data sets and performance measures

Correlation analysis and ablation study

Availability

Authors' contributions

Additional file

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us