Software | Open | Published:
ElliPro: a new structure-based tool for the prediction of antibody epitopes
BMC Bioinformaticsvolume 9, Article number: 514 (2008)
Reliable prediction of antibody, or B-cell, epitopes remains challenging yet highly desirable for the design of vaccines and immunodiagnostics. A correlation between antigenicity, solvent accessibility, and flexibility in proteins was demonstrated. Subsequently, Thornton and colleagues proposed a method for identifying continuous epitopes in the protein regions protruding from the protein's globular surface. The aim of this work was to implement that method as a web-tool and evaluate its performance on discontinuous epitopes known from the structures of antibody-protein complexes.
Here we present ElliPro, a web-tool that implements Thornton's method and, together with a residue clustering algorithm, the MODELLER program and the Jmol viewer, allows the prediction and visualization of antibody epitopes in a given protein sequence or structure. ElliPro has been tested on a benchmark dataset of discontinuous epitopes inferred from 3D structures of antibody-protein complexes. In comparison with six other structure-based methods that can be used for epitope prediction, ElliPro performed the best and gave an AUC value of 0.732, when the most significant prediction was considered for each protein. Since the rank of the best prediction was at most in the top three for more than 70% of proteins and never exceeded five, ElliPro is considered a useful research tool for identifying antibody epitopes in protein antigens. ElliPro is available at http://tools.immuneepitope.org/tools/ElliPro.
The results from ElliPro suggest that further research on antibody epitopes considering more features that discriminate epitopes from non-epitopes may further improve predictions. As ElliPro is based on the geometrical properties of protein structure and does not require training, it might be more generally applied for predicting different types of protein-protein interactions.
An antibody epitope, aka B-cell epitope or antigenic determinant, is a part of an antigen recognized by either a particular antibody molecule or a particular B-cell receptor of the immune system . For a protein antigen, an epitope may be either a short peptide from the protein sequence, called a continuous epitope, or a patch of atoms on the protein surface, called a discontinuous epitope. While continuous epitopes can be directly used for the design of vaccines and immunodiagnostics, the objective of discontinuous epitope prediction is to design a molecule that can mimic the structure and immunogenic properties of an epitope and replace it either in the process of antibody production–in this case an epitope mimic can be considered as a prophylactic or therapeutic vaccine–or antibody detection in medical diagnostics or experimental research [2, 3].
If continuous epitopes can be predicted using sequence-dependent methods built on available collections of immunogenic peptides (for review see ), discontinuous epitopes–that are mostly the case when a whole protein, pathogenic virus, or bacteria is recognized by the immune system–are difficult to predict or identify from functional assays without knowledge of a three-dimensional (3D) structure of a protein [5, 6]. The first attempts at epitope prediction based on 3D protein structure began in 1984 when a correlation was established between crystallographic temperature factors and several known continuous epitopes of tobacco mosaic virus protein, myoglobin and lysozyme . A correlation between antigenicity, solvent accessibility, and flexibility of antigenic regions in proteins was also found . Thornton and colleagues  proposed a method for identifying continuous epitopes in the protein regions protruding from the protein's globular surface. Regions with high protrusion index values were shown to correspond to the experimentally determined continuous epitopes in myoglobin, lysozyme and myohaemerythrin .
Here we present ElliPro (derived from Elli psoid and Pro trusion), a web-tool that implements a modified version of Thornton's method  and, together with a residue clustering algorithm, the MODELLER program  and the Jmol viewer, allows the prediction and visualization of antibody epitopes in protein sequences and structures. ElliPro has been tested on a benchmark dataset of epitopes inferred from 3D structures of antibody-protein complexes  and compared with six structure-based methods, including the only two existing methods developed specifically for epitope prediction, CEP  and DiscoTope ; two protein-protein docking methods, DOT  and PatchDock ; and two structure-based methods for protein-protein binding site prediction, PPI-PRED  and ProMate . ElliPro is available at http://tools.immuneepitope.org/tools/ElliPro.
The tool input
ElliPro is implemented as a web accessible application and accepts two types of input data: protein sequence or structure (Fig. 1, Step 1). In the first case, the user may input either a protein SwissProt/UniProt ID or a sequence in either FASTA format or single letter codes and select threshold values for BLAST e-value and the number of structural templates from PDB that will be used to model a 3D structure of the submitted sequence (Fig. 1, Step 2a). In the second case, the user may input either a four-character PDB ID or submit her own PDB file in PDB format (Fig. 1, Step 2b). If the submitted structure consists of more than one protein chain, ElliPro will ask the user to select the chain(s) upon which to base the calculation. The user can change threshold values on the parameters used by ElliPro for epitope prediction, namely, the minimum residue score (protrusion index), denoted here as S, between 0.5 and 1.0 and the maximum distance, denoted as R, in the range 4 – 8Å.
3D Structure Modeling
If a protein sequence is used as input, ElliPro searches for the protein or its homologues in PDB , using a BLAST search . If a protein cannot be found in PDB that matches the BLAST criteria, MODELLER  is run to predict the protein 3D structure. The user may change the threshold values for BLAST e-value and a number of templates that MODELLER uses as an input (Fig. 1, Step 2a).
ElliPro implements three algorithms performing the following tasks: (i) approximation of the protein shape as an ellipsoid ; (ii) calculation of the residue protrusion index (PI) ; and (iii) clustering of neighboring residues based on their PI values.
Thornton's method for continuous epitope prediction was based on the two first algorithms and only considered Cα atoms . It approximated the protein surface as an ellipsoid, which can vary in sizes to include different percentages of the protein atoms; for example, the 90% ellipsoid includes 90% of the protein atoms. For each residue, a protrusion index (PI) was defined as percentage of the protein atoms enclosed in the ellipsoid at which the residue first becomes lying outside the ellipsoid; for example, all residues that are outside the 90% ellipsoid will have PI = 9 (or 0.9 in ElliPro). In implementing the first two algorithms, ElliPro differs from Thornton's method by considering each residue's center of mass rather than its Cα atom.
The third algorithm for clustering residues defines a discontinuous epitope based on the threshold values for the protrusion index S and the distance R between each residue's centers of mass. All protein residues with a PI values greater than S are considered when calculating discontinuous epitopes. Clustering separate residues into discontinuous epitopes involves three steps that are recursively repeated until distinct clusters with no overlapping residues are formed. First, primary clusters are formed from single residues and their neighboring residues within the distance R. Second, secondary clusters are formed from primary clusters where at least three centers of mass are within the distance R from each other. Third, tertiary clusters are formed from secondary clusters which contain common residues. These tertiary clusters of residues represent distinct discontinuous epitopes predicted in the protein. The score for each epitope is defined as a PI value averaged over epitope residues.
3D visualization of Predicted Epitopes
Results and Discussion
For evaluation of ElliPro performance and comparison with other methods we used a previously established benchmark approach for discontinuous epitopes . We tested ElliPro on a dataset of 39 epitopes present in 39 protein structures where only one discontinuous epitope was known based on 3D structures of two-chain antibody fragments with one-chain protein antigens .
Depending on the threshold values for parameters R and S, ElliPro predicted different number of epitopes in each protein; for an R of 6Å and S of 0.5, the average number of predicted epitopes in each protein analyzed was 4, with a variance from 2 to 8. For example, for Plasmodium vivax ookinete surface protein Pvs25 [PDB: 1Z3G, chain A], ElliPro predicted four epitopes with scores of 0.763, 0.701, 0.645, and 0.508, respectively (Fig. 2).
For each predicted epitope in each protein, we calculated the correctly (TP) and incorrectly predicted epitope residues (FN) and non-epitope residues, which were defined as all other protein residues (TN and FN). The statistical significance of a prediction, that is, the difference between observed and expected frequencies of an actual epitope/non-epitope residue in the predicted epitope/non-epitope, was determined using Fisher's exact test (right-tailed). The prediction was considered significant if the P-value was = 0.05. Then, for each prediction the following parameters were calculated:
Sensitivity (recall or true positive rate (TPR)) = TP/(TP + FN) – a proportion of correctly predicted epitope residues (TP) with respect to the total number of epitope residues (TP+FN).
Specificity (or 1 – false positive rate (FPR)) = 1 - FP/(TN + FP) – a proportion of correctly predicted non-epitope residues (TN) with respect to the total number of non-epitope residues (TN+FP).
Positive predictive value (PPV) (precision) = TP/(TP + FP) – a proportion of correctly predicted epitope residues (TP) with respect to the total number of predicted epitope residues (TP+FN).
Accuracy (ACC) = (TP + TN)/(TP + FN + FP + TN) – a proportion of correctly predicted epitope and non-epitope residues with respect to all residues.
Area under the ROC Curve (AUC) – area under a graph representing a dependency of TPR against FPR; that is, sensitivity against 1-specificity. The AUC gives the general performance of the method and is "equivalent to the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance" .
For example, for the first predicted epitope in Plasmodium vivax ookinete surface protein Pvs25 [PDB:1Z3G, chain A] (Fig. 2), for an R of 6Å and S of 0.5, TP = 13, FP = 13, TN = 156, FN = 4, P-value = 5.55E-10, giving a sensitivity of 0.76, a specificity of 0.92, an accuracy of 0.91, and an AUC of 0.84. The results and detailed statistics of ElliPro performance for each epitope and other threshold values for R and S are provided in the supplementary materials [see Additional file 1].
The statistics averaged over all epitopes and overall statistics calculated from FP, FN, TP, and TN values summarized for the whole pool of epitope and non-epitope residues are presented in Table 1 and Fig. 3. The results for the methods other than ElliPro have been obtained as described in . ElliPro performed best, by AUC values, with the score S set at 0.7 and the distance R set at 6Å when the prediction with the highest score was considered for each protein and with the score S set at 0.5 and the distance R set at 6Å when the best by significance or average prediction was taken into account. Results are described using these thresholds (Table 1, Fig. 3); the results at other threshold values are provided in the supplementary materials [see Additional file 1].
ElliPro's top predictions, that are those with the highest scores, correlated poorly with the discontinuous epitopes known from 3D structures of antibody-protein complexes (Table 1, overall statistics, AUC = 0.523). DiscoTope and the first models from the docking methods performed better, giving AUC values above 0.6, whereas protein-protein binding site predicting methods, ProMate and PPI-PRED, performed worse. However, when the first predictions with the highest score were considered, ElliPro was the best among all the methods based on specificity (1-specificity = 0.047) and comparable with DiscoTope by precision (PPV = 0.158) (Table 1, overall statistics).
In a next set of metrics, we compared the performance between prediction methods when choosing the best hit within the top 10 predictions of each method. This approach takes into account that each antigen harbors multiple distinct binding sites for different antibodies. Therefore it is expected that the top predicted site is not necessarily recognized by the specific antibody used in the dataset. This comparison directly applies only to the docking methods DOT and PatchDock as well as ElliPro. For DiscoTope, only one epitope is predicted, while for CEP no ranking is available to identify the top 10 predictions.
The docking methods DOT and PatchDock have an intrinsic advantage in this comparison over ElliPro, because they use structures of both protein antigen and antibody from the same antibody-protein complex in order to predict binding sites. To our surprise, when the best significant prediction was considered for each protein, ElliPro nevertheless gave the highest AUC value of 0.732, the highest sensitivity of 0.601 and the second highest precision value of 0.29 among all the compared methods (Table 1; Fig. 3, red circle). The docking methods gave the AUC values of 0.693 for DOT and 0.656 for PatchDock, when also the best prediction of the top ten was considered (Table 1, overall statistics; Fig. 3). The average number of predicted epitopes for the analyzed proteins was four, with the rank of the best prediction at most fifth; for more than a half of proteins the rank was first or second, and the rank first, second, or third for more than 70% of all proteins [see Additional file 1].
ElliPro is based on simple concepts. First, regions protruding from the globular surface of the protein are more available for interaction with an antibody  and second those protrusions can be determined by treating the protein as a simple ellipsoid . Obviously, this is not always the case, especially for multi-domain or large single-domain proteins. However, no correlation between the protein size, which varied from 51 to 429 residues with an average value of 171, or number of domains (8 proteins among the 39 analyzed contained more than one domain) and ElliPro performance was found (data not shown).
ElliPro is a web-based tool for the prediction of antibody epitopes in protein antigens of a given sequence or structure. It implements a previously developed method that represents the protein structure as an ellipsoid and calculates protrusion indexes for protein residues outside of the ellipsoid. ElliPro was tested on a benchmark dataset of discontinuous epitopes inferred from 3D structures of antibody-protein complexes. In comparison with six other structure-based methods that can be used for epitope prediction, ElliPro performed the best (AUC value of 0.732) when the most significant prediction was considered for each protein. Since the rank of the best prediction was at most three in more than 70% of proteins and never exceeded five, ElliPro is considered a potentially useful research tool for identifying antibody epitopes in protein antigens.
While ElliPro was tested on antibody-protein binding sites, it might be interesting to test it on other protein-protein interactions since it implements a method that is based on geometrical properties of protein structure and does not require training.
Comparison with DiscoTope, which is based on training and utilizes epitope features such as amino acid propensities, residue solvent accessibility, spatial distribution, and inter-molecular contacts, suggests that further research on antibody epitopes which considers more features that discriminate epitopes from non-epitopes may improve the prediction of antibody epitopes.
Availability and requirements
Project name: ElliPro
Project home page: http://tools.immuneepitope.org/tools/ElliPro
Operating system(s): Platform independent
Programming language: Java
Other requirements: None
Any restrictions to use by non-academics: None
true negatives, FN: false negatives
Receiver Operating Characteristics
area under the ROC curve.
Peters B, Sidney J, Bourne P, Bui HH, Buus S, Doh G, Fleri W, Kronenberg M, Kubo R, Lund O, et al.: The design and implementation of the immune epitope database and analysis resource. Immunogenetics 2005, 57(5):326–336. 10.1007/s00251-005-0803-5
Bijker MS, Melief CJ, Offringa R, Burg SH: Design and development of synthetic peptide vaccines: past, present and future. Expert Rev Vaccines 2007, 6(4):591–603. 10.1586/147605220.127.116.111
Gomara MJ, Haro I: Synthetic peptides for the immunodiagnosis of human diseases. Curr Med Chem 2007, 14(5):531–546. 10.2174/092986707780059698
Greenbaum JA, Andersen PH, Blythe M, Bui HH, Cachau RE, Crowe J, Davies M, Kolaskar AS, Lund O, Morrison S, et al.: Towards a consensus on datasets and evaluation metrics for developing B-cell epitope prediction tools. J Mol Recognit 2007, 20(2):75–82. 10.1002/jmr.815
Laver WG, Air GM, Webster RG, Smith-Gill SJ: Epitopes on protein antigens: misconceptions and realities. Cell 1990, 61(4):553–556. 10.1016/0092-8674(90)90464-P
Van Regenmortel MHV: Mapping Epitope Structure and Activity: From One-Dimensional Prediction to Four-Dimensional Description of Antigenic Specificity. Methods 1996, 9(3):465–472. 10.1006/meth.1996.0054
Westhof E, Altschuh D, Moras D, Bloomer AC, Mondragon A, Klug A, Van Regenmortel MH: Correlation between segmental mobility and the location of antigenic determinants in proteins. Nature 1984, 311(5982):123–126. 10.1038/311123a0
Novotny J, Handschumacher M, Haber E, Bruccoleri RE, Carlson WB, Fanning DW, Smith JA, Rose GD: Antigenic determinants in proteins coincide with surface regions accessible to large probes (antibody domains). Proc Natl Acad Sci USA 1986, 83(2):226–230. 10.1073/pnas.83.2.226
Thornton JM, Edwards MS, Taylor WR, Barlow DJ: Location of 'continuous' antigenic determinants in the protruding regions of proteins. EMBO J 1986, 5(2):409–413.
Eswar NWB, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A: Comparative Protein Structure Modeling With MODELLER. In Current Protocols in Bioinformatics. 5.6.1–5.6.30. John Wiley & Sons, Inc; 2006.
Ponomarenko JV, Bourne PE: Antibody-protein interactions: benchmark datasets and prediction tools evaluation. BMC Struct Biol 2007, 7: 64. 10.1186/1472-6807-7-64
Kulkarni-Kale U, Bhosle S, Kolaskar AS: CEP: a conformational epitope prediction server. Nucleic Acids Res 2005, 33(Web Server issue):W168–171. 10.1093/nar/gki460
Haste Andersen P, Nielsen M, Lund O: Prediction of residues in discontinuous B-cell epitopes using protein 3D structures. Protein Sci 2006, 15(11):2558–2567. 10.1110/ps.062405906
Mandell JG, Roberts VA, Pique ME, Kotlovyi V, Mitchell JC, Nelson E, Tsigelny I, Ten Eyck LF: Protein docking using continuum electrostatics and geometric fit. Protein Eng 2001, 14(2):105–113. 10.1093/protein/14.2.105
Schneidman-Duhovny D, Inbar Y, Polak V, Shatsky M, Halperin I, Benyamini H, Barzilai A, Dror O, Haspel N, Nussinov R, et al.: Taking geometry to its edge: fast unbound rigid (and hinge-bent) docking. Proteins 2003, 52(1):107–112. 10.1002/prot.10397
Bradford WD Jr: Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 2005, 21(8):1487–94. 10.1093/bioinformatics/bti242
Neuvirth H, Raz R, Schreiber G: ProMate: a structure based prediction program to identify the location of protein-protein binding sites. J Mol Biol 2004, 338(1):181–199. 10.1016/j.jmb.2004.02.040
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28(1):235–242. 10.1093/nar/28.1.235
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215(3):403–410.
Taylor WR, Thornton JM, Turnell WG: An ellipsoidal approximation of protein shape. Journal of Molecular Graphics 1983, 1: 30–38. 10.1016/0263-7855(83)80001-0
Fawcett T: An introduction to ROC analysis. Pattern Recognition Letters 2006, 27: 861–874. 10.1016/j.patrec.2005.10.010
The work was supported by the National Institutes of Health Contract HHSN26620040006C.
HHB conceived, designed and programmed the tool. JVP tested the tool and wrote the manuscript. WL and NF participated in programming the tool. PEB, BP and AS contributed to writing the manuscript. All authors have read and approved the final version of the manuscript.