Benchmarking the PEPOP methods for mimicking discontinuous epitopes

Background Computational methods provide approaches to identify epitopes in protein Ags to help characterizing potential biomarkers identified by high-throughput genomic or proteomic experiments. PEPOP version 1.0 was developed as an antigenic or immunogenic peptide prediction tool. We have now improved this tool by implementing 32 new methods (PEPOP version 2.0) to guide the choice of peptides that mimic discontinuous epitopes and thus potentially able to replace the cognate protein Ag in its interaction with an Ab. In the present work, we describe these new methods and the benchmarking of their performances. Results Benchmarking was carried out by comparing the peptides predicted by the different methods and the corresponding epitopes determined by X-ray crystallography in a dataset of 75 Ag-Ab complexes. The Sensitivity (Se) and Positive Predictive Value (PPV) parameters were used to assess the performance of these methods. The results were compared to that of peptides obtained either by chance or by using the SUPERFICIAL tool, the only available comparable method. Conclusion The PEPOP methods were more efficient than, or as much as chance, and 33 of the 34 PEPOP methods performed better than SUPERFICIAL. Overall, “optimized” methods (tools that use the traveling salesman problem approach to design peptides) can predict peptides that best match true epitopes in most cases.


Background
Ag-Ab interactions are at the heart of the humoral immune response. B-cell epitopes correspond to the regions of the protein Ag that are recognized by the Ab paratope. Epitopes can be continuous (a linear fragment of the protein sequence) or discontinuous (constituted of several fragments scattered in the protein sequence, but nearby on the surface of the folded protein) [1][2][3]. Most protein epitopes are discontinuous [4,5] and therefore very difficult to map. Epitope identification and characterization are, however, pivotal steps in the development of immunodiagnostic tests [6], epitopedriven vaccines [7] and drug design as well as in protein function discovery, biochemical assays or proteomic studies for biomarker discovery. Epitopes can be mapped using various experimental methods [8][9][10][11][12] among which crystallographic analysis of Ag-Ab complexes is considered to give the most reliable information [13,14]. These techniques are, however, time-, resource-and labor-consuming, and, thus, unsuitable for proteomic applications. Computational methods could be an attractive alternative. B-cell epitope prediction methods [9,[15][16][17] try to bioinformatically predict the Ab binding site on a protein sequence or on the 3D structure of a protein Ag. However, epitopes are not structural entities on their own. Epitopes and paratopes are relational entities that are defined by their mutual complementarity [18]. Thus, trying to predict a priori the identity of a protein epitope is a difficult task. For this reason, epitope predictors that take into account the sequence or structure of the Ab have been developed [19][20][21], but are of limited application since available Ab structures are scarce. Moreover, benchmark studies have highlighted that tools for predicting continuous epitopes have low efficiency [22][23][24] and that methods based on the Ag 3D structure show limited sensitivity (Se) and positive predictive value (PPV) [25].
We approached this issue from a slightly different point of view. Considering that the surface of a protein is a mosaic of potential antigenic epitopes, each of which could be bound by a cognate Ab [26], we developed PEPOP [27,28] to generate series of peptide sequences that can replace continuous or discontinuous epitopes in their interaction with their cognate Ab. Differently from discontinuous epitope predictors where the output prediction is either a list of amino acids (aa) or small protein fragments [29][30][31], PEPOP proposes peptide sequences that can be used directly in experiments. This tool promises to facilitate the manipulation of proteins in a way dealing with the output of proteomic studies.
We have previously validated the capacity of PEPOP 1.0 to generate immunogenic [27] and antigenic peptides that can be experimentally probed with Abs to disclose the cognate epitopes [32][33][34][35].
As most Abs against protein Ags recognize discontinuous epitopes, peptide design methods should take into account the structural information and try to guess (mimic) the epitope discontinuity. We thus improved the PEPOP tool (version 2.0) by focusing on methods for better predicting peptides aimed at mimicking discontinuous epitopes. It is now possible using PEPOP to generate large series of peptides that, collectively, should represent the accessible surface of the protein with its mosaic of putative epitopes. Consequently, within these large series of peptides, at least some should appropriately mimic antigenic epitopes.
In the present work, we describe these new methods and the benchmarking of their performances. To this aim, we used a comprehensive methodology and a series of test proteins for which epitopes have been experimentally determined by X-ray crystallography, which is the reference method. We show that the performance of each method is specific and that one method (TSPaa) performs better in these specific benchmarking conditions. We also compared the peptides designed by the different PEPOP methods with those predicted by SUPERFICIAL, in which the 3D structure of the protein surface is transformed into a peptide library [36], or by chance. PEPOP is available at https://www.sys2diag.cnrs. fr/index.php?page=pepop.

PEPOP principle
PEPOP is an algorithm dedicated to the design of peptides that are predicted to replace a protein epitope in its interaction with an Ab [27]. To bioinformatically design a peptide from the 3D structure of a given protein, a reference is chosen as a starting point. This can be a surface-accessible aa or a segment (i.e., a fragment of the protein composed of accessible and contiguous aa, from one to n aa, in the sequence) determined by PEPOP. After the identification of the aa or segments neighboring the reference, a method is used to delineate a path between them and to link them in order to generate the designed peptide. The aa or segments neighboring the reference are selected in an area of extension that can be either a cluster or a patch. To form a cluster, PEPOP groups segments according to their spatial distances. A patch is defined around the reference. A requested peptide length has to be specified by the user in some methods. PEPOP proposes 35 methods: one method (the FPS method already included in PEPOP 1.0) generates peptides representing continuous epitopes, whereas the other 34 methods (of which NN and ONN were already present in PEPOP 1.0) are focused on peptides mimicking discontinuous epitopes.

PEPOP web site
The web site of PEPOP [28] is composed of three sections corresponding to different ways to use PEPOP 2.0 in experimental projects. The section "One specific Peptide Design" is dedicated to the prediction of a solely peptide, for example to use as immunogen to generate anti-protein Abs. For the prediction, parameters are selected by default but can be modified by the user. The section "Paired Peptide Design" is dedicated to the prediction of peptides by pair with the idea that they can be used to prepare Abs which should capture the protein.
To this end, the most distant peptides are proposed to avoid steric hindrance between the future Abs. Finally, the third section "Peptide Bank Prediction" is dedicated to the prediction of antigenic peptides to, for example, experimentally localize an epitope or select an inhibitory peptide. In this case, a great diversity of peptide sequences is wanted in order to increase the chance the Ab recognize at least one peptide since it is known that only one mutation can change an Ab-Ag interaction (Duarte C et al., A mimic of a discontinuous epitope from AaH II identified by combining wet and dry experiments: a new experimental methodology to localize discontinuous epitopes, in preparation).
Each predicted peptide can represent a potential epitope and can be visualized on the 3D structure of the protein.

Prediction capacity controls
To assess the capacity of the different PEPOP methods to predict peptides that mimic epitopes, we used a dataset of experimentally (X-ray crystallography) determined epitopes that was filtered to eliminate any epitope redundancy (Additional file 1: Table S1) [25].

Assessment of the PEPOP method sensibility to detect true epitopes
The correspondence between areas of extension determined by PEPOP and true epitopes was analyzed to verify whether a peptide predicted by PEPOP can theoretically mimic an epitope (Additional file 1: Table S1).
As accurate selection of surface-accessible residues is a crucial PEPOP parameter, we verified that PEPOP correctly identifies as surface-accessible the aa of true epitopes. This is the case for 87.75% (median 88.24%) of the epitopic aa. Most of the aa of an epitope are surfaceaccessible, thus guaranteeing that the final peptide could contain such aa. In comparison, Chen and co. [23] found that 20 to 32% of epitopic aa are buried, whereas we found only 12.2% of buried epitopic aa.
The definition of areas of extension is another important parameter. The number of segments and patches increases with the antigen size (Additional file 2: Figure  S3A), contrary to the number of clusters. Clusters, 10 Åradius patches, 15 Å-radius patches and varying radius patches contain, on average, 39.53, 10.87, 25.86 and 31.67 aa, respectively. Compared to the size of an epitope, clusters, 15 Å-radius patches and varying radius patches are bigger: by defining a minimum peptide length, the peptide can be of an appropriate size.
To measure the adequacy of PEPOP areas with the existing epitopes, aa of clusters and patches were compared to known epitopic aa (Additional file 1: Table S1, Additional file 2: Figure S3B and C). Additional file 1: Table S1 reports the number and percentage of aa in common with the epitope and the types of area which best fit the epitope by or without taking into account the aa positions. These data are reported in a histogram in Additional file 2: Figure S3B. Additional file 2: Figure  S3C shows the distribution of the percentage of aa in common between epitope and PEPOP areas. As expected, with bigger antigens it is more difficult to well fit the epitope. By taking into account the aa positions, the best fitting areas contain between 50 and 100% (mean: 84.19%) of epitopic aa. The best fitting areas are the varying patches in 45 cases, the clusters in 26 cases and the 15 Å-radius patches in 4 cases. Without taking into account the aa positions, the best fitting areas contain between 85 and 100% (mean: 97.94%) of epitopic aa. The best fitting area is the varying patch in 39 cases and the cluster in 36 cases. PEPOP areas fit well existing epitopes, indicating that the predicted peptides should well mimic epitopes.

Methods' redundancy
The redundancy in the output sequences generated by the different PEPOP methods was verified by comparing the set of peptides predicted by each method (see "Peptide prediction" below). The low output redundancy by the different methods (Additional file 2: Figure S2) indicated that the sequences of the generated peptides were highly diverse, except among methods of the same category. Peptides obtained using the SHP-and TSP-based methods showed less similarity with peptides obtained with the other methods (from 0 to 53% and 61%, respectively). The OPP, SHPaa and TSPaa methods were the most original methods because their peptides did not show any or only few similarities with the peptides generated by the other methods (37% at most). As the methods were developed to take into account different parameters, these results indicate that, except for few methods, sampling is large. The PEPOP methods are thus complementary, bringing diversity in the range of predicted peptides. This is useful when trying to represent the huge diversity of possible epitopes on a protein Ag.

Peptide prediction
Each of the 34 methods was used to generate a series of peptide sequences from the 3D coordinates of the 75 Ags. As a protein is composed of a mosaic of epitopes [26,32], any region (cluster or patch) is potentially an epitope. Hence, all the possible peptides from a protein were predicted: each reference, segments or aa was used in turn to design a peptide. In this way, the whole protein surface was represented by the set of peptides generated by a given method.

Requested length
To design a peptide using PEPOP, a sequence length for the predicted peptide has to be chosen (requested length). Nonetheless, as segments of variable lengths are used to build the peptide sequence, the final peptide length might differ from the requested length. The appropriate requested length to use for benchmarking was determined by requesting discrete lengths (ranging from 8 to 16 aa) for the predicted peptides. The final mean peptide sizes are reported in Table 1. As expected, the average final lengths were higher than the requested lengths by 2 or 3 aa and increased with the requested length. According to the chosen method, the average final peptide lengths could be very different. For benchmarking, the prime and linker methods should use the same requested length, unless this lead to different peptide sizes, to allow their comparison and the evaluation of the linker contribution. For the evaluation, two requested lengths were chosen: 12 aa for the prime and linker methods and 16 for the graphbased methods because they lead to an average final peptide length close to the mean length of the epitopes in the dataset (16.7 aa).

Benchmarking
The 34 methods predicted a total of 119,277 peptides (i.e., about 3508 per method and 1590 per Ag), using the 75 protein Ags of the dataset (Additional file 1: Table S1).
The Se and the PPV parameters were used to evaluate the methods' performance ( Fig. 1). The Se evaluates the correctly predicted aas compared to the epitope: this is the proportion of peptide residues present also in the epitope. The PPV evaluates the correctly predicted aas compared to the predicted aas: this is the proportion of epitope residues in the predicted peptide. A perfectly accurate prediction would give Se and PPV values of 1. The mean Se ranged from 0.34 to 0.49, according to the method, and the mean PPV was a little higher (between 0.39 and 0.57) (Additional file 2: Figure S4). In the evaluation of discontinuous epitope prediction tools carried out by Ponomarenko & Bourne using the same dataset, the mean Se and PPV with the best method (ClusPro (DOT)) were 0.46 and 0.41, respectively [25], indicating that many of the PEPOP approaches are more efficient (17 methods have a better Se, 29 a better PPV and 17 both). This difference is explained by a larger epitope prediction size by ClusPro (DOT) while the peptides predicted by PEPOP are closer to a standard epitope size. The mean absolute deviation between the epitope and the prediction sizes of ClusPro (DOT) is 3,7 and of PEPOP is 2,5. Actually, in a next paragraph, we will see that peptides (so predictions) closer in size to the epitope performed better (Fig. 2). So, for a same number of correctly predicted aas, a greater prediction size can give more chance to the Se to be higher but it will decrease the PPV.
The Se and PPV mean values give a measure of the adequacy between the predicted peptide and the reference epitope, but they do not discriminate between methods. However, a researcher would wish to have a method that provides the highest possible number of peptides for the highest possible number of epitopes. To  know whether a method is more performing than another, (i.e. whether a given method can predict the best possible matching peptides for the widest possible range of epitopes), the Se and PPV minimal value (threshold) to consider a method as theoretically efficient must be determined.
To select an appropriate threshold, we studied the distribution of peptides relative to their Se and PPV ( Fig. 3 and Additional file 2: Figure S5). On average, a method predicted 7.6% of peptides with a Se and PPV above 0.6, 1.73% of peptides with a Se and PPV above 0.7 and 0.13% of peptides with a Se and PPV above 0.8 (Table 2), i.e., about 267, 60 and 4 peptides, respectively, based on the mean number of peptides predicted per method. We finally selected the 0.7 value as the threshold because it offers a good compromise between "quality" and quantity of predicted peptides.
We then calculated the proportion of peptides with a Se and a PPV higher than 0.7 for each method (Fig. 4, empty bars). Group of methods were roughly clustered around similar values. Indeed, 1.4 to 1.9% of peptides generated by using the prime methods had Se and PPV values above 0.7, whereas this percentage decreased to 0.7% for peptides designed with the ALA methods. Compared to the prime methods, the performance of the ALA methods decreased due to the beneficial effect of the addition of Alanine residues between segments on Se and its unfavorable effect on PPV. The SA methods were slightly more efficient than the prime methods, but not the SAS methods. This also was the result of a beneficial effect of the addition of aa linkers on Se and their negative effect on PPV. TSP-based methods, particularly TSPaa, were the most efficient as they generated the highest percentage of peptides with Se and PPV above 0.7.
We then calculated for how many Ags, a given method would generate peptides with a Se and a PPV higher than 0.7 (Fig. 5, empty circles) in order to know whether a method was efficient with different proteins. As before (see Fig. 4), methods from the same group showed similar performances and the most efficient were the TSP methods. Indeed, TSPaa, TSPnat3 and TSPrev4 targeted the highest number of Ags.

Influence of peptide length
As Wang and collaborators [37] showed that their performance classification was dependent on the epitope length, we studied the influence of the peptide length on the methods' performance. First, we determined the number of peptides with Se and PPV above 0.7, relative to the peptide-epitope size difference (Fig. 2). We found that peptides that were closer in size to the epitope performed better. We then analyzed the influence of the five requested peptide lengths (8, 10, 12, 14 and 16 aa) on the performance of the methods (Fig. 6). The peptide length had, as expected, no influence on the performance of the SHP and OPP methods (because the final peptides are identical whatever the requested peptide length) (Fig. 6c). It had a weak influence on the performance of the ALA methods (the final peptides are longer than the requested length, but they are only enriched in Ala residues) and on the SA and SAS methods (peptides are possibly enriched of several aa, and this may have an Fig. 2 Relationship between peptide performance and size similarity between epitope and peptide. Aa positions were taken into account unfavorable effect depending on the epitope composition). Conversely, the performance of the NN, NNu, ONN, FN, OFN and TSP methods progressively increased with the peptide length. Nevertheless, TSPaa remained the most performing method. These results also show that when selecting the PEPOP parameters, it is advisable to request a peptide length close to the epitope size, i.e. a number of aa of the peptide close to the average number of aa contained in an epitope (17aa).

Amino acid positions
The aa positions were not taken into account for Se and PPV computations because the nature of the aa and not their original position in the protein is important for Ab recognition. However, the closeness and the order of aa residues in the peptide could be important factors for protein mimicry by peptides. Thus, we computed again the Se and PPV values by taking into account the aa position in the predicted peptides compared to their position in the epitope. As expected, the Se and PPV values of all methods decreased when taking into account the aa positions (Fig. 4, black bars, and Fig. 5, black circles, and Additional file 2: Figure S4). All methods showed comparable mean PPV values, except for the SA and SAS methods (Additional file 2: Figure S4A). The mean Se values, when taking or not into account the aa positions, have similar profiles (Additional file 2: Figure  S4B). Despite the overall reduction in efficiency (i.e., proportion of peptides with both Se and PPV higher than 0.7) when taking into account the aa positions, methods followed the same tendency as the analysis that did not take into account the aa positions. TSPaa again was the most efficient method (Fig. 4). When calculating for how many Ags a given method would generate peptides with a Se and a PPV higher than 0.7 by taking into account the aa positions, the most efficient method was TSPrev1 instead of TSPnat3, but overall, the TSP methods still performed best (Fig. 5). On the other hand,

Comparison with SUPERFICIAL
We then compared our results with those obtained using SUPERFICIAL (Sfs), the only other available peptide design tool. This software predicted 143 discontinuous peptides from 30 Ags of the dataset, including 21 peptides from only one Ag. The proportion of peptides generated by SUPERFICIAL with Se and PPV values higher than 0.7 was about 0.7% (Fig. 4), a value similar to the one obtained with the lowest performing PEPOP methods. Moreover, SUPERFICIAL did generate peptides with Se and PPV values higher than 0.7 only for one Ag among the 75 proteins of the dataset (Fig. 5). Finally, SUPERFICIAL did not predict any peptide with Se and PPV higher than 0.7 when the aa positions were taken into account (Fig. 4). In conclusion, all PEPOP methods performed better than the only other available peptide design tool.

Comparison with chance
Finally, we compared the PEPOP methods to a method that predicts peptides by chance ( Table 2). The performance of the random method was comparable to that of the less efficient PEPOP methods (ALA methods). It predicted 0.8% of peptides with Se and PPV values higher than 0.7 for one Ag out of 3 (mean of 0.28% by considering all the Ags). When the aa positions were taken into account, the random method did not predict any efficient peptide (Fig. 4). These results show that peptides predicted by the PEPOP methods are not efficient only by chance. This was further confirmed by the probability to predict the most efficient peptide (10E-24). These results indicate that the PEPOP methods perform much better than chance.

Example
We then wanted to assess how well the most efficient peptides (best Se and PPV) designed by PEPOP represent the corresponding epitope on the 3D structure of its Ag (Fig. 7). Among all the PEPOP methods, we selected the peptide having the best Se and PPV without taking into account the aa positions (Fig. 7a) and the peptide having the best Se and PPV when taking into account the aa positions (Fig. 7b). The two peptides matched pretty well their epitope (all the epitopic aa are predicted in the peptides). Nonetheless, the Se and PPV values greatly changed depending on how they were calculated. For example, the first peptide has Se and PPV of respectively 1 and 0.864 but if the aa positions are taken into account, Se and PPV are of only 0.368 and 0.318 respectively. However, the peptide includes effectively the same nature of aa than the epitope. This example shows that what is important for the final peptide is the nature of the aa, not its position in the protein.
We then selected the two best peptides (highest Se and PPV) generated by TSPaa method when taking (Fig. 7d) or not (Fig. 7c) into account the aa positions. These two predicted peptides also match pretty well their epitope (only one epitopic aa is not predicted in the peptides). This example shows that even if TSPaa is the best performing method, it does not actually predict, for every Ag, the most efficient peptides since these two peptides have Se and PPV slightly lower than the two previous peptides. Hence, all the PEPOP methods should lead to efficient peptides.
In any case, the peptides contained a majority of epitopic aa and thus they should be recognized by the Ab, particularly because the arrangement between aa was optimized.
Discussion PEPOP (https://www.sys2diag.cnrs.fr/index.php?page=pepop) is a prediction tool of antigenic / immunogenic peptides. It was developed, not to predict epitopes, but to deliver series of peptides that should mimic epitopes, particularly discontinuous epitopes, which are the more common ones. It is more complex to predict peptides mimicking discontinuous than continuous epitopes because the aa order is not already defined. As enumerating all possible peptides would amount to solving a complex NP-complete problem and it would anyhow be impossible to test all of them computationally or experimentally, a limited enumeration must be defined. To this aim, PEPOP version 2.0 was improved by implementing 32 new methodologies that exploit different criteria, such as the distance between segments, their disposition in the peptide and their conformation relative to the protein Ag, to design discontinuous peptides that match as much as possible the Ag. The main principle is to find a path, an arrangement, between elements (segments or aa) of a defined area on the protein that will compose the final peptide.
The prime methods design a peptide from a reference segment and add to it its neighboring segments. Compared to the NN method, the FN method was developed to maintain the reference segment in the central position. The ONN, OFN and OPP methods search for the most natural path between segments by minimizing the traveled distance. The linker methods add (or not) aa between segments. The purpose of the ALA method is to keep in the peptide the same segment spacing of the Ag protein to allow the interacting aa of the Ab to establish contacts. The SA and SAS methods use the protein blocks (PBs) of a structural alphabet [38,39] to facilitate the adoption of the protein conformation. To bypass the NP-complete problem of enumerating all possible arrangements between the segments composing the peptide (n! permutations) in ONN, ONF and OPP methods, we used the graph theory. In the graph-based methods, the objective is to find the optimal path between segments or aa, as this should lead to peptides close to the native protein context.
Here, we evaluated the performance of the 34 methods included in PEPOP 2.0 by measuring the match between the peptide composition and that of known discontinuous epitopes and classified as efficient the methods that Fig. 6 Influence of the requested peptide length on the methods' performance. a Se and b PPV distribution according to the requested peptide length. c Proportion of peptides with Se and PPV > 0.7 based on the requested peptide length, from 8 (solid bars) to 16 (empty bars) aa with an increment of 2 at each step predicted the largest number of peptides with both Se and PPV values higher than 0.7 (efficient peptides). We found that the TSP-based methods, particularly TSPaa, TSPnat3 and TSPrev2, predicted the best matching peptides in most cases, although they did not lead to the best peptide (Fig. 7). TSPaa was the most efficient method in silico as it predicted the largest percentage of efficient peptides for the highest number of Ags. These methods performed better because the search of the optimal path using the TSP allows selecting the correct segment or aa (i.e., the segment or aa present in the epitope).
All PEPOP methods, except OPPala, were more successful than SUPERFICIAL and more efficient than or as much as the method predicting peptides by chance.
Benchmarking of different computational methods must be done with precaution as the tools, datasets and metrics can be different from one analysis to the other, thus not allowing objective comparisons. Even the definition of epitope can be different. Indeed, some consider the part of the Ag recognized by one Ab as an epitope on its own, whereas others consider to be an entire epitope all the aa found to interact with any Ab [16]. Moreover, some authors think that proteins have only one or few epitopes on the surface [40], whereas others see a protein as a mosaic of epitopes [26]. Finally, because all the possible epitopes could not be discovered, only a few of the features that characterize epitopes are used when trying to discriminate between epitopic and non-epitopic aa. This could at least partially explain why all epitope prediction tools show a weak performance [22,25]. And, we believe that an epitope cannot be faithfully predicted without taking into account its Ab partner, because epitopes only exist through the interaction with their cognate Ab [41]. Thus, studies taking into account the Ab partner in predicting Ag epitopes are of particular interest [19][20][21]42] although, because of the poor availability of Ab data, they currently cannot be applied in high-throughput analyses.
To determine whether some of the 34 methods included in PEPOP version 2.0 for designing discontinuous peptides are more relevant than others, we carried out a benchmark process. To this aim we followed as much as possible the recommendations by Greenbaum and collaborators [43] for assessing the performances of epitope prediction methods, although PEPOP goal is slightly different from that of "classical" epitope prediction tools. We decided to use Se and PPV together to select the most efficient peptides, although they are threshold-dependent. Indeed, when used on their own, they do not provide a complete picture of the method performance. For instance, a peptide with a Se (number of epitopic aa included in the peptide) close to 1 could also contain many additional aa that might disturb its recognition by the Ab. Similarly, a peptide with a PPV (number of the peptide aa included in the epitope) close to 1 could contain not enough epitopic aa for Ab recognition. Considering a given Ag-Ab interaction, not all generated peptides will match the epitope because peptides come from the entire surface of the protein. However, a method can be considered efficient if it yields an elevated number of peptides that closely match the epitope (i.e., with both Se and PPV higher than 0.7 in our study).
Nevertheless, benchmarking under-evaluated the linkerbased methods. Indeed, even if a peptide generated using these methods included all the aa of the epitope, its PPV would be lower than the PPV of the same peptide without linker (e.g., a peptide designed using the ONNala, ONNsa or ONNsas method versus the same peptide generated using ONN). This despite the fact that the linker methods were developed to increase the performance, based on the hypothesis that, for the ALA linker methods, spacing the segments by linker aas would better mimic their real disposition on the protein and consequently facilitate the peptide recognition by the Ab. Similarly, the SA and SAS methods have been developped to favor the adoption by the PEPOP segments of the same conformation in the peptide as in the protein. Although this bias was compensated by a slightly higher Se, we feel that the principle on which these methods are based has been not perfectly appraised.
Molecular mimicry is still today poorly understood. It is also known that the 3D structure of peptides is important in the Ab-Ag interaction but it is very arduous to determine and predictions are still not sufficiently performant. So, it is difficult to predict which peptide compared to another will be recognized by a specific Ab even if they are both composed of the same key aa of the epitope. For the same reason, it is difficult to claim that one method is better than another one. Indeed, the good performances of one method in terms of Se and PPV do not ensure that the corresponding peptides will actually be recognized by an Ab. Similarly, we deliberately chosen not to elaborate a scoring function because it will not ensure to select the "best" peptides. Only their experimental evaluation can confirm the peptide reactivity. Moreover, we think bioinformatics predictions cannot be used as such and have to be always associated to experiments. Combining bioinformatics predictions and simple experimental methods can be an interesting alternative to expensive and time-consuming approaches. Indeed, the idea behind the PEPOP tool is that, due to the inherent difficulty to guess an epitope, it would be preferable to generate a comprehensive series of peptides that can be experimentally assessed to determine which ones are endowed with the properties of a functional epitope. The mean number of peptides predicted per Ag by the PEPOP methods was 1590. Experimentally testing the antigenicity of about 1500 peptides is feasible by techniques like peptide microarrays [44][45][46]. Thus, the specific epitopes of a given Ag could be identified by running all PEPOP methods, synthesizing the generated peptides and testing them in microarrays. We showed that 1.73% of the predicted peptides have a Se and a PPV above 0.7 which represent about 28 peptides. This is a low number but only one peptide is suffisiant to localize an epitope using SPOT method for example (Duarte C et al., A mimic of a discontinuous epitope from AaH II identified by combining wet and dry experiments: a new experimental methodology to localize discontinuous epitopes, in preparation). Conversely, the experimental validation of about 120,000 peptides (number of peptides designed by the PEPOP methods for the entire dataset) would require too much time and resources and probably would not be feasible for any future peptide design tool.
Testing the immunogenicity of at least 1500 peptides would be even worse. Due to these difficulties to realize systematic experimental validations, we believe PEPOP, and others similar tools, have to be seen as "test tubes" which will gradually be validated as studies will be developed, until a consensus satisfactory validation process is developed. Although some studies begin to explore this problem [47] the proposed benchmark is not applicable for all epitope prediction tools neither for all studies. Anyway, PEPOP has already been successfully used in several studies of different goals [27, 32-35, 48, 49].

Conclusion
In the workshop reported by Greenbaum et al., Dr. Van Regenmortel "emphasized the need to clarify the purpose of making a specific epitope prediction, and how this clarification could direct selection of the most appropriate prediction tool or development of a new tool, as needed". PEPOP has been or can be used for all the purposes where surrogate epitopes are needed, purposes such as those cited by Van Regenmortel, i.e. "seeking vaccine candidates" [49] or "replacing Ags in diagnostic immunoassays". It can also efficiently help in mapping epitopes [33][34][35] (Duarte C et al., A mimic of a discontinuous epitope from AaH II identified by combining wet and dry experiments: a new experimental methodology to localize discontinuous epitopes, in preparation; Abraham J-D et al., Combination of bioinformatics and experimental approaches to map the conformational epitope on GM-CSF, in preparation) and would be a very informative tool for understanding the rules of molecular mimicry, a very difficult [23,41,50] but promising research field as testified by the number of available studies [9,[51][52][53][54] and tools [30,55,56]. PEPOP could also help characterizing all new proteins discovered by high-throughput technologies, such as proteomics [57,58], by facilitating their manipulation.

Structural data
The dataset of 165 X-ray-determined epitopes was from Ponomarenko & Bourne [25]. To avoid bias caused by the over-or under-representation of an epitope described by several 3D structures of the same Ag-Ab complex, epitope redundancy was eliminated by keeping only one crystallography of a given Ag-Ab complex. Therefore, 90 Ag-Ab complexes were rejected. The final dataset was of 75 unique Ag-Ab complexes (Additional file 1: Table S1).
The epitope size varied from 4 to 23 residues with only one exception (52 aa). The average size of an epitope was 16.7 aa (median: 17 aa). Epitopes were all discontinuous and were composed of 3 to 14 segments, each containing 1 to 12 contiguous aa. An epitope contained on average 7 to 8 segments of 2.38 aa. These data are in accordance with the literature [54,59,60].

Epitope definition
An epitope was defined as a series of aa included in the protein Ag. These aa contained at least one atom that establishes a contact (i.e., a distance threshold lower than or equal to 4 Å) with an atom from the Ab.

PEPOP methods
Peptides that mimic the discontinuous epitopes of the dataset were designed using the different PEPOP methods (Fig. 8).
To build a peptide, the PEPOP algorithm concatenates either segment sequences (a continuous stretch of surfaceaccessible aa) or single surface-accessible aa from the Ag 3D structure. Based on Euclidian distances, the PEPOP methods first select the neighboring segments or aa and then determine in which order assemble them to form the final linear peptide sequence supposed to mimic the discontinuous epitope. In three-dimensional space, the Euclidian distance between points a and b is: a) In the prime methods ( Fig. 8 and Additional file 2: Figure S1) neighboring segments are collected around a reference segment (starting segment). Therefore, starting from the reference segment and until a defined peptide length is reached, the methods concatenate the segments as follows: the nearest neighbor (NN) method adds the sequence of the nearest neighbor segment Cterminally to the forming peptide; the upset nearest neighbor (uNN) method adds the sequence of the nearest neighbor segment Cterminally in the natural or the reverse sense according to the distance of the C-terminus of the forming peptide; the flanking nearest neighbor (FN) method adds the sequence of the nearest neighbor segment in turn C-terminally and N-terminally to build the peptide.
More sophisticated prime methods are directly derived from these three firsts methods and are used to determine the optimized path between segments, i.e. in which order assemble them, by enumerating all possible arrangements. The sums of the distances between segments are then calculated. The optimized path corresponds to the arrangement with the shortest total distance. No extra aa is added. Thus, the optimized path can be calculated by using: the optimized nearest neighbor (ONN) method for segments found using the NN method. the optimized flanking nearest neighbor (OFN) method: for segments found with the FN method. the optimized patched segments path (OPP) method: for the set of segments present in a 10 Åradius patch. b) Linker methods. As in the prime methods no intermediate aa is added, 16 linker methods were then derived from these methods to add extra aa ( Fig. 8 and Additional file 2: Figure S1) between segments generated by one of the prime methods (NN, ONN, FN, OFN, OPP): the ALA linker methods (NNala, uNNala, ONNala, FNala, OFNala, OPPala) add an alanine linker, as many times as the distance between segments allows the insertion of a peptide bond.
Alanine is often considered as the most average aa in terms of length, volume and polarity. the structural alphabet-based linker (SA) methods (NNsa, ONNsa, FNsa, OFNsa, OPPsa) add zero, one or two aa as linkers. Protein blocks (PBs) [38,39] form a library of 16 small protein fragments of five residues in length that can approximate every part of a protein structure. PBs are overlapping, so each PB is followed by a limited number of PBs (i.e., some specific transitions exist between PBs). First, the transitions between segments are verified in the segments transcribed into PBs, based on the protein 3D structure. According to the PB transition matrix [61], if the transition between the last PB of a segment and the first PB of the following segment is allowed, no aa is added between these segments. If the transition is not allowed, a PB is virtually added by searching the one leading to the best PB transition. Adding a PB means adding an aa. The most favorable aa is determined from data calculated from the PDB file that reports each aa statistical preferences for each positions in each PB [38]. If a transition cannot be found, the process is repeated by adding two PBs. If also this does not work, the peptide is not possible. the structural alphabet superposition-based linker (SAS) methods (NNsas, ONNsas, FNsas, OFNsas, OPPsas) add one aa as linker according to the structural superposition of the segment using the structural alphabet approach to facilitate the peptide folding in the same fold as the corresponding fragments in the protein.
c) Graph-based methods (Fig. 8) use the graph theory to model a given protein, its segments and its aa, in order to find the neighboring segments or aa. They employ three different graphs where edges are weighted by Euclidian distances. The first graph ("natural" graph) is oriented. The nodes are protein segments and can only be added in their natural sense, from N-terminus to C-terminus. The second graph ("reversed" graph) is the non-oriented version of the previous one. The third graph ("aa" graph) is non-oriented and the nodes are surface-accessible aa instead of segments. Two algorithms used these graph to find the optimal path, i.e. in which order assemble the segments or aa: SHortest Path (SHP)-based methods (three methods): from a set (i.e., a cluster or a patch, see definitions below) of elements (segments or aa), a peptide is the shortest path between two elements that include most aa residues. The SHPnat method uses the "natural" graph, SHPrev the "reversed" graph and SHPaa the "aa" graph. Traveling Salesman Problem (TSP) based methods (nine methods): from a set (cluster or patch) of elements (segments or aa), the TSP algorithm is used to find the optimal path (shortest distance) between elements. The TSPnat methods use the "natural" graph, the TSPrev methods use the "reversed" graph and the TSPaa method uses the "aa" graph. From the optimal path between the elements defined by the TSP algorithm, all possible peptides of the requested length are computed. The final peptide, identified using the: ▪ TSPnat1 and TSPrev1 methods, is the peptide with the highest score (see "peptide scoring") ▪ TSPnat2 and TSPrev2 methods, is the peptide with the shortest traveled distance ▪ TSPnat3 and TSPrev3 methods, is the peptide for which the traveled distance according to the number of segments of the peptide is the shortest ▪ TSPnat4 and TSPrev4 methods, is the peptide that includes the two closest segments ▪ TSPaa method, is the peptide for which the traveled distance is the shortest.
The protein area used in the SHPnat, SHPrev, TSPnat and TSPrev methods is a cluster or a 15 Å-radius patch (see definitions below). The area used in the SHPaa and TSPaa methods is a varying patch.

Definition of cluster and patch
In PEPOP, clusters are segments grouped according to their spatial distances. They are calculated using Kitsch from the PHYLIP package v3.67 [62], as previously described [27]. PEPOP uses three types of patches. The 10 Å-and 15 Å-radius patches gather segments within a fixed distance, respectively 10 Å and 15 Å, from the center of gravity of a reference segment. The third patch type gathers the aa at a distance that varies from 15 to 20 Å from a reference aa: the final radius is the one in which the average number of aa between radius 15, 16, 17, 18, 19 and 20 Å is collected.
As each segment is used in turn to define a patch, the number of 10 Å-and 15 Å-radius patches is equal to the number of segments (Additional file 2: Figure S3). The number of varying patches is equal to the number of accessible aa because each aa is used in turn to define a varying patch.

Peptide scoring
In PEPOP, the score of a peptide is the sum of the scores of the segments composing the peptide [27]: where Sp is the peptide score, Ss the segment score, Naa the number of aas composing the segment, Naccess the average accessibility of the segment, Nhyp the number of hydrophobic aas, Nwryp the number of specific aas (W, R, Y or P) and Nturn the number of aas involved in a β-turn.

Peptide predictions
Depending on the PEPOP method, each segment or surface-accessible aa of a protein Ag is used as a reference to design a peptide.
In the prime (NN, uNN, ONN, FN, OFN, OPP methods) and linker methods (ALA, SA and SAS methods), peptides are predicted from each segment defined by PEPOP. The number of peptides generated by these methods corresponds to the number of segments.
In the graph-based methods that model protein segments (SHPnat, SHPrev, TSPnat1, TSPnat2, TSPnat3,  TSPnat4, TSPrev1, TSPrev2, TSPrev3 and TSPrev4), peptides are predicted in PEPOP clusters and in 15 Å-radius patches. As each protein segment is successively considered as a reference segment, there are as many patches as segments. The number of peptides predicted by these methods corresponds to the number of clusters plus the number of segments.
In the graph-based methods that model the surfaceaccessible aa of the protein (SHPaa and TSPaa), peptides are predicted in PEPOP clusters and in varying patches. The number of aa is computed for all radii between 15 and 20 Å (1 Å increment per step), and the radius leading to the average number of aa defines the final patch. The number of peptides predicted by these methods corresponds to the number of clusters plus the number of segments.
In each method, redundant peptide sequences are eliminated; however, two different methods can predict the same peptide sequence.

Performance evaluation metrics
The capacity of each peptide generated by a given PEPOP method to mimic the epitope described in the reference dataset for that protein was evaluated using two criteria: the sensitivity (Se) and the positive predictive value (PPV). Figure 1 gives the definition of Se and PPV and an example of their calculations without and by taking into account the aa positions. Se represents the proportion of epitope aa present in the peptide, whereas PPV is the proportion of peptide aa present in the epitope.
The aa nature or the aa position in the peptide and reference epitope was then compared by not taking and by taking into account the aa positions of the protein.
The aa used in linker methods (ALA, SA and SAS) were considered in the evaluation that takes into account the aa positions only after all the other aa of the peptide were compared with the epitopic aa. We chose to take into account the supplementary aa, because otherwise it would have amounted to evaluate again the results of the prime methods. Indeed, the only difference between prime and linker methods is the aa that are added between segments and that do not correspond to any position in the protein.
To measure the correlation between performance and size between peptides and epitopes, the absolute value was calculated.

Random method
Peptides were designed as sequences of randomly selected aa according to the protein aa composition. The peptide length was randomly computed according to the distribution of peptide lengths designed by PEPOP using the dataset of 75 Ags. The number of peptides was randomly chosen according to the number of peptides designed by each PEPOP method.

Probability
If X is a surface-accessible aa (alanine, cysteine, …, tyrosine), n X and p X represent the number of occurrences of X in the protein and the peptide, respectively. The probability to obtain a specific peptide sequence by chance is thus given by the following formula: A P A n A Â A P c n c Â A P D n D ⋯A P Y n Y A P n with: where n = n A + n C + n P + ⋯ + n Y is the number of surface-accessible aa in the protein and P = P A + P C + P P + ⋯P Y⋅ the number of aa in the peptide.

Superficial
The aim of SUPERFICIAL [36] is to design peptides that mimic regions at the surface of a given protein, starting from its 3D structure. SUPERFICIAL first computes the surface-accessibility of each aa and then builds segments as surface-accessible and contiguous aa sequences. Peptides can be made of several segments close in space, linked together in order to conserve the local conformation of the targeted protein surface. SUPERFICIAL finds the linkers by calculating the number (not the type) of aa needed to link two segments, based on the distances and angles between their C-and N-termini.