Identification of CD8+ T cell epitopes through proteasome cleavage site predictions
BMC Bioinformatics volume 21, Article number: 484 (2020)
We previously introduced PCPS (Proteasome Cleavage Prediction Server), a web-based tool to predict proteasome cleavage sites using n-grams. Here, we evaluated the ability of PCPS immunoproteasome cleavage model to discriminate CD8+ T cell epitopes.
We first assembled an epitope dataset consisting of 844 unique virus-specific CD8+ T cell epitopes and their source proteins. We then analyzed cleavage predictions by PCPS immunoproteasome cleavage model on this dataset and compared them with those provided by a related method implemented by NetChop web server. PCPS was clearly superior to NetChop in term of sensitivity (0.89 vs. 0.79) but somewhat inferior with regard to specificity (0.55 vs. 0.60). Judging by the Mathew’s Correlation Coefficient, PCPS predictions were overall superior to those provided by NetChop (0.46 vs. 0.39). We next analyzed the power of C-terminal cleavage predictions provided by the same PCPS model to discriminate CD8+ T cell epitopes, finding that they could be discriminated from random peptides with an accuracy of 0.74. Following these results, we tuned the PCPS web server to predict CD8+ T cell epitopes and predicted the entire SARS-CoV-2 epitope space.
We report an improved version of PCPS named iPCPS for predicting proteasome cleavage sites and peptides with CD8+ T cell epitope features. iPCPS is available for free public use at https://imed.med.ucm.es/Tools/pcps/.
Proteasomes are multicatalytic protease complexes that play a central role in cellular protein homeostasis by degrading damaged and misfolded proteins [1,2,3]. Within the cell, the majority of proteins destined to degradation are marked with ubiquitin and delivered to the proteasome, which cut them into peptide fragments that are then easily degraded by other proteases up to recoverably amino acids . Proteasomes can also degrade proteins through ubiquitin-independent pathways  and, in vertebrates, are key components of the class I antigen presentation pathway . Degradation of intracellular proteins by proteasomes produces peptides that can eventually bind to major histocompatibility complex (MHC I) molecules and be presented to CD8+ T cells. Moreover, it has been shown that the C-terminus of peptides presented by MHC I molecules results from proteasome cleavage .
The essential role of the proteasomes in class I antigen presentation has been elucidated in mammals, where two main forms of the proteasome exist, the constitutive proteasome and the immunoproteasome . Most cells generally express the constitutive or standard proteasome, but under proinflammatory stimuli they switch to express the immunoproteasome [9, 10]. The immunoproteasome is also constitutively expressed by some immune cells and in particular by dendritic cells, professional antigen presenting cells responsible for priming T cells . In the immunoproteasome, the catalytic β1, β2 and β5 subunits of the standard proteasome, are replaced by the subunits β1i (Low-molecular mass protein-2, LMP2), β2i (Multicatalytic endopeptidase complex-like 1, MECL-1) and β5i (Low-molecular mass protein-7, LMP7), respectively . As a result, proteasomes and immunoproteasomes cleave proteins at different sites . In particular, the immunoproteasome does not cut after acidic residues and have a higher cleavage preference after hydrophobic and basic residues. Overall, the proteolytic activity of the immunoproteasome is optimized to provide peptide antigens for presentation by MHC I molecules and most CD8+ T cell epitopes result from immunoproteasome cleavage .
Given that proteasomes/immunoproteasomes determine the repertoire of CD8+ T cell epitopes, researchers have developed different approaches to predict proteasome cleavage sites . Since the C-terminus of the peptides presented by MHC I molecules correspond to the P1 residue of the cleavage site, we and others have produced models to predict proteasome cleavage sites that are trained on datasets consisting of MHC I peptide ligands and their C-terminal flanking regions [16,17,18]. The task of modeling cleavage sites resemble that of modeling grammatical rules and we specifically used n-grams to model and predict immunoproteasome cleavage sites . Moreover, we developed a web-based tool, PCPS (Proteasome Cleavage Prediction Server), implementing these n-grams models for free public use at https://imed.med.ucm.es/Tools/pcps/.
Proteasomal cleavage site predictions serve to enhance CD8+ T cell epitope discrimination when combined with peptide-MHC I binding predictions [16, 19, 20]. However, here we sought to analyze if cleavage predictions provided by PCPS immunoproteasome models could alone serve to predict CD8+ T cell epitopes. Using a dataset of 844 virus-specific CD8+ T cell epitopes, we discriminated epitopes from random peptides with an accuracy of 0.74 or more, regardless of the presenting MHC I molecules. Following these results, we have enhanced PCPS to a new version named iPCPS, enabling CD8+ T cell epitope prediction and we applied the tool to identify the entire Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) epitome.
Results and discussion
Evaluation of cleavage site predictions by PCPS immunoproteasome model
The C-terminus of most peptides recognized by CD8+ T cells in the context of MHC I molecules results from cleavage by the immunoproteasome . Prediction of immunoproteasome cleavage sites is thus relevant for T cell epitope vaccine design and we previously reported n-grams models for such a task that were implemented in PCPS online tool. In cross-validation, PCPS immunoproteasome n-grams reached an MCC of 0.47. Here, we evaluated cleavage predictions by the default PCPS immunoproteasome model using a larger independent dataset consisting of 257 proteins encompassing 844 9-mer virus-specific CD8+ T cell epitopes (see Additional file 1). These CD8+ T cell epitopes were obtained from the IEDB database  and were reported to be recognized by humans during the course of a viral infection (details in "Methods"). CD8+ T cells are primed by peptide antigens processed and presented by dendritic cells which express the immunoproteasome. Therefore, all the selected epitopes ought to be generated by the immunoproteasome [23, 24].
To evaluate cleavage site prediction on this dataset we followed the assumption that cleavage is more likely to occur in the C-terminus of the epitopes than in any other internal cleavage site . Under this assumption, CD8+ T cell epitopes with internal cleavage sites with a score above that of the C-terminus were considered false positives (details in "Methods"). We also carried out the same analysis using the immunoproteasome model of NetChop, which is often considered a reference tool for immunoproteasome cleavage site predictions [17, 25]. In both methods, we used a default score of 0.5 as the threshold to define cleavage sites.
According to the results, summarized in Table 1, the specificity of the predictions by PCPS was somewhat lower than that of NetChop (0.55 vs. 0.60, respectively). However, the sensitivity of PCPS predictions was clearly superior to those obtained with NetChop (0.89 vs. 0.79, respectively). Moreover, judging by Mathews Correlation Coefficient, PCPS performance was overall better than that of NetChop (0.46 vs. 0.39, respectively). It is worth noting that we used the original PCPS n-gram models while NetChop models have been retrained in increasingly larger datasets (current version is 3.1), indicating that we already have enough data to capture proteasome cleavage sites.
Discrimination of CD8+ T cell epitopes by C-terminal cleavage predictions
CD8+ T cells only recognize peptides presented by MHC I molecules and prediction of peptide-MHC I binding is therefore the main basis for anticipating CD8+ T cell epitopes . Proteasomal cleavage is typically used in combination with MHC I-binding predictions to enhance CD8+ T cell epitope discrimination . In particular, it has been shown that combining PCPS cleavage predictions with MHC I binding predictions reduces the number of false positives around a 70% . Here we analyzed if cleavage predictions alone could serve to predict CD8+ T cell epitopes. To that end, we tested the ability of the PCPS immunoproteasome model to distinguish our set of 844 CD8+ T cell epitopes by cleavage at their C-terminus in their source proteins with regard to random peptides generated from the same proteins. The results obtained at different cleavage site score thresholds are summarized in Fig. 1. The epitopes were predicted with an accuracy of over 0.70 for thresholds from 0.35 to 0.55, reaching a top accuracy of 0.74 ± 0.01 at the 0.5 threshold. The sensitivity and specificity of the predictions at this same threshold were 0.89 ± 0.01 and 0.60 ± 0.01, respectively. Note that these results were obtained without considering the restriction elements of the CD8+ T cell epitopes.
We also analyzed the epitope prediction results obtained with the best threshold (0.5) with regard to the human leukocyte antigen class I (HLA I) molecules known to restrict the CD8+ T cell responses (see Additional file 1). Thus, we selected the CD8+ T cell epitopes restricted by HLA-A*02:01, HLA-A*11:01, HLA-A*24:02, HLA-B*07:02 and HLA-B*08:01 and computed the performance separately. HLA I molecules are highly polymorphic  and the selected HLA I molecules are frequently expressed in the world population. As shown in Fig. 2, the selected CD8+ T cell epitopes could be predicted with an accuracy that ranged from 0.70 ± 0.02 for those restricted by HLA-A*11:01 and 0.78 ± 0.03 for those restricted by HLA-B*07:02.
In sum, our results clearly show that the prediction of C-terminal cleavage sites in peptides by PCPS immunoproteasome model alone can serve to predict CD8+ T cell epitopes covering all HLA I molecules, which is of particular relevance when no peptides are available for developing HLA I-binding models. Following these results, we have enhanced PCPS to enable CD8+ T cell epitope predictions. In the next section we illustrate the usage of the improved version of PCPS, which we renamed as iPCPS.
iPCPS usage and SARS-CoV-2 epitome analysis
PCPS was initially developed to predict proteasome cleavage sites in amino acid sequences using n-grams. This option is still available in iPCPS (Fig. 3a), but the new version, can also return all the peptides (length selected by users) with a C-terminus compatible with immunoproteasome cleavage (Fig. 3b). We recommend using the default length of 9 residues, as this is the most common size of peptides presented by MHC I molecules. It is known that immunoproteasomes can also destroy potential CD8+ T cell epitopes . Therefore, we implemented the possibility of discarding from the output those peptides with internal cleavage sites. This feature can also lead to the loss of true CD8+ T cell epitopes. Users can set both the cleavage site score thresholds for C-terminal and internal cleavage sites predictions. By default, the threshold for internal cleavage sites is set at 0.65, higher than that for C-terminal cleavage sites to minimize the loss of bona fide CD8+ T cell epitopes.
iPCPS also includes n-grams models for the constitutive proteasome which users can select instead of or in combination with immunoproteasome models. Constitutive proteasomes also have a role in the generation of the repertoire of peptides that can be presented by MHC I molecules, particularly in non-immune cells targeted by already primed CD8+ T cells . Proteasome models available on iPCPS were trained on self- peptides eluted from human MHC I ligands instead of on CD8+ T cell epitopes . When users combine immunoproteasome and proteasome models, iPCPS will return the repertory of peptides that have a C-terminus compatible with cleavage by both, the immunoproteasome and the proteasome. Protective CD8+ T cell epitopes are thought to be generated by both types of proteasome . If user selects the option “Discard peptides with internal cleavage sites” with the combination of immunoproteasome and proteasome models, iPCPS will return those peptides without internal cleavage sites by both models.
To illustrate the full potential of iPCPS, we predicted CD8+ T cell epitopes in SARS-CoV-2 (ACN: NC_045512.2) with different settings and compared them with those predicted by Grifoni et al.  for 12 common HLA I molecules (Fig. 4). We only focused on peptides with a size of 9 residues. The complete epitope space of SARS-CoV-2 encompassing all 9-mer peptides is 9757. Using the iPCPS immunoproteasome model with default settings, we identified 4486 peptides with a C-terminus compatible with immunoproteasomal processing. This set of peptides contain epitopes restricted by all HLA I molecules and actually includes 96% of the epitopes predicted by Grifoni et al. . Note that there are hundreds of HLA I molecules which actually exhibit distinct peptide binding specificities . If we discard those peptides with internal cleavage sites, the set of predicted CD8+ T cell epitopes drop down to 1682 (Fig. 4a), and includes only 35% of the predicted CD8+ T cell epitopes after their MHC I binding (Fig. 4b). We get 3091 peptides when combining immunoproteasome and proteasome cleavage models (Fig. 4a), encompassing 73% of the predicted CD8+ T cell epitopes (Fig. 4b). If we again discard those peptides with internal cleavage sites we obtain 437 CD8+ T cell epitopes (Fig. 4a), including only 8% of the CD8+ T cell epitopes resulting from HLA I binding predictions (Fig. 4b). Despite we are analyzing iPCPS results on predicted CD8+ T cell epitopes and not actual epitopes, our comparison indicates that, overall, the best way for predicting CD8+ T cell epitopes in iPCPS is using immunoproteasome models alone or in combination with proteasome models. Discarding peptides with internal cleavage sites can lead to a great loss of bona fide CD8+ T cell epitopes, likely reflecting that proteasomes are quite unspecific, as their main role is the complete degradation of intracellular proteins rather than the generation of CD8+ T cell epitopes. Nonetheless, the option of discarding peptides with internal cleavage sites may be useful with large proteomes as it narrows down the number of potential epitopes for experimental scrutiny. CD8+ T cell epitopes predicted with iPCPS using different settings are provided in Additional file 2.
We describe an improved version of PCPS named iPCPS for predicting proteasome cleavage sites and peptides with CD8+ T cell epitope features, as depicted in Fig. 5. To our knowledge, iPCPS is the only tool for predicting proteasome cleavage sites with such capability. iPCPS is available for free public use at https://imed.med.ucm.es/Tools/pcps/.
Epitopes and sequences
We obtained virus-specific CD8+ T cell epitopes from the Immune Epitope Database (IEDB) . We limited our search to epitope peptides from virus that gave positive T cell responses in the course of a natural infection in humans and were restricted by MHC I molecules. Subsequently, we selected those epitopes with a length of 9 residues, the optimal for MHC I binding, and those whose entire sequence matched exactly with that of relevant source proteins. Amino acid sequences in FASTA format of epitope source proteins were obtained from UNIPROT (https://www.uniprot.org/) after the accession numbers (ACN) provided by IEDB.
Prediction of proteasome cleavage sites
We predicted proteasome cleavage sites using PCPS . PCPS implements n-grams models that were trained on fragments obtained upon peptides eluded from human MHC I molecules (proteasome models) and upon naturally restricted CD8+ T cell epitopes (immunoproteasome models). In this work, we selected the default immunoproteasome model, which was specifically trained on 8 residue peptide fragments, each including a cleavage site (P1 and P1′ residues). Peptide fragments included the 4 last residues of CD8+ T cell epitopes followed by 4 proximal residues that flank the C-terminus of the epitopes in the relevant source proteins. We compared PCPS cleavage site predictions with those provided by NetChop (https://www.cbs.dtu.dk/services/NetChop/) . In NetChop, we selected the default C-term 3.0 model, equivalent to the PCPS immunoproteasome model. NetChop C-term 3.0 was trained on peptide fragments generated upon MHC I peptide ligands and provides the most accurate cleavage predictions .
Evaluation of proteasome cleavage site predictions
We evaluated the performance of PCPS and NetChop immunoproteasome models on sequences containing CD8+ T cell epitopes following the approach described elsewhere [16, 25] and considering a default cleavage threshold (Th) of 0.5. This approach assumes that cleavage after the epitope C-terminus has to be favored with regard to other cleavage sites within the epitope (internal cleavage site). Briefly, by evaluating cleavage scores (Cs) on each epitope residue, cleavage sites defined by epitopes were classified as follows:
True positive (TP): The C-terminus Cs is ≥ Th.
False Positive (FP): At least one internal Cs is ≥ Th and ≥ than that of the C-terminus.
True Negative (TN): All internal Cs are < Th or < than that of the C-terminus.
False Negative (FN): The C-terminus Cs is < Th.
Evaluation of CD8+ T cell epitope prediction
The capacity of C-terminal cleavage predictions provided by PCPS immunoproteasome model to anticipate CD8+ T cell epitopes was analyzed by determining their ability to discriminate viral CD8+ T cell epitopes from random 9-mer peptides. Random peptides were considered as non-epitopes and were selected randomly from the same protein sources than the CD8+ T cell epitopes at a 1:1 ratio. Cleavage sites predictions were carried over the protein sources containing both epitopes and randomly selected peptides. TPs, TNs, FPs and FNs were obtained at different cleavage thresholds (0.35, 0.40, 0.45, 0.5, 0.55 and 0.60) by examining the cleavage scores after the C-terminal residue of both CD8+ T cell epitopes (positive instances) and random peptides (negative instances). CD8+ T cell epitopes with a cleavage score above threshold were TPs and with cleavage scores below threshold were FNs. Conversely, random 9-mer peptides with a cleavage score below threshold were TNs and those with cleavage scores above threshold were FPs. The performance of the predictions was subsequently assessed by computing SE, SP and ACC, using Eqs. 1, 2 and 3. These analyses were repeated five times, each time selecting different random peptides, obtaining average performance values with standard deviations.
Availability of data and materials
Authors confirm that all relevant data are included in the article and/or its supplementary information files.
- HLA I:
Human leukocyte antigen class I
Low-molecular mass protein-2
Low-molecular mass protein-7
Multicatalytic endopeptidase complex-like 1
Severe acute respiratory syndrome coronavirus 2
Noormohammadi A, Calculli G, Gutierrez-Garcia R, Khodakarami A, Koyuncu S, Vilchez D. Mechanisms of protein homeostasis (proteostasis) maintain stem cell identity in mammalian pluripotent stem cells. Cell Mol Life Sci. 2018;75(2):275–90.
Kumar Deshmukh F, Yaffe D, Olshina MA, Ben-Nissan G, Sharon M. The contribution of the 20S proteasome to proteostasis. Biomolecules 2019, 9(5).
Kudriaeva AA, Belogurov AA. Proteasome: a nanomachinery of creative destruction. Biochemistry (Mosc). 2019;84(Suppl 1):S159–92.
Lander GC, Estrin E, Matyskiela ME, Bashore C, Nogales E, Martin A. Complete subunit architecture of the proteasome regulatory particle. Nature. 2012;482(7384):186–91.
Jariel-Encontre I, Bossis G, Piechaczyk M. Ubiquitin-independent degradation of proteins by the proteasome. Biochim Biophys Acta. 2008;1786(2):153–77.
Kloetzel PM. Antigen processing by the proteasome. Nat Rev Mol Cell Biol. 2001;2(3):179–87.
Cascio P, Hilton C, Kisselev AF, Rock KL, Goldberg AL. 26S proteasomes and immunoproteasomes produce mainly N-extended versions of an antigenic peptide. EMBO J. 2001;20(10):2357–66.
Craiu A, Akopian T, Goldberg A, Rock KL. Two distinct proteolytic processes in the generation of a major histocompatibility complex class I-presented peptide. Proc Natl Acad Sci USA. 1997;94(20):10850–5.
Kimura H, Caturegli P, Takahashi M, Suzuki K. New insights into the function of the immunoproteasome in immune and nonimmune cells. J Immunol Res. 2015;2015:541984.
Ferrington DA, Gregerson DS. Immunoproteasomes: structure, function, and antigen presentation. Prog Mol Biol Transl Sci. 2012;109:75–112.
Guermonprez P, Valladeau J, Zitvogel L, Thery C, Amigorena S. Antigen presentation and T cell stimulation by dendritic cells. Annu Rev Immunol. 2002;20:621–67.
Groettrup M, Standera S, Stohwasser R, Kloetzel PM. The subunits MECL-1 and LMP2 are mutually required for incorporation into the 20S proteasome. Proc Natl Acad Sci USA. 1997;94(17):8970–5.
Dalet A, Stroobant V, Vigneron N, Van den Eynde BJ. Differences in the production of spliced antigenic peptides by the standard proteasome and the immunoproteasome. Eur J Immunol. 2011;41(1):39–46.
Basler M, Kirk CJ, Groettrup M. The immunoproteasome in antigen processing and other immunological functions. Curr Opin Immunol. 2013;25(1):74–80.
Sanchez-Trincado JL, Gomez-Perosanz M, Reche PA: Fundamentals and methods for T-and B-cell epitope prediction. J Immunol Res. 2017, 2017:2680160. doi:https://doi.org/10.1155/2017/2680160. Epub 2682017 Dec 2680128.
Diez-Rivero CM, Lafuente EM, Reche PA. Computational analysis and modeling of cleavage by the immunoproteasome and the constitutive proteasome. BMC Bioinform. 2010;11:479.
Kesmir C, Nussbaum AK, Schild H, Detours V, Brunak S. Prediction of proteasome cleavage motifs by neural networks. Protein Eng. 2002;15(4):287–96. https://doi.org/10.1093/protein/1015.1094.1287.
Bhasin M, Raghava GP. Pcleavage: an SVM based method for prediction of constitutive proteasome and immunoproteasome cleavage sites in antigenic sequences. Nucleic Acids Res. 2005, 33(Web Server issue):W202–W207.
Donnes P, Kohlbacher O. Integrated modeling of the major events in the MHC class I antigen processing pathway. Protein Sci. 2005;14(8):2132–40.
Tenzer S, Peters B, Bulik S, Schoor O, Lemmel C, Schatz MM, Kloetzel PM, Rammensee HG, Schild H, Holzhutter HG. Modeling the MHC class I pathway by combining predictions of proteasomal cleavage, TAP transport and MHC class I binding. Cell Mol Life Sci. 2005;62(9):1025–37.
Sijts EJ, Kloetzel PM. The role of the proteasome in the generation of MHC class I ligands and immune responses. Cell Mol Life Sci. 2011;68(9):1491–502.
Fleri W, Paul S, Dhanda SK, Mahajan S, Xu X, Peters B, Sette A. The immune epitope database and analysis resource in epitope discovery and synthetic vaccine design. Front Immunol. 2017;8:278.
Morel S, Levy F, Burlet-Schiltz O, Brasseur F, Probst-Kepper M, Peitrequin AL, Monsarrat B, Van Velthoven R, Cerottini JC, Boon T, et al. Processing of some antigens by the standard proteasome but not by the immunoproteasome results in poor presentation by dendritic cells. Immunity. 2000;12(1):107–17.
Chen W, Norbury CC, Cho Y, Yewdell JW, Bennink JR. Immunoproteasomes shape immunodominance hierarchies of antiviral CD8(+) T cells at the levels of T cell repertoire and presentation of viral antigens. J Exp Med. 2001;193(11):1319–26.
Saxova P, Buus S, Brunak S, Kesmir C. Predicting proteasomal cleavage sites: a comparison of available methods. Int Immunol. 2003;15(7):781–7.
Lafuente EM, Reche PA. Prediction of MHC-peptide binding: a systematic and comprehensive overview. Curr Pharm Des. 2009;15(28):3209–20.
Reche PA, Reinherz EL. Definition of MHC supertypes through clustering of MHC peptide-binding repertoires. Methods Mol Biol. 2007;409:163–73.
Chapiro J, Claverol S, Piette F, Ma W, Stroobant V, Guillaume B, Gairin JE, Morel S, Burlet-Schiltz O, Monsarrat B, et al. Destructive cleavage of antigenic peptides either by the immunoproteasome or by the standard proteasome results in differential antigen presentation. J Immunol. 2006;176(2):1053–61.
Hansen TH, Bouvier M. MHC class I antigen presentation: learning from viral evasion strategies. Nat Rev Immunol. 2009;9(7):503–13.
Dekhtiarenko I, Ratts RB, Blatnik R, Lee LN, Fischer S, Borkner L, Oduro JD, Marandu TF, Hoppe S, Ruzsics Z, et al. Peptide processing is critical for T-cell memory inflation and may be optimized to improve immune protection by CMV-based vaccine vectors. PLoS Pathog. 2016;12(12):e1006072.
Grifoni A, Sidney J, Zhang Y, Scheuermann RH, Peters B, Sette A. A sequence homology and bioinformatic approach can predict candidate targets for immune responses to SARS-CoV-2. Cell Host Microbe. 2020;27(4):671–80.
We thank the referees that reviewed this manuscript for their thoughtful and constructive comments.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 21 Supplement 17 2020: Selected papers from the 3rd International Workshop on Computational Methods for the Immune System Function (CMISF 2019). The full contents of the supplement are available at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-21-supplement-17.
The work was supported by grant BIO2014:54164-R from Spanish Department of Science to PAR. The funders had no role in study design, decision to publish or preparation of the manuscript. Publication costs funded by Complemento II-CM network (S2017/BMD-3673).
Ethics approval and consent to participate
Consent for publication
All the authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Gomez-Perosanz, M., Ras-Carmona, A., Lafuente, E.M. et al. Identification of CD8+ T cell epitopes through proteasome cleavage site predictions. BMC Bioinformatics 21 (Suppl 17), 484 (2020). https://doi.org/10.1186/s12859-020-03782-1