- Open Access
Understanding and predicting binding between human leukocyte antigens (HLAs) and peptides by network analysis
BMC Bioinformatics volume 16, Article number: S9 (2015)
As the major histocompatibility complex (MHC), human leukocyte antigens (HLAs) are one of the most polymorphic genes in humans. Patients carrying certain HLA alleles may develop adverse drug reactions (ADRs) after taking specific drugs. Peptides play an important role in HLA related ADRs as they are the necessary co-binders of HLAs with drugs. Many experimental data have been generated for understanding HLA-peptide binding. However, efficiently utilizing the data for understanding and accurately predicting HLA-peptide binding is challenging. Therefore, we developed a network analysis based method to understand and predict HLA-peptide binding.
Qualitative Class I HLA-peptide binding data were harvested and prepared from four major databases. An HLA-peptide binding network was constructed from this dataset and modules were identified by the fast greedy modularity optimization algorithm. To examine the significance of signals in the yielded models, the modularity was compared with the modularity values generated from 1,000 random networks. The peptides and HLAs in the modules were characterized by similarity analysis. The neighbor-edges based and unbiased leverage algorithm (Nebula) was developed for predicting HLA-peptide binding. Leave-one-out (LOO) validations and two-fold cross-validations were conducted to evaluate the performance of Nebula using the constructed HLA-peptide binding network.
Nine modules were identified from analyzing the HLA-peptide binding network with a highest modularity compared to all the random networks. Peptide length and functional side chains of amino acids at certain positions of the peptides were different among the modules. HLA sequences were module dependent to some extent. Nebula archived an overall prediction accuracy of 0.816 in the LOO validations and average accuracy of 0.795 in the two-fold cross-validations and outperformed the method reported in the literature.
Network analysis is a useful approach for analyzing large and sparse datasets such as the HLA-peptide binding dataset. The modules identified from the network analysis clustered peptides and HLAs with similar sequences and properties of amino acids. Nebula performed well in the predictions of HLA-peptide binding. We demonstrated that network analysis coupled with Nebula is an efficient approach to understand and predict HLA-peptide binding interactions and thus, could further our understanding of ADRs.
As the major histocompatibility complex (MHC) in humans, human leukocyte antigens (HLAs) are important immunologic proteins found on the surface of somatic cells . They can present antigenic peptides from the infectious agents to T-cells to induce immune responses [2–5]. People of different ethnicities or from different regions may carry distinct HLA variations or alleles [6, 7]. According to IMGT/HLA database  by Mar 15, 2015, there are more than 12,000 alleles identified for HLAs, making the HLAs as one of the most polymorphic genes in humans. HLA genes contain multiple loci including A-G. The HLA D locus is classified as Class II and the rest are categorized as Class I due to their differences of responding T-cells and functions [2–5]. Since the binding grooves of Class I HLAs are determined by one single chain and there are a lot of peptide binding data and three-dimensional structures available [8, 9], we selected Class I HLAs in this study to demonstrate the applicability of network analysis to predict HLA-peptide binding.
Patients carrying certain HLA alleles are more likely to develop adverse drug reactions (ADRs) after taking specific drugs. Drug-HLA associations have been identified between abacavir and HLA-B*57:01 [10–12], flucloxacillin and HLA-B*57:01 , and carbamazepine and HLA-B*15:02 , etc. Several mechanisms have been proposed to understand the HLA related ADRs, including the hapten concept, the super-antigen model, the p.i. concept, the altered repertoire model and the danger hypothesis [15–18]. In all the hypotheses except the danger hypothesis, the HLAs on the surface of antigen-presenting cells (APCs) present peptides to T-cell receptors (TCRs) on the surfaces of T-cells and the drug molecules interfere with the system through covalent binding to the peptides, instable interaction, or insertion into the binding grooves of HLAs. Ultimately, it is beneficial to predict ADR occurrences of drugs before patients take the drugs. However, the mechanisms for ADRs are complicated and each of the players in the system has a large number of variations in their structures, making it very challenging to study HLA related ADRs. Our previous molecular modeling study showed the drug-HLA binding prediction was improved when the binding peptide was incorporated in the modeling system . Therefore, better understanding and accurately predicting HLA-peptide binding could facilitate predicting ADRs related to genetic predisposition.
Various machine learning models have been developed to predict HLA-peptide binding for individual HLAs [20–22]. However, lacking enough experimental HLA-peptide binding data to train machine learning models for many HLAs limits the capability of this approach. In addition, a significant part of machine learning models uses parameters that are derived from peptides with the same length but experiments showed the same HLA can bind peptides with different lengths, making predicting HLA-peptide binding using these methods very challenging. New methods for accurately predicting HLA-peptide binding that overcome the challenges of the reported machine learning models are in urgent need. Therefore, in this study we conducted network analysis to understand the binding characteristics between HLAs and peptides and developed a new method named neighbor-edges based and unbiased leverage algorithm (Nebula) to predict HLA-peptide binding.
Through analyzing the HLA-peptide binding network, we identified nine modules that are densely connected regions in the network . Modularity is the measurement of goodness of a division of a network into modules  and was used to yield nine modules. The modularity of the real HLA-peptide binding network was compared to the modularity values yielded from random networks. Peptides and HLAs in the same modules shared similar properties. We further developed Nebula to predict HLA-peptide binding. To our best knowledge, this study is the first one to use network analysis for understanding and predicting HLA-peptide binding.
An overview of this study's workflow is shown in Figure 1. We first harvested qualitative Class I HLA-peptide binding data from four major databases that collected and curated HLA binding assays from the literature. The HLA-peptide binding network was then constructed from the harvested data. Thereafter, the fast greedy modularity optimization algorithm was used to identify modules. The modularity analysis on 1,000 randomly generated networks was conducted to verify that the modules yielded from the HLA-peptide binding network could not be generated by chance. Finally, we implemented Nebula to make predictions and evaluated its performance via leave-one-out (LOO) validations.
Data preparation and network construction
Four major databases, IEDB , SYFPEITHI , MHCBN  and AntiJen , contain HLA-peptide binding data curated from the literature. IEDB, MHCBN and AntiJen provided qualitative binding categories (positive or negative), while SYFPEITHI contains all positive bindings. For IEDB data, "positive-high", "positive-intermediate" and "positive-low" were all considered as positives. For AntiJen data, the "weak binders" are considered as negatives according to the paper's description .
We harvested the qualitative binding data of Class I HLA-peptide binding assays from the four databases by Aug 25, 2014 and combined all data into a single dataset. The dataset contains three columns: HLA, peptide and binding category (positive or negative). For an HLA-peptide pair with multiple entries in the databases, we calculated the proportion of positives. If the proportion is larger than or equal to 0.5, then it was stored as a unique record of positive (otherwise it was labelled as a negative). We removed the peptides and HLAs that contain only one binding datum for two reasons: 1) the datum may be in low quality because only the HLA or the peptide shows binding among such a large number of peptides or HLAs; and 2) it would not be able to be predicted by Nebula because no data for the HLAs or peptides could be used for the predictions in the LOO validations. The data filtering process was run through several iterations to make sure all the peptides and HLAs had more than one binding datum. To enable the calculations in the network analysis, we used 2 and 1 to represent positive and negative, respectively. Finally, the HLA-peptide binding network was constructed using the igraph package (version 0.7.1) in R 3.1.3.
Module identification and modularity analysis
We used the fast greedy modularity optimization algorithm by Clauset et al.  via the igraph package to identify modules within the HLA-peptide binding network. This algorithm is well-known for its advantage to fast detect modules from large networks . To examine whether the yielded modules really have binding characteristics for HLAs and peptides or could be generated by chance, we generated 1,000 random networks using three criteria: (1) the network topology, both nodes and edges, in the random networks remain the same as in the real HLA-peptide binding network; (2) only weights (positive or negative) were randomly shuffled, while keeping the same amount of positives and negatives; and (3) the modules from random networks were generated using the same algorithm and parameters. The modularity values  were then compared between the real HLA-peptide binding network and its randomly permutated networks.
Comparative analysis of modules
The modules yielded from modularity analysis of the HLA-peptide binding network were compared in terms of both the HLAs and the peptides. To compare the HLAs across modules, the available protein sequences of all the HLAs were first downloaded from the IMGT/HLA database . The HLA sequences were then aligned using the MUSCLE method in MEGA 5.2.1  with default parameters. Chelvanayagam  identified a uniform list of HLA residues that specifically interact with each amino acid (AA) position in 9-mer peptides. These residue numbers were given referring to the sequence of A*02:01 (PDB ID: 3HLA) . For each AA position, we extracted the corresponding residues from all the HLA sequences and put them together as position-specific pseudo-sequences. The pairwise sequence identities of the pseudo-sequences were calculated using Clustal Omega 1.2.0 .
Nebula was developed through modification of the collaborative filtering algorithm  and the network-based inference (NBI) method [33, 34]. The HLA-peptide binding data can be constructed into a weighted bipartite network, where an edge is drawn between HLA h i and peptide p x if there is a binding datum (positive or negative) between them. The edge weight is given by in equation (1).
If HLA h i and peptide p x do not have a binding datum, a prediction value for the edge between h i and p x can be calculated from the edges neighboring to the edge in prediction using equations (2-6).
is the average weight of all edges that connect to HLA h i , h j is a HLA that connects to peptide p x , and is the Pearson correlation coefficient between h i and h j and calculated using equation (3).
p a indicates a peptide that connects to both h i and h j . When h i and h j do not share a connected peptide, . Likewise, can be calculated using equation (4).
is the average weight of all edges that connect to peptide p x , p k is a peptide that connects to HLA h i , and is the Pearson correlation coefficient between p x and p k and calculated using equation (5).
h a is a HLA that connects to both p x and p k . When p x and p k do not share a connected HLA, .
Nebula treats the contributions from the edges connected to the two nodes of the edge equally in prediction, that is and . Therefore, the final prediction value between HLA h i and peptide p x as F(h i ,p x ) is calculated using equation (6).
F(h i ,p x ) is a continuous value which is converted into a categorical prediction value C(h i ,p x ) in Nebula using the unbiased leverage (UL) as presented by equation (7). Since we assigned the weights for positive binding as 2 and negative as 1, the UL was set to be 1.5.
Evaluation of Nebula performance
To evaluate the performance of Nebula, we used LOO validations. Each of the edges was taken out one at a time from the HLA-peptide binding network, and the remaining network was used to predict the weight of the taken-away edge. A receiver operating characteristic (ROC) curve was generated using the continuous final prediction values F(h i ,p x ) against the binding labels using AUC package in R (version 0.3.0). Sensitivity, specificity and accuracy were calculated by comparing the categorical prediction values C(h i ,p x ) against the labels determined from HLA-peptide binding assays. We did a similar evaluation for NBI method  as a comparison. The author of NBI, Dr. Feixiong Cheng, provided the NBI codes to us.
Two-fold cross-validations were also conducted to eliminate potential over-fitting from the LOO validations. Each time the entire HLA-peptide binding network was randomly divided into two even portions and each portion was used to predict HLA-peptide binding in the other portion. We ran 100 iterations and calculated the sensitivity, specificity, accuracy and area under the ROC curve (AUC) values to measure the performance of Nebula.
Results and discussion
After data pre-processing, we obtained 118,959 binding data points (39.6% positives and 60.4% negatives) between 18,630 peptides and 211 Class I HLAs for network construction and modularity analysis (Supplementary Table S1 in Additional file 1). Nine modules were identified from the HLA-peptide binding network using the fast greedy modularity optimization algorithm as shown in Figure 2a. A modularity value of 0.489 was found. The calculated results of the peptides and HLAs in the nine modules are given in Table 1. The sequences of the peptides and HLAs in the nine modules are listed in Supplementary Table S2 and S3, respectively, in Additional file 1.
Using the same modularity analysis algorithm, we analyzed the 1,000 randomly generated networks. The 1,000 modularity values were plotted as a histogram shown in Figure 2b. All the 1,000 modularity values are lower than the modularity yielded from the real HLA-peptide binding network (the red arrow in Figure 2b), indicating the nine modules harvested by the algorithm is not likely some result obtained by chance. In order to discover potential signals buried in the nine modules, we analyzed the peptide and HLA properties across the modules.
In this modularity comparison, we used very strict criteria to generate the random networks not even altering the topology of the original network. Another way is to generate random networks by reconnecting edges while keeping the same amount of nodes, positive and negative edges, which resulted in modularity values with even larger differences (data not shown).
Peptide analysis across the modules
The distributions of peptides and HLAs in the nine modules are listed in Table 1. For peptides with a specific length such as 8-mers, the peptide distribution across the modules is also shown in Table 1 by column with the column sum equal to 100%. Using 10% as a cutoff, we found 8-mer and 11-mer peptides are more likely to appear in modules 4-7, while 9-mers and 10-mers majorly exist in modules 1-4. Modules 1-3 and 5-7 show a higher specificity on peptide lengths while module 4 is a mixture of 8-mers, 9-mers and 11-mers. The results indicate peptides in different modules may have different binding interactions with HLAs.
Modules 1-7 are the major modules that contain more than 1,000 peptides and 9-mers are the majority of peptides (68.8%). We further analyzed 9-mer peptides across these modules to explore HLA-peptide binding characteristics of the modules. According to Hong et al. , the common 20 amino acids (AAs) can be categorized into 3 groups: (1) polar charged (Arg, Asp, Glu, His and Lys), (2) polar uncharged (Asn, Cys, Gln, Gly, Ser, Thr and Tyr), and (3) apolar (Ala, Ile, Leu, Met, Phe, Pro, Trp and Val). For each module, we categorized the AA residues within the 9-mer peptides into the three groups from Position 1 to 9. The result is shown in Figure 3.
We found the distributions of AA residue categories are similar across different modules for most positions. However, two positions showed very distinct characteristics across modules. For position 2 (P2), while module 4 contains 80.8% apolar AA residues and module 6 includes 93.2% apolar AA residues, module 7 has a dominant proportion of polar charged AA residues (up to 98.1%). For the last position (PΩ, or P9 for 9-mer peptides), while module 3 showed a majority of polar charged AA residues (79.0%), modules 4, 6 and 7 had 87.5%, 78.4% and 81.4% of apolar AA residues, respectively. Detailed information regarding AA residue distributions at Position 2 and 9 in each module compared to their overall distributions among all nine modules is attached in Supplementary Table S4 in Additional file 1. Similar results were also observed for other lengths of peptides (results not shown). Therefore, we found differences in peptide lengths and properties across the modules.
It has been reported that position 2 (P2) and, especially the last position (PΩ), of peptides have critical effects on drug binding inside the HLA binding grooves which may affect the occurrence of HLA related ADRs [16, 36]. Our network analysis differentiated peptides with specific amino acid properties into different modules and clustered those similar ones together in certain modules, which could be useful to further our understanding of HLA related ADR mechanisms.
HLA sequence analysis across the modules
Since Class I HLA alleles are highly similar, the sequences of their peptide-binding regions (residues 2 to 182, Chain A) aligned well without a single gap. These partial HLA sequences are given in Supplementary Table S3 in Additional file 1. To assess the sequential differences of HLAs across the modules, we plotted the available sequences in Figure 4 to highlight the residues that are different from the most frequent residue for each position. The different residues are colored according to the three AA categories mentioned above.
We observed some HLA sequence differences in the modules at certain sequence positions. For example, in module 7, the residues at position 24 and 67 are apolar and the residues at position 45 are polar charged, which showed a uniformity and difference against the rest of the modules. Since these three HLA residues are reported to interact with position 2 of 9-mer peptides , combining the peptide sequence analysis results that indicated the position 2 of 9-mer peptides in module 7 is dominant by polar charged residues (98.1%), we think the peptides and HLAs in the same modules are concordant to form specific binding patterns.
We also analyzed the identities of HLA pseudo-sequences that specifically interact with each AA position of 9-mer peptides, and the results for positions 2 and 9 are given in Supplementary Table S5 in Additional file 1. For positions 2 and 9, we found the pseudo-sequence identities within the modules are generally significantly higher (p < 0.05) than those between modules. Especially, for position 2 pseudo-sequences, module 7 had the highest average identity within the module and lowest average identity between the modules.
In summary, this study revealed that not only the peptides, but also HLA sequences showed more similarities and concordant properties within the modules than between the modules. Modularity analysis of the HLA-peptide binding network is helpful to understand HLA-peptide binding interactions that, in turn, could facilitate understanding of HLAs related ADRs.
Validations for Nebula
To evaluate the performance of Nebula, LOO validations were conducted in which each of the 118,959 HLA-peptide binding data was left out for prediction by the network constructed from the rest of the 118,958 binding data points. The results were plotted as a ROC curve shown in Figure 5a. AUC was calculated to be 0.868. The sensitivity, specificity and accuracy are 70.8%, 88.7% and 81.6%, respectively.
As a comparison, the performance of NBI for predicting HLA-peptide binding in the same dataset was evaluated using the same LOO validations. The results were given in Figure 5b, with a lower AUC of 0.799. The sensitivity, specificity and accuracy are 33.2%, 94.4% and 68.1%, respectively. The results indicated that Nebula generally outperformed NBI and holds a promising application in analyzing big and sparse datasets such the HLA-peptide binding dataset used in this study.
In order to reduce the potential over-fitting from the LOO validations, we conducted 100 iterations of two-fold cross-validations for Nebula. The results were shown in Figure 5c. The sensitivity, specificity, accuracy and AUC values are 69.2% ± 0.2%, 86.3% ± 0.2%, 79.5% ± 0.1% and 0.830 ± 0.001, respectively, slight lower than the LOO validations as expected, indicating over-fitting is not a big concern for Nebula.
Machine learning methods such as artificial neuron network (ANN)  and support vector machine (SVM)  were used for HLA-peptide binding predictions. However, most conventional machine learning methods have been applied for a limited number of HLAs and peptides of specific lengths because they require a large enough amount of data to train a reliable prediction model. Moreover, the independent variables used in most reported HLA-peptide binding prediction models were derived from peptides with a fixed length unless an extra process  was implemented to process peptides with different lengths. As demonstrated by the results, Nebula not only achieved good prediction accuracy, but also does not require a large amount of experimental data for an HLA allele or a fixed length for peptides.
We identified nine modules in the HLA-peptide binding network using the fast greedy modularity optimization algorithm. The modules showed distinct distributions and properties for both peptides and HLAs across the modules, indicating network analysis is a promising approach to understand structures and characteristics of big and sparse data. We developed Nebula for prediction based on network analysis and used HLA-peptide binding dataset as a case study to demonstrate it is reliable and practicable for big data analysis. Our results suggest that the network analysis methods such as Nebula are applicable and effective to interpret and predict large and sparse datasets such as the HLA-peptide binding dataset used in this study. We showed such methods could accurately predict HLA-peptide binding that, in turn, could improve predictions of HLA related ADRs to better implement precision medicine.
The findings and conclusions in this article have not been formally disseminated by the US Food and Drug Administration (FDA) and should not be construed to represent the FDA determination or policy.
adverse drug reaction
artificial neuron network
area under the ROC curve
human leukocyte antigen
Immune Epitope Database
major histocompatibility complex
neighbor-edges based and unbiased leverage algorithm
Protein Data Bank
receiver operating characteristic
support vector machine
Complete sequence and gene map of a human major histocompatibility complex. The MHC sequencing consortium. Nature. 1999, 401 (6756): 921-923. 10.1038/44853.
Bushkin Y, Demaria S, Le JM, Schwab R: Physical association between the CD8 and HLA class I molecules on the surface of activated human T lymphocytes. Proc Natl Acad Sci USA. 1988, 85 (11): 3985-3989. 10.1073/pnas.85.11.3985.
Spaggiari GM, Contini P, Carosio R, Arvigo M, Ghio M, Oddone D, et al: Soluble HLA class I molecules induce natural killer cell apoptosis through the engagement of CD8: evidence for a negative regulation exerted by members of the inhibitory receptor superfamily. Blood. 2002, 99 (5): 1706-1714. 10.1182/blood.V99.5.1706.
Mangalam A, Rodriguez M, David C: Role of MHC class II expressing CD4+ T cells in proteolipid protein(91-110)-induced EAE in HLA-DR3 transgenic mice. Eur J Immunol. 2006, 36 (12): 3356-3370. 10.1002/eji.200636217.
Poncet P, Arock M, David B: MHC class II-dependent activation of CD4+ T cell hybridomas by human mast cells through superantigen presentation. J Leukoc Biol. 1999, 66 (1): 105-112.
Robinson J, Halliwell JA, McWilliam H, Lopez R, Parham P, Marsh SG: The IMGT/HLA database. Nucleic Acids Res. 2013, 41 (Database issue): D1222-D1227.
Gonzalez-Galarza FF, Christmas S, Middleton D, Jones AR: Allele frequency net: a database and online repository for immune gene frequencies in worldwide populations. Nucleic Acids Res. 2011, 39 (Database issue): D913-D919.
Chelvanayagam G: A roadmap for HLA-A, HLA-B, and HLA-C peptide binding specificities. Immunogenetics. 1996, 45 (1): 15-26. 10.1007/s002510050162.
Saper MA, Bjorkman PJ, Wiley DC: Refined structure of the human histocompatibility antigen HLA-A2 at 2.6 A resolution. J Mol Biol. 1991, 219 (2): 277-319. 10.1016/0022-2836(91)90567-P.
Mallal S, Nolan D, Witt C, Masel G, Martin AM, Moore C, et al: Association between presence of HLA-B*5701, HLA-DR7, and HLA-DQ3 and hypersensitivity to HIV-1 reverse-transcriptase inhibitor abacavir. Lancet. 2002, 359 (9308): 727-732. 10.1016/S0140-6736(02)07873-X.
Mallal S, Phillips E, Carosi G, Molina JM, Workman C, Tomazic J, et al: HLA-B*5701 screening for hypersensitivity to abacavir. N Engl J Med. 2008, 358 (6): 568-579. 10.1056/NEJMoa0706135.
Martin AM, Nolan D, Gaudieri S, Almeida CA, Nolan R, James I, et al: Predisposition to abacavir hypersensitivity conferred by HLA-B*5701 and a haplotypic Hsp70-Hom variant. Proc Natl Acad Sci USA. 2004, 101 (12): 4180-4185. 10.1073/pnas.0307067101.
Daly AK, Donaldson PT, Bhatnagar P, Shen Y, Pe'er I, Floratos A, et al: HLA-B*5701 genotype is a major determinant of drug-induced liver injury due to flucloxacillin. Nat Genet. 2009, 41 (7): 816-819. 10.1038/ng.379.
Chung WH, Hung SI, Hong HS, Hsih MS, Yang LC, Ho HC, et al: Medical genetics: a marker for Stevens-Johnson syndrome. Nature. 2004, 428 (6982): 486-10.1038/428486a.
Li J, Uetrecht JP: The danger hypothesis applied to idiosyncratic drug reactions. Handb Exp Pharmacol. 2010, 493-509. 196
Illing PT, Vivian JP, Dudek NL, Kostenko L, Chen Z, Bharadwaj M, et al: Immune self-reactivity triggered by drug-modified HLA-peptide repertoire. Nature. 2012, 486 (7404): 554-558.
Bharadwaj M, Illing P, Theodossis A, Purcell AW, Rossjohn J, McCluskey J: Drug hypersensitivity and human leukocyte antigens of the major histocompatibility complex. Annu Rev Pharmacol Toxicol. 2012, 52: 401-431. 10.1146/annurev-pharmtox-010611-134701.
Wei CY, Chung WH, Huang HW, Chen YT, Hung SI: Direct interaction between HLA-B and carbamazepine activates T cells in patients with Stevens-Johnson syndrome. J Allergy Clin Immunol. 2012, 129 (6): 1562-1569.e1565. 10.1016/j.jaci.2011.12.990.
Luo H, Du T, Zhou P, Yang L, Mei H, Ng H, et al: Molecular docking to identify associations between drugs and class I human leukocyte antigens for predicting idiosyncratic drug reactions. Comb Chem High Throughput Screen. 2015, 18 (3): 296-304. 10.2174/1386207318666150305144015.
Paul S, Kolla RV, Sidney J, Weiskopf D, Fleri W, Kim Y, et al: Evaluating the immunogenicity of protein drugs by applying in vitro MHC binding data and the immune epitope database and analysis resource. Clin Dev Immunol. 2013, 2013: 467852-
Liao WW, Arthur JW: Predicting peptide binding to Major Histocompatibility Complex molecules. Autoimmunity reviews. 2011, 10 (8): 469-473. 10.1016/j.autrev.2011.02.003.
Wang P, Sidney J, Dow C, Mothe B, Sette A, Peters B: A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach. PLoS Comput Biol. 2008, 4 (4): e1000048-10.1371/journal.pcbi.1000048.
Colak R, Moser F, Chu JS, Schonhuth A, Chen N, Ester M: Module discovery by exhaustive search for densely connected, co-expressed regions in biomolecular interaction networks. PloS One. 2010, 5 (10): e13348-10.1371/journal.pone.0013348.
Clauset A, Newman ME, Moore C: Finding community structure in very large networks. Phys Rev E. 2004, 70: 066111-
Vita R, Zarebski L, Greenbaum JA, Emami H, Hoof I, Salimi N, et al: The immune epitope database 2.0. Nucleic Acids Res. 2010, 38 (Database issue): D854-D862.
Rammensee H, Bachmann J, Emmerich NP, Bachor OA, Stevanovic S: SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics. 1999, 50 (3-4): 213-219. 10.1007/s002510050595.
Lata S, Bhasin M, Raghava GP: MHCBN 4.0: A database of MHC/TAP binding peptides and T-cell epitopes. BMC Res Notes. 2009, 2: 61-10.1186/1756-0500-2-61.
Toseland CP, Clayton DJ, McSparron H, Hemsley SL, Blythe MJ, Paine K, et al: AntiJen: a quantitative immunology database integrating functional, thermodynamic, kinetic, biophysical, and cellular data. Immunome Res. 2005, 1 (1): 4-10.1186/1745-7580-1-4.
van Meeteren M, Poorthuis A, Dugundji E: Mapping communities in large virtual social networks. Proceedings of the 1st International Forum on the Application and Management of Personal Electronic Information. 2009, Cambridge
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011, 28 (10): 2731-2739. 10.1093/molbev/msr121.
Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, et al: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Biol Evol. 2011, 7: 539-
Sarwar B, Karypis G, Konstan J, Riedl J: Item-based collaborative filtering recommendation algorithms. Proceedings of the 10th international conference on World Wide Web. 2001, 285-295.
Cheng F, Zhou Y, Li W, Liu G, Tang Y: Prediction of chemical-protein interactions network with weighted network-based inference method. PLoS One. 2012, 7 (7): e41064-10.1371/journal.pone.0041064.
Cheng F, Liu C, Jiang J, Lu W, Li W, Liu G, et al: Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Comput Biol. 2012, 8 (5): e1002503-10.1371/journal.pcbi.1002503.
Hong H, Hong Q, Perkins R, Shi L, Fang H, Su Z, et al: The accurate prediction of protein family from amino acid sequence by measuring features of sequence fragments. J Comput Biol. 2009, 16 (12): 1671-1688. 10.1089/cmb.2008.0115.
Ostrov DA, Grant BJ, Pompeu YA, Sidney J, Harndahl M, Southwood S, et al: Drug hypersensitivity caused by alteration of the MHC-presented self-peptide repertoire. Proc Natl Acad Sci U S A. 2012, 109 (25): 9959-9964. 10.1073/pnas.1207934109.
Nielsen M, Lundegaard C, Blicher T, Lamberth K, Harndahl M, Justesen S, et al: NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence. PLoS One. 2007, 2 (8): e796-10.1371/journal.pone.0000796.
Donnes P, Elofsson A: Prediction of MHC class I binding peptides, using SVMHC. BMC Bioinformatics. 2002, 3: 25-10.1186/1471-2105-3-25.
Nielsen M, Lund O: NN-align. An artificial neural network-based alignment algorithm for MHC class II peptide binding prediction. BMC Bioinformatics. 2009, 10: 296-10.1186/1471-2105-10-296.
We thank Dr. Feixiong Cheng of Vanderbilt University Medical Center for providing us the code of NBI and for discussing on the data analysis. This research was supported in part by an appointment to the Research Participation Program at the National Center for Toxicological Research (Heng Luo, Hao Ye and Hui Wen Ng) administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the U.S. Food and Drug Administration. This project was partially supported by grant funding from the National Institutes of Health (NIH) National Institute of General Medical Sciences (NIGMS) (P20 GM103429) (formerly P20RR016460).
The publication charges for this article were funded by NCTR appropriated funds.
This article has been published as part of BMC Bioinformatics Volume 16 Supplement 13, 2015: Proceedings of the 12th Annual MCBIOS Conference. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/16/S13.
The authors declare that they have no competing interests.
HH, DM, LS, WM and WT designed and led the project. HL collected the data. HL and HY implemented the methods. HL, HY, HN and HH discussed the data analysis and the results. HL, HH and DM wrote the manuscript.
Electronic supplementary material
Additional File 1: Supplementary Tables S1, S2, S3, S4 and S5. Supplementary Table S1. The processed HLA-peptide binding data for network construction. Supplementary Table S2. The peptides in each module. Supplementary Table S3. The HLAs in each module. Supplementary Table S4. Residue distributions at Positions 2 and 9 for 9-mer peptides in seven major modules compared to their overall distributions among all nine modules. Supplementary Table S5. Pseudo-sequence identities within and between modules for Positions 2 and 9. (XLSX 2 MB)
About this article
Cite this article
Luo, H., Ye, H., Ng, H.W. et al. Understanding and predicting binding between human leukocyte antigens (HLAs) and peptides by network analysis. BMC Bioinformatics 16 (Suppl 13), S9 (2015). https://doi.org/10.1186/1471-2105-16-S13-S9