 Software
 Open Access
 Published:
HLAClus: HLA class I clustering based on 3D structure
BMC Bioinformatics volume 24, Article number: 189 (2023)
Abstract
Background
In a previous paper, we classified populated HLA class I alleles into supertypes and subtypes based on the similarity of 3D landscape of peptide binding grooves, using newly defined structure distance metric and hierarchical clustering approach. Compared to other approaches, our method achieves higher correlation with peptide binding specificity, intracluster similarity (cohesion), and robustness. Here we introduce HLAClus, a Python package for clustering HLA Class I alleles using the method we developed recently and describe additional features including a new nearest neighbor clustering method that facilitates clustering based on userdefined criteria.
Results
The HLAClus pipeline includes three stages: First, HLA Class I structural models are coarse grained and transformed into clouds of labeled points. Second, similarities between alleles are determined using a newly defined structure distance metric that accounts for spatial and physicochemical similarities. Finally, alleles are clustered via hierarchical or nearestneighbor approaches. We also interfaced HLAClus with the peptide:HLA affinity predictor MHCnuggets. By using the nearest neighbor clustering method to select optimal allelespecific deep learning models in MHCnuggets, the average accuracy of peptide binding prediction of rare alleles was improved.
Conclusions
The HLAClus package offers a solution for characterizing the peptide binding specificities of a large number of HLA alleles. This method can be applied in HLA functional studies, such as the development of peptide affinity predictors, disease association studies, and HLA matching for grafting. HLAClus is freely available at our GitHub repository (https://github.com/yshen25/HLAClus).
Background
Human leukocyte antigen (HLA) class I proteins, which include HLAA, HLAB, and HLAC, play an essential role in the adaptive immune system by presenting intrinsic peptide antigens to CD8^{+} T cells and eliciting a cytotoxic immune response [1, 2]. The extreme functional polymorphism of HLA alleles complicates the investigation of these important macromolecules [3] (Additional file 1: Fig. S1). To help disentangle this complex system of proteins, supertypes have been defined to include alleles that have similar peptide binding specificities [4, 5]. Comprehensively determining peptide binding specificities using experimental methods requires exorbitant time and effort [6,7,8]. Thus, in silico methods have been developed as viable alternatives including affinity predictionbased [9, 10], sequencebased [11, 12], and structurebased methods [13, 14]. Because function is determined by structure, the structurebased methods may provide advantages over sequencebased approaches. However, the accuracy and coverage of previous structurebased approaches has been limited by the lack of availability of highquality structures, the performance of algorithms, and computational demand for the required analysis.
We recently presented an HLA class I structurebased supertype and subtype classification method that combines multiple targeted solutions [15]. We briefly summarize the approach here. By using ColabFold [16], a notebookbased implementation of AlphaFold2 [17], highquality HLA class I structures were generated. In addition, coarse graining of protein models was applied to reduce the computational cost of structural analyses and the impact of possibly inaccurate side chain positioning introduced by modeling. Inspired by the successful application of structural similarity in comparing small molecule binding pockets [18,19,20,21], a structure distance metric, SD, was adapted from the atomlevel point cloudbased supCK algorithm [21], which performs well in classification accuracy applied to binding pocket prediction. To incorporate physicochemical properties and the varied importance of specific HLA bindingsite residues, we implemented a similarity matrix and weight factor, which improve the correlation between structural and functional similarity. Compared to previous clustering methods, our approach offers improved correlation with peptide binding specificity, intracluster similarity (cohesion), and cluster stability against random sampling (robustness).
Here we present a package, HLAClus, for clustering HLA class I alleles based on the method described above. HLAClus includes three major stages: structure processing, structure distance calculation, and clustering. First, the 3D HLA structures are processed into labeled point clouds. Then, the structural distances (SD) between alleles are calculated. Lastly, based on the pairwise SD matrix, alleles are clustered hierarchically using a complete linkage method. A nearestneighbor clustering method is also available to allow new alleles to be easily incorporated into existing, predefined clusters.
As an example application, we interfaced HLAClus in the peptide:HLA affinity predictor, MHCnuggets, which employs allelespecific deep learning models. The affinity prediction of alleles lacking a corresponding deep learning model is achieved by selecting the closest prediction model, which is referred to as model selection, based on sequencebased allele similarity and model quality. Instead, we used HLAClus nearest neighbor clustering method in model selection. Compared to the default method, HLAClus improved the binding prediction accuracy and correlation with experimental affinity on tested alleles.
The pipeline is fully open source to enable easy use and modification by users. Recommended parameters are provided for rapid startup. Although not demonstrated explicitly, this approach has the potential to be adapted to clustering of other classes of proteins based on structure distance metrics.
Implementation
Overview of structurebased HLA class I similarity measurement
The details of implemented methods have been described in detail previously [15] but are briefly summarized here.
Similarity between alleles is measured by a newly defined structure distance metric, SD, which was adapted from the SupCK method [21], a point cloudbased allatom structure distance metric designed for comparing small molecule binding pockets. However, the high computational demand is a major issue when applying this method in HLA molecules. Furthermore, the SupCK requires a gradient ascent optimization to find the optimal relative orientation, which greatly increased the computational demand.
We modified the metric for use with coarsegrained models to improve calculation speed, incorporate weight factors representing the importance of each residue, and implement physicochemical similarity at the residue level. By coarse graining, the number of points in a structure is greatly decreased. Also, structure alignment was used to find the optimal relative orientation between HLAs, so that the gradient ascent optimization could be omitted. This is applicable because all HLA class I molecules are homologous with highly similar structures. In addition, SupCK uses a hard cutoff to determine the residues that contact the peptide, instead our method uses the weight factors as a soft threshold, which avoids the inaccuracy brought by falsely detected binding pocket.
The SD metric incorporates three components: spatial similarity, physicochemical similarity, and weight factor. First, the similarity K between two HLA proteins P1 and P2 is defined as:
The spatial similarity is quantified by a kernel function that transforms the Euclidean distance between two pairs of residues x_{i} and x_{j} into a value between 0 and 1. The physicochemical similarity S_{ij} between two residues is measured using a transformed Grantham distance [22] matrix, which measures the physicochemical similarity between residues. The weight factors w_{i} and w_{j} were adapted from a previous study [23] that indicated the relative importance of the position in determining peptide binding affinity.
To normalize the positive definite similarity measurement K, the structure distance metric (SD) was defined as:
Algorithm overview
We implemented the abovementioned method in Python 3.8 with publicly available packages, including NumPy, SciPy, pandas, BioPandas, pymolbundle, Biopython, matplotlib, seaborn, and scikitlearn. The source code is available in our GitHub repository [24]. Users can install the package via pip or by cloning the repository. The method has three stages: structure processing, structure distance calculation, and clustering (Fig. 1). Example outputs are available as supplemental materials (Additional file 2: Tables S1–S4).
Stage 1: HLA class I structure processing
The HLAClus package accepts 3D structures of HLA class I α chains as input. In the previous study we modeled 449 populated HLA class I alleles [15]. The structures are available at [25] for download and further investigation.
Trim and align
The 3D structures are trimmed to include only the 179 residues that form the peptide binding domain (residues 2180). The trimmed models are superimposed onto the structure of the peptide binding domain of the most studied allele HLAA*02:01 (PDB ID: 1i4f).
Coarse graining
The structures are coarse grained, with each residue represented by the center of mass of its side chain and the backbone atoms omitted. The coordinates and residue types are stored in a CSV file.
Assigning weight factors
Weight factors, which were adapted from a previous study [23], are assigned to residues by the position according to their relative importance in determining peptide binding specificity. The weight factors are recorded in the CSV files of coarsegrained structures and can be changed by passing a Python dictionary with the residue position as the key and the weight factor as the value.
Processed HLA structure file
Processed HLA structures are stored in CSV files. Each file includes 179 rows and 7 columns. Each row corresponds to one residue, and the columns are chain identifier, residue number, residue name, Cartesian coordinates, and weight factor.
Step 2: measuring similarity between alleles using structure distance (SD) metric
This step is the most time consuming in the HLA clustering pipeline, especially for the hierarchical clustering method, and so calculation speed has been optimized as follows. As the definition of SD suggests, the calculation of SD is split into two stages: the similarity score K and the structure distance SD. The similarity score K between a pair of alleles is split into three parts and calculated separately: spatial similarity, physicochemical similarity, and average weight factor. Then, the final result is generated with vectorized calculations.
Spatial similarity
The spatial similarity is calculated according to the kernel function. The input is a 3D coordinates with shape (179, 3), and the result is a 2D matrix with shape (179, 179).
Physicochemical similarity
The physicochemical similarity is derived by looking up values in the similarity matrix. The input is a (179, 1) NumPy array, and the result is a 2D matrix with shape (179, 179).
Average weight factor
The average weight factor is the geometric mean of the weight factors of two compared residues and is calculated as the square root of the outer product of two weight factor NumPy arrays with shape (179, 1). The result is a 2D matrix with shape (179, 179).
Calculation of structural similarity K
K is calculated as the grand sum of the elementwise product of the three matrices: spatial similarity, physicochemical similarity, and average weight factor.
Calculation of structure distance SD
As shown in Eq. 2, the calculation of SD(P1, P2) requires three components: K(P1, P1), K(P2, P2), and K(P1, P2). Because the selfsimilarity values (e.g., K(P1, P1)) are used multiple times and K(P1, P2) = K(P2, P1), the calculation process was optimized for efficiency.
In the hierarchical clustering mode, the SD matrix is calculated as follows:

(i)
The combination with repetition of query HLA alleles is generated, so that only one of K(P1, P2) and K(P2, P1) will be calculated.

(ii)
The structural similarity K for each allele pair in the combination is calculated, and the values are stored in a Python dictionary.

(iii)
Finally, the elements in the SD matrix are calculated by looking up K values in the dictionary. Because the SD matrix is symmetric about the diagonal, and the diagonal is always 0, only the upper triangular portion of the SD matrix is calculated.
In the nearest neighbor clustering mode, because the query and anchor alleles differ, the output SD matrix is not symmetric, and the similarity between two anchor alleles or two query alleles is not needed. Therefore the calculation of K is divided into two cases: the selfsimilarity (e.g., K(P1, P1)) and anchorquery similarity (e.g., K(P1, P2)). The anchorquery similarity is calculated according to the combination of anchorquery pairs without replacement. Both similarity values are stored in a Python dictionary. Finally, the SD matrix is calculated by looking up K values.
Besides the optimization of the calculation process, a multiprocessing method was implemented to improve calculation speed. In the calculation of structural similarities, the K value of each pair of alleles is calculated in parallel.
Step 3: clustering of alleles based on SD
Two clustering methods are available. The hierarchical clustering method clusters all query alleles, while the nearestneighbor clustering approach is used to cluster query alleles according to an existing or userdefined clustering scheme.
Hierarchical clustering
The hierarchical clustering method first calculates the pairwise SD matrix between alleles to be clustered. Then, hierarchical clustering is performed with a userdefined number of clusters and linkage method.
Nearest neighbor clustering
To use the nearestneighbor approach, anchor alleles and corresponding clusters must be defined. An anchor allele is the structure used as a representative of a predefined cluster (Table 1). For each query allele awaiting clustering, the SD to each anchor allele is calculated, and the query allele is assigned to the cluster represented by the nearest anchor allele.
Choice of optimal number of clusters (N) for hierarchical clustering
To select the optimal number of clusters, the elbow method and silhouette method have been implemented. First, hierarchical clustering is performed given multiple consecutive numbers of clusters (N) and then the sum of squared errors (SSE) and the silhouette coefficient (SC) are calculated for each clustering result.
The SSE is defined as the sum of distances between alleles and corresponding cluster centers, which is conventionally the average of cluster members. However, for a precomputed pairwise distance matrix, this average is not preferred because it may be unphysical. Therefore, we instead use cluster centroids, which are calculated as:
The silhouette coefficients are calculated in several steps. First, for each allele i that belongs to cluster C, the average distance to all other alleles in the same cluster is defined as:
Next, the average distance to the closest neighboring cluster D for each allele i is defined as:
Lastly, the SC for the clustering result with n samples is calculated using the following equation:
The elbow plot (SSE vs N) and silhouette plot (SC vs N) are generated using the Matplotlib library. The optimal value of N can be selected from elbow points (i.e., the inflection point) of the SSE curve or the peaks (i.e., local maxima) of the silhouette curve.
Test example: model selection for MHCnuggets using HLAClus nearestneighbor clustering
To demonstrate the application of HLAClus in characterizing similarities between HLA alleles, we implemented our nearestneighbor clustering approach in the MHCnuggets pipeline. The peptide:MHC affinity predictor MHCnuggets [26] includes allelespecific deep learning models for 102 classical HLA class I alleles. However, affinity prediction for rare alleles lacking a corresponding deep learning model is performed by using the model of the closest wellcharacterized allele. This procedure is referred to as model selection. Thus, the predictive performance of MHCnuggets on rare alleles is determined by the quality of both the deep learningbased affinity prediction model and the model selection algorithm. We implemented HLAClus to replace the default model selection in MHCnuggets and compared its performance to the default algorithm.
The default algorithm was assessed previously by the authors using a leaveonemoleculeout (LOMO) test. In the original LOMO test protocol, 20 well characterized alleles were chosen as pseudorare alleles (“LOMO allele”). Then, for each of the 20 alleles the data in the training set for that allele was held out, and deep learning models were trained using data from all other alleles. Finally, the affinity of the heldout peptides was predicted by the remaining models and compared to the heldout experimental data.
Because MHCnuggets has been updated over the years, we reimplemented the LOMO test that was used in the default model selection method. We used the original dataset for the LOMO test containing experimental affinities for peptides binding to 20 alleles, referred to as IEDB class I rare alleles. Instead of retraining the deep learning models for each of the LOMO alleles, the model was selected using the closest_allele function in the find_closest_mhcI.py script by omitting the tested allele one allele at a time in the model search file examples_per_allele.pkl from the MHCnuggets source code.
For comparison, we applied the HLAClus nearestneighbor clustering method to select the closest model. Among the 102 alleles that have a prediction model, four are invalid according to the IPDIMGT/HLA database (version 3.50), including three LOMO alleles, and were therefore excluded from further analysis. Structural models of the 98 valid alleles were generated using ColabFold as described previously [15]. Next, each of the remaining 17 alleles was clustered using the nearestneighbor method together with the remaining 97 alleles, and the closest model was selected according to the clustering result.
Finally, the affinity of peptides to the 17 alleles was predicted using the closest model trained on binding affinity data (i.e., ba_models = True) and compared to the corresponding experimental result. The performance was assessed by binding prediction accuracy and correlation between the predicted and experimental IC_{50} values. The binding prediction, binder:nonbinder binary classification was based on an IC_{50} threshold of 500 nM. The accuracy was calculated as the number of peptides that have identical binding results (i.e., binder or nonbinder) between predicted and experimental values divided by the total number of tested peptides. The correlation between predicted and experimental values was calculated as the Spearman rank correlation coefficient.
Results
Test example: application of HLAClus to MHCnuggets model selection improves peptide binding prediction accuracy for rare alleles
As an example of the use of HLAClus in identifying similar alleles for use in peptide binding affinity classification, we demonstrate the application of a newly added nearestneighbor clustering method in the MHCnuggets package and test its performance. HLAClus provides an SD metric for predicting the similarity of peptide binding specificity between HLA class I alleles and two clustering methods: hierarchical clustering and nearestneighbor clustering. The performance of the SD metric and hierarchical clustering method was demonstrated in our previous article [15].
The MHCnuggets package predicts peptide:HLA affinity using allelespecific deep learning models. To make predictions for rare alleles, the closest deep learning model is used, which is selected by a sequencebased algorithm. As we demonstrated previously [15], our structurebased method has a higher correlation with peptide binding specificity than does a sequencebased comparison. We now investigate if HLAClus improves model selection relative to the default method.
In the model selection result, among the 17 valid LOMO alleles, 12 alleles were assigned different closest models by the two methods. We further compared the performance of MHCnuggets on these 12 alleles using two groups of closest models given by HLAClus and the default method, via accuracy of binder:nonbinder binary classification and the Spearman rank correlation coefficient. On average, the HLAClus group shows higher accuracy in binding prediction than the group obtained with the default method (Fig. 2a), as the average accuracy of HLAClus group is 0.55 compared to 0.47 using the default method (Table 2). Among the 12 alleles, seven have improved accuracy while five show a decrease. The Spearman correlation coefficient shows no significant difference on average (Table 2), while for each individual allele, the correlation coefficient varies significantly (Fig. 2b). In general, the classification accuracy and correlation coefficient are positively correlated, as a good predictor is expected to perform well on both correlation and classification scenarios.
We further investigated the potential cause of the cases in which the prediction performance decreased when HLAClus was used by examining the number of peptides in the training data for each model contained in the examples_per_allele.pkl file in the MHCnuggets source code. In most cases (10 out of 12), the models selected by HLAClus have a much smaller training set than the default algorithm, especially for the five alleles that show a decrease in prediction performance (Table 2). In the extreme example, the tested allele HLAB*27:20, the default method selected model HLAB*27:05 includes 4402 peptides in the training set, while HLAClus selected HLAB*27:06, which contains only 87 peptides in the training set. Therefore, we conclude that the insufficient training data is the main cause of the decreased performance using HLAClus to identify the closest allele. On the other hand, this finding also suggests the advantage of HLAClus over the default method, as a better prediction performance was achieved using much smaller training sets. By combining HLAClus with the consideration of model quality applied in the default MHCnuggets model selection algorithm, a substantial improvement in performance is expected.
Conclusions
Here we presented the HLAClus package for clustering HLA class I alleles with similar peptide binding specificities based on similarity of the peptide binding groove landscape. The clustering pipeline first processes modeled 3D HLA structures into coarsegrained point clouds. It then calculates the pairwise SD matrix between HLA alleles and clusters alleles into groups using a hierarchical or nearestneighbor method. The structure distance metric SD correlates strongly with the peptide binding specificity, leading to reliable supertype and subtype classification [15].
In addition, HLAClus is versatile and can be readily applied in various scenarios. For example, we have demonstrated that using the nearestneighbor clustering method in HLAClus can improve peptide binding prediction in MHCnuggets for rare alleles by upgrading the model selection algorithm. Moreover, HLAClus has the potential to be used in disease association studies to merge similar alleles into groups for streamlining analyses. It may also be useful for HLA matching in transplantation studies (Additional file 1: Fig. S1 and Additional file 2: Tables S1–S4).
Availability and requirements
Project name: HLAClus.
Project home page: https://github.com/yshen25/HLAClus.
Operating system(s): Platform independent.
Programming language: Python.
Other requirements:
License: GPL3.0
Availability of data and materials
The HLAClus package is available at https://github.com/yshen25/HLAClus. The functions and scripts were written in Python 3 using publicly available packages. The clustering pipeline and examples are also provided as Jupyter notebooks. The datasets supporting the conclusions of this article are included within the article and its additional files.
Abbreviations
 HLA:

Human leukocyte antigen
 3D:

Threedimensional
 SD:

Structure distance
 SSE:

Sum of squared errors
 SC:

Silhouette coefficient
 LOMO:

Leaveonemoleculeout
References
Klein J, Sato A. The HLA system. N Engl J Med. 2000;343(10):702–9.
Hewitt EW. The MHC class I antigen presentation pathway: strategies for viral immune evasion. Immunology. 2003;110(2):163–9.
Bird L. Advantages to being different. Nat Rev Immunol. 2004;4(8):577.
Sette A, Sidney J. Nine major HLA class I supertypes account for the vast preponderance of HLAA andB polymorphism. Immunogenetics. 1999;50(3):201–12.
Lund O, Nielsen M, Kesmir C, Petersen AG, Lundegaard C, Worning P, et al. Definition of supertypes for HLA molecules using clustering of specificity matrices. Immunogenetics. 2004;55(12):797–810.
Kobayashi H, Lu J, Celis E. Identification of helper Tcell epitopes that encompass or lie proximal to cytotoxic Tcell epitopes in the gp100 melanoma tumor antigen. Can Res. 2001;61(20):7577–84.
Panigada M, Sturniolo T, Besozzi G, Boccieri MG, Sinigaglia F, Grassi GG, et al. Identification of a promiscuous Tcell epitope in Mycobacterium tuberculosis Mce proteins. Infect Immun. 2002;70(1):79–85.
Doytchinova IA, Flower DR. In silico identification of supertypes for class II MHCs. J Immunol. 2005;174(11):7085–95.
Thomsen M, Lundegaard C, Buus S, Lund O, Nielsen M. MHCcluster, a method for functional clustering of MHC molecules. Immunogenetics. 2013;65(9):655–65.
Reche PA, Reinherz EL. Definition of MHC supertypes through clustering of MHC peptidebinding repertoires. In: Flower DR, editor. Immunoinformatics. Springer; 2007. p. 163–73.
Cano P, Fan B, Stass S. A geometric study of the amino acid sequence of class I HLA molecules. Immunogenetics. 1998;48(5):324–34.
McKenzie L, PeconSlattery J, Carrington M, O’Brien SJ. Taxonomic hierarchy of HLA class I allele sequences. Genes Immun. 1999;1(2):120–9.
Doytchinova IA, Guan P, Flower DR. Identifiying human MHC supertypes using bioinformatic methods. J Immunol. 2004;172(7):4314–23.
Tong JC, Tan TW, Ranganathan S. In silico grouping of peptide/HLA class I complexes using structural interaction characteristics. Bioinformatics. 2007;23(2):177–83.
Shen Y, Parks JM, Smith JC. HLA class I supertype classification based on structural similarity. J Immunol. 2022;210:103.
Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: making protein folding accessible to all. Nat Methods. 2022;19:679.
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9.
Gao M, Skolnick J. APoc: largescale identification of similar protein pockets. Bioinformatics. 2013;29(5):597–604.
ShulmanPeleg A, Nussinov R, Wolfson HJ. SiteEngines: recognition and comparison of binding sites and proteinprotein interfaces. Nucleic Acids Res. 2005;33(Web Server issue):W337–41.
Lee HS, Im W. GLoSA: an efficient computational tool for local structurecentric biological studies and drug design. Protein Sci. 2016;25(4):865–76.
Hoffmann B, Zaslavskiy M, Vert JP, Stoven V. A new protein binding pocket similarity measure based on comparison of clouds of atoms in 3D: application to ligand prediction. BMC Bioinform. 2010;11(1):1–16.
Grantham R. Amino acid difference formula to help explain protein evolution. Science. 1974;185(4154):862–4.
van Deutekom HW, Kesmir C. Zooming into the binding groove of HLA molecules: which positions and which substitutions change peptide binding most? Immunogenetics. 2015;67(8):425–36.
HLAClus repository. Available from: https://github.com/yshen25/HLAClus.
GitHub repository for article "HLA Class I Supertype Classification Based on Structural Similarity" [Available from: https://github.com/yshen25/HLA_clustering.
Shao XM, Bhattacharya R, Huang J, Sivakumar IKA, Tokheim C, Zheng L, et al. Highthroughput prediction of MHC class I and II neoantigens with MHCnuggets. Cancer Immunol Res. 2020;8(3):396–408.
Acknowledgements
Not applicable.
Funding
This research received no external funding.
Author information
Authors and Affiliations
Contributions
YS developed the package and wrote the manuscript with contributions from all authors; JMP and JCS supervised the studies. All authors have read and approved the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file1
. Figure S1: Comparison between the number of HLA class I alleles studied previously.
Additional file2
. Table S1: Example output of the Processing_pipeline function. Table S2: Example output of HC_pipeline function. Table S3: Example of anchor_dictionary parameter for NN_pipeline function. Table S4: Example output of NN_pipeline output.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Shen, Y., Parks, J.M. & Smith, J.C. HLAClus: HLA class I clustering based on 3D structure. BMC Bioinformatics 24, 189 (2023). https://doi.org/10.1186/s1285902305297x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1285902305297x
Keywords
 Human leukocyte antigen
 Protein structure
 Clustering
 Machine learning