In silico vaccine design and epitope mapping of New Delhi metallo-beta-lactamase (NDM): an immunoinformatics approach

Background Antibiotic resistance is a global health crisis. The adage that “prevention is better than cure” is especially true regarding antibiotic resistance because the resistance appears and spreads much faster than the production of new antibiotics. Vaccination is an important strategy to fight infectious agents; however, this strategy has not attracted sufficient attention in antibiotic resistance prevention. New Delhi metallo-beta-lactamase (NDM) confers resistance to many beta-lactamases, including important carbapenems like imipenem. Our goal in this study is to use an immunoinformatics approach to develop a vaccine that can elicit strong and specific immune responses against NDMs that prevent the development of antibiotic-resistant bacteria. Results In this study, 2194 NDM sequences were aligned to obtain a conserved sequence. One continuous B cell epitope and three T cell CD4+ epitopes were selected from NDMs conserved sequence. Epitope conservancy for B cell and HLA-DR, HLA-DQ, and HLA-DP epitopes was 100.00%, 99.82%, 99.41%, and 99.86%, respectively, and population coverage of MHC II epitopes for the world was 99.91%. Permutation of the four epitope fragments resulted in 24 different peptides, of which 6 peptides were selected after toxicity, allergenicity, and antigenicity assessment. After primary vaccine design, only one vaccine sequence with the highest similarity with discontinuous B cell epitope in NDMs was selected. The final vaccine can bind to various Toll-like receptors (TLRs). The prediction implied that the vaccine would be stable with a good half-life. An immune simulation performed by the C-IMMSIM server predicted that two doses of vaccine injection can induce a strong immune response to NDMs. Finally, the GC-Content of the vaccine was designed very similar to E. coli K12. Conclusions In this study, immunoinformatics strategies were used to design a vaccine against different NDM variants that could produce an effective immune response against this antibiotic-resistant factor. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04378-z.

Background Antibiotic resistance that is a direct consequence of the overwhelming use of antibiotics is one of the biggest issues that threaten global health. The World Health Organization (WHO), the Food and Agriculture Organization (FAO), and the World Organization for Animal Health (OIE) have considered antibiotic resistance as one of the three declared priority issues for joint action [1]. The United Nations (UN) also reported that the death toll due to antibiotic resistance is at least 700,000 deaths a year in 2019, which will be raised to 10 million deaths by 2050. Furthermore, economic damage due to uncontrolled antimicrobial resistance may lead to an economic shock similar to the 2008-2009 global financial crisis, which can result in more poverty and inequality [2].
The currently available strategies to avoid or prevent antibiotics resistance suggested by the Centers for Disease Control and Prevention (CDC) as the "Four Core Actions to Prevent Antibiotic Resistance" includes preventing infections and preventing the spread of resistance, tracking antibiotic-resistant infections, avoiding unnecessary uses of antibiotics in humans and animals and developing new drugs and diagnostic tests [3]. Designing a vaccine to prevent antibiotic resistance falls into the first class of strategies suggested by the CDC.
As β-lactamases are very important in conferring resistance to many essential antibiotics, there are numerous published papers about their prevalence, single nucleotide polymorphisms, and their inhibitors. These enzymes led to antibiotic resistance in a wide range of human pathogens through the inactivation of β-lactam antibiotics which posed serious challenges for the treatment of patients. The mechanism of action of beta-lactamase is through hydrolysis of the β-lactam ring in the structure of β-lactam antibiotics. The β-lactam ring is the core structure of β-lactam antibiotics, therefore its hydrolysis leads to the inactivation of these antibiotics [4,5]. Normally, β-lactamases are divided into two groups according to the characteristics of the active site. In the first group, the acyl-enzyme contains serine in the active site (classes A, C, and D) while in the second group the hydrolytic reaction is facilitated by one or two zinc ions (class B). The second group is called metallo-β-lactamases (MBLs) [6,7], and New Delhi metallo-betalactamase (NDM) is categorized in this class. The first member of NDMs (NDM-1) was reported in 2009 from a Swedish patient traveling to New Delhi, India with the bacterium Klebsiella pneumoniae. It hydrolyzes a wide range of β-lactam antibiotics including penicillins, late-generation cephalosporins, and carbapenems, except monobactams [8]. These enzymes are found in a variety of bacterial pathogens including K. pneumoniae, Escherichia coli, Acinetobacter baumannii, Enterobacter cloacae, Providencia rettgeri, and Morganella morganii. As different plasmids carrying the blaNDM gene, these enzymes can be transmitted between microorganisms in different ways including interstrain, inter-species, and inter-genus [9].
Immunoinformatics approaches have revolutionized vaccine design and development by increasing the quality of empirical studies. In-silico methods help to focus exclusively on well-analyzed and rational vaccine candidates. For example, Immunoinformatics approaches can help to determine conserved sequences. Conserved sequences are sequences that are similar or identical between different isoforms of a protein or nucleic acid sequences from various sources. The conserved sequences are important targets for vaccine design as they allow production vaccines effective against different isoforms of the target protein to cover a large range of potential targets and reduce the probability of resistance occurrence [10].
Furthermore, another important consideration during the design of a vaccine is the determination of antigenicity, immunogenicity, and allergenicity of the candidate epitopes. An Antigen is a molecular structure that can bind specifically to antibodies, B cell receptors, or T cell receptors; while immunogens are antigens that can induce either or both humoral or cell-mediated immune responses. Some antigens cannot induce immune responses at the first encounter, therefore they are not immunogenic. Allergens are immunogens or antigens that induce a pathogenic immune response due to overactivation of the immune system [11,12]. Immunoinformatics approaches can predict the most available and immunogenic epitopes that have the minimum possibility of toxicity, allergenicity, and cross-reactivity to host antigens [13].
Another advantage of bioinformatics approaches is the possibility of "codon optimization" Amino acids are commonly expressed by synonymous codons. In various organisms, these synonymous codons are used with unequal frequency, which is called codon bias (CB). Codon optimization results in similar GC content. Each organism has its specific codons usage and therefore specific GC content; therefore, to have a maximum expression of a cloned protein sequence in an organism, the sequence must have synonymous codons with the host [9,10]. We used E. coli K12 as the host, so all the codons must be optimized according to the E. coli K12 genome.
The adage that "prevention is better than cure" is especially true regarding antibiotic resistance because the resistance is introduced and spread much faster than the production of new antibiotics [14,15]. Vaccination is an important strategy to fight infectious agents; however, this strategy has not attracted sufficient attention in antibiotics resistance prevention. Our goal in this study is to use an immunoinformatics approach (Fig. 1) to develop a vaccine that can elicit strong and specific immune responses against NDMs that prevent the development of antibiotic-resistant bacteria.

Conserved protein sequence
The final conserved sequence contains 229 amino acids. The conserved sequence obtained from the multiple sequence alignment (MSA) is presented below (for more details regarding MSA methodology refer to the "1-Protein Sequence Retrieval" in the methods section). GDQRFGDLVFRQLAPNVWQHTSYLDMPGFGAVASNGLIVRDGGRVLVVD-TAWTDDQTAQILNWIKQEINLPVALAVVTHAHQDKMGGMDALHAAGIATYA-NALSNQLAPQEGMVAAQHSLTFAANGWVEPATAPNFGPLKVFYPGPGHTSDNIT-VGIDGTDIAFGGCLIKDSKAKSLGNLGDADTEHYAASARAFGAAFPKASMIVMSH-SAPDSRAAITHTARMADKLR.

Secondary structure
The secondary structure of the conserved sequence obtained from the NetSurfP-2.0 Server has approximately 74.23% coil, 27.51% strand, and 25.76% helix. About the first six amino acids in the conserved sequence were disordered residues. Considering the threshold at 25% for Relative Solvent Accessibility (RSA), approximately 45.41% of amino acids contain RSA (Fig. 2).  The first line represents the amino acids constituting the conserved sequence. The second line represents the relative surface accessibility (RSA), in which a red curve at the top of the line denotes the residue is exposed and the blue at the bottom of the line indicates buried residues. The third line represents the Secondary Structure, including the strand (purple thick arrow), the coil (pink straight lines), and the helix (orange spiral shapes). The fourth line represents disordered residues in the structure; the thickened regions indicate the presence of disordered residues. The fifth line shows the sequence number of residues

Continuous B cell epitope
Continuous B cell epitopes were detected from all three servers LBtope, ABCpred, and SVMtrip, and the top five epitopes are listed in Table 1. The antigenicity of these epitopes was also assessed by the VaxiJen v3.0 server. In the top five epitopes of the LBtope server, the peptide GGCLIKDSKAKSLGN (165-179) with a score of 0.95000785 is in the first rank and has an antigenicity with a probability of 66%. Although this peptide is not among the 5 top epitopes ranked by ABCpred, however, it has an acceptable score to be considered as a candidate epitope. Indeed, the calculated score for this epitope is 0.73, which is much higher than the defined threshold for this server that is 0.51.

T cell CD4 + epitope
The MHC-II alleles that are abundant in at least one European country were selected. Regarding this including criteria, twelve HLA-DR alleles, five HLA-DQ haplotypes, and two HLA-DP haplotypes were chosen ( Table 2).
In the Consensus method (for more details regarding the IEDB Consensus method refer to the "4-T cell CD4 + epitope" in the methods section), the lower the adjusted rank epitope means the better binding of the epitopes to MHC alleles. The results of the Consensus method for HLA-DQ, and HLA-DP haplotypes, and HLA-DR alleles are depicted in Fig. 3, and the top five of each chart are summarized in Table 3. In addition, the antigenicity of each of the top five peptides was calculated by the Vaxi-Jen v3.0 server and is depicted in the last column of Table 3. The epitopes that were identified by the VaxiJen v3.0 server as non-antigen was not selected as the final vaccine epitopes even if they are among the five top IEDB Consensus rank. Finally, PKASMIVMSHSAPDS (200-214) epitope for HLA-DR alleles, EHYAASARAF-GAAFP (186-200) epitope for HLA-DQ haplotypes, and DQRFGDLVFRQLAPN (2-16) epitope for HLA-DP haplotypes were selected. The top five immunogenic epitopes predicted by the IEDB are listed in Table 4. The GDLVFRQLAPNVWQH epitope (6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20) with first rank immunogenicity with a score of 49.57 overlaps with     (Table 5).

Population coverage
World population coverage (for more details regarding Population coverage refer to the "7-Population coverage" in the methods section) was 99.91% for all selected alleles of HLA-DR, HLA-DQ, and HLA-DP, while the coverage rate was 100% for Europe and North America. Other regions (except South Africa) had more than 98.5% population coverage and only South Africa had population coverage of less than 50% (Table 6).

Toxicity, allergenicity and antigenicity of the peptides
Toxicity, allergenicity, and antigenicity of these 24 peptides were evaluated as it is required for any potential therapeutic protein.
Nine out of the initial 24 peptides were found to be non-toxic and 13 out of the 24 peptides were identified with a 100% antigen probability. Eleven peptides were ignored due to inefficient antigenicity, of which 10 peptides were estimated with only a 66% antigen probability, and one was estimated non-antigenic with a 66% probability (Table 7).  Finally, even though AllergenFP detected none of the peptides as an allergen, 10 out of the 24 peptides were identified as an allergen by AllerTOP v.2.0 server.
The RaptorX server predicts 5 structures with the lowest root-mean-square deviation (RMSD) score for each peptide sequence. To further determine the best of those 5 structures, the MolProbity server was used. MolProbity server utilizes 3 criteria to compare the structures including favored rotamers, Ramachandran favored, and Rama-Z score. These 3 criteria were used to determine the best possible structure for the downstream steps in the study. In other words, for each one of the 6 candidate vaccines, one of the 5 structures was chosen as the optimum structure (Table 8). Favored rotamers and Ramachandran favored values are given as a percentage. The best structures were Table 7 Evaluation of toxicity, allergenicity and antigenicity 24 permutation-derived peptides given the highest percentages and the ideal value is usually higher than 98%. The Rama-Z score value should be ideally between − 2 and + 2, i.e. abs (Z score) < 2. Finally, from the vaccine models listed in Table 8, vaccine 1 structure 2, vaccine 2 structure 3, vaccine 3 structure 2, vaccine 4 structure 5, vaccine 5 structure 3, and vaccine 6 structure 3, were selected based on optimum scores in the following four values: RMSD, Favored rotamers, Ramachandran favored and Rama-Z score.

Discontinuous B cell epitope
The discontinuous B cell epitopes of the six vaccine constructs and all of the existed chains of twelve NDM proteins retrieved from PDB were predicted by the ElliPro server (Table 9). Consequently, the discontinuous B cell epitopes from the vaccine constructs were compared with the twelve NDM proteins to identify discontinuous epitopes in the vaccine constructs that are identical or similar to the discontinuous epitopes in the NDM proteins. The analysis detected one sequence that was present only in vaccine construct 2 with high similarity to the DQRFGDLVFRQLAPN sequence in positions 42 to 57 of NDMs proteins. This common sequence had highscoring amino acids for discontinuous B cell epitopes (Fig. 4).

Molecular docking of multi-epitope vaccine with TLR receptors
Molecular docking of vaccine 2 construct with Toll-like receptors (TLRs) was performed by using the PatchDock server. The top ten vaccine 2 construct-TLRs docking results by this server was obtained. The Global Energy of the above-mentioned ten results was calculated by FireDock (Additional file 1: Table 1

Normal mode analysis (NMA)
Normal mode analysis (for more details regarding Normal mode analysis refer to the "13-Normal mode analysis" in the methods section) of the TLR1 vaccine complex (PDB ID: 6nih) is shown in Fig. 6, and the vaccine complex with TLR1-TLR2 (PDB ID: 2z80) and TLR4 (PDB ID: 2Z62) are also shown in the Additional file 3: Fig. 2.
Areas of protein that have deformability are peaked in Fig. 6a. The B-factor graph shows the divergence of the complex in the PDB file with NMA that is depicted in Fig. 6b. The eigenvalue graph is also shown in Fig. 6c and its numerical value is equal to 2.129046e-05. Furthermore, the variance graph corresponding to the normal mode is shown in Fig. 6d. The covariance map in Fig. 6e shows pairs of residues with correlated motions in red, uncorrelated motions in white and anti-correlated motions in blue. Finally, the darker grays in Fig. 6f show rigid regions (stiffer springs) in the elastic network.

Physiochemical evaluation of vaccine
Physicochemical properties obtained by the ProtParam tool. The simulation of the half-life in mammalian reticulocytes as in vitro environment was estimated to be 30 h, and in yeast and Escherichia coli, as in vivo environment, was estimated to be more than 20 and 10 h, respectively. Other calculated physicochemical properties included 21,842.04 Daltons for molecular weight, 9.08 for theoretical isoelectric point (pI), 75.74 for aliphatic index, − 0.195 for a grand average of hydropathicity (GRAVY), 32.32 for instability index. Furthermore, the number of residues with a negative charge (Asp + Glu) and positive charge (Arg + Lys) were determined 17 and 22, respectively. Finally, C975H1531N267O285S9 was obtained as the vaccine formula.

Immune simulation
The immune simulation was performed by the C-ImmSim server which predicted that two doses of vaccine injection can induce a strong immune response to NDMs.
Based on the evaluation of the immune response by the C-ImmSim server; a specific response regarding antibody titer after exposure to the specific antigen was much higher than the non-specific antigen (Fig. 7a). B-cell subpopulations including B-cell memory and Plasma B lymphocytes (PLB cells) also had a higher peak on the 100th day ( Fig. 7b-d). In addition, the T helper cell population demonstrated a higher peak in the specific antigen group compared to the nonspecific antigen on the 100th day after injection of specific antigen (Fig. 7e-f ).

Codon-optimization and cloning
The final vaccine was codon-optimized for E. coli strain K12 using the JCAT server. Finally, a DNA sequence of 612 nucleotides with CAI-Value of 1.0 and GC-Content of 51.14 was obtained, which is very close to the GC-Content of E. coli strain K12 with 50.73. In addition, the Restriction sites SacI (GAG CTC ) and NheI (GCT AGC ) were added to the N and C terminals of the final vaccine codon sequence, then this sequence was added into the pET-28a (+) vector by the SnapGene tool (Fig. 8).

Discussion
Due to the widespread use of antibiotics, antibiotic-resistant bacteria are on the rise, and many of previously used antibiotics are ineffective now. The discovery of new antibiotics has failed to keep pace with the emergence of antibiotic-resistant pathogens. We are moving forward to a stage where we are no longer able to cure a large number of bacterial infections. NDMs are one of the most important antibiotic resistance-conferring enzymes that present in a wide range of human pathogens and their prevalence is increasing worldwide [16,17]. Although vaccination is one of the best preventive approaches to fight infectious agents, no vaccine has yet been developed to prevent antibiotics resistance. We believe that producing a vaccine against bacterial resistance can provide a solution to reduce the burden of this global health crisis. Numerous studies have used immunoinformatics methods to design vaccines against various agents such as SARS CoV 2, Lassa virus (LASV), Type A influenza viruses, Saint Louis Encephalitis Virus, Echinococcus granulosus, and Sarcoptes scabiei [13,[18][19][20][21][22][23][24][25][26]. In this study, we aimed to design an in silico multi-epitope vaccine Fig. 8 In silico cloning simulation. Schematic representation of the codon-optimized multi-epitope vaccine sequence (Red) insertion into pET-28a (+) expression vector (black) between Restriction sites SacI and NheI that affects a wide range of different bacterial NDMs. We used bioinformatics tools to find epitopes for appropriate and specific stimulation of the immune system against NDMs. An efficient vaccine should be able to stimulate CD4 + T cells and B cells, so in the current study, the vaccine candidate was designed to contain linear and discontinuous epitopes to stimulate both CD4 + T cells and B cells. As NDMs are active enzymes, the natural form of these proteins cannot be used due to their adverse effects on β-lactam antibiotics. On the other hand, using denatured or partially degraded forms can damage the discontinuous epitopes and hide or damage continuous epitopes. Therefore, we used a multi-epitope vaccine construct without any adverse enzymatic activity while providing the main discontinuous and continuous epitopes of the original protein.
In this study, 2194 sequences of NDM variants expressed in different pathogens were used to obtain a conserved sequence. The purpose of obtaining a conserved sequence was to have common epitopes between the NDMs of all pathogenic bacteria to have a vaccine construct with a wide range of potential applications. Continuous B cell epitope and T cell CD4 + epitopes were predicted by several servers or several algorithms and also the antigenicity and immunogenicity of these epitopes were calculated. Finally, four epitopes were selected, one for continuous B cell and three for CD4 + T cell (HLA-DR, HLA-DP, and HLA-DQ alleles). Also, the epitope conservancy of these four epitopes was obtained among 2194 initial sequences which demonstrated a very high conservancy of selected epitopes. Furthermore, population coverage was obtained for CD4 + T cell epitopes that showed a high percentage of coverage for most of the world regions including Europe, North America and, Asia.
According to the permutation of the final four epitopes, 24 different peptides were obtained that had different toxicity, allergenicity, and antigenicity. Six out of 24 primary peptides which demonstrated very high antigenicity, low toxicity, and lack of allergenicity were selected. The third structure of the six peptides was modeled after adding CTB adjuvants with an EAAAK linker and creating six different vaccines, and the discontinuous B cell epitope of all six vaccines was compared to different NDM proteins. Finally, it was observed that only in the second vaccine, there was a sequence with high similarity to a conserved discontinuous B cell epitope in original NDM proteins.
To stimulate innate immunity and prevent tolerance, the vaccine must be able to bind to innate immune receptors. Toll-like receptors (TLRs) activate innate immune cells such as neutrophils, monocytes, and dendritic cells (DCs), that lead to responses including expression of cell surface molecules and the production of cytokines. These responses contribute to the activation of the acquired immune system cells and prevent tolerance. Herein, three extracellular TLRs including TLR1, TLR1-TLR2, and TLR4 have been evaluated in the docking analysis. These TLRs detect pathogen-associated molecular patterns (PAMPs) in the vaccine adjuvant in the extracellular space. The docking of the vaccine construct with these TLRs demonstrated that they can bind through strong interactions. The predicted vaccine-TLRs complex eigenvalue was 2.129046e-05 which implies the stability of the vaccine-TLRs complex. Amino acids with correlated motions and rigid regions (stiffer springs) are present in the covariance and elastic network graphs, respectively. The presence of these amino acids implies a stable vaccine-TLR complex.
The half-life of the vaccine was estimated in both in vitro and in vivo simulations using the ProtParam tool. For example, the stability in E. coli was predicted to be more than 10 h, which provides the time required to extract the vaccine protein during the induction process. In addition, the vaccine construct was shown to be stable.
The results of immune simulation using the C-IMMSIM server showed that 70 days after the second dose of the NDM vaccine a much higher immune response against the NDM antigen was induced compared to a nonspecific antigen control group. In other words, the titer of antibodies increases rapidly after exposure to the specific antigen, and also the population of memory B cells, PLB cells, and T helper (Th) cells increased.
Finally, the protein sequence was Codon optimized for E. coli K12. The GC-Content vaccine (51.14) was very similar to the GC-Content E. coli K12 (50.73), which results in high-level protein expression in E. coli. The simulation results using the SnapGene tool confirmed that the recombinant vaccine construct can be expressed following cloning in the bacterium.
In the current study, immunoinformatic methods were used to find epitopes to design a vaccine construct with high antigenicity to specifically stimulate the immune system to prevent the NDMs adverse effects. In the current study, epitopes from several shared regions of the enzyme (NDM) were used simultaneously. The selected epitopes are highly conserved as we demonstrated their presence in all members of a collection of available sequences of NDMs including a large number of isoforms and possible mutations. Therefore, the appearance of mutations in these regions is improbable. Furthermore, as we used multiple epitopes simultaneously, the immune system of the immunized organism produces multiple antibodies against the vaccine candidate. The multiple antibodies prevent resistance even in case of mutation occurs in some of the epitopes. Indeed, this vaccine candidate mimics the combination therapy strategy regarding its resistance to antibiotics resistance. The current study demonstrated that designing a vaccine based on a conserved sequence of NDMs can stimulate immune responses efficiently. However, it is crucial to test the vaccine construct in animal and human studies to verify the results. Furthermore, the immune stimulation is not enough to approve a vaccine. It should be considered that even if the vaccine stimulates the immune system effectively, it does not necessarily mean that it can be effective in the human model. Therefore, it is obligatory to assess the efficacy and efficiency of the vaccine in both animal models and human volunteers for a long duration of time.

Conclusion
Antibiotics resistance, especially through NDMs, is on the rise today and is referred to as a global health crisis. Although the efforts have so far focused only on developing new antibiotics, one of the best ways to avoid antibiotic resistance is preclusion through preventing excess antibiotic use or designing a resistance inhibiting vaccine. In this study, immunoinformatics strategies were used to design a vaccine against different variants of NDMs that could produce a favorable response in CD4 + T cells and B cells. The vaccine has also been shown to be able to bind to TLRs stably to ensure eliciting immune responses. The vaccine was also predicted to be stable and had a good half-life. The immune simulation showed that with two doses of vaccine injection a strong immune response to NDMs can be induced. Finally, the expression potential of the vaccine in the bacterial host was confirmed by simulation methods. The implementation of multiple highly conserved epitopes in the NDMs makes the vaccine candidate resistant to antibiotics resistance. However, in vivo and community-level studies are required to ensure the efficacy and efficiency of the vaccine construct.

Protein sequence retrieval
The complete amino acid sequences of NDM-1 to NDM-29 were retrieved from the NCBI database (https:// www. ncbi. nlm. nih. gov/ prote in) in May 2020. The search contained enzymes with the same name but a different structure that was expressed by non-pathogenic bacteria. These non-pathogenic bacteria, namely Kocuria rhizophila, Planctomycetes bacterium Pan265, Sphingomonadales bacterium, and Acinetobacter nosocomialis were removed from the search. Furthermore, the word "cloning vector" was also removed due to the existence of a cloning vector with the same name. Finally, 2201 sequences were obtained and were aligned using Clustal Omega software by multiple sequence alignment (MSA). Furthermore, seven sequences with a large gap at the beginning or the end of the sequence were considered as partial sequences and removed from the MSA results. The NCBI accession numbers of removed partial sequences were AQT38377.1, WP_063860857.1, QID22101.1, PIL86686.1, PIL65196.1, APY22234.1, and BBE58699.1. Finally, the 2194 remaining sequences were re-aligned to obtain the conserved regions by Clustal Omega software.

Secondary structure
Protein secondary structure including surface accessibility, Alpha helix, beta-strand, Coil, and disorder in an amino acid sequence was obtained using NetSurfP-2.0 Server (https:// servi ces. healt htech. dtu. dk/ servi ce. php? NetSu rfP-2.0). NetSurfP-2.0 is a sequence-based server meaning that it can predict local structural features of proteins from the initial sequence, and "uses an architecture composed of convolutional and long short-term memory neural networks trained on solved protein structures". The Net-SurfP-2.0 server calculates second structures including helix, strand, and coil, as well as relative solvent accessibility (RSA) and disorder.

Continuous B cell epitope
Sequential linear amino acid sequences are called continuous epitopes. Continuous B cell epitopes were retrieved from LBtope (https:// webs. iiitd. edu. in/ ragha va/ lbtope/), ABCpred (https:// webs. iiitd. edu. in/ ragha va/ abcpr ed/) and SVMTriP (http:// sysbio. unl. edu/ SVMTr iP/). These three Servers work based on different algorithms. The LBtope server contains three models in three different datasets, including Fixed, Variable, and Confirm models. The Fixed model predicts only linear B-cell epitopes of 20 residues, while the Variable-model, which was used in this study, predicts variable-length B-cell epitopes. Finally, Confirm model predicts the desired continuous B cell epitopes based on experimentally validated data by two or more studies [27]. The ABCpred server uses an artificial neural network based on machine-learning technique and predicts fixed lengths of 10, 12, 14, 16, and 20 residues [28], that in the current study epitopes with 16 residues were selected. SVMTriP is based on Support Vector Machine (SVM) which works by combining Tri-peptide similarity and Propensity scores (SVMTriP) [29]. This server can also predict epitopes with lengths of 10, 12, 14, 16, and 20 residues. Herein, epitopes with a length of 16 residues were selected. The optimum epitopes were considered as the amino acid sequences standing among the top five scoring epitopes in at least one of the servers, and, the related scores stand higher than the threshold in at least one of the other two servers.

T cell CD4 + epitope
The Allele Frequency Net Database (AFND) (http:// www. allel efreq uenci es. net/ hla. asp) was used to determine the frequent HLA-DP and HLA-DQ haplotypes and the HLA-DR alleles in Iran and European countries.
The T cell epitopes were predicted using the IEDB MHC II prediction tool (http:// tools. immun eepit ope. org/ mhcii/). Consensus method was used which combines NNalign (NetMHCII 2.2), SMM-align (NetMHCII 1.1), CombLib, and Sturniolo [30]. Fifteen amino-acids length epitopes were selected. Finally, the scores obtained by the Consensus method were retrieved and the stacked column chart was drawn for HLA-DR, HLA-DP, and HLA-DQ. The length of peptides binding to MHC class II is usually between 13 and 17 amino acids, although shorter or longer length peptides can also bind to the groove of the MHC class II [31,32], in the current study we adopted the server default length of 15 amino acids.

Prediction of antigenicity and immunogenicity of peptide fragments
The antigenicity of epitopes obtained from B cell and T cell prediction servers was calculated by the VaxiJen v3.0 server (https:// www. ddg-pharm fac. net/ vaxij en3/). Unlike most methods, the VaxiJen v3.0 server uses an alignment-free approach and is based on auto-cross covariance (ACC) transformation of protein sequences into uniform equallength vectors [33]. The T cell immunogenicity was predicted by the IEDB server (http:// tools. immun eepit ope. org/ CD4ep iscore/). The IEDB recommended algorithm is a combination of the 7-allele method and the immunogenicity method, which is more accurate comparing to single methods [34].

Epitope conservancy
The percentage of the protein sequences identity was calculated by the Epitope Conservancy tool (http:// tools. iedb. org/ conse rvancy/) to obtain identity or the degree of correspondence (similarity) of the final epitopes to the 2194 initial sequences. "Epitope linear sequence conservancy" analysis type was selected and other parameters were kept in default.

Population coverage
T cells are only able to detect the peptide-MHC complex and therefore they only respond to the antigens when the MHC molecule can bind to fragments of antigenrelated epitopes. The IEDB population coverage analysis tool (http:// tools. iedb. org/ popul ation/) was used to examine the population coverage of selected T cell epitopes in the world population as well as in Iran. The selected MHC alleles were the same alleles used in T cell epitope prediction servers.

Toxicity, allergenicity, and antigenicity of peptides
Four selected epitope fragments were permutated to obtain all possible states of the final peptide. These epitope fragments were joined by glycine-proline-rich GPGPG linkers. All permutation-derived peptides were evaluated for toxicity using the ToxDL server (http:// www. csbio. sjtu. edu. cn/ bioinf/ ToxDL/), Allergenicity was assessed by AllerTOP v.2.0 (http:// www. ddg-pharm fac. net/ Aller TOP/) and AllergenFP v.1.0 (http:// www. ddgpharm fac. net/ Aller genFP/) servers, and antigenicity was assessed by VaxiJen v3.0 server. ToxDL server is based on interpretable deep learning which is consists of two main components, one component is based on CNNs and the other based on multilayer perceptron with domain information [35]. Toxicity less than 0.05 was considered as a threshold. AllerTOP v.2.0 and AllergenFP v.1.0 based on auto-cross covariance (ACC) transformation of protein sequences into uniform equal-length vectors, that 2427 known allergens and 2427 non-allergens classified by the k-nearest neighbor algorithm (kNN, k = 1).

Adjuvant and vaccine construct
Adjuvant improves immune system responses and thus leads to a strong and long-lasting response and prevents tolerance. Cholera toxin subunit B (CTB) is a non-toxic component of cholera toxin that has a high tendency to bind to the monosialotetrahexosylganglioside (GM1) receptor on immune cells such as macrophages, dendritic cells, and B cells, thereby stimulating these cells. The CTB sequence was retrieved from the Uniprot database with an entry identifier as P01556. Finally, CTB was added by the EAAAK linker to the N-terminal of peptides.

Vaccine-structure modeling and validation
The three-dimensional (3D) structure of final vaccine candidates was modeled by the RaptorX server (http:// rapto rx. uchic ago. edu/ Conta ctMap/), which is a distance-based protein folding server powered by deep learning. RaptorX server provides root-meansquare deviation (RMSD) for each proposed structure in angstroms (Å), which is the measure of the average distance between the atoms for each proposed structure. The six vaccine constructs were loaded on the RaptorX server and resulted in five proposed models. Validation of the five proposed RaptorX models of each vaccine construct was performed using the MolProbity server (http:// molpr obity. bioch em. duke. edu/), which calculated the values of favored rotamers, Ramachandran favored and Rama-Z score, and also designed the Ramachandran plot. Consequently, the top model based on Mol-Probity was selected for each one of the vaccine constructs.

Discontinuous B cell epitope
Discontinuous B cell epitope Vaccines were evaluated by the ElliPro server (http:// tools. iedb. org/ ellip ro/). Also, protein data bank (PDB) IDs of twelve different NDM proteins were extracted from the PDB database (https:// www. rcsb. org/) and examined regarding Discontinuous epitopes by ElliPro server. Finally, the epitope sharing between vaccines and primary proteins was investigated and a heatmap of the final sequence was designed by R software [36].