Design, structure prediction and molecular dynamics simulation of a fusion construct containing malaria pre-erythrocytic vaccine candidate, PfCelTOS, and human interleukin 2 as adjuvant

Background Malaria infection is still widespread in some parts of the world and threatens the lives of millions of people every year. Vaccines, especially oral vaccines are considered to be effective in reducing the burden of malaria morbidity and mortality. By using recombinant technology, suitable oral hosts could serve as antigen delivering vehicles in developing oral vaccines. This study was aimed towards designing and computational analysis of a fusion protein consisting of Plasmodium falciparum cell-traversal protein for ookinetes and sporozoites (PfCelTOS) fused to human interleukin-2 (IL-2) and M cell-specific peptide ligand (Co1), as a step toward developing a vaccine candidate. Results To our best knowledge, the three dimensional (3D) structure of CelTOS is not reported in protein database. Therefore, we carried out computational modeling and simulation in the hope of understanding the properties and structure of PfCelTOS. Then we fused IL-2 to PfCelTOS by a flexible linker and did in silico analysis to confirm the proper folding of each domain in the designed fusion protein. In the last step, Co1 ligand was added to the confirmed fusion structure using a rigid linker and computational analysis was performed to evaluate the final fusion construct. One structure out of five predicted by I-TASSER for PfCelTOS and fusion constructs was selected based on the highest value for C-score. Molecular dynamics (MD) simulation analysis indicated that predicted structures are stable during the simulation. Ramchandran Plot analysis of PfCelTOS and fusion constructs before and after MD simulation also represented that most residues were fallen in favorable regions. Conclusion In silico study showed that Co1-(AEEEK)3- IL-2-(GGGGS)3-PfCelTOS construct has a constant structure and the selected linkers are effectively able to separate the domains. Therefore, data reported in this paper represents the first step toward developing of an oral vaccine candidate against malaria infection.


Background
Infectious diseases are the leading cause of death all over the world [1,2]. Malaria, also called plasmodium infection, is one of the most important human infectious diseases threatening the lives of millions of people every year. According to WHO malaria report, globally, about 3.2 billion people in 97 countries and territories are at risk of being infected with malaria, and 1.2 billion are at high risk [3]. Although the disease has been eradicated in most areas, it's still widespread in some regions [3]. The biggest challenges in controlling malaria disease are the emergence of anti-malarial drug resistance and insecticide resistance in parasite and mosquito, respectively [4][5][6]. Therefore, there is an urgent need to develop novel intervention strategies such as vaccines to reduce the burden of malaria morbidity and mortality. Malaria is most common in poor and developing regions of the world and has a strong negative effect on economic growth [7]. Oral immunogenicity has opened new avenues for the development of vaccines potentially effective in reducing the burden of diseases especially in low-income and developing countries. Oral vaccines are Low cost, easily administered (needlefree), in most cases capable of being stored and transported without refrigeration (Non-Refrigerated) and painless [8]. In contrast to injected vaccines, oral vaccines target mucosal surfaces, which cause stimulation of systemic as well as mucosal immune responses [9][10][11]. Considering the fact that most pathogens enter the body through mucosal surfaces, mucosal immune system provides the first line of defense against invading bacteria and viruses. Therefore, the simultaneous induction of systemic and mucosal immunity offers an ideal strategy to fight infectious diseases. Despite many advantages that oral vaccines have, only limited numbers of them have been approved for human commercial use and yet significant challenges must be overcome to make oral vaccines closer to reality [12]. One of the challenges is the complexity of mucosal immune system that must discriminate between harmless and dangerous antigens. One way to overcome this problem is to use adjuvants. Oral adjuvants offer exciting possibilities for the formulation of oral vaccines [13][14][15].
During the past decades considerable progress in recombinant DNA technology has led to the development of fusion proteins. Fusion proteins are novel biomolecules obtained by genetically fusing two or more genes that originally code for separate proteins. Thereby fusion proteins have distinct functions derived from each of their domains. Fusion proteins have wide applications in industry and pharmaceutical protein production.
The general objective of the current study was the design and computational analysis of a fusion protein consisting of Plasmodium falciparum cell-traversal protein for ookinetes and sporozoites (PfCelTOS) fused to human interleukin-2 (IL-2) and M cell-specific peptide ligand (Co1), as a step toward developing a candidate recombinant oral vaccine against malaria. CelTOS is a protein that mediates malarial invasion into its hosts (both vertebrate and mosquito) and is required for effective malaria infection [16][17][18]. On the other hand, CelTOS amino acid sequence is highly conserved among the Plasmodium species. CelTOS highly conserved amino acid sequence and vital role for malaria infection indicate its important function across all species [19]. Therefore, targeting the immune response to this highly conserved and crucial protein and interfering with its biological function could possibly result in protection against infection by heterologous species of Plasmodium [19].
In the designed construct, human interleukin 2 (IL-2) is thought to act as mucosal adjuvant. It should be considered that the type of desired immune response (cellular or humoral) will influence the choice of adjuvants for immunization regimens [20]. In case of malaria disease, parasites and infected red blood cells which activate dendritic cells through pattern recognition receptors (PRRs) are phagocytosed and their antigens are presented to T cells. PRR signaling causes the secretion of cytokines that initiate the inflammation that underlies malaria pathogenesis and direct TH1 cell to differentiate. TH1 cells help B cells differentiation and antibody secretion and also secrete IFN-γ, which activates macrophages that phagocytose opsonized parasites and infected red blood cells and subsequently kill them by NO-and O2-dependent pathways. Inflammation induces expression of endothelial adhesion molecules to which infected red blood cells bind. Finally the secretion of anti-inflammatory cytokines from macrophages and regulatory populations of T cells curtailed the Inflammation [21]. To the best of our understanding, adjuvants that increase the TH1 response are more appropriate in malaria vaccine development. Here we focus on human interleukin 2 (IL-2) as mucosal adjuvant. IL-2 has a central role in the cascade of events involved in immune responses and can function as a vaccine adjuvant to increase the immune response to vaccine immunogens [20,[22][23][24][25][26].
Specific advantages of IL-2 as adjuvant for malaria vaccine are as follow: 1) IL-2 stimulates T cells to secrete INF-γ. INF-γ is known as immune interferon due to activating macrophages and enhancing phagocytic activity which leads to activation of immune responses. On the other hand, INF-γ results in the expression of MHCI and MHCII molecules on the surface of infected cells and antigen presenting cells, respectively, therefore enhancing antigen presenting and immune system activity. In addition, INF-γ activates the secretion of IgG and its subclasses by effecting on B lymphocytes (The positive effect of INF-γ on immune responses to malaria parasite is discussed above).

2) Activation of T cells under the influence of IL-2
leads to the secretion of cytokines such as IL-10. Unlike many other interleukins, IL-10 inhibits macrophage activity and secretion of IL-1, IL-12 and TNF. IL-12 acts as inducer for INF-γ and is an effective factor in cellular and innate immune responses against intracellular microbes. Inhibition of IL-12 leads to inhibition of INF-γ production. In fact, IL-10 is well known as the inhibitor of immune responses, especially responses that are set up by macrophages, thereby immune responses are terminated. As mentioned above, the enhancement of inflammation and thereby endothelial adhesion of blood vessels is one of the outcomes of immune system responses to malaria parasite and malaria vaccine. In this case, repressive role of IL-10 prevents excessive inflammation.
Although reported studies have indicated the potential of cytokines as mucosal adjuvants, in order to increase the probability of vaccine candidate binding to intestinal cells, we adopted another strategy. Co1 is a peptide ligand that promotes the binding of ligand-fused antigen to human M-like cells and has also comparable levels of adjuvant activity to Cholera Toxin (CT) [27]. Conjugation of Co1 to the designed construct should result in delivery of antigen to M cells.
Since most of the biological functions of proteins depend upon their 3D structure, in designing multi domain recombinant proteins, proper folding, stability and interaction between domains must be considered. Fusion proteins are much more susceptible to misfolding than single-domain proteins due to the interaction between their different peptide domains. Therefore, in silico analysis of multi domain proteins is an indispensible stage in recombinant protein production projects.
Attempts to model the structure of proteins on the computer began about 30 years ago. Since that time our understanding of protein structure and dynamics has significantly increased and now Protein Data Bank (PDB) contains more than 10,000 high resolution protein structures. Valuable 3D models of a protein that has a clear sequence homology to known proteins can be predicted by homology modeling method. However, even in cases where there is no sequence homology, threading methods relate protein sequences to a library of known structures and predict the likely protein structure. The crystallographic structure of CelTOS has not been reported yet. In the present study, we carried out a molecular modeling study of PfCelTOS protein and designed fusion proteins using iterative threading assembly refinement (I-TASSER) to obtain their 3D structures. Then energy minimization and molecular dynamics (MD) simulation were run to refine the models. The simulations of PfCelTOS and fusion proteins were performed for long time duration of 10 ns and the obtained structures were analyzed to verify further.

Sequence retrieval and fusion protein construction
The amino acid sequences of PfCelTOS, human IL-2 and M cell-targeting ligand, Co1, were retrieved from UniProt (PfCelTOS id: Q53UB8; human IL-2 id: P60568). Amino acid sequences were used to design fusion protein construct. The designed construct consisted of PfCelTOS and human IL-2 mature parts linked together by a flexible linker, whereas Co1 is linked to this construct through a rigid (helical) linker.

Primary structural analysis
Expasy's Prot Param server [28] was used to study the physiochemical characters of PfCelTOS and designed fusion constructs such as theoretical isoelectric point (pI), molecular weight, molecular formula, total number of positive and negative residues, instability index [29], aliphatic index [30] and grand average hydropathicity (GRAVY) [31]. The instability index provides an estimate of a protein's stability in vitro. Proteins with instability index smaller than 40 are predicted as stable. A value above 40 indicates that the protein may be unstable.
The aliphatic index of a protein is regarded as a positive factor for the increase of thermostability of globular proteins and is mainly defined as the relative volume occupied by aliphatic side chains (alanine, valine, isoleucine and leucine). The GRAVY score is calculated as the sum of hydropathy values of all the amino acids, divided by the number of residues in the sequence.

Secondary structure prediction
Secondary structure elements of PfCelTOS and designed fusion proteins were determined using Phyre2 [32] and I-TASSER [33] online servers.

Three-Dimensional (3D) model prediction
The 3D structure of PfCelTOS and fusion proteins were modeled using I-TASSER online server. The raw amino acid sequences of PfCelTOS and fusion proteins were uploaded in FASTA format to I-TASSER server. Tertiary structures were predicted in PDB format. The results generated five top models for each entry which the one with the highest confidence score (c-score) represented the best model and was the structure selected for this study [33].

Structure Validation
Visualization was carried out by PyMOL version v1.7.2 software [34]. Traceable structural errors were proofed and global charges replaced by Vega ZZ version 3.0.5.12 software [35]. Ramachandran plots were calculated by RAMPAGE Ramachandran Plot Assessment [36]. Hydropathicity index were calculated by Marvin view version 14.7.7 software (http://www.chemaxon.com). Molecular dynamics were performed by GROMACS version 5.0.1 software [37,38]. PDB file format were analyzed for MD approaches-molecular dynamics algorithm including an initial cubic salvation with 3 point simple water model, followed by ionization and neutralization of simulation cube with Na and Cl ions. Geometry optimization was performed with constrain method. This procedure continued with two separate temperature and pressure unconstrained global dynamics. Final unconstrained dynamics performed with coupled temperature (300°K) and pressure (1 bar) for 10 ns. A schematic sketch of molecular dynamics procedure is presented in Table 1.

Analysis of simulation
After the simulation had finished, the output data were analyzed according to Energy (kinetic, potential & total energy), root-mean-square deviation (RMSD), Gyration radius and H bond formation/deformation. The following plot is represented as screening procedure of modeled structures (Fig. 1).

Results and discussion
To our knowledge, the crystallographic structure of Cel-TOS has not been reported in the protein database. Therefore, in silico analysis of CelTOS 3D structure could be of benefit in predicting its probable structure.
Computer prediction and simulation methods can be used to generate representative conformations of a molecule in equilibrium and provide a picture of the way in which a molecule changes from one configuration to another [39]. Therefore, we carried out computational modeling and simulation in the hope of understanding the properties and structure of PfCelTOS. Then we fused IL-2 to PfCelTOS by a flexible linker and did in silico analysis to confirm the proper folding of each domain in the designed fusion protein. In the last step, Co1 ligand was added to the confirmed fusion structure using a rigid linker and computational analysis was performed to evaluate the final fusion construct.

Primary structure analysis
The primary structure of PfCelTOS (accession number Q53UB8) is composed of 182 amino acid residues, encoded by 546 nucleotides. A fragment of 9 amino acids at the N-terminal is a signal peptide and the next 173 amino acids fragment is a mature peptide. Human IL-2 (accession number P60568) composed of 153 amino acid residues, encoded by 459 nucleotides. A fragment of 20 amino acids at the N-terminal was found to be a signal peptide and the remaining 133 amino acids fragment was a mature peptide [40]. The 173 amino acids fragment of PfCelTOS (mature part) and 133 amino acids fragment of IL-2 were used to design the fusion protein construct. The length of the designed construct is 321 amino acids, which within 1 to 133 is IL-2. Amino acids 133 to 148 (15 aa) are linker sequence, which is (GGGGS) 3 . Amino acids 149 to 321 (173 aa) are PfCel-TOS. Based on previous studies, N-terminal of IL-2 and C-terminal of PfCelTOS seems more important in their biological function than the other end. Therefore we designed the fusion construct in a way to make free the Nand C-terminals of IL-2 and PfCelTOS, respectively. The Co1 ligand peptide was added to the computationally confirmed structure of IL-2-(GGGGS) 3 -PfCelTOS. To present the Co1 ligand to human M-like cells better, a rigid linker ((AEEEK) 3 ) was used between Co1 ligand and designed fusion protein and the new construct was designed as follow: Co1-(AEEEK) 3 -IL-2-(GGGGS) 3 -PfCelTOS. Figure 2 shows schematic view of designed fusion protein construct. The primary structural features of PfCelTOS and two designed fusion constructs determined by Prot Param are summarized in Table 2. Prot Param is a tool which allows the computation of various physical and chemical parameters for a given protein. The calculated isoelectric points (pI) for PfCelTOS, IL-2-(GGGGS) 3 -PfCelTOS and Co1-(AEEEK) 3 -IL-2-(GGGGS) 3 -PfCel-TOS were 4.58, 5.11 and 5.32, respectively, suggesting

Secondary structure prediction
Among protein structure prediction software, the on-line servers seem more appropriate since they have access to all structures available in databases. Secondary structure of PfCelTOS and designed fusion constructs were predicted by Phyre2 and I-TASSER servers. phyre2 (Protein Homology/AnalogY Recognition Engine; pronounced as 'fire') is among the most popular methods for protein structure prediction having been cited over 1500 times and has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods. Both programs established that PfCelTOS structure is     Table 3. In case of the fusion constructs also, amount of α-helices exceeded β-sheets. The presence of only 5 % β-sheets is related IL-2 structure (a bend created by proline residue at position 47 0f IL-2).

Three-Dimensional (3D) structure prediction
Prediction of 3D structure was performed by I-TASSER online server using the best aligned template obtained by searching against Protein Data Bank database. I-TASSER (Iterative Threading ASSEmbly Refinement) server (as 'Zhang-Server') is a web-based service for the prediction of protein structures and functions. It's free for academic users and allows them to automatically generate high-quality models of 3D structure of proteins from their amino acid sequences. It detects structure templates from PDB by threading technique. I-TASSER is one of the most successful servers in the communitywide CASP (Critical Assessment of protein Structure Prediction) experiments and was ranked as the No 1 server for protein structure prediction in recent CASP7 CASP8, CASP9, CASP10, and CASP11 experiments. 3D structure prediction by I-TASSER generates five top models with C-scores ranging from −5 to 2. The one with the highest C-score represents the best model. The highest value for predicted models of PfCelTOS, IL-2-(GGGGS) 3 -PfCelTOS and Co1-(AEEEK) 3 -IL-2-(GGGGS) 3 -PfCelTOS were −2.94, −2.23 and −2.67, respectively. These models were selected for this study. The best selected model for PfCelTOS based on C-score represented that the 3D structure is consistent with secondary structure elements generated by Phyre2. Based on Phyre2 SS confidence, amino acids 3-15, 25-28, 32-47, 85-112 and 122-145 are α-helices with high confidence, which are also predicted as α-helix structures in 3D model of PfCelTOS generated by I-TASSER (shown in different colors in Fig. 3). It is important to note that further fusion protein constructs linked by other flexible linkers were designed but the one with (GGGGS) 3 linker resulted in the best 3D structure based on C-score. Predicted model for IL-2-(GGGGS) 3 -PfCelTOS showed a protein with two separate domains linked together through a small linker. Glycine-rich peptides confer flexibility, which allows the domains to move independently of one another while maintaining their individual three dimensional shapes [41]. 3D structure prediction of Co1-(AEEEK) 3 -IL-2-(GGGGS) 3 -PfCelTOS showed a protein with three domains linked together by two small linkers. (AEEEK) 3 rigid linker keeps domains apart and ensure a relatively rigid separation of the domain and carrier [41]. The pre-simulated 3D structures of PfCel-TOS and fusion proteins are shown in Fig. 3.

Structural model refinement
The structural refinement was carried out using molecular dynamics simulation as described in methods. Simulation acts as a bridge between theory and experiment. Indeed we test a theory by conducting a simulation using the same or computationally predicted models and provide a guess at the possible interactions between molecules [39,42]. Gromacs is a molecular dynamics application developed by Groningen University. Gromacs is able to work in the operating system Linux. The main ability of Gromacs is to perform MD simulation and minimization energy. However, Gromacs does not work alone and should interact with PyMOL and Grace. PyMOL is an application to visualize molecule structure and Grace is an application in Linux to display graphs. Both applications support analysis of MD simulation.
The MD simulation output data were analyzed based on Energy (kinetic, potential & total energy), RMSD [43], Gyration radius [44] and H bond formation/deformation [45]. The total energy should be constant throughout the simulation process, as it is the sum of kinetic and potential energy of the molecules. Kinetic energy should be constant or following a decreasing trend since the constant increasing of kinetic energy level reflects the general confusion of protein structure. Potential energy level should be increasing or constant

Structure validation of predicted models
To validate the modeled structures, Ramachandran maps were drawn before as well as after MD simulation and structures were analyzed using RAMPAGE Ramachandran Plot Assessment. The Ramchandran plot analysis obtained by RAMPAGE Ramachandran Plot Assessment is summarized in Table 4 and the plots are provided in Fig. 7. This provides an insight into the correctness of the modeled structures. As it's obvious Co1-(AEEEK) 3 -IL-2-(GGGGS) 3 -PfCelTOS has a well modeled structure in terms of RAMPAGE Ramachandran Plot Assessment.
Comparison of predicted structures before and after MD simulation 3D Structures of PfCelTOS and fusion constructs were also investigated after MD simulation (Fig. 8). Overlaying of 3D structure of PfCelTOS and designed fusion proteins before and after MD simulation showed that the predicted structures are constantly stable and the selected linkers are able to separate the domains of designed fusion constructs effectively. As it's obvious, rigid linker ensures the separation of domains and carrier and leads to the better presentation of construct to human M-like cells (Fig. 9). Structural comparison of predicted models of PfCelTOS and Co1-(AEEEK) 3− IL-2-(GGGGS) 3 -PfCelTOS, obtained after molecular dynamics simulations is shown in Fig. 10. This comparison indicates the proper folding of PfCelTOS in fusion construct.

Conclusion
The procedure of our study (as shown in Fig. 1) helps to rapid analysis of designed fusion constructs before initializing the recombinant fusion protein lab experiments.
As it is obvious, this procedure is fast, inexpensive (since all the servers are free) and simple, especially for inexpert users in this field. In silico study of Co1-(AEEEK) 3 -IL-2-(GGGGS) 3 -PfCelTOS structure through this procedure revealed that designed construction is suitable for fusion protein expression in edible host cells for oral delivery. Flexible linker separates PfCelTOS and IL-2 domains effectively to maintain their proper individual three dimensional structures and allows them to move independently of one another. On the other hand, rigid linker ensures the separation of fusion protein and carrier and lead to the better presentation of fusion construct to human M-like cells. Therefore, data reported in this paper represents the first step toward developing of an oral vaccine candidate against malaria infection.