In silico saturation mutagenesis and docking screening for the analysis of protein-ligand interaction: the Endothelial Protein C Receptor case study
BMC Bioinformatics volume 10, Article number: S3 (2009)
The design of mutants in protein functional regions, such as the ligand binding sites, is a powerful approach to recognize the determinants of specific protein activities in cellular pathways. For an exhaustive analysis of selected positions of protein structure large scale mutagenesis techniques are often employed, with laborious and time consuming experimental set-up. 'In silico' mutagenesis and screening simulation represents a valid alternative to laboratory methods to drive the 'in vivo' testing toward more focused objectives.
We present here a high performance computational procedure for large-scale mutant modelling and subsequent evaluation of the effect on ligand binding affinity. The mutagenesis was performed with a 'saturation' approach, where all 20 natural amino acids were tested in positions involved in ligand binding sites. Each modelled mutant was subjected to molecular docking simulation and stability evaluation. The simulated protein-ligand complexes were screened for their impairment of binding ability based on change of calculated Ki compared to the wild-type.
An example of application to the Endothelial Protein C Receptor residues involved in lipid binding is reported.
The computational pipeline presented in this work is a useful tool for the design of structurally stable mutants with altered affinity for ligand binding, considerably reducing the number of mutants to be experimentally tested. The saturation mutagenesis procedure does not require previous knowledge of functional role of the residues involved and allows extensive exploration of all possible substitutions and their pairwise combinations. Mutants are screened by docking simulation and stability evaluation followed by a rationally driven selection of those presenting the required characteristics. The method can be employed in molecular recognition studies and as a preliminary approach to select models for experimental testing.
Structure-based site-directed mutagenesis is a widely used approach to elucidate and modify specific aspects of protein function and to investigate the properties of protein-ligand interactions. Prediction of mutants with desirable properties is often obtained by rational design of a few specific position in the region of interest. This approach requires an in-depth knowledge of physicochemical, structural and functional properties of the protein, which may not be exhaustively provided by X-ray crystallography data. A different strategy is to apply the so-called scanning mutagenesis based on the systematic replacement of all residues involved in a specific function. Several examples of alanine scanning mutagenesis experiments  are reported in literature that illustrate its relevance for the study of molecular determinants of ligand binding  and protein function . Even more informative is the use of the saturation mutagenesis, where all 20 natural amino acids are tested in place of the wild type residue, in or near the functional site. 'In vitro' scanning saturation mutagenesis has been applied to several case studies and has proved to be useful to modulate protein properties such as substrate enzyme specificity  or to identify key residues for catalytic mechanisms .
A particular aspect that is amenable to investigation with random mutagenesis is the study of protein-ligand interaction. Several examples are reported in literature that show how this approach has contributed to the clarification of ligand interaction in G protein coupled receptors [6, 7], in nuclear receptors [8, 9] and in EphA3/ephrin-A5 , to cite a few examples.
However, the large number of mutants that are obtained from the systematic replacement of all the residues of a functional site makes the experimental procedure laborious and time consuming. This is especially true when dealing with large functional sites, because it requires the implementation of specific high-throughput methods for mutant screening or selection . 'In silico' simulation of mutagenesis and screening may represent a possible way around this problem. By exploiting the capabilities of high performance computers, it is possible to design and test a large numbers of variants with altered properties in order to restrict the number of models that are worth testing experimentally. Computational protein design methods have been employed in protein engineering to increase enzymes efficiency  or antibiotic resistance: for example Hayes et al.  applied a protein optimization strategy to increase the resistance of bacteria toward the antibiotic cefotaxime by optimizing TEM-1 β-lactamase. Moreover, computational design tools have been developed and have proved to be successful in the design of proteins with enhanced stability and specificity, and in engineering novel protein functions .
We present here a high performance semi-automated computational procedure suited to the study of key residues of protein-ligand interactions, which is able to dissect the contribution of different residues to ligand binding. The procedure performs a saturation mutagenesis of all residues involved in ligand binding, and the subsequent evaluation of the effect of amino acid substitutions on ligand affinity by docking simulation and on protein structure stability. An example of application to Endothelial Protein C Receptor is reported.
A semi automated pipeline was designed to manage mutant modelling, docking simulations and data analysis procedures by integrating public software through a number of in-house developed Perl scripts.
'In silico' side chain replacement and modelling were carried out with the routine 'mutate_model' of MODELLER 9v3 . 'Mutate_model' introduces a single point mutation in a user-specified residue and optimizes the mutant side chain conformation by conjugated gradient and by molecular dynamics simulation.
AutoDock4.0 [16, 17] was used for docking simulation handling both ligand and mutated side chain as flexible. AutoDockTools (ADT) (http://autodock.scripps.edu/resources/adt) facility supported the protein and ligand set up for docking. The automatic procedure, based on a Perl script, employs the following ADT routines:
'prepare_receptor4' to add polar hydrogens, assign Kollman charges and convert the protein PDB file in pdbqt (the input file for AutoDock4);
'prepare_flexdocking' and 'prepare_flexreceptor' are applied to protein and ligand to obtain the input files for flexible docking simulation;
'prepare_gpf' and 'prepare_dpf' generate the parameters file for AutoGrid (gpf) and for AutoDock (dpf), respectively.
Docking simulations were performed with Lamarkian Genetic Algorithm which is reported as the most efficient and reliable method of AutoDock .
The 'stability' routine of FoldX algorithm  was applied to evaluate the total energy of wild type structure and mutated models by the estimation of the interactions that contribute to the protein stability.
Docking simulations were performed on a shared Linux cluster of 280 Opteron AMD cores 275 at 2,2 GHz, provided by the Eurotech Group (Amaro, UD, Italy), dedicated to bioinformatics applications. The system is composed of 10 chassis of 6 diskless blades, each equipped with an Infiniband 4× network card and 8 GB of RAM. The cluster nodes are accessed through the OpenPBS queuing system.
Protein structure and preparation
The crystal structure of the Endothelial protein C receptor (PDB: 1LQV), in a complex with phosphatidylethanolamine (PTY), and the Gla domain of protein C solved by X-ray crystallography at 1.6 Å  was retrieved from the Protein Data Bank . Chain A with the corresponding ligand (PTY) molecule was used for this study. The Gla domain, crystallization water, Ca ions, N-acetyl-D-glucosamine (NAG) and 2-(acelytamino)-2-deoxi-A-D-glucopyranose (NDG) were removed from PDB file. Polar hydrogens were added at 7.4 pH, using InsightII (Accelrys, San Diego).
The ADT graphics interface was employed for manual preparation of the ligand to assign the flexibility, add polar hydrogens and load Gasteiger charges. 22 rotatable bonds were assigned to the PTY ligand.
A grid box of dimensions 76 Å × 60 Å × 92 Å was constructed around the binding site, based on the co-crystallized ligand. Ligand flexible docking simulations were performed with 100 runs and 2500000 energy evaluations per run, while the other parameters were set to the default.
The complete scheme of the computational procedure used in this work is depicted in Figure 1. A preliminary step is performed using LIGPLOT  to identify the residues involved in ligand interactions. The pipeline has a modular structure to improve its adaptability to different datasets. The backbone of the pipeline is a MySQL database where the results of each computational step are stored (blue arrows in Figure 1). The pipeline is structured in two main stages devoted to the modelling and analysis of single and double mutants respectively (blue and violet boxes).
The first stage begins by single mutant modelling starting from the receptor atomic coordinates and the list of residues identified in the preliminary step. This first step makes use of the MODELLER routine 'mutate_model' that performs the side chain substitution and refinement. A Perl script is used to iterate the substitution of each residue with all the amino acids to achieve the saturation mutagenesis. For each mutant, a PDB file is obtained and stored in the database.
In the following step, the pipeline prepares the input files for AutoGrid and AutoDock4.0 according to user specified parameters: PDB files are converted to PDBQT format, adding polar hydrogens and assigning Kollman charges. Substituted residues are set for flexible docking. The ligand is manually prepared as illustrated in the Methods section and given as input to the pipeline. In the next step, AutoDock simulations are submitted to be run in parallel on the Linux cluster. Docking results are parsed to extract, for each simulation, the most representative conformation. This corresponds to the best energy conformation included in an AutoDock cluster of at least 5 elements and showing a binding energy within one Kcal/mol compared to the first ranked solution. This criterion is arbitrarily set in order to obtain a solution from a well represented group of conformations with the best scored results. However, when these conditions are not satisfied (i.e. no populated clusters are present in the top score), the first ranked solution is accepted. For each selected conformation, the corresponding values of binding energy, cluster population, inhibition constant (Ki), root mean square deviation (RMSD) and atomic coordinates are stored in the database. In order to select mutants with a modified affinity for the ligand compared to the wild type, the output data are filtered taking the wild type Ki value as threshold.
Structural stability of selected mutants is evaluated using the FoldX program that computes, for each conformation, an estimated free energy values. The free energy difference between the mutant and the wild type (ΔΔG = ΔGmutants - ΔGwild type) is automatically calculated and stored in the database.
Pair combinations of single substitutions are analyzed in the second stage of the pipeline (violet box in Figure 1) to investigate the occurrence of synergistic effects in double mutants. Residues chosen for double mutants modelling are obtained from the results of Ki filtering in first stage screening. A Perl script manages the residues combination, removing the redundant pairs. The following pipeline steps, simulations, data analysis, filtering and storage, are performed as in the first stage.
The database is designed to collect and organize all the output data which remain available for further study. Specific queries can be applied in order to compare binding capabilities and stability of all the mutants, as illustrated in the example below.
We used the described strategy to study the ligand binding of the Endothelial Protein C Receptor (EPCR). EPCR is a transmembrane glycoprotein homologous to the major histocompatibility complex (MHC) class 1/CD1 family . EPCR plays a well-characterized role in the coagulation cascade being the receptor of the anticoagulant factor Protein C (PC) and participating to its activation mechanism. Being involved in protein C pathway, EPCR also takes part in the regulation of inflammation and apoptosis in endothelial cells . Similarly to CD1 molecules, it binds a phospholipid ligand in a deep hydrophobic groove composed of an eight-stranded β-sheet floor and three almost parallel α-helices that lie above the floor and form a crevice where the lipid is located (Figure 2) . The specific role of the lipid for the protein structure and function has not yet been clarified. Amino acid variants located in the lipid binding region of EPCR may provide useful information about the role of this interaction in protein function. However, mutations that impair or suppress the lipid binding have not been reported so far. In this work we predicted mutants that have reduced binding affinity for the ligand, while maintaining structural stability.
The structure of human EPCR, co-crystallized with PTY, exhibits a wide pattern of interactions that extend the length of binding groove. PTY makes contacts with 27 residues: four are located on α-helix H1, two on H2 and nine on H3, one on the loop between H1 and β-strand 5, and eleven residues on the strands of β-sheet floor (Figure 2). The binding is mostly dominated by hydrophobic interactions with the hydrocarbon chain of the ligand; a single H-bond is established between the polar group of phospholipid and ARG 156.
Employing the pipeline described above, we have carried out an 'in silico' saturation mutagenesis of the EPCR residues involved in lipid binding. Each modelled mutant was subjected to molecular docking simulation and stability evaluation. The protein-ligand complexes obtained by docking simulations were screened by their inhibition constant (Ki) values in order to estimate the impairment of ligand binding ability induced by the mutation. The overall procedure and a summary of the results are depicted in Figure 3. The 27 interacting residues, identified using LIGPLOT, were saturated with the 19 remaining amino acids, producing 513 mutant models. Then, wild type and mutants were analysed by docking simulations. The wild type re-docking correctly reproduced the X-ray ligand conformation with a RMSD of 1.2 Å, a binding energy of -12.46 Kcal/mol and a Ki value of 1 nM. Positions and substitutions critical for binding affinity were identified with a Ki-based filter. Referring to the wild type Ki value, we used a threshold of 1 μM (i.e. 3 order of magnitude higher than wild type) to discriminate between mutants with probable impaired binding ability. The pipeline selected 43 mutants with Ki value higher than the threshold, substituted in 16 out of the 27 starting positions. Pair combinations of the 43 substitutions were generated and 797 double mutants were modelled. They were analyzed by the same docking protocol of the single mutants and filtered with a Ki threshold of 10 μM. This value was set an order of magnitude higher than for single mutants, with the purpose of selecting those combinations with a binding ability worse than the single ones. With this filtering criterion, 212 double mutants were rescued. The large reduction in the number of double mutants (from 797 to 212), suggested that most residue combinations did not cooperate to increase the Ki value compared to the single substitutions.
The structural stability of single and double variants that passed the corresponding Ki thresholds was evaluated using the FoldX tool, directly integrated in the pipeline. Mutants ensuing a ΔΔG value higher than 2 kcal/mol were considered structurally unstable and removed from the selection . After the screening, the pipeline retrieved 12 single and 17 double mutants with a ΔΔG < 2 Kcal/mol. Binding energy and Ki values intervals are reported in Figure 3. The stability filter was passed only by mutants with a Ki value within 10 and 100 μM for single and double, respectively. This value was an order of magnitude higher than the respective Ki thresholds. Mutants with high Ki values, although potential candidates for binding affinity studies, were rejected by the ΔΔG filter and thus considered unstable.
At the end of this analysis, 6 key positions for the ligand binding in EPCR were identified (Figure 4): position 69 and 72 are localized in the H1 α-helix, positions 156, 164 and 168 are localized in H3 α-helix and one, the position 31 on the eight-stranded β-sheet floor. The most represented substitutions of these key positions involved hydrophobic residues, such as W, in positions 31, 69 and 156, and L, in positions 31 and 164.
Pair combinations in double mutants were evaluated in terms of synergistic effect where the Ki value of double mutant was larger than the sum of the corresponding single ones. To estimate the synergistic effect, we applied the following equation:
where Ki1 and Ki2 represent the Ki value of the first and the second single mutant, and Ki1+2 the Ki value of double mutant. Kid is the difference between the effect of the double mutation (Ki1+2) and the sum of the two corresponding single mutations (Ki1+Ki2). Results are shown in Figure 5. We found that 8 of the 17 double mutants displayed a partial additive effect as the Ki value was included between the more damaging mutation and the sum of the two corresponding single mutants (Ki1 < Ki1+2 < Ki1+Ki2) (on the left of the black vertical line, in Figure 5) . The remaining 9 double mutants had Ki values higher than the sum of the two corresponding single mutants (Ki1+2> Ki1+Ki2), to which we assigned a synergistic effect compared to the single mutation (on the right of the black vertical line, in Figure 5). We observed that the highest Ki values were associated with double mutants resulting from replacement of A31 with either I or L and F164 with a small polar/apolar side chain. Between them, the highest synergistic effect was shown by A31I/F164S mutant (Kid = 49), while the largest effect in terms of binding energy (-5.58 Kcal/mol) and Ki (82 μM) value was associated to A31I/F164N mutant.
In conclusion, using the described approach, we have selected a limited number of single and double mutants of EPCR for which we predict a reduced binding ability for the PTY ligand and a structural stability comparable to the wild type. The set of 12 selected single mutants and the 9 double mutants presenting synergistic properties can be proposed as rationally constructed candidates for experimental testing in cellular models.
Due to the computational load needed to accomplish all the simulations, a parallel infrastructure was employed (see Methods). In this test case, the simulation of the single and double mutant took on average 12 and 15 hours, respectively. The whole challenge consisted of 513 docking experiments for the single mutants plus the wild type simulation and 797 for the double mutants. Depending on the resource availability, it was possible to employ 80 CPUs on average. Therefore, in a period of about ten days we collected all the results covering about 2 years of computer activity.
We present a high performance semi-automated computational pipeline that integrates modelling and docking simulation algorithms for large scale mutant design. Additional analysis tools such as structure stability evaluation are also included. The pipeline is organized into two main blocks for dealing with single and double mutants that are interconnected so that selected results of first analysis are used for the second part of the study. The procedure is presently devoted to protein-ligand interaction studies, but can be adapted to investigations of protein-protein interaction. The mutagenesis method is based on a saturation approach and does not require previous knowledge of functional or structural role of involved residues: it can therefore be applied to explore new binding features. The implementation of the pipeline on a parallel computer infrastructure permits a greatly reduced computational time and makes it possible to test a very large number of mutants. For example, in the reported text case we have modelled and tested 1310 variants of EPCR with the purpose of identifying single or coupled amino acid mutations that significantly impair the binding of the lipid ligand. The whole procedure reduced the number of selected models to 21, which corresponds to about 2% of the initial dataset. Selected variants provide information about the probable functional and structural effect of the mutated residues and can be proposed as subjects for experimental tests. In addition, the large amount of data stored in the database after each step of computational analysis can be further retrieved for more detailed examination and study. This pipeline can be useful for protein-ligand binding studies to identify key residues for the ligand interaction, to evaluate the effect of the substitutions on different ligands and to design selective mutants.
Morrison KL, Weiss GA: Combinatorial alanine-scanning. Curr Opin Chem Biol 2001, 5: 302–307. 10.1016/S1367-5931(00)00206-4
Williams PF, Mynarcik DC, Yu GQ, Whittaker J: Mapping of NH2-terminal ligand binding site of the insulin receptor by alanine scanning mutagenesis. J Biol Chem 1995, 270(7):3012–3016. 10.1074/jbc.270.7.3012
Hulme EC, Bee MS, Goodwin JA: Phenotypic classification of mutants: a tool for understanding ligand binding and activation of muscarinic acetylcholine receptors. Biochem Soc Trans 2007, 35(Pt 4):742–745.
Geddie ML, Matsumura I: Rapid Evolution of beta-Glucuronidase Specificity by Saturation Mutagenesis of an Active Site Loop. J Biol Chem 2004, 279(25):26462–26468. 10.1074/jbc.M401447200
Yep A, Kenyon GL, McLeish MJ: Saturation mutagenesis of putative catalytic residues of benzoylformate decarboxylase provides a challenge to the accepted mechanism. Proc Natl Acad Sci USA 2008, 105(15):5733–5738. 10.1073/pnas.0709657105
Celic A, Connelly SM, Martin NP, Dumont ME: Intensive mutational analysis of G protein-coupled receptors in yeast. Methods Mol Biol 2004, 237: 105–120.
Hagemann IS, Narzinski KD, Floyd DH, Baranski TJ: Random mutagenesis of the complement factor 5a (C5a) receptor N terminus provides a structural constraint for C5a docking. J Biol Chem 2006, 281: 36783–92. 10.1074/jbc.M607686200
Lim YP, Huang JD: Pregnane X receptor polymorphism affects CYP3A4 induction via a ligand-dependent interaction with steroid receptor coactivator-1. Pharmacogenet Genomics 2007, 17(5):369–382. 10.1097/FPC.0b013e32803e40d7
Dubbink HJ, Hersmus R, Verma CS, Korput HA, Berrevoets CA, van Tol J, Ziel-van der Made AC, Brinkmann AO, Pike AC, Trapman J: Distinct recognition modes of FXXLF and LXXLL motifs by the androgen receptor. Mol Endocrinol 2004, 18(9):2132–2150. 10.1210/me.2003-0375
Smith FM, Vearing C, Lackmann M, Treutlein H, Himanen J, Chen K, Saul A, Nikolov D, Boyd AW: Dissecting the EphA3/Ephrin-A5 interactions using a novel functional mutagenesis screen. J Biol Chem 2004, 279(10):9522–31. 10.1074/jbc.M309326200
Hancock SM, Vaughan MD, Withers SG: Engineering of glycosidases and glycosyltransferases. Curr Opin Chem Biol 2006, 10(5):509–519. 10.1016/j.cbpa.2006.07.015
Tokunaga H, Arakawa T, Tokunaga M: Engineering of halophilic enzymes: two acidic amino acid residues at the carboxy-terminal region confer halophilic characteristics to Halomonas and Pseudomonas nucleoside diphosphate kinases. Protein Sci 2008, 17(9):1603–1610. 10.1110/ps.035725.108
Hayes RJ, Bentzien J, Ary ML, Hwang MY, Jacinto JM, Vielmetter J, Kundu A, Dahiyat BI: Combining computational and experimental screening for rapid optimization of protein properties. Proc Natl Acad Sci USA 2002, 99(25):15926. 10.1073/pnas.212627499
Dahiyat BI, Mayo SL: Protein design automation. Protein Sci 1996, 5(5):895–903.
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A: Comparative Protein Structure Modeling With MODELLER. Curr Protoc Bioinformatics 2006., (Unit 5.6):
Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, Belew RK, Olson AJ: Automated Docking Using a Lamarckian Genetic Algorithm and Empirical Binding Free Energy Function. J Comput Chem 1998, 19: 1639–1662. 10.1002/(SICI)1096-987X(19981115)19:14<1639::AID-JCC10>3.0.CO;2-B
Huey R, Morris GM, Olson AJ, Goodsell DS: A Semiempirical Free Energy Force Field with Charge-Based Desolvation. J Comput Chem 2007, 28(6):1145–1152. 10.1002/jcc.20634
Guerois R, Nielsen JE, Serrano L: Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 2002, 320: 369–387. 10.1016/S0022-2836(02)00442-4
Oganesyan V, Oganesyan N, Terzyan S, Qu D, Dauter Z, Esmon NL, Esmon CT: The crystal structure of the endothelial protein C receptor and a bound phospholipid. J Biol Chem 2002, 277: 24851–4. 10.1074/jbc.C200163200
Bernstein FC, Koetzle TF, Williams GJB, Meyer EF Jr, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M: The Protein Data Bank: A Computer-Based Archival File for Macromolecular Structures. Eur J Biochem 1977, 80: 319–324. 10.1111/j.1432-1033.1977.tb11885.x
Wallace AC, Laskowski RA, Thornton JM: LIGPLOT: a program to generate schematic diagrams of protein-ligand interactions. Protein Eng 1995, 8: 127–134. 10.1093/protein/8.2.127
Esmon CT: The endothelial cell protein C receptor. Thromb Haemost 2000, 83: 639–643.
Van de Wouwer M, Collen D, Conway EM: Thrombomodulin-protein C-EPCR system: integrated to regulate coagulation and inflammation. Arterioscler Thromb Vasc Biol 2004, 24: 1374–1383. 10.1161/01.ATV.0000134298.25489.92
Tokuriki N, Stricher F, Schymkowitz J, Serrano L, Tawfik DS: The Stability Effects of Protein Mutations Appear to be Universally Distributed. J Mol Biol 2007, 369: 1318–1332. 10.1016/j.jmb.2007.03.069
Jang DS, Cha HJ, Cha SS, Hong BH, Ha NC, Lee JY, Oh BH, Lee HS, Choi KY: Structural double-mutant cycle analysis of a hydrogen bond network in ketosteroid isomerase from Pseudomonas putida biotype B. Biochem J 2004, 382: 967–973. 10.1042/BJ20031871
This work has been supported by the FIRB-MUR projects LITBIO (RBLA0332RH), Italian Bioinformatics Network (RBPR05ZK2Z) and Bioinformatics analysis applied to Populations Genetics (RBIN064YAT_003), the Interdepartmental CNR-BIOINFORMATICS network and EGEE-III Enabling Grids for E-sciencE (INFSO-RI-222667).
The authors gratefully thank Dr. E M Faioni for fruitful discussion and suggestions and Chiara Bishop for reviewing the manuscript.
This article has been published as part of BMC Bioinformatics Volume 10 Supplement 12, 2009: Bioinformatics Methods for Biomedical Complex System Applications. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/10?issue=S12.
The authors declare that they have no competing interests.
FC developed the core of the pipeline, tested all the software for modelling and docking simulations, prepared the biological dataset, analyzed the output results and drafted the manuscript. PD conceived and designed the study and critically revised the results. IM contributed to developing the pipeline, designing the database and enabling the parallel computation of the docking experiments on the cluster infrastructure. LM supervised the project and made available the computational resources. ER coordinated the study and critically revised the results and manuscript. All authors read and approved the final manuscript.
About this article
Cite this article
Chiappori, F., D'Ursi, P., Merelli, I. et al. In silico saturation mutagenesis and docking screening for the analysis of protein-ligand interaction: the Endothelial Protein C Receptor case study. BMC Bioinformatics 10 (Suppl 12), S3 (2009). https://doi.org/10.1186/1471-2105-10-S12-S3