In silico saturation mutagenesis and docking screening for the analysis of protein-ligand interaction: the Endothelial Protein C Receptor case study
© Chiappori et al; licensee BioMed Central Ltd. 2009
Published: 15 October 2009
The design of mutants in protein functional regions, such as the ligand binding sites, is a powerful approach to recognize the determinants of specific protein activities in cellular pathways. For an exhaustive analysis of selected positions of protein structure large scale mutagenesis techniques are often employed, with laborious and time consuming experimental set-up. 'In silico' mutagenesis and screening simulation represents a valid alternative to laboratory methods to drive the 'in vivo' testing toward more focused objectives.
We present here a high performance computational procedure for large-scale mutant modelling and subsequent evaluation of the effect on ligand binding affinity. The mutagenesis was performed with a 'saturation' approach, where all 20 natural amino acids were tested in positions involved in ligand binding sites. Each modelled mutant was subjected to molecular docking simulation and stability evaluation. The simulated protein-ligand complexes were screened for their impairment of binding ability based on change of calculated Ki compared to the wild-type.
An example of application to the Endothelial Protein C Receptor residues involved in lipid binding is reported.
The computational pipeline presented in this work is a useful tool for the design of structurally stable mutants with altered affinity for ligand binding, considerably reducing the number of mutants to be experimentally tested. The saturation mutagenesis procedure does not require previous knowledge of functional role of the residues involved and allows extensive exploration of all possible substitutions and their pairwise combinations. Mutants are screened by docking simulation and stability evaluation followed by a rationally driven selection of those presenting the required characteristics. The method can be employed in molecular recognition studies and as a preliminary approach to select models for experimental testing.
Structure-based site-directed mutagenesis is a widely used approach to elucidate and modify specific aspects of protein function and to investigate the properties of protein-ligand interactions. Prediction of mutants with desirable properties is often obtained by rational design of a few specific position in the region of interest. This approach requires an in-depth knowledge of physicochemical, structural and functional properties of the protein, which may not be exhaustively provided by X-ray crystallography data. A different strategy is to apply the so-called scanning mutagenesis based on the systematic replacement of all residues involved in a specific function. Several examples of alanine scanning mutagenesis experiments  are reported in literature that illustrate its relevance for the study of molecular determinants of ligand binding  and protein function . Even more informative is the use of the saturation mutagenesis, where all 20 natural amino acids are tested in place of the wild type residue, in or near the functional site. 'In vitro' scanning saturation mutagenesis has been applied to several case studies and has proved to be useful to modulate protein properties such as substrate enzyme specificity  or to identify key residues for catalytic mechanisms .
A particular aspect that is amenable to investigation with random mutagenesis is the study of protein-ligand interaction. Several examples are reported in literature that show how this approach has contributed to the clarification of ligand interaction in G protein coupled receptors [6, 7], in nuclear receptors [8, 9] and in EphA3/ephrin-A5 , to cite a few examples.
However, the large number of mutants that are obtained from the systematic replacement of all the residues of a functional site makes the experimental procedure laborious and time consuming. This is especially true when dealing with large functional sites, because it requires the implementation of specific high-throughput methods for mutant screening or selection . 'In silico' simulation of mutagenesis and screening may represent a possible way around this problem. By exploiting the capabilities of high performance computers, it is possible to design and test a large numbers of variants with altered properties in order to restrict the number of models that are worth testing experimentally. Computational protein design methods have been employed in protein engineering to increase enzymes efficiency  or antibiotic resistance: for example Hayes et al.  applied a protein optimization strategy to increase the resistance of bacteria toward the antibiotic cefotaxime by optimizing TEM-1 β-lactamase. Moreover, computational design tools have been developed and have proved to be successful in the design of proteins with enhanced stability and specificity, and in engineering novel protein functions .
We present here a high performance semi-automated computational procedure suited to the study of key residues of protein-ligand interactions, which is able to dissect the contribution of different residues to ligand binding. The procedure performs a saturation mutagenesis of all residues involved in ligand binding, and the subsequent evaluation of the effect of amino acid substitutions on ligand affinity by docking simulation and on protein structure stability. An example of application to Endothelial Protein C Receptor is reported.
A semi automated pipeline was designed to manage mutant modelling, docking simulations and data analysis procedures by integrating public software through a number of in-house developed Perl scripts.
'In silico' side chain replacement and modelling were carried out with the routine 'mutate_model' of MODELLER 9v3 . 'Mutate_model' introduces a single point mutation in a user-specified residue and optimizes the mutant side chain conformation by conjugated gradient and by molecular dynamics simulation.
'prepare_receptor4' to add polar hydrogens, assign Kollman charges and convert the protein PDB file in pdbqt (the input file for AutoDock4);
'prepare_flexdocking' and 'prepare_flexreceptor' are applied to protein and ligand to obtain the input files for flexible docking simulation;
'prepare_gpf' and 'prepare_dpf' generate the parameters file for AutoGrid (gpf) and for AutoDock (dpf), respectively.
Docking simulations were performed with Lamarkian Genetic Algorithm which is reported as the most efficient and reliable method of AutoDock .
The 'stability' routine of FoldX algorithm  was applied to evaluate the total energy of wild type structure and mutated models by the estimation of the interactions that contribute to the protein stability.
Docking simulations were performed on a shared Linux cluster of 280 Opteron AMD cores 275 at 2,2 GHz, provided by the Eurotech Group (Amaro, UD, Italy), dedicated to bioinformatics applications. The system is composed of 10 chassis of 6 diskless blades, each equipped with an Infiniband 4× network card and 8 GB of RAM. The cluster nodes are accessed through the OpenPBS queuing system.
Protein structure and preparation
The crystal structure of the Endothelial protein C receptor (PDB: 1LQV), in a complex with phosphatidylethanolamine (PTY), and the Gla domain of protein C solved by X-ray crystallography at 1.6 Å  was retrieved from the Protein Data Bank . Chain A with the corresponding ligand (PTY) molecule was used for this study. The Gla domain, crystallization water, Ca ions, N-acetyl-D-glucosamine (NAG) and 2-(acelytamino)-2-deoxi-A-D-glucopyranose (NDG) were removed from PDB file. Polar hydrogens were added at 7.4 pH, using InsightII (Accelrys, San Diego).
The ADT graphics interface was employed for manual preparation of the ligand to assign the flexibility, add polar hydrogens and load Gasteiger charges. 22 rotatable bonds were assigned to the PTY ligand.
A grid box of dimensions 76 Å × 60 Å × 92 Å was constructed around the binding site, based on the co-crystallized ligand. Ligand flexible docking simulations were performed with 100 runs and 2500000 energy evaluations per run, while the other parameters were set to the default.
The first stage begins by single mutant modelling starting from the receptor atomic coordinates and the list of residues identified in the preliminary step. This first step makes use of the MODELLER routine 'mutate_model' that performs the side chain substitution and refinement. A Perl script is used to iterate the substitution of each residue with all the amino acids to achieve the saturation mutagenesis. For each mutant, a PDB file is obtained and stored in the database.
In the following step, the pipeline prepares the input files for AutoGrid and AutoDock4.0 according to user specified parameters: PDB files are converted to PDBQT format, adding polar hydrogens and assigning Kollman charges. Substituted residues are set for flexible docking. The ligand is manually prepared as illustrated in the Methods section and given as input to the pipeline. In the next step, AutoDock simulations are submitted to be run in parallel on the Linux cluster. Docking results are parsed to extract, for each simulation, the most representative conformation. This corresponds to the best energy conformation included in an AutoDock cluster of at least 5 elements and showing a binding energy within one Kcal/mol compared to the first ranked solution. This criterion is arbitrarily set in order to obtain a solution from a well represented group of conformations with the best scored results. However, when these conditions are not satisfied (i.e. no populated clusters are present in the top score), the first ranked solution is accepted. For each selected conformation, the corresponding values of binding energy, cluster population, inhibition constant (Ki), root mean square deviation (RMSD) and atomic coordinates are stored in the database. In order to select mutants with a modified affinity for the ligand compared to the wild type, the output data are filtered taking the wild type Ki value as threshold.
Structural stability of selected mutants is evaluated using the FoldX program that computes, for each conformation, an estimated free energy values. The free energy difference between the mutant and the wild type (ΔΔG = ΔGmutants - ΔGwild type) is automatically calculated and stored in the database.
Pair combinations of single substitutions are analyzed in the second stage of the pipeline (violet box in Figure 1) to investigate the occurrence of synergistic effects in double mutants. Residues chosen for double mutants modelling are obtained from the results of Ki filtering in first stage screening. A Perl script manages the residues combination, removing the redundant pairs. The following pipeline steps, simulations, data analysis, filtering and storage, are performed as in the first stage.
The database is designed to collect and organize all the output data which remain available for further study. Specific queries can be applied in order to compare binding capabilities and stability of all the mutants, as illustrated in the example below.
The structure of human EPCR, co-crystallized with PTY, exhibits a wide pattern of interactions that extend the length of binding groove. PTY makes contacts with 27 residues: four are located on α-helix H1, two on H2 and nine on H3, one on the loop between H1 and β-strand 5, and eleven residues on the strands of β-sheet floor (Figure 2). The binding is mostly dominated by hydrophobic interactions with the hydrocarbon chain of the ligand; a single H-bond is established between the polar group of phospholipid and ARG 156.
The structural stability of single and double variants that passed the corresponding Ki thresholds was evaluated using the FoldX tool, directly integrated in the pipeline. Mutants ensuing a ΔΔG value higher than 2 kcal/mol were considered structurally unstable and removed from the selection . After the screening, the pipeline retrieved 12 single and 17 double mutants with a ΔΔG < 2 Kcal/mol. Binding energy and Ki values intervals are reported in Figure 3. The stability filter was passed only by mutants with a Ki value within 10 and 100 μM for single and double, respectively. This value was an order of magnitude higher than the respective Ki thresholds. Mutants with high Ki values, although potential candidates for binding affinity studies, were rejected by the ΔΔG filter and thus considered unstable.
In conclusion, using the described approach, we have selected a limited number of single and double mutants of EPCR for which we predict a reduced binding ability for the PTY ligand and a structural stability comparable to the wild type. The set of 12 selected single mutants and the 9 double mutants presenting synergistic properties can be proposed as rationally constructed candidates for experimental testing in cellular models.
Due to the computational load needed to accomplish all the simulations, a parallel infrastructure was employed (see Methods). In this test case, the simulation of the single and double mutant took on average 12 and 15 hours, respectively. The whole challenge consisted of 513 docking experiments for the single mutants plus the wild type simulation and 797 for the double mutants. Depending on the resource availability, it was possible to employ 80 CPUs on average. Therefore, in a period of about ten days we collected all the results covering about 2 years of computer activity.
We present a high performance semi-automated computational pipeline that integrates modelling and docking simulation algorithms for large scale mutant design. Additional analysis tools such as structure stability evaluation are also included. The pipeline is organized into two main blocks for dealing with single and double mutants that are interconnected so that selected results of first analysis are used for the second part of the study. The procedure is presently devoted to protein-ligand interaction studies, but can be adapted to investigations of protein-protein interaction. The mutagenesis method is based on a saturation approach and does not require previous knowledge of functional or structural role of involved residues: it can therefore be applied to explore new binding features. The implementation of the pipeline on a parallel computer infrastructure permits a greatly reduced computational time and makes it possible to test a very large number of mutants. For example, in the reported text case we have modelled and tested 1310 variants of EPCR with the purpose of identifying single or coupled amino acid mutations that significantly impair the binding of the lipid ligand. The whole procedure reduced the number of selected models to 21, which corresponds to about 2% of the initial dataset. Selected variants provide information about the probable functional and structural effect of the mutated residues and can be proposed as subjects for experimental tests. In addition, the large amount of data stored in the database after each step of computational analysis can be further retrieved for more detailed examination and study. This pipeline can be useful for protein-ligand binding studies to identify key residues for the ligand interaction, to evaluate the effect of the substitutions on different ligands and to design selective mutants.
This work has been supported by the FIRB-MUR projects LITBIO (RBLA0332RH), Italian Bioinformatics Network (RBPR05ZK2Z) and Bioinformatics analysis applied to Populations Genetics (RBIN064YAT_003), the Interdepartmental CNR-BIOINFORMATICS network and EGEE-III Enabling Grids for E-sciencE (INFSO-RI-222667).
The authors gratefully thank Dr. E M Faioni for fruitful discussion and suggestions and Chiara Bishop for reviewing the manuscript.
This article has been published as part of BMC Bioinformatics Volume 10 Supplement 12, 2009: Bioinformatics Methods for Biomedical Complex System Applications. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/10?issue=S12.
- Morrison KL, Weiss GA: Combinatorial alanine-scanning. Curr Opin Chem Biol 2001, 5: 302–307. 10.1016/S1367-5931(00)00206-4View ArticlePubMedGoogle Scholar
- Williams PF, Mynarcik DC, Yu GQ, Whittaker J: Mapping of NH2-terminal ligand binding site of the insulin receptor by alanine scanning mutagenesis. J Biol Chem 1995, 270(7):3012–3016. 10.1074/jbc.270.7.3012View ArticlePubMedGoogle Scholar
- Hulme EC, Bee MS, Goodwin JA: Phenotypic classification of mutants: a tool for understanding ligand binding and activation of muscarinic acetylcholine receptors. Biochem Soc Trans 2007, 35(Pt 4):742–745.View ArticlePubMedGoogle Scholar
- Geddie ML, Matsumura I: Rapid Evolution of beta-Glucuronidase Specificity by Saturation Mutagenesis of an Active Site Loop. J Biol Chem 2004, 279(25):26462–26468. 10.1074/jbc.M401447200View ArticlePubMedGoogle Scholar
- Yep A, Kenyon GL, McLeish MJ: Saturation mutagenesis of putative catalytic residues of benzoylformate decarboxylase provides a challenge to the accepted mechanism. Proc Natl Acad Sci USA 2008, 105(15):5733–5738. 10.1073/pnas.0709657105PubMed CentralView ArticlePubMedGoogle Scholar
- Celic A, Connelly SM, Martin NP, Dumont ME: Intensive mutational analysis of G protein-coupled receptors in yeast. Methods Mol Biol 2004, 237: 105–120.PubMedGoogle Scholar
- Hagemann IS, Narzinski KD, Floyd DH, Baranski TJ: Random mutagenesis of the complement factor 5a (C5a) receptor N terminus provides a structural constraint for C5a docking. J Biol Chem 2006, 281: 36783–92. 10.1074/jbc.M607686200View ArticlePubMedGoogle Scholar
- Lim YP, Huang JD: Pregnane X receptor polymorphism affects CYP3A4 induction via a ligand-dependent interaction with steroid receptor coactivator-1. Pharmacogenet Genomics 2007, 17(5):369–382. 10.1097/FPC.0b013e32803e40d7View ArticlePubMedGoogle Scholar
- Dubbink HJ, Hersmus R, Verma CS, Korput HA, Berrevoets CA, van Tol J, Ziel-van der Made AC, Brinkmann AO, Pike AC, Trapman J: Distinct recognition modes of FXXLF and LXXLL motifs by the androgen receptor. Mol Endocrinol 2004, 18(9):2132–2150. 10.1210/me.2003-0375View ArticlePubMedGoogle Scholar
- Smith FM, Vearing C, Lackmann M, Treutlein H, Himanen J, Chen K, Saul A, Nikolov D, Boyd AW: Dissecting the EphA3/Ephrin-A5 interactions using a novel functional mutagenesis screen. J Biol Chem 2004, 279(10):9522–31. 10.1074/jbc.M309326200View ArticlePubMedGoogle Scholar
- Hancock SM, Vaughan MD, Withers SG: Engineering of glycosidases and glycosyltransferases. Curr Opin Chem Biol 2006, 10(5):509–519. 10.1016/j.cbpa.2006.07.015View ArticlePubMedGoogle Scholar
- Tokunaga H, Arakawa T, Tokunaga M: Engineering of halophilic enzymes: two acidic amino acid residues at the carboxy-terminal region confer halophilic characteristics to Halomonas and Pseudomonas nucleoside diphosphate kinases. Protein Sci 2008, 17(9):1603–1610. 10.1110/ps.035725.108PubMed CentralView ArticlePubMedGoogle Scholar
- Hayes RJ, Bentzien J, Ary ML, Hwang MY, Jacinto JM, Vielmetter J, Kundu A, Dahiyat BI: Combining computational and experimental screening for rapid optimization of protein properties. Proc Natl Acad Sci USA 2002, 99(25):15926. 10.1073/pnas.212627499PubMed CentralView ArticlePubMedGoogle Scholar
- Dahiyat BI, Mayo SL: Protein design automation. Protein Sci 1996, 5(5):895–903.PubMed CentralView ArticlePubMedGoogle Scholar
- Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A: Comparative Protein Structure Modeling With MODELLER. Curr Protoc Bioinformatics 2006., (Unit 5.6):Google Scholar
- Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, Belew RK, Olson AJ: Automated Docking Using a Lamarckian Genetic Algorithm and Empirical Binding Free Energy Function. J Comput Chem 1998, 19: 1639–1662. 10.1002/(SICI)1096-987X(19981115)19:14<1639::AID-JCC10>3.0.CO;2-BView ArticleGoogle Scholar
- Huey R, Morris GM, Olson AJ, Goodsell DS: A Semiempirical Free Energy Force Field with Charge-Based Desolvation. J Comput Chem 2007, 28(6):1145–1152. 10.1002/jcc.20634View ArticlePubMedGoogle Scholar
- Guerois R, Nielsen JE, Serrano L: Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 2002, 320: 369–387. 10.1016/S0022-2836(02)00442-4View ArticlePubMedGoogle Scholar
- Oganesyan V, Oganesyan N, Terzyan S, Qu D, Dauter Z, Esmon NL, Esmon CT: The crystal structure of the endothelial protein C receptor and a bound phospholipid. J Biol Chem 2002, 277: 24851–4. 10.1074/jbc.C200163200View ArticlePubMedGoogle Scholar
- Bernstein FC, Koetzle TF, Williams GJB, Meyer EF Jr, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M: The Protein Data Bank: A Computer-Based Archival File for Macromolecular Structures. Eur J Biochem 1977, 80: 319–324. 10.1111/j.1432-1033.1977.tb11885.xView ArticlePubMedGoogle Scholar
- Wallace AC, Laskowski RA, Thornton JM: LIGPLOT: a program to generate schematic diagrams of protein-ligand interactions. Protein Eng 1995, 8: 127–134. 10.1093/protein/8.2.127View ArticlePubMedGoogle Scholar
- Esmon CT: The endothelial cell protein C receptor. Thromb Haemost 2000, 83: 639–643.PubMedGoogle Scholar
- Van de Wouwer M, Collen D, Conway EM: Thrombomodulin-protein C-EPCR system: integrated to regulate coagulation and inflammation. Arterioscler Thromb Vasc Biol 2004, 24: 1374–1383. 10.1161/01.ATV.0000134298.25489.92View ArticlePubMedGoogle Scholar
- Tokuriki N, Stricher F, Schymkowitz J, Serrano L, Tawfik DS: The Stability Effects of Protein Mutations Appear to be Universally Distributed. J Mol Biol 2007, 369: 1318–1332. 10.1016/j.jmb.2007.03.069View ArticlePubMedGoogle Scholar
- Jang DS, Cha HJ, Cha SS, Hong BH, Ha NC, Lee JY, Oh BH, Lee HS, Choi KY: Structural double-mutant cycle analysis of a hydrogen bond network in ketosteroid isomerase from Pseudomonas putida biotype B. Biochem J 2004, 382: 967–973. 10.1042/BJ20031871PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.