EGFR Mutant Structural Database: computationally predicted 3D structures and the corresponding binding free energies with gefitinib and erlotinib
© Ma et al.; licensee BioMed Central. 2015
Received: 21 October 2014
Accepted: 27 February 2015
Published: 14 March 2015
Epidermal growth factor receptor (EGFR) mutation-induced drug resistance has caused great difficulties in the treatment of non-small-cell lung cancer (NSCLC). However, structural information is available for just a few EGFR mutants. In this study, we created an EGFR Mutant Structural Database (freely available at http://bcc.ee.cityu.edu.hk/data/EGFR.html), including the 3D EGFR mutant structures and their corresponding binding free energies with two commonly used inhibitors (gefitinib and erlotinib).
We collected the information of 942 NSCLC patients belonging to 112 mutation types. These mutation types are divided into five groups (insertion, deletion, duplication, modification and substitution), and substitution accounts for 61.61% of the mutation types and 54.14% of all the patients. Among all the 942 patients, 388 cases experienced a mutation at residue site 858 with leucine replaced by arginine (L858R), making it the most common mutation type. Moreover, 36 (32.14%) mutation types occur at exon 19, and 419 (44.48%) patients carried a mutation at exon 21. In this study, we predicted the EGFR mutant structures using Rosetta with the collected mutation types. In addition, Amber was employed to refine the structures followed by calculating the binding free energies of mutant-drug complexes.
The EGFR Mutant Structural Database provides resources of 3D structures and the binding affinity with inhibitors, which can be used by other researchers to study NSCLC further and by medical doctors as reference for NSCLC treatment.
As the primary type of lung cancer, non-small-cell lung cancer (NSCLC) has received growing attention from the researchers [1-3]. It is reported that about 85% of all the lung cancer patients are diagnosed as NSCLC . One strategy commonly used in the treatment is to target the tyrosine kinase (TK) domain of epidermal growth factor receptor (EGFR) to interrupt the downstream signaling [5,6]. Reversible tyrosine kinase inhibitors (TKIs), such as gefitinib and erlotinib, are generally applied in this procedure. They are proven to be efficient for patients over a period of time, but a limited treatment outcome usually occurs because of mutation at EGFR TK domain [7,8]. According to statistics, about 10% to 15% of white patients and 30% East Asian patients experience a mutation of EGFR TK domain , and over one hundred mutation types have been found so far [9,10].
Structural information is available for just a few EGFR mutants from the Protein Data Bank (PDB) . They are obtained with experimental methods, such as X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy . These methods can produce high-resolution protein crystal structures, but they are usually very complex, costly and time consuming. Bioinformatics based methods have become very popular and successful in predicting protein structures [13,14]. Wang et al.  predicted EGFR mutant structures using the tools scap and loopy. Yarov‐Yarovoy et al.  employed Rosetta  to predict helical transmembrane protein structures. The binding free energy acts as a useful index to evaluate the binding affinity between mutants and drugs, and can be used as an important indicator of drug resistance. Zhou et al.  predicted EGFR mutation induced drug resistance based on the binding free energy, which was calculated with Amber . As different mutations affect the EGFR structure and drug resistance level differently, a database of the EGFR mutant structures and the corresponding binding free energies with TKIs can provide a useful resource for further research and clinical guidance.
The naming rules of EGFR mutations
Combination of residues deletion and insertion
The crystal structures of EGFR mutants with L858R and G719S are available from PDB . Other EGFR mutants, used for calculating binding free energies with gefitinib and erlotinib, were generated based on the template structures “2ITY” and “1 M17” respectively downloaded from PDB.
Point mutation modeling
Distribution of EGFR mutations
Number of mutations
Number of patients
Percentage of all mutation types
Percentage of all the patients
We employed the homology modeling (also known as comparative modeling (CM))  protocol in Rosetta to generate the mutations of amino acids insertion, deletion, duplication and modification relative to WT EGFR. Homology modeling is widely used in predicting protein structures as it can often provide reliable and accurate structural models [23-25]. It provides a way to fill the large gap between the increasing number of available protein sequences and the protein crystal structures obtained from experimental methods .
Before model construction, several files (target and template protein sequences, template PDB file, mutant-aligned sequences, fragment library and secondary structure file of the target) should be prepared first. Selection of a template is very important because it can affect the accuracy of the predicted structure. In our studies, the crystal structures of EGFR TK domain “2ITY” and “1 M17” are selected as templates to generate the mutants. After the template is determined, mutant sequences are aligned to the template sequence with multiple-sequence alignment program ClustalW . The fragment library includes short peptide backbone fragments, which can play an important role in the construction of variable regions. We employed the fragment picker protocol in Rosetta to pick fragments, which can help to establish models more efficiently and accurately by enabling rapid search of the conformational space. Moreover, PSIPRED  are used to obtain the target’s secondary structure file. After all these files prepared, the CM protocol  in Rosetta are applied to build the well-aligned regions and the missing parts are rebuilt using loop modeling with the fragment library. Finally, a full-atom refinement step is performed to the models and clustering method is used to select models.
The models predicted with software simulation may not be accurate, thus the verification and assessment of the predicted models become very important. Two methods are often adopted to assess the predicted models with software simulation, computing the energy of the model and evaluating the similarity with a given characteristic between the predicted model and the real structure . In this paper, we used physics-based energies of the predicted EGFR mutants to assess the accuracy of the 3D structures. The full atom energy scoring function was employed to calculate the energies of all the structures and the one with the minimum energy was identified as the finally predicted structure. Using the function, each predicted structure is scored with a series of parameters (Lennard-Jones interactions, solvation, residue pair interactions, van der Waals, hydrogen bonding, Ramachandran torsion preferences, rotamer self-energy and unfolded state reference energy) and their corresponding weights . The total score of a predicted model is defined as the weighted sum of all the scoring parameters.
Molecular dynamics (MD) simulation
The total energy is composed of bonded term E bonded and non-bonded term E nonbonded . In Equation (1), the bonded energy which is related to the covalent bonds consists of bond stretching (where K b is an empirical stretching force constant, b and b 0 are the actual and empirical bond lengths respectively), angle bending (where K θ is a constant, θ and θ 0 are the actual and empirical bond angles respectively), and torsion terms (where V n is the barrier to free rotation for the empirical bond, n is rotation periodicity, ϕ stands for torsion angle, and δ represents the angle when the potential reaches its minimum value). The non-bonded energy includes van der Waals (where A ij and B ij describe the depth and position for a pair of non-bonded interacting atoms respectively, and r ij is the interatomic distance) and the long-range electrostatic terms (where q i and q j are point charges, and r ij is the interatomic distance). In our simulation, we employed the ff99SB force field, which is a broad application of the basic force field. After solvating the complex and adding force filed, we conducted a minimization step to the entire system with sander in Amber. The result from the optimization process is our refined mutant structure.
With MatchMaker in UCSF Chimera , we aligned the optimized structure to the template complex “2ITY” (EGFR-gefitinib complex) or “1 M17” (EGFR-erlotinib complex) to obtain the mutant-drug complex. Then Amber was used to optimize these complexes. Similarly, the complex was solvated in a TIP3P water box (10.0 Å) and the ff99SB force filed was adopted. In order to conduct the production MD, we need to equilibrate the solvated complex using sander in Amber. First, 1000 circles of minimization were adopted to remove any bad contacts and make the structure relaxed. In this procedure, steepest descent algorithm was used for the first 500 steps and conjugate gradient algorithm was applied for the second 500 steps. Then 50 picosecond (ps) of heating and 50 ps of density equilibration were conducted to reach the temperature about 300 K and the density around 1 grams/ml. Subsequently, equilibration of constant pressure at 500 ps was carried out at the temperature of 300 K. All these simulations were conducted with shake on hydrogen atoms, and Langevin dynamics was used to control the temperature. Several parameters, such as temperature, density, total energy and root-mean-square deviation (RMSD) were finally used to verify that the equilibration of the system. When the system is equilibrated, we proceeded to run the production MD for a total of 2 ns and recorded the coordinates every 10 ps.
Binding free energy calculation
where ∆G bind,solv and ∆G bind,vacuum represent the free energy difference of bound and unbound state of a complex in solvent and vacuum environment respectively, and ∆G solv,receptor , ∆G solv,ligand and ∆G solv,complex stand for the changes of free energies of the receptor, ligand and complex between solvent and vacuum environment, respectively.
We calculated the binding free energies of EGFR mutants with gefitinib and erlotinib. MM-GBSA in Amber derives the interaction energy and solvation free energy for the receptor, ligand and complex respectively. The energy of each molecular is composed of several terms, including van der Waals force (VDWAALS), electrostatic energy (EEL), the electrostatic contribution to the solvation free energy (EGB) and nonpolar contribution to the solvation free energy (ESURF). The total binding free energy is given by ∆G along with error values.
Results and discussion
According to the naming rules, 112 EGFR mutation types of the 942 NSCLC patients are divided into five groups, including insertion, deletion, duplication, modification and substitution. We counted the number of mutation types as well as the corresponding patients of each mutation type (Table 2). From Table 2, substitution accounts for more than half of EGFR mutation types and the number of patients. Although deletion just takes up 5.36% of all the mutation types, 285 cases belong to this group and they hold 30.25% of all the patients.
Most common EGFR mutation types
Number of patients
Distribution of mutation types and the number of patients by mutation position
Number of occurrences
Number of mutation types
Number of patients
Percentage of all mutation types
Percentage of all the patients
EGFR mutant structure prediction
Binding free energy calculation
Binding free energies of WT EGFR-drug complex and several common mutation-drug complexes
Binding free energy with gefitinib (kcal/mol)
Binding free energy with erlotinib (kcal/mol)
Comparison of the EGFR mutant structural database and other EGFR-related databases
Several EGFR-related databases are available publicly, such as the EGFR Mutation Database (http://www.cityofhope.org/egfr-mutation-database) , the Catalogue of Somatic Mutations in Cancer (http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/) , EGFR Inhibitor Database (http://crdd.osdd.net/raghava/egfrindb/)  and the widely used PDB . The EGFR Mutation Database contains the mutant position information as well as the response to inhibitors of the NSCLC patients. The COSMIC stores somatic mutation data of human cancer. These databases just provide the sequence information of the mutations. The EGFR Inhibitor Database contains biological and chemical information of the EGFR inhibitors. PDB provides crystal structures of proteins, nucleic acids, and complex assemblies obtained from experimental methods, such as X-ray or NMR. However, only a few EGFR mutant structures are available because of the high cost of experiments. The EGFR Mutant Structural Database presented in this paper contains 3D structures of 112 kinds of EGFR mutants. Moreover, the binding free energies of the mutant and inhibitors are provided to show the binding affinity. The structural information is very helpful to conduct protein docking, hydrogen bond analysis, and protein-drug complex simulation, which are very important in the studying of drug resistance mechanisms.
In our previous work [15,33], the molecular mechanisms have been identified from the aspects of geometric properties of mutant structures and the binding free energies with gefitinib and erlotinib. In , with 30 mutant structures generated by Rosetta, we analyzed local surface changes of the binding pocket relative to the wild-type EGFR using alpha shape modeling. Moreover, we conducted a correlation analysis about the geometric properties and the pre-recorded progression-free survival (PFS) in the treatments. Results show that the curvature of the binding pocket surface plays an important role in the prediction of EGFR mutation-induced drug resistance. In , we identified drug resistance mechanisms from the binding free energies with inhibitors (gefitinib and erlotinib) as well as some personal features of 168 patients (belonging to 37 mutation types). Extreme learning machine method was employed to build a classification model and resistant subjects were successfully identified. Overall, the molecular mechanisms of drug resistance are closely related to the mutant structures and the binding affinity with inhibitors. Thus, the EGFR Mutant Structural Database we built here is very useful to other researchers and medical doctors for further studying or clinical guidance.
In this work, we created an EGFR Mutant Structural Database, composed of computationally predicted 3D structures of the EGFR mutants and the corresponding binding free energies with gefitinib and erlotinib. In our database, 112 kinds of mutants were collected from 942 NSCLC patients. We categorized the mutants into five groups (insertion, deletion, duplication, modification and substitution), and substitution accounts for 61.61% of the EGFR mutation types and 54.14% of all the patients. As the most common mutation type, L858R covers 388 or 41.19% of all the patients. In addition, we analyzed the mutations at each exon. It shows that exon 19 (32.14%) possesses the most mutation types and exon 21 (44.48%) occupies the largest number of patients. With the mutant protein sequences and WT EGFR crystal structure, we predicted the EGFR mutation structures with Rosetta and optimized the structures using Amber. Finally, we calculated the binding free energies of EGFR mutants and the inhibitors (gefitinib and erlotinib). Our work provides a database of the EGFR mutant structures and their corresponding binding free energy with inhibitors. These resources can be used for further researches and clinical guidance, such as analyzing drug resistance of the EGFR mutants, which is a major problem during the treatment of NSCLC patients. The database is freely available at http://bcc.ee.cityu.edu.hk/data/EGFR.html.
This work is supported by the Health and Medical Research Fund (HMRF) of Hong Kong (Project 01121986). The authors would like to thank Zhoubao Sun and Zhiyong Shen for their help with molecular dynamics simulation work and for useful discussions.
- Lynch TJ, Bell DW, Sordella R, Gurubhagavatula S, Okimoto RA, Brannigan BW. Activating mutations in the epidermal growth factor receptor underlying responsiveness of non–small-cell lung cancer to gefitinib. N Engl J Med. 2004;350(21):2129–39.View ArticlePubMedGoogle Scholar
- Wang H, Xing F, Su H. Novel image markers for non-small cell lung cancer classification and survival prediction. BMC Bioinform. 2014;15(1):310.View ArticleGoogle Scholar
- Okamoto W, Okamoto I, Tanaka K, Arao T, Nishio K, Fukuoka M. TAK-701, a humanized monoclonal antibody to HGF, reverses gefitinib resistance induced by tumor-derived HGF in non-small cell lung cancer with an EGFR mutation. Cancer Res. 2011;71(8 Supplement):1731.View ArticleGoogle Scholar
- Bar J, Onn A. Overcoming molecular mechanisms of resistance to first-generation epidermal growth factor receptor tyrosine kinase inhibitors. Clin Lung Cancer. 2012;13(4):267–79.View ArticlePubMedGoogle Scholar
- Wu JY, Wu SG, Yang CH, Chang YL, Chang YC, Hsu YC. Comparison of gefitinib and erlotinib in advanced NSCLC and the effect of EGFR mutations. Lung Cancer. 2011;72(2):205–12.View ArticlePubMedGoogle Scholar
- Rosell R, Carcereny E, Gervais R, Vergnenegre A, Massuti B, Felip E. Erlotinib versus standard chemotherapy as first-line treatment for European patients with advanced EGFR mutation-positive non-small-cell lung cancer (EURTAC): a multicentre, open-label, randomised phase 3 trial. Lancet Oncol. 2012;13(3):239–46.View ArticlePubMedGoogle Scholar
- Kosaka T, Yamaki E, Mogi A, Kuwano H. Mechanisms of resistance to EGFR TKIs and development of a new generation of drugs in non-small-cell lung cancer. BioMed Res Int 2011; doi:10.1155/2011/165214.Google Scholar
- Oxnard GR, Arcila ME, Sima CS, Riely GJ, Chmielecki J, Kris MG. Acquired resistance to EGFR tyrosine kinase inhibitors in EGFR-mutant lung cancer: distinct natural history of patients with tumors harboring the T790M mutation. Clin Cancer Res. 2011;17(6):1616–22.View ArticlePubMedGoogle Scholar
- Gu D, Scaringe WA, Li K, Saldivar JS, Hill KA, Chen Z. Database of somatic mutations in EGFR with analyses revealing indel hotspots but no smoking-associated signature. Hum Mutat. 2007;28(8):760–70.View ArticlePubMedGoogle Scholar
- Lee VH, Tin VP, Choy TS, Lam KO, Choi CW, Chung LP. Association of Exon 19 and 21 EGFR mutation patterns with treatment outcome after first-line tyrosine kinase inhibitor in metastatic non-small-cell lung cancer. J Thorac Oncol. 2013;8(9):1148–55.View ArticlePubMedGoogle Scholar
- The Protein Data Bank. [http://www.rcsb.org]
- Yang LW, Eyal E, Chennubhotla C, Jee J, Gronenborn AM, Bahar I. Insights into equilibrium dynamics of proteins from comparison of NMR and X-ray data with computational predictions. Structure. 2007;15(6):741–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Hao GF, Yang GF, Zhan CG. Structure-based methods for predicting target mutation-induced drug resistance and rational drug design to overcome the problem. Drug Discov Today. 2012;17(19):1121–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Cao ZW, Han LY, Zheng CJ, Ji ZL, Chen X, Lin HH. Computer prediction of drug resistance mutations in proteins. Drug Discov Today. 2005;10(7):521–9.View ArticlePubMedGoogle Scholar
- Wang DD, Zhou W, Yan H, Wong M, Lee V. Personalized prediction of EGFR mutation-induced drug resistance in lung cancer. Sci Rep. 2013;3:2855.PubMedPubMed CentralGoogle Scholar
- Yarov‐Yarovoy V, Schonbrun J, Baker D. Multipass membrane protein structure prediction using Rosetta. Proteins. 2006;62(4):1010–25.View ArticlePubMedPubMed CentralGoogle Scholar
- Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 2011;487:545–74.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhou W, Wang DD, Yan H, Wong M, Lee V. Prediction of anti-EGFR drug resistance base on binding free energy and hydrogen bond analysis. In: Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). 2013. p. 193–7.Google Scholar
- Case DA. AMBER 12. San Francisco: University of California; 2012.Google Scholar
- Kellogg EH, Leaver‐Fay A, Baker D. Role of conformational sampling in computing mutation‐induced changes in protein structure and stability. Proteins. 2011;79(3):830–8.View ArticlePubMedGoogle Scholar
- Kortemme T, Baker D. A simple physical model for binding energy hot spots in protein–protein complexes. Proc Natl Acad Sci. 2002;99(22):14116–21.View ArticlePubMedPubMed CentralGoogle Scholar
- Martí-Renom MA, Stuart AC, Fiser A, Sánchez R, Melo F, Šali A. Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomo Struct. 2000;29(1):291–325.View ArticleGoogle Scholar
- Ginalski K. Comparative modeling for protein structure prediction. Curr Opin Struct Biol. 2006;16(2):172–7.View ArticlePubMedGoogle Scholar
- Sanchez R, Šali A. Advances in comparative protein-structure modelling. Curr Opin Struct Biol. 1997;7(2):206–14.View ArticlePubMedGoogle Scholar
- Pieper U, Webb BM, Barkan DT, Schneidman-Duhovny D, Schlessinger A, Braberg H. ModBase, a database of annotated comparative protein structure models, and associated resources. Nucl Acids Res. 2011;39 suppl 1:D465–74.View ArticlePubMedGoogle Scholar
- Xiang Z. Advances in homology protein structure modeling. Curr Protein Pept Sci. 2006;7(3):217–27.View ArticlePubMedPubMed CentralGoogle Scholar
- Thompson JD, Gibson T, Higgins DG. Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinform 2002; doi:10.1002/0471250953.bi0203s00.Google Scholar
- McGuffin LJ, Bryson K, Jones DT. The PSIPRED protein structure prediction server. Bioinform. 2000;16(4):404–5.View ArticleGoogle Scholar
- Rohl CA, Strauss CE, Misura KM, Baker D. Protein structure prediction using Rosetta. Methods Enzymol. 2004;383:66–93.View ArticlePubMedGoogle Scholar
- Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC. UCSF Chimera—a visualization system for exploratory research and analysis. J Comput Chem. 2004;25(13):1605–12.View ArticlePubMedGoogle Scholar
- Bamford S, Dawson E, Forbes S, Clements J, Pettet R, Dogan A. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br J Cancer. 2004;91(2):355–8.PubMedPubMed CentralGoogle Scholar
- Yadav IS, Singh H, Imran KM, Chaudhury A, Raghava GP, Agarwal SM. EGFRIndb: Epidermal Growth Factor Receptor Inhibitor Database. Anti-cancer Agents Med Chem. 2014;14(7):928–35.View ArticleGoogle Scholar
- Ma L, Wang DD, Huang Y, Wong MP, Lee VH, Yan H. Decoding the EGFR mutation-induced drug resistance in lung cancer treatment by local surface geometric properties. Comput Biol Med. 2014; doi:10.1016/j.compbiomed.2014.06.016Google Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.