- Methodology article
- Open Access
Utilizing protein structure to identify non-random somatic mutations
© Ryslik et al.; licensee BioMed Central Ltd. 2013
- Received: 8 March 2013
- Accepted: 28 May 2013
- Published: 13 June 2013
Human cancer is caused by the accumulation of somatic mutations in tumor suppressors and oncogenes within the genome. In the case of oncogenes, recent theory suggests that there are only a few key “driver” mutations responsible for tumorigenesis. As there have been significant pharmacological successes in developing drugs that treat cancers that carry these driver mutations, several methods that rely on mutational clustering have been developed to identify them. However, these methods consider proteins as a single strand without taking their spatial structures into account. We propose an extension to current methodology that incorporates protein tertiary structure in order to increase our power when identifying mutation clustering.
We have developed iPAC (identification of Protein Amino acid Clustering), an algorithm that identifies non-random somatic mutations in proteins while taking into account the three dimensional protein structure. By using the tertiary information, we are able to detect both novel clusters in proteins that are known to exhibit mutation clustering as well as identify clusters in proteins without evidence of clustering based on existing methods. For example, by combining the data in the Protein Data Bank (PDB) and the Catalogue of Somatic Mutations in Cancer, our algorithm identifies new mutational clusters in well known cancer proteins such as KRAS and PI3KC α. Further, by utilizing the tertiary structure, our algorithm also identifies clusters in EGFR, EIF2AK2, and other proteins that are not identified by current methodology. The R package is available at: http://www.bioconductor.org/packages/2.12/bioc/html/iPAC.html.
Our algorithm extends the current methodology to identify oncogenic activating driver mutations by utilizing tertiary protein structure when identifying nonrandom somatic residue mutation clusters.
- Tertiary Structure
- Protein Data Bank
- Significant Cluster
- Driver Mutation
- Mutational Data
Cancer is one of the most widespread and heterogeneous diseases imposing a huge toll on patients, relatives, friends, and society. However, at its most basic, it is a genetic disease that is caused by the accumulation of somatic mutations in oncogenes and tumor suppressors . While mutations in tumor suppressors tend to down-regulate the activity of genes that prevent cancer, mutations in proto-oncongenes either up-regulate or deregulate the activities of the resulting proteins. So far, pharmacological intervention has shown to be more successful at inhibiting activating oncogenes than restoring tumor suppressing gene function. Coupled with the idea of “oncogene addiction”, that many cancers rely on mutations in a small subset of key genes to be able to continue their uncontrolled growth while the remainder of the mutations constitute passenger mutations [2, 3], the problem of identifying activating oncogenic mutations has received great attention in cancer research.
Recently, several studies have shown support for the hypothesis that activating somatic mutations tend to cluster in protein kinases [2, 4, 5]. Further, as observed by , mutational clusters might provide further information regarding where to look for activating mutations, reducing the driver mutation search space needed to be analyzed. Moreover, mutational clusters that lead to either beneficial or detrimental phenotypic changes may point to regions that are under positive or directional selection as well as regions that are functionally significant and thus can be targeted by protein engineering .
So far, several methods based upon the number of mutations in a specific region have been developed to detect potential driver oncogenic mutations as well as naturally selected regions. One common method hypothesizes that driver mutations have a higher non-synonymous mutation rate as compared to the background mutation rate [5, 8]. Further, one can look at the ratio of nonsynonymous (K a ) to synonymous (K s ) changes per site, . A criterion for selection is then to check if , based on the hypothesis that the benchmark neutral rate of nucleotide substitution is exceeded when positive selection also contributes to the substitution process. Similarly,  proposes a hypothesis that driver mutations have a larger mutational rate than the background mutational rate after gene length normalization.
While the approaches mentioned above have had some success in detecting positive selection and/or identifying driver mutations, they nevertheless have several shortcomings. First, many of them are dependent on calculating the disparity in non-synonymous versus synonymous mutations but do not recognize that selection often occurs on very small sections of the gene and thus might fail when averaged over the entirety of the gene length. Second, the methods described above [9, 10] do not make any attempt to distinguish between activating and non-activating non-synonymous mutations.
In addition to the approaches described above, some researchers have focused on creating classifiers in order to determine mutation status. As described in , these algorithms employ a variety of machine learning techniques, such as Random Forests  and Support Vector Machines , to calculate a score for each mutation. These scores are typically calculated using a variety of information such as measures of evolutionary conservation as well as physico-chemical properties such as size and polarity of substituted and original residues as well as surface accessibility. These scores are then used to classify the mutation. For example, PolyPhen-2 predicts whether a missense mutation is damaging while CHASM attempts to discriminate between driver and passenger mutations. While several of these models have had significant success in classifying the mutation, they all require large and well annotated data sets in order to first train the machine learning classifier and then apply the resulting rule set.
Recently,  developed Non-Random Mutational Clustering (NMC) to identify potential activating mutations by hypothesizing that, in the absence of heretofore known mutational hotspots, a mutational cluster is indicative of selection for an activating driver mutation since only a small number of precise mutations can activate a protein [4, 5]. By looking at the order statistics and assuming that the locations of amino acid mutations follow a uniform distribution when the protein is considered in linear form under the null hypothesis, they identify clusters by calculating whether any two pair-wise mutations are closer together on the line than expected by chance alone. Despite its success, one limitation of the NMC method is that the proteins are treated as a linear sequence without considering the three dimensional structures of the proteins.
In this work, we extend the NMC methodology to account for tertiary protein structure. This enables the identification of mutational clusters that are relatively far away in linear space but relatively close together in 3D space. We proceed to show that our methodology is effective in identifying novel mutational clusters that are missed by NMC in key cancer proteins such as KRAS and PIK3C α. Unlike NMC, iPAC is also able to identify the EGFR and EIF2AK2 proteins as containing mutational clustering as well. We also show that many of the clusters identified by iPAC are predicted to be deleterious by well known machine learning algorithms such as Polyphen-2 . However, iPAC has the distinct advantage of requiring only the mutational positions and tertiary structure which allows its application to novel mutations and structures for which extensive information and literature is not yet available. Finally, we also show that for a large percentage of protein structures, the tertiary structure leads to a net reduction in mutational clusters found, thus presenting a simplified clustering mutational landscape. Ultimately, by providing a refined picture of the mutational clustering, we are are able to provide a more accurate representation of where potential activating mutations may reside within the protein.
Our method, named iPAC, uses a 4 step approach to finding mutational clusters. First, mutational and positional data are obtained from the COSMIC  and PDB  databases (described in Sections “Obtaining mutational data” and “Obtaining the 3D structural data”, respectively). The mutational and positional information is then reconciled to allow a single numerical reference to identify the same physical amino acid in both databases (Section “Reconciling the structural and mutational data”). Next, MultiDimensional Scaling (MDS)  is used to map the protein structure from 3D to 1D space while preserving, as best as possible, all pairwise three dimensional distances between amino acids for a given protein (Section “Multidimensional scalingtidimensional scaling”). The NMC algorithm is then run on the remapped amino acids to find mutational clusters (Section “NMC”). Finally, the clusters are mapped back into the original protein space and reported back to the user. In the following subsections we discuss each of these steps in detail.
Obtaining mutational data
Mutational data were obtained from the COSMIC database (version 58) via http://ftp.sanger.ac.uk/pub/CGP/cosmic and implemented using Oracle. In order to justify the assumption that amino acids follow a uniform distribution of mutation, only mutations that were found through whole gene screens were included. Further, we only used missense mutations that belonged to two categories: 1) “Confirmed somatic variant” or 2) “Reported in another cancer sample as somatic”. All nonsense and synonymous mutations as well as mutations that had different somatic status categories were excluded. Further, as multiple studies can report mutational data from the same cell line, mutational redundancies were removed to avoid double counting. See “Additional file 1: Cosmic Query” for the SQL code and schema used to generate the data. Finally, in order to match mutational data with structural data, only the proteins for which a UniProt Accession Number  was available were kept. This resulted in 777 unique proteins.
Obtaining the 3D structural data
The protein structural data were obtained from the PDB database via http://www.pdb.org. As one protein can have several structures, for each of the 777 proteins described above, all the structures with a matching UniProt Accession Number were obtained. If a specific structure had more than one polypeptide chain with a matching amino acid sequence in UniProt, the first matching chain listed was used (typically chain A). For proteins where the resolution was sufficiently high enough to provide more than one alternative conformation for a specific amino acid side chain, only the first conformation listed in the file was used. Once the appropriate side chain and conformation was selected, the (x,y,z) coordinates of all the α-carbon atoms were extracted and used to represent the 3D backbone structure of the protein. In all, this process resulted in 1,904 structures. See “Additional file 2: Structure Files” for a full listing of the structures and side chains used for each protein considered.
Reconciling the structural and mutational data
Due to a different numbering system of the amino acids employed by the PDB and COSMIC databases, an alignment needed to be performed in order to reference the same residue numerically in both databases. Two methods in the iPAC package were designed to reconcile these differences, one based on pairwise alignment  and the other based on a numerical reconstruction from the structural data obtained from the PDB. As there are often significant technical difficulties for such a reconstruction, for the rest of this paper, unless specifically noted, pairwise alignment was used to reconcile these elements. Please see the documentation in the iPAC package for a full description of these two methods. Successful alignment of mutational and positional data occurred on 140 proteins which corresponded to 1100 unique structure/side-chain combinations and 667 unique residue positions containing 1,434 total mutations. We note that for any given structure/side-chain combination, if there is no positional data for a specific residue, the mutational data for that residue is not used. Please see “Additional file 2: Structure Files” for a full description.
and is not subject to unit distortion, will be minimized instead.
Thus, via Equation (2), we can directly calculate if two mutations are closer together than by chance alone quickly and efficiently. For a given structure, a cluster was considered to be significant using an α-level of 0.05 and the Bonferroni adjustment. Specifically, the p-value of the cluster must be , where n(n+1)/2 are all the pairwise mutations considered.
Multiple comparison adjustment for structures
A comparison of the most significant iPAC and NMC p-values from the 8 proteins that were picked up by both algorithms
Using the iPAC package, 215 of the total 1100 structures analyzed were found to have significant clustering. When comparing iPAC with the original NMC method, out of the 140 proteins analyzed, both iPAC and NMC identified 8 proteins that contained significant clusters. However, iPAC also identified 3 new proteins as well, specifically EGFR, EIF2AK2 and HAO1. These 3 new proteins correspond to 10 of the 215 structures found to have clustering. iPAC also found structure 2ENQ for the protein PIK3CA to contain a significant cluster while NMC did not. The 8 proteins identified by both algorithms correspond to the remaining 204 structures. There were no proteins that were identified by NMC but were subsequently missed by the iPAC algorithm. Please see “Additional file 3: Results Summary” for a full listing of which structures and which proteins were found to be significant.
We note that 9 out of the 11 proteins that were found significant by iPAC had their most significant cluster overlap a binding site, proton acceptor site or kinase domain. Of the remaining 2 proteins, the most significant cluster for PIK3CA overlapped amino acid 1047, which has been shown to ease the entrance of substrates and hence potentially increase the substrate turnover rate, a typical oncogenic behavior . For a detailed per protein description, please see “Additional file 4: Relevant Sites”.
Finally, we validated the performance of iPAC using two popular machine learning algorithms, PolyPhen-2 and CHASM. First, this validation must be considered in light of the fact that these algorithms require a much more extensive set of information than iPAC. Nevertheless, over 98% of the amino acids that occurred in significant mutation clusters were also identified as significant (with a FDR of ≤ 20%) by Polyphen-2 and CHASM. For full details, please see “Additional file 5: Performance Validation”.
iPACfinds novel proteins
The three most significant clusters found in EGFR for the 2GS7 structure
Muts. in cluster
Overall, all the statistically significant clusters found deal with lung cancer pathology and an increase in kinase activity. The two mutations in cluster 2, G719S and T751I are both found in lung cancer with the first mutation responsible for strongly increased kinase activity [32-34] and the second found in erlotinib responsive non small cell lung cancer patients (NSCLC) [35, 36], respectively. Cluster 3 contains two mutations, T790M and L858R, both of which have been found in lung cancer and are known for increased kinase activity as well [32-34, 37]. Finally, cluster 1 is comprised of clusters 2 and 3, with an additional mutation S768I which potentially shows a positive clinical response to Getfinib in NSCLC patients . It is interesting to note that both clusters 1 and 2, that are identified via statistical analysis, contain mutations that have been found to benefit from pharmacological intervention. Had the tertiary structure of EGFR not been taken into account, these clusters would not have been identified by the NMC algorithm. When the protein is viewed linearly, the mutations occur too far away from each other to result in statistically significant p-values.
iPACfinds additional clusters
P-value for each region when the region is considered independently
2&3) 61 - 146
We also ran NMC and iPAC on each region separately to consider how the clustering results would be affected. As can be seen from Table 3, failure to account for the tertiary protein structure resulted in region 3 no longer being detected and region 1 losing significance by over ninety orders of magnitude.
Further, while somatic mutations in region 12-61 have been found in many cancers such as colorectal, lung, pancreatic and bladder [8, 33, 40-43], somatic mutations at amino acids 61, 117 and 146 have primarily been found in lung and colorectal carcinomas. Even more specifically, mutations at amino acids 117 and 146 (K → N and A → T, respectively) deal mostly with colorectal cancer . Thus, by taking into account the tertiary structure, the clusters identified by iPAC subdivide the protein along pathological lines.
iPACfinds fewer clusters than NMC
The significant clusters found by both and NMC
Clusters found by both NMC and iPAC
The clusters that were not deemed significant by iPAC but were deemed significant by NMC
Clusters dropped by iPAC
While it is outside the scope of this paper to consider all the differences between Table 4 and 5, we would like to point out that, contrary to iPAC, the NMC algorithm reports the two longest clusters: 1) 464-671 (p-value = 6.01×10−9) and 2) 469-671 (p-value = 2.38×10−8). After alignment of the structure as described in Section “Obtaining the 3D structural data”, we only have structural information on amino acids 448 - 723. Thus, the largest cluster detected by NMC covers ≈75% of all the amino acids that we are considering. However, by taking into account the 3D structure of the protein, these ultra-long clusters are dropped and the clusters where iPAC and NMC overlap show 2 distinct areas of the protein, amino acids 464-600 and 600-671. As expected, as the majority of mutations occur on amino acid 600, both NMC and iPAC declare that the “cluster” located at amino acid 600 is highly significant.
In this paper, we extended the existing methodology available to find somatic mutation clustering by utilizing the information provided in the protein tertiary structure. In doing so, we showed that we are able to find both new proteins with clustering as well as new clusters in previously found proteins. We have also shown that by taking into account 3D structure, we are able to remove clusters that do not have biological meaning. The method is fast and robust, with the vast majority of proteins analyzed within 5-10 minutes when executed on a desktop with 8 GB of DDR3 RAM and an Intel i7 3600k processor running at a frequency of 3.40 GHZ. Further, as the underlying calculation relies upon the NMC algorithm, a preset fixed window size is not required which allows for the detection of clusters of various lengths . We have also shown that by employing a completely statistical methodology, we are able to identify mutations that, when suppressed via pharmacological intervention, may stop further tumor growth.
This methodology, while an improvement on the NMC method, still suffers from some limitations. First, the mutation status of all the amino acids must be determined although with the advent of high-throughput sequencing, this will become less of an issue as time progresses. Also, both hypermutability of genomic locations and unequal rates of mutagenesis might violate the assumption that each amino acid has a uniform mutation probability. For instance, it is well known that hypermutable positions for both somatic and germline mutations exist. Insertions and deletions that are typically sequence dependent have been removed from the analysis and only missense substitutions of single amino acids have been kept in this study to help reduce such uniformity violations. Similarly, CpG dinucleotides can have mutational frequency that is ten times or more that of other dinucleotides . However, less than 13% of the mutations used to find clustering in Sections “iPAC finds novel proteins”, “iPAC finds additional clusters finds addi-tional clusters”, and “iPAC finds fewer clusters than NMC” were in CpG sites. Further, as described by , tobacco smoking preferentially causes transversions in lung cancer while the mutational landscape for colorectal cancer has more transitions . Nevertheless, in the context of KRAS, the vast majority of mutations occur on amino acids 12, 13 and 61 for both lung and colorectal cancer. This suggests that while the mutational spectrum may be different, it does not have a large effect on the position of mutations and thus the uniformity assumption. As with previous studies, while this analysis is influenced by nonrandom factors, it nonetheless appears that selection of a cancer phenotype is the primary cause of clustering.
It should also be noted that while iPAC is designed to take tertiary structure into account, it is only able to do so by appealing to the MDS methodology. Future research is required in order to relax this restriction to potentially identify additional clustering results. Next, as we obtained our mutational data from COSMIC, some tissues types are over or under-represented. However, such situations would make our analysis more conservative and the clusters we find even more significant. If different tissue types host mutations in different parts of the protein, aggregating over all tissue types will result in a larger value of n while the value of k and i for two specific mutations (as seen in Equation 2) would remain the same. This results in a higher p-value, implying that clusters that are found to be significant after collapsing over tissue type would be even more so if only a specific tissue type was analyzed.
Finally, as shown in Section ‘Results and discussion”, iPAC finds fewer clusters for a significant percentage of the structures analyzed. This reduction in total clusters can come from two sources: the removal of some amino acids due to lack of tertiary position information or that the cluster is no longer found to be significant when 3D structure is taken into account. The first source, while already rare will become even more so in the future as more detailed structural information becomes available. As for the second source, when a cluster is not identified under iPAC when compared to NMC, an overlapping or nearby cluster is typically found (as shown in Tables 4 and 5). For BRAF specifically, there was a total of 3 structures where iPAC found fewer clusters than NMC. Further, every “possibly” or “probably damaging” mutation, as categorized by PolyPhen-2 , was still represented in at least one cluster in each structure. Thus, in the case of BRAF, none of the damaging mutations identified by PolyPhen-2 were lost. For a more detailed analysis, please see “Additional file 6: Potential Driver Loss”. Ultimately, further research is required to further reduce the possibility of losing driver mutations while taking into account tertiary structure.
In conclusion, we present an approach that extends current methodology to identify mutation clustering by taking into account protein tertiary structure. We further show that by taking into account tertiary structure we are able to detect clusters that would otherwise be missed. Next, we demonstrate that for some of the clusters found, pharmacological intervention has already been successfully applied, further confirming the hypothesis that mutational clustering might point to activating driver mutations. As additional protein structures continue to be solved, iPAC would be able to rapidly perform a statistical analysis to identify such potential mutations. Finally, as we gain a better understanding of the tertiary structure of DNA, this method might also have applications to finding mutational clustering on the DNA level.
a For this analysis, we included included mutational and positional data only on residues 1-167. No 3D positional information was available in the 3GFT structure on residues 168-188, and these residues were removed before the analysis. Further, the structural information has amino acid 61 as a histidine (isoform 2B for KRAS in the Uniprot Database) while the COSMIC database has a glutamine in that position. However, as the substitution of one amino acid in the structure for another would not have a significant affect on its spatial orientation and as amino acid 61 has a large number of somatic mutations, it was kept in the analysis.
We thank Drs. Francesca Chairomonte and Catherine Siena Grasso for their time and discussions regarding the development of this methodology.
This work was supported in part by NSF Grant DMS 1106738 (GR, HZ), NIH Grant GM59507 (HZ), P01 CA154295 (GR, HZ) and the China Scholarship Council (YC).
- Vogelstein B, Kinzler KW: Cancer genes and the pathways they control. Nat Med. 2004, 10 (8): 789-799. 10.1038/nm1087. [http://www.ncbi.nlm.nih.gov/pubmed/15286780] [PMID: 15286780]View ArticlePubMedGoogle Scholar
- Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Stevens C, Edkins S, O’Meara S, Vastrik I, Schmidt EE, Avis T, Barthorpe S, Bhamra G, Buck G, Choudhury B, Clements J, Cole J, Dicks E, Forbes S, Gray K, Halliday K, Harrison R, Hills K, Hinton J, Jenkinson A, Jones D, et al: Patterns of somatic mutation in human cancer genomes. Nature. 2007, 446 (7132): 153-158. 10.1038/nature05610. [http://www.nature.com/doifinder/10.1038/nature05610]PubMed CentralView ArticlePubMedGoogle Scholar
- Weinstein IB, Joe AK: Mechanisms of disease: Oncogene addiction-a rationale for molecular targeting in cancer therapy. Nat Clin Pract Oncol. 2006, 3 (8): 448-457. 10.1038/ncponc0558. [http://www.ncbi.nlm.nih.gov/pubmed/16894390] [PMID: 16894390]View ArticlePubMedGoogle Scholar
- Torkamani A, Schork NJ: Prediction of cancer driver mutations in protein kinases. Cancer Res. 2008, 68 (6): 1675-1682. 10.1158/0008-5472.CAN-07-5283. [http://www.ncbi.nlm.nih.gov/pubmed/18339846] [PMID: 18339846]View ArticlePubMedGoogle Scholar
- Bardelli A, Parsons DW, Silliman N, Ptak J, Szabo S, Saha S, Markowitz S, Willson JKV, Parmigiani G, Kinzler KW, Vogelstein B, Velculescu VE: Mutational analysis of the tyrosine kinome in colorectal cancers. Sci (New York, N.Y.). 2003, 300 (5621): 949-10.1126/science.1082596. [http://www.ncbi.nlm.nih.gov/pubmed/12738854] [PMID: 12738854]View ArticleGoogle Scholar
- Ye J, Pavlicek A, Lunney EA, Rejto PA, Teng C: Statistical method on nonrandom clustering with application to somatic mutations in cancer. BMC Bioinformatics. 2010, 11: 11-10.1186/1471-2105-11-11. [http://www.biomedcentral.com/1471-2105/11/11]PubMed CentralView ArticlePubMedGoogle Scholar
- Wagner A: Rapid detection of positive selection in genes and genomes through variation clusters. Genetics. 2007, 176 (4): 2451-2463. 10.1534/genetics.107.074732.PubMed CentralView ArticlePubMedGoogle Scholar
- Sjöblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Mandelker D, Leary RJ, Ptak J, Silliman N, Szabo S, Buckhaults P, Farrell C, Meeh P, Markowitz SD, Willis J, Dawson D, Willson JKV, Gazdar AF, Hartigan J, Wu L, Liu C, Parmigiani G, Park BH, Bachman KE, Papadopoulos N, Vogelstein B, Kinzler KW, Velculescu VE: The consensus coding sequences of human breast and colorectal cancers. Sci (New York, N.Y.). 2006, 314 (5797): 268-274. 10.1126/science.1133427. [http://www.ncbi.nlm.nih.gov/pubmed/16959974] [PMID: 16959974]View ArticleGoogle Scholar
- Kreitman M: Methods to detect selection in populations with applications to the human. Ann Rev Genomics Hum Genet. 2000, 1: 539-559. 10.1146/annurev.genom.1.1.539.View ArticleGoogle Scholar
- Wang T: Prevalence of somatic alterations in the colorectal cancer cell genome. Proc Natl Acad Sci. 2002, 99 (5): 3076-3080. 10.1073/pnas.261714699.PubMed CentralView ArticlePubMedGoogle Scholar
- Reva B, Antipin Y, Sander C: Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 2011, 39 (17): e118-e118. 10.1093/nar/gkr407.PubMed CentralView ArticlePubMedGoogle Scholar
- Breiman L: Random forests. Mach Learn. 2001, 45: 5-32. 10.1023/A\%3A1010933404324.View ArticleGoogle Scholar
- Cortes C, Vapnik V: Support-vector networks. Mach Learn. 1995, 20 (3): 273-297. [http://www.springerlink.com/index/10.1007/BF00994018]Google Scholar
- Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR: A method and server for predicting damaging missense mutations. Nat Methods. 2010, 7 (4): 248-249. 10.1038/nmeth0410-248. [http://www.nature.com/doifinder/10.1038/nmeth0410-248]PubMed CentralView ArticlePubMedGoogle Scholar
- Carter H, Chen S, Isik L, Tyekucheva S, Velculescu VE, Kinzler KW, Vogelstein B, Karchin R: Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer Res. 2009, 69 (16): 6660-6667. 10.1158/0008-5472.CAN-09-1133.PubMed CentralView ArticlePubMedGoogle Scholar
- Forbes SA, Bhamra G, Bamford S, Dawson E, Kok C, Clements J, Menzies A, Teague JW, Futreal PA, Stratton MR: The Catalogue of Somatic Mutations in Cancer (COSMIC). Curr Protoc Hum Genet. 2008, Chapter 10: Unit 10.11-[http://www.ncbi.nlm.nih.gov/pubmed/18428421] [PMID: 18428421]PubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H, Shindyalov I, Bourne P: The protein data bank. Nucleic Acids Res. 2000, 28: 235-242. 10.1093/nar/28.1.235. [http://www.pdb.org]PubMed CentralView ArticlePubMedGoogle Scholar
- Borg I, Groenen PJF: Modern multidimensional scaling : theory and applications. 1997, New York: SpringerView ArticleGoogle Scholar
- The UniProt Consortium: Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res. 2011, 40 (D1): D71-D75. 10.1093/nar/gkr981.PubMed CentralView ArticleGoogle Scholar
- Pages H, Aboyoun P, Gentleman R, DebRoy S: Biostrings: String objects representing biological sequences, and matching algorithms. R package version 2.28.0.Google Scholar
- Tong Y, Tempel W, Shen L, Arrowsmith C, Edwards A, Sundstrom M, Weigelt J, Park H, Bockharev A: Human K-Ras in complex with a GTP analogue. 2009, [http://www.rcsb.org/pdb/explore.do?structureId=3GFT]Google Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B (Methodological). 1995, 57: 289-300. [http://www.jstor.org/stable/2346101]Google Scholar
- Dunn OJ: Confidence intervals for the means of dependent, normally distributed variables. J Am Stat Assoc. 1959, 54 (287): 613-621. 10.1080/01621459.1959.10501524. [http://www.jstor.org/stable/2282541]View ArticleGoogle Scholar
- Dunn OJ: Multiple comparisons among means. J Am Stat Assoc. 1961, 56 (293): 52-64. 10.1080/01621459.1961.10482090. [http://www.jstor.org/stable/2282330]View ArticleGoogle Scholar
- Gong Y, Kakihara Y, Krogan N, Greenblatt J, Emili A, Zhang Z, Houry WA: An atlas of chaperone-protein interactions in Saccharomyces cerevisiae: implications to protein folding pathways in the cell. Mol Syst Biol. 2009, 5: [http://www.nature.com/doifinder/10.1038/msb.2009.26]Google Scholar
- Mankoo PK, Sukumar S, Karchin R: PIK3CA somatic mutations in breast cancer: Mechanistic insights from Langevin dynamics simulations. Proteins: Struct, Funct, Bioinformatics. 2009, 75 (2): 499-508. 10.1002/prot.22265.View ArticleGoogle Scholar
- Herbst RS: Review of epidermal growth factor receptor biology. Int J Radiat Oncol *Biology* Phys. 2004, 59 (2, Supplement): S21-S26. 10.1016/j.ijrobp.2003.11.041. [http://www.sciencedirect.com/science/article/pii/S0360301604003311]View ArticleGoogle Scholar
- Scagliotti GV, Selvaggi G, Novello S, Hirsch FR: The biology of epidermal growth factor receptor in lung cancer. Clin Cancer Res. 2004, 10 (12): 4227s-4232s. 10.1158/1078-0432.CCR-040007. [http://clincancerres.aacrjournals.org/content/10/12/4227s.abstract]View ArticlePubMedGoogle Scholar
- Walker F, Abramowitz L, Benabderrahmane D, Duval X, Descatoire V, Hénin D, Lehy T, Aparicio T: Growth factor receptor expression in anal squamous lesions: modifications associated with oncogenic human papillomavirus and human immunodeficiency virus. Hum Pathol. 2009, 40 (11): 1517-1527. 10.1016/j.humpath.2009.05.010. [http://www.sciencedirect.com/science/article/pii/S004681770900197X]View ArticlePubMedGoogle Scholar
- Heimberger AB, Hlatky R, Suki D, Yang D, Weinberg J, Gilbert M, Sawaya R, Aldape K: Prognostic effect of epidermal growth factor receptor and EGFRvIII in glioblastoma multiforme patients. Clin Cancer Res. 2005, 11 (4): 1462-1466. 10.1158/1078-0432.CCR-04-1737. [http://clincancerres.aacrjournals.org/content/11/4/1462.abstract]View ArticlePubMedGoogle Scholar
- Zhang X, Gureasko J, Shen K, Cole PA, Kuriyan J: An allosteric mechanism for activation of the kinase domain of epidermal growth factor receptor. Cell. 2006, 125 (6): 1137-1149. 10.1016/j.cell.2006.05.013. [http://www.sciencedirect.com/science/article/pii/S0092867406005848]View ArticlePubMedGoogle Scholar
- Yun CH, Boggon TJ, Li Y, Woo MS, Greulich H, Meyerson M, Eck MJ: Structures of lung cancer-derived EGFR mutants and inhibitor complexes: mechanism of activation and insights into differential inhibitor sensitivity. Cancer Cell. 2007, 11 (3): 217-227. 10.1016/j.ccr.2006.12.017. [http://www.ncbi.nlm.nih.gov/pubmed/17349580] [PMID: 17349580]PubMed CentralView ArticlePubMedGoogle Scholar
- Tam IYS, Chung LP, Suen WS, Wang E, Wong MCM, Ho KK, Lam WK, Chiu SW, Girard L, Minna JD, Gazdar AF, Wong MP: Distinct epidermal growth factor receptor and KRAS mutation patterns in non-small cell lung cancer patients with different tobacco exposure and clinicopathologic features. Clin Cancer Res. 2006, 12 (5): 1647-1653. 10.1158/1078-0432.CCR-05-1981. [http://www.ncbi.nlm.nih.gov/pubmed/16533793] [PMID: 16533793]View ArticlePubMedGoogle Scholar
- Paez JG, Jänne PA, Lee JC, Tracy S, Greulich H, Gabriel S, Herman P, Kaye FJ, Lindeman N, Boggon TJ, Naoki K, Sasaki H, Fujii Y, Eck MJ, Sellers WR, Johnson BE, Meyerson M: EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Sci (New York, N.Y.). 2004, 304 (5676): 1497-1500. 10.1126/science.1099314. [http://www.ncbi.nlm.nih.gov/pubmed/15118125] [PMID: 15118125]View ArticleGoogle Scholar
- Peraldo-Neia C, Migliardi G, Mello-Grand M, Montemurro F, Segir R, Pignochino Y, Cavalloni G, Torchio B, Mosso L, Chiorino G, Aglietta M: Epidermal Growth Factor Receptor (EGFR) mutation analysis, gene expression profiling and EGFR protein expression in primary prostate cancer. BMC Cancer. 2011, 11: 31-10.1186/1471-2407-11-31. [http://www.biomedcentral.com/1471-2407/11/31]PubMed CentralView ArticlePubMedGoogle Scholar
- Tsao MS, Sakurada A, Cutz JC, Zhu CQ, Kamel-Reid S, Squire J, Lorimer I, Zhang T, Liu N, Daneshmand M, Marrano P, da Cunha Santos G, Lagarde A, Richardson F, Seymour L, Whitehead M, Ding K, Pater J, Shepherd FA: Erlotinib in lung cancer - molecular and clinical predictors of outcome. N Engl J Med. 2005, 353 (2): 133-144. 10.1056/NEJMoa050736.View ArticlePubMedGoogle Scholar
- Yun CH, Mengwasser KE, Toms AV, Woo MS, Greulich H, Wong KK, Meyerson M, Eck MJ: The T790M mutation in EGFR kinase causes drug resistance by increasing the affinity for ATP. Proc Natl Acad Sci U S A. 2008, 105 (6): 2070-2075. 10.1073/pnas.0709662105. [http://www.ncbi.nlm.nih.gov/pubmed/18227510] [PMID: 18227510]PubMed CentralView ArticlePubMedGoogle Scholar
- Masago K, Fujita S, Irisa K, Kim YH, Ichikawa M, Mio T, Mishima M: Good clinical response to Gefitinib in a non-small cell lung cancer patient harboring a rare somatic epidermal growth factor gene point mutation; codon 768 AGC > ATC in Exon 20 (S768I). Jpn J Clin Oncol. 2010, 40 (11): 1105-1109. 10.1093/jjco/hyq087. [http://jjco.oxfordjournals.org/content/40/11/1105.abstract]View ArticlePubMedGoogle Scholar
- Kranenburg O: The KRAS oncogene: Past, present, and future. Biochim Biophys Acta (BBA) - Rev Cancer. 2005, 1756 (2): 81-82. 10.1016/j.bbcan.2005.10.001. [http://www.sciencedirect.com/science/article/pii/S0304419X05000624] [The KRAS Oncogene]View ArticleGoogle Scholar
- Lee KH, Lee JS, Suh C, Kim SW, Kim SB, Lee JH, Lee MS, Park MY, Sun HS, Kim SH: Clinicopathologic significance of the K-ras gene codon 12 point mutation in stomach cancer. An analysis of 140 cases. Cancer. 1995, 75 (12): 2794-2801. 10.1002/1097-0142(19950615)75:12<2794::AID-CNCR2820751203>3.0.CO;2-F. [http://www.ncbi.nlm.nih.gov/pubmed/7773929] [PMID: 7773929]View ArticlePubMedGoogle Scholar
- Motojima K, Urano T, Nagata Y, Shiku H, Tsurifune T, Kanematsu T: Detection of point mutations in the Kirsten-ras oncogene provides evidence for the multicentricity of pancreatic carcinoma. Ann Surg. 1993, 217 (2): 138-143. 10.1097/00000658-199302000-00007. [http://www.ncbi.nlm.nih.gov/pubmed/8439212] [PMID: 8439212]PubMed CentralView ArticlePubMedGoogle Scholar
- Nakano H, Yamamoto F, Neville C, Evans D, Mizuno T, Perucho M: Isolation of transforming sequences of two human lung carcinomas: structural and functional analysis of the activated c-K-ras oncogenes. Proc Natl Acad Sci U S A. 1984, 81: 71-75. 10.1073/pnas.81.1.71. [http://www.ncbi.nlm.nih.gov/pubmed/6320174] [PMID: 6320174]PubMed CentralView ArticlePubMedGoogle Scholar
- Santos E, Martin-Zanca D, Reddy EP, Pierotti MA, Della Porta G, Barbacid M: Malignant activation of a K-ras oncogene in lung carcinoma but not in normal tissue of the same patient. Sci (New York, N.Y.). 1984, 223 (4637): 661-664. 10.1126/science.6695174. [http://www.ncbi.nlm.nih.gov/pubmed/6695174] [PMID: 6695174]View ArticleGoogle Scholar
- Wenglowsky S, Ren L, Ahrendt KA, Laird ER, Aliagas I, Alicke B, Buckmelter AJ, Choo EF, Dinkel V, Feng B, Gloor SL, Gould SE, Gross S, Gunzner-Toste J, Hansen JD, Hatzivassiliou G, Liu B, Malesky K, Mathieu S, Newhouse B, Raddatz NJ, Ran Y, Rana S, Randolph N, Risom T, Rudolph J, Savage S, Selby LT, Shrag M, Song K, et al: Pyrazolopyridine inhibitors of B-Raf V600E. Part 1: the development of selective, orally Bioavailable, and efficacious inhibitors. ACS Med Chem Lett. 2011, 2 (5): 342-347. 10.1021/ml200025q.PubMed CentralView ArticlePubMedGoogle Scholar
- Gandhi J, Zhang J, Xie Y, Soh J, Shigematsu H, Zhang W, Yamamoto H, Peyton M, Girard L, Lockwood WW, Lam WL, Varella-Garcia M, Minna JD, Gazdar AF: Alterations in genes of the EGFR signaling pathway and their relationship to EGFR tyrosine kinase inhibitor sensitivity in lung cancer cell lines. PLoS ONE. 2009, 4 (2): e4576-10.1371\%2Fjournal.pone.0004576.PubMed CentralView ArticlePubMedGoogle Scholar
- Pratilas CA, Hanrahan AJ, Halilovic E, Persaud Y, Soh J, Chitale D, Shigematsu H, Yamamoto H, Sawai A, Janakiraman M, Taylor BS, Pao W, Toyooka S, Ladanyi M, Gazdar A, Rosen N, Solit DB: Genetic predictors of MEK dependence in non-small cell lung cancer. Cancer Res. 2008, 68 (22): 9375-9383. 10.1158/0008-5472.CAN-08-2223.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee JW, Yoo NJ, Soung YH, Kim HS, Park WS, Kim SY, Lee JH, Park JY, Cho YG, Kim CJ, Ko YH, Kim SH, Nam SW, Lee JY, Lee SH: BRAFmutations in non-Hodgkin’s lymphoma. Br J Cancer. 2003, 89 (10): 1958-1960. 10.1038/sj.bjc.6601371. [http://www.ncbi.nlm.nih.gov/pubmed/14612909] [PMID: 14612909]PubMed CentralView ArticlePubMedGoogle Scholar
- Davies H, Bignell GR, Cox C, Stephens P, Edkins S, Clegg S, Teague J, Woffendin H, Garnett MJ, Bottomley W, Davis N, Dicks E, Ewing R, Floyd Y, Gray K, Hall S, Hawes R, Hughes J, Kosmidou V, Menzies A, Mould C, Parker A, Stevens C, Watt S, Hooper S, Wilson R, Jayatilake H, Gusterson BA, Cooper C, Shipley J, et al: Mutations of the BRAF gene in human cancer. Nature. 2002, 417 (6892): 949-954. 10.1038/nature00766. [http://www.ncbi.nlm.nih.gov/pubmed/12068308] [PMID: 12068308]View ArticlePubMedGoogle Scholar
- Naoki K, Chen TH, Richards WG, Sugarbaker DJ, Meyerson M: Missense mutations of the BRAF gene in human lung adenocarcinoma. Cancer Res. 2002, 62 (23): 7001-7003. [http://www.ncbi.nlm.nih.gov/pubmed/12460919] [PMID: 12460919]PubMedGoogle Scholar
- Andreu-Pérez P, Esteve-Puig R, de Torre-Minguela C, López-Fauqued M, Bech-Serra JJ, Tenbaum S, García-Trevijano ER, Canals F, Merlino G, Avila MA, Recio JA: Protein arginine methyltransferase 5 regulates ERK1/2 signal transduction amplitude and cell fate through CRAF. Sci signaling. 2011, 4 (190): ra58-10.1126/scisignal.2001936. [http://www.ncbi.nlm.nih.gov/pubmed/21917714] [PMID: 21917714]View ArticleGoogle Scholar
- Hingorani SR, Jacobetz MA, Robertson GP, Herlyn M, Tuveson DA: Suppression of BRAF(V599E) in human melanoma abrogates transformation. Cancer Res. 2003, 63 (17): 5198-5202. [http://www.ncbi.nlm.nih.gov/pubmed/14500344] [PMID: 14500344]PubMedGoogle Scholar
- Rajagopalan H, Bardelli A, Lengauer C, Kinzler KW, Vogelstein B, Velculescu VE: Tumorigenesis: RAF/RAS oncogenes and mismatch-repair status. Nature. 2002, 418 (6901): 934-10.1038/418934a. [http://www.ncbi.nlm.nih.gov/pubmed/12198537] [PMID: 12198537]View ArticlePubMedGoogle Scholar
- Chapman MA, Lawrence MS, Keats JJ, Cibulskis K, Sougnez C, Schinzel AC, Harview CL, Brunet JP, Ahmann GJ, Adli M, Anderson KC, Ardlie KG, Auclair D, Baker A, Bergsagel PL, Bernstein BE, Drier Y, Fonseca R, Gabriel SB, Hofmeister CC, Jagannath S, Jakubowiak AJ, Krishnan A, Levy J, Liefeld T, Lonial S, Mahan S, Mfuko B, Monti S, Perkins LM, et al: Initial genome sequencing and analysis of multiple myeloma. Nature. 2011, 471 (7339): 467-472. 10.1038/nature09837.PubMed CentralView ArticlePubMedGoogle Scholar
- Sved J, Bird A: The expected equilibrium of the CpG dinucleotide in vertebrate genomes under a mutation model. Proc Natl Acad Sci. 1990, 87 (12): 4692-4696. 10.1073/pnas.87.12.4692. [http://www.pnas.org/content/87/12/4692.abstract]PubMed CentralView ArticlePubMedGoogle Scholar
- Hollstein M, Sidransky D, Vogelstein B, Harris CC: p53 mutations in human cancers. Sci (New York, N.Y.). 1991, 253 (5015): 49-53. 10.1126/science.1905840. [http://www.ncbi.nlm.nih.gov/pubmed/1905840] [PMID: 1905840]View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.