Step 2: interpretation of the consequences of the mutations
The first output the user will get right after submitting the mutations is a summary page with useful information about the requested mutations (Figure
1, panel b). It includes a description of the proteins in Uniprot, the membership to kinase groups in the classification in KinBase
[32, 33] and the estimation of the pathogenicity of mutations attending to our kinase-specific predictor of pathogenicity, KinMut
. The prediction of the pathogenicity will be discussed in detail in a forthcoming section, nevertheless we decided to include this information at this step as a guide to prioritize mutations. It might be interesting to point out here that users interested only in the results from KinMut, can find a link to the predictions in this summary page that can be accessed programmatically. The scope of wKinMut goes beyond providing raw prediction of pathogenicity from KinMut, the web-service’s main goal is to aid computational biologists and clinicians to understand and to interpret the consequences of kinase mutations. Hence, information complementary to KinMut predictions, is provided. In the summary table, the ‘View’ link in the right-most ‘Details’ column (Figure
1, panel b) will redirect the user to another page containing this complementary information, which includes: the values of the features used for classification, PFAM domains affected by the mutation, protein-protein interaction information extracted from the literature with iHop
, mentions of the mutations in the literature automatically mined with SNP2L
[25, 35], and existing records of the mutations in other dedicated databases. This additional information is intended to provide the basic background to help to understand and interpret the consequences of the mutations. Each individual piece of information will be discussed thoroughly in the following sections.
General information about the protein/gene
Information under the ‘gene/protein’ tab (Figure
1, panel c) focuses on information shared by all mutations in the same kinase. Background information such as the gene name, the formal description in Uniprot and the classification in KinBase
[32, 33] of the kinase is provided. In addition, the system provides the Gene Ontology terms with which the kinase has been annotated in each of the independent sub-ontologies (namely Molecular Function, Cellular Compartment and Biological Process). This information provides clues to unveil the function of the kinase and it is used by KinMut to calculate the likeness of the protein (and subsequently the mutation) to play a role in disease.
In a previous publication
 we demonstrated that mutations occurring in certain domains such as the Tyrosine kinase domain (PKinase Tyr, according to PFAM) are more likely to cause disease. This is coherent with the assumption that the function of some domains is more important than the function of others. In wKinMut, this information is contained in the ‘PFAM domains’ tab (Figure
1, panel d), which displays the domain (or domains, in some cases) where the mutation is occurring and the alignment used by PFAM as seed to generate the domain family. The alignment is evaluated in terms of sequence conservation. Under the assumption that conserved regions have been preserved by evolution, this information can help the user to identify important regions in the structure of the domain.
Mapping the mutations onto structures
To understand the consequences of mutations might have in protein stability and function it is sometimes useful to study the mutations in their structural contexts. However, mapping mutations from sequences to structures is not always trivial
. Under the ‘Structures‘ tab, wKinMut enables the visualization of the mutation mapped to all available structures. (Figure
1, panel e). In addition, the versatility of the Jmol applet implemented in wKinMut allows advanced users to adapt the visualization to their specific needs.
Prediction of the pathogenicity
In wKinMut the theoretical pathogenicity of mutations is assessed by two independent methods, namely SIFT
 and KinMut
. This information is displayed in the ‘Pathogenicity’ tab (Figure
1, panel f). SIFT
 predicts whether non-synonymous mutations are prone to affect protein function. This prediction is based on the degree of conservation of the residues in sequence alignments derived from closely related sequences. A threshold value of 0.05 is used to determine that mutations are likely to be pathogenic. KinMut
 is a kinase-specific predictor of the pathogenicity of mutations. It relies in a machine-learning approach (SVM) to evaluate a number of sequence-derived features that describe kinase mutations from different perspectives, including: a) at the gene level, the membership to a Kinbase group and Gene Ontology terms. b) at the domain level, the occurrence of the mutation inside a PFAM domain, and c) at the residue level, several properties including amino acid type, functional annotations from Swissprot and FireDB
, specificity-determining positions, etc. SVM scores greater than -0.5 indicate that the mutation is very likely pathogenic. The values of these features are also displayed in this section of the web-service to aid to interpret the predictions. Please, refer to the original publications for information on the individual characteristics, capabilities and validation of each predictor.
Mutations in databases
The wealth of knowledge provided by current research is usually stored in databases. A number of them store information about mutations from diverse perspectives. In wKinMut (Figure
1, panel g) we collect information from four different sources (namely the Uniprot Variant Pages
 and COSMIC
) in an attempt to cover all aspects of protein kinase mutation. The information displayed includes information about the structural consequences of mutations, experiments associating mutations with a certain disease, or the proof that a mutation has been observed in a cancer sample.
Automatic extraction of mutations from the literature
Unfortunately, the databases referred in the previous section do not contain all current knowledge about mutations. Even in the cases where a database record exists, the knowledgebase cannot always store all contextual information. The context is sometimes very important for the correct interpretation of the predictions: experimental conditions, patients’ habits and clinical histories, etcetera. wKinMut provides pointers to mentions of the mutations in the literature under the ‘Literature’ tab (Figure
1, panel h). We extract this information automatically using our in-house text mining approach,SNP2L
. In brief, SNP2L is a literature mining pipeline for the automatic extraction and disambiguation of singlepoint mutation mentions from both abstracts as well as full text articles, followed by a sequence validation check to link mutations to their corresponding kinase protein sequences.
Automatic determination of interaction partners
wKinMut integrates Protein-Protein Interactions (PPI) gathered from iHOP in the homonymous tab (Figure
1, panel i). Briefly, iHOP is a powerful text mining system to automatically extract protein protein interactions from PubMed abstracts. To relate the interaction information with its context, the sentences including the interaction mentions are also provided.