Volume 17 Supplement 6
Prediction of scaffold proteins based on protein interaction and domain architectures
© Oh and Yi. 2016
Published: 28 July 2016
Scaffold proteins are known for being crucial regulators of various cellular functions by assembling multiple proteins involved in signaling and metabolic pathways. Identification of scaffold proteins and the study of their molecular mechanisms can open a new aspect of cellular systemic regulation and the results can be applied in the field of medicine and engineering. Despite being highlighted as the regulatory roles of dozens of scaffold proteins, there was only one known computational approach carried out so far to find scaffold proteins from interactomes. However, there were limitations in finding diverse types of scaffold proteins because their criteria were restricted to the classical scaffold proteins. In this paper, we will suggest a systematic approach to predict massive scaffold proteins from interactomes and to characterize the roles of scaffold proteins comprehensively.
From a total of 10,419 basic scaffold protein candidates in protein interactomes, we classified them into three classes according to the structural evidences for scaffolding, such as domain architectures, domain interactions and protein complexes. Finally, we could define 2716 highly reliable scaffold protein candidates and their characterized functional features. To assess the accuracy of our prediction, the gold standard positive and negative data sets were constructed. We prepared 158 gold standard positive data and 844 gold standard negative data based on the functional information from Gene Ontology consortium. The precision, sensitivity and specificity of our testing was 80.3, 51.0, and 98.5 % respectively. Through the function enrichment analysis of highly reliable scaffold proteins, we could confirm the significantly enriched functions that are related to scaffold protein binding. We also identified functional association between scaffold proteins and their recruited proteins. Furthermore, we checked that the disease association of scaffold proteins is higher than kinases.
In conclusion, we could predict larger volume of scaffold proteins and analyzed their functional characteristics. Deeper understandings about the roles of scaffold proteins from this study will provide a higher opportunity to find therapeutic or engineering applications of scaffold proteins using their functional characteristics.
Cells regulate and integrate various functional modules to monitor external and internal states, and to execute the appropriate physiological responses. Generally, cells can monitor environmental stimuli using sensors like receptors. This information is then processed by intracellular signaling networks to control various cellular outputs. Scaffold proteins are known as an important controller in this process . Scaffold proteins are signaling organizers which can modulate signaling specificity, integration, crosstalk, feedback, and multiplicity by acting as a physical platform to assemble signaling components [2, 3]. Through these regulations, scaffold protein can lead to dynamic signaling outputs . Scaffold proteins are involved not only signaling processes but also in the assembly-line processes and cell-cell communications . Scaffold proteins also control enzymatic activities by conformational fine-tuning. Scaffold proteins can engage their interacting partners and transport them into specific cellular compartments . To sum up, scaffold proteins basically need to assemble multiple proteins by protein-protein interaction using interacting domain to enforce proximity. Mainly, scaffold proteins regulate spatial organization of reactions and control dynamics by recruiting modifiers or acting as catalysts. They also act as a signaling/metabolism organizer. Through these functionalities of scaffold proteins, it is possible to combine the use of these elements, protect activated signaling molecules from inactivation, and control dynamic signaling output.
As mentioned, the characteristics of scaffold proteins could be applied as therapeutic targets to treat human diseases and industrial applications to synthesize desired chemical products by engineering. There has been encouraging example of scaffold proteins as therapeutic applications. Some studies have suggested IQGAP1 proteins are highly expressed in cancer cell lines  and plays a role for scaffold protein IQGAP1 in enhancing tumorigenesis, but IQGAP1 knockout mice are viable and fertile, do not show any defects in normal epithelium and heal wounds normally . Thus, IQGAP1 is a potential tumor-required scaffold protein that is dispensable for homeostasis. So, they made scaffold-kinase interaction blockade (SKIB). SKIB acts using a mechanism distinct from direct kinase inhibition and may be a strategy to target overactive oncogenic kinase cascades in cancer . Like this example, aberrant regulation of these various cellular functions can lead to the development of many types of diseases, because scaffold proteins act as systemic regulators in cellular network.
In spite of the importance of scaffold protein, only a few have been discovered on an individual basis and their regulatory roles are largely unknown. Zeke et al. provide a definition for classical scaffold proteins. Classical scaffold protein can be defined as proteins that: (i) lack intrinsic catalytic activity relevant for signaling; (ii) have at least two binding partners with catalytic activity relevant for signaling; and (iii) have binding partners that interact with each other in a direct or indirect way . Fidel and Mario firstly predicted potential scaffold proteins from interactomes according to the criteria by Zeke et al. . However, there was a limitation to find diverse scaffold proteins because their criteria were restricted to the classical scaffold proteins. In this study, we searched known scaffold proteins from articles and database and used that knowledge to give reliability to scaffold proteins predicted from interactomes.
We newly defined criteria for finding scaffold proteins focused on structural features to act as scaffold proteins. We extracted 10,127 proteins which have multiple interacting partners from protein interactomes and defined 2716 reliable scaffold proteins according to our novel criteria. We carried out the functional association between scaffold proteins and their recruited proteins and the disease association were tested. Through functional enrichment analysis, we could identify the information of their known function and additional novel implications. As a result, our discovery can help further investigation to study or utilize scaffold proteins for engineering and therapeutics.
Collection of interactome data
To predict scaffold proteins from interactomes by using structural features, we collected protein-protein interaction (PPI), domain-domain interaction (DDI), and protein-domain, and protein complex data. The protein domain information were taken from the Pfam database . PPI and DDI data were collected from integrated PPI database  and integrated DDI database (IDDI)  respectively. Moreover, we downloaded the protein complex datasets from COFECO .
Collection of functional categories
From the UniProtKB, we first obtained totalhuman proteins in SwissProt . Disease-associated genes were collected from three databases: OMIM, PharmGKB , KEGG DISEASE . Because the naming of disease status vary among the source databases, we standardized the disease names by extracting the Unified Medical Language System (UMLS)  IDs using MetaMap. The UMLS IDs were converted to ICD-10-CM (International Classification of Diseases, 10th Revision, Clinical Modification) once more, using the mapped information that were provided in the UMLS. Drugs and their targets data were collected from DrugBank . To compare the functional associations between scaffolds and their partner proteins, we prepared data of localization and pathway. Localization data is collected from Gene Ontology  Cellular Compartment. Each identifier of the Gene Ontology Cellular Compartment is re-organized into 17 cellular compartments according to the hierarchical structure (cell surface, chromosome, cytoplasm, cytoplasmic membrane-bounded vesicle, cytoskeleton, cytosol, endoplasmic reticulum, endosome, ER-Golgi intermediate compartment, extracellular region, Golgi apparatus, mitochondrion, nucleus, plasma membrane, ribosome, sarcoplasmic reticulum, vacuole). Pathway data is collected from four different pathway databases (KEGG , PID , Reactome , and WikiPathway ) and defined pathway names and their components were extracted.
Collection of known scaffold proteins
To test reliability of our prediction, we defined the gold standard positive and the negative set using basic text mining and functional term filtering. For the gold standard positive set, we collected scaffold protein candidates from multiple sources. First, we manually gathered scaffold proteins from review articles. Second, we found candidates using query search from functional descriptions of UniProt database and title/abstract of PubMed. From those candidates, we filtered out candidates which already have their known molecular functions as scaffold activities and complex assemblies. For the gold standard negative set, we excluded proteins which have molecular functions and biological functions related to known scaffold proteins.
Criteria for predicting novel scaffold proteins
Characterization of scaffold protein candidates
Gene annotation enrichment analysis
We used the DAVID  tools to analyze functional characteristics of collected scaffold protein candidates. The functional meaning of the scaffold protein candidates was interpreted using function enrichment analysis tool in DAVID. We analyzed functional implications in GO molecular function, GO cellular compartment, GO biological process and Pfam families. The p-values were adjusted by multiple testing corrections using Benjamini and Hochberg’s method .
Functional association analysis
We tested the hypothesis of having no association between scaffold proteins and disease related genes (disease genes or drug targets). To use chi-square statistics, we made contingency table. Observed frequency is compared to expected frequency. If there was no association between scaffold proteins and disease related genes, then the expected frequency should be almost equal to the observed frequency, and the value of the chi-square statistic would be small and the probability (p-value) would be large.
a) Statistics of collected data
ComBiCom (BIND, BioGRID, DIP, HPRD, IntAct, MIPS)
iDDI (3DID, iPfam)
Protein complexes: 3317
KEGG, NCI-PID, Reactome, WikiPathway
Gene Ontology Cellular Compartment
GO terms: 635
OMIM, PharmGKB, KEGG Disease
Gold standard set
UniProt, PubMed, Gene Ontology
b) Statistics of scaffold protein candidates
# of scaffold proteins
2 × 2 contingency table for evaluating the performance of prediction
Condition positive (158)
Condition negative (844)
Prevalence 17.1 %
Predicted condition (2716)
Predicted condition positive
Precision (Positive predictive value) 83.8 %
Predicted condition negative
False omission rate 9.8 %
Accuracy (89.6 %)
Sensitivity (True positive rate) 42.4 %
Fall-out (False positive rate) 1.5 %
Miss rate (False negative rate) 57.6 %
Specificity (True negative rate) 98.5 %
Functional characteristics of predicted scaffold proteins
Functional similarity between scaffold and partner proteins
Similarity between scaffold protein candidates and partner proteins
a) Similarity of cellular localization
# of scaffold proteins
Matched with partner’s information
b) Similarity of related pathway
# of scaffold proteins
Matched with partner’s information
Disease and drug target association of scaffold protein candidates and kinases
Drug target association
Our prediction identified PIK3R1 as a scaffold protein candidate by recruiting GAB1 and PIK3CA. We could find protein expression level of PIK3R1 in both the normal cell and the cancer cell using Human Protein Atlas . Protein expression of PIK3R1 was not detected in normal breast cell, however it was highly expressed in breast cancer cell. PIK3CA is known as a gene related malignant neoplasm of breast  and inhibits apoptosis function. From these evidences, we could make hypothesis that cancer-specific high expression of PIK3R1 increases activation of PIK3CA and as a result, negatively regulated apoptotic function cause cancer in breast.
Using massive data from high-throughput screening, we could predict plenty of candidate proteins which may act as scaffolds. Many of them are not known as scaffold proteins but they have possibilities to recruitpartner proteins and regulate their functions. Although our text mining methods can be improved, known scaffold proteins extracted from articles and database might be quite helpful to corroborate the reliability of scaffold proteins that are predicted from interactomes. In this study, we used highly reliable data of protein-protein interaction and domain-domain interaction. Because there are many predicted information of protein domain, protein-protein interaction and domaindomain interaction, there is a chance to expand predicted scaffold proteins with scores of reliabilities. If we could utilize functional information or condition specific data, predicted scaffold proteins might be classified into various types by their functional characteristics, such as localization, pathway regulation or crosstalk. These functional characteristics also can be used as a measurements of the reliability scores. Some of known scaffold proteins recruit more than two proteins, but we restricted scaffold protein with two partner proteins, because there are so many possible combinations of partner proteins sets. We can filter and find scaffold proteins which can recruit more than two proteins from our predictions.
Scaffold proteins can precisely control the specificity and dynamics of information transfer. Furthermore, scaffold proteins have versatility due to their modularity, which allows recombination of protein domains to build new signaling pathways. In the past, scaffold proteins were discovered only by chance via experiments aimed at studying the function of signaling enzymes or receptors. We carried out extraction of scaffold proteins from articles and database and prediction from interactomes according to the new criteria we proposed. Through functional enrichment analysis, we identified not only the known functional implications of scaffold proteins but novel enriched terms. Using functional characteristics of partner proteins, we also predicted new function of scaffold proteins. Finally, we found that scaffold proteins were highly associated with diseases and drug targets like kinases. Through future studies, more can be understood about the role of scaffold proteins, and scaffolds can be used to generate new and predictable pathway to program useful cellular behaviors. In this respect, this study can support further researches for discovering the target of molecular engineering and therapy.
Ethics approval and consent to participate
Consent for publication
Availability of data and materials
The datasets supporting the conclusions of this article are included within the article and its additional files (Additional files 1 and 2).
This work was supported by the Bio-Synergy Research Project (NRF-2012M3A9C4048759), the Converging Research Center Program (Project No.2015054201), and the KAIST Future Systems Healthcare Project funded by the Ministry of Science, ICT and Future Planning.
Publication charges for this article have been funded by the Bio-Synergy Research Project (NRF-2012M3A9C4048759) of the Ministry of Science, ICT and Future Planning through the National Research Foundation.
This article has been published as part of BMC Bioinformatics Volume 17 Supplement 6, 2016: Proceedings of the ACM Ninth International Workshop on Data and Text Mining in Biomedical Informatics. The full contents of the supplement are available online at http://dl.acm.org/citation.cfm?id=2811186.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Good MC, Zalatan JG, Lim WA. Scaffold proteins: hubs for controlling the flow of cellular information. Science. 2011;332(6030):680–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Pan CQ, Sudol M, Sheetz M, Low BC. Modularity and functional plasticity of scaffold proteins as p(l)acemakers in cell signaling. Cell Signal. 2012;24(11):2143–65.View ArticlePubMedGoogle Scholar
- Palfy M, Remenyi A, Korcsmaros T. Endosomal crosstalk: meeting points for signaling pathways. Trends Cell Biol. 2012;22(9):447–56.View ArticlePubMedPubMed CentralGoogle Scholar
- Zeke A, Lukacs M, Lim WA, Remenyi A. Scaffolds: interaction platforms for cellular signalling circuits. Trends Cell Biol. 2009;19(8):364–74.View ArticlePubMedPubMed CentralGoogle Scholar
- Shaw AS, Filbert EL. Scaffold proteins and immune-cell signalling. Nat Rev Immunol. 2009;9(1):47–56.View ArticlePubMedGoogle Scholar
- White CD, Brown MD, Sacks DB. IQGAPs in cancer: a family of scaffold proteins underlying tumorigenesis. FEBS Lett. 2009;583(12):1817–24.View ArticlePubMedPubMed CentralGoogle Scholar
- Jameson KL, Mazur PK, Zehnder AM, Zhang J, Zarnegar B, Sage J, Khavari PA. IQGAP1 scaffold-kinase interaction blockade selectively targets RAS-MAP kinase-driven tumors. Nat Med. 2013;19(5):626–30.View ArticlePubMedPubMed CentralGoogle Scholar
- Stuart DD, Sellers WR. Targeting RAF-MEK-ERK kinase-scaffold interactions in cancer. Nat Med. 2013;19(5):538–40.View ArticlePubMedGoogle Scholar
- Ramirez F, Albrecht M. Finding scaffold proteins in interactomes. Trends Cell Biol. 2010;20(1):2–4.View ArticlePubMedGoogle Scholar
- Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, et al. Pfam: the protein families database. Nucleic Acids Res. 2014;42(Database issue):D222–30.View ArticlePubMedGoogle Scholar
- Youngwoong H, Choong-Hyun S, Min-Sung K, Gwan-Su Y. "Combined database system for binary protein interaction and co-complex association." In: IEEE 10.1109/IACSIT-SC.2009.42, 538-542Google Scholar
- Kim Y, Min B, Yi GS. IDDI: integrated domain-domain interaction and protein interaction analysis system. Proteome Sci. 2012;10 Suppl 1:S9.View ArticlePubMedGoogle Scholar
- Sun CH, Kim MS, Han Y, Yi GS. COFECO: composite function annotation enriched by protein complex data. Nucleic Acids Res. 2009;37(Web Server issue):W350–5.View ArticlePubMedPubMed CentralGoogle Scholar
- UniProt C. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43(Database issue):D204–12.Google Scholar
- Whirl-Carrillo M, McDonagh EM, Hebert JM, Gong L, Sangkuhl K, Thorn CF, Altman RB, Klein TE. Pharmacogenomics knowledge for personalized medicine. Clin Pharmacol Ther. 2012;92(4):414–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2010;38(Database issue):D355–60.View ArticlePubMedGoogle Scholar
- Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32(Database issue):D267–70.View ArticlePubMedPubMed CentralGoogle Scholar
- Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V, et al. DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 2014;42(Database issue):D1091–7.View ArticlePubMedGoogle Scholar
- Gene Ontology C. Gene Ontology Consortium: going forward. Nucleic Acids Res. 2015;43(Database issue):D1049–56.View ArticleGoogle Scholar
- Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30.View ArticlePubMedPubMed CentralGoogle Scholar
- Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH. PID: the Pathway Interaction Database. Nucleic Acids Res. 2009;37(Database issue):D674–9.View ArticlePubMedGoogle Scholar
- Croft D, Mundo AF, Haw R, Milacic M, Weiser J, Wu G, Caudy M, Garapati P, Gillespie M, Kamdar MR, et al. The Reactome pathway knowledgebase. Nucleic Acids Res. 2014;42(Database issue):D472–7.View ArticlePubMedGoogle Scholar
- Kelder T, van Iersel MP, Hanspers K, Kutmon M, Conklin BR, Evelo CT, Pico AR. WikiPathways: building research communities on biological pathways. Nucleic Acids Res. 2012;40(Database issue):D1301–7.View ArticlePubMedGoogle Scholar
- Dennis Jr G, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 2003;4(5):3.View ArticleGoogle Scholar
- Benjamini Y, Hochberg Y. Controlling the False Discovery Rate - a Practical and Powerful Approach to Multiple Testing. J Roy Stat Soc B Met. 1995;57(1):289–300.Google Scholar
- Arnold HK, Zhang X, Daniel CJ, Tibbitts D, Escamilla-Powers J, Farrell A, Tokarz S, Morgan C, Sears RC. The Axin1 scaffold protein promotes formation of a degradation complex for c-Myc. EMBO J. 2009;28(5):500–12.View ArticlePubMedPubMed CentralGoogle Scholar
- Jain P, Vig S, Datta M, Jindel D, Mathur AK, Mathur SK, Sharma A. Systems biology approach reveals genome to phenome correlation in type 2 diabetes. PLoS One. 2013;8(1):e53522.View ArticlePubMedPubMed CentralGoogle Scholar
- Papatriantafyllou M. WNT chews the fat with glucose uptake. Nat Rev Mol Cell Biol. 2012;13(6):339.View ArticlePubMedGoogle Scholar
- Uhlen M, Fagerberg L, Hallstrom BM, Lindskog C, Oksvold P, Mardinoglu A, Sivertsson A, Kampf C, Sjostedt E, Asplund A, et al. Proteomics. Tissue-based map of the human proteome. Science. 2015;347(6220):1260419.View ArticlePubMedGoogle Scholar
- Karakas B, Bachman KE, Park BH. Mutation of the PIK3CA oncogene in human cancers. Br J Cancer. 2006;94(4):455–9.View ArticlePubMedPubMed CentralGoogle Scholar