The yeast kinome displays scale free topology with functional hub clusters
© Lee and Megeney; licensee BioMed Central Ltd. 2005
Received: 12 May 2005
Accepted: 09 November 2005
Published: 09 November 2005
The availability of interaction databases provides an opportunity for researchers to utilize immense amounts of data exclusively in silico. Recently there has been an emphasis on studying the global properties of biological interactions using network analysis. While this type of analysis offers a wide variety of global insights it has surprisingly not been used to examine more localized interactions based on mechanism. In as such we have particular interest in the role of key topological components in signal transduction cascades as they are vital regulators of healthy and diseased cell states.
We have used publicly available databases and a novel software tool termed Hubview to model the interactions of a subset of the yeast interactome, specifically protein kinases and their interaction partners. Analysis of the connectivity distribution has inferred a fat-tailed degree distribution with parameters consistent with those found in other biological networks. In addition, Hubview identified a functional clustering of a large group of kinases, distributed between three separate groupings. The complexity and average degree for each of these clusters is indicative of a specialized function (cell cycle propagation, DNA repair and pheromone response) and relative age for each cluster.
Using connectivity analysis on a functional subset of proteins we have evidence that reinforces the scale free topology as a model for protein network evolution. We have identified the hub components of the kinase network and observed a tendency for these kinases to cluster together on a functional basis. As such, these results suggest an inherent trend to preserve scale free characteristics at a domain based modular level within large evolvable networks.
The Barabási and Albert scale free network model is a mathematical precept that describes the innate connectivity and distribution within complex networks. These scale free networks defy the traditional random graph model of Erdös and Renyi and display a connectivity distribution where the occurrence of highly interacting components of the network, defined as nodes decay as a power law, P(k) ~ k-γ[1–3]. In turn, growth of a scale free network is characterized by a preferential attachment scheme in which new nodes attach to older more connected nodes with a higher probability [2, 4, 5]. This model facilitates a rich-get-richer schema and allows for the existence of a very important class of highly connected hubs [1, 6]. These hubs are largely responsible for the non-Gaussian connectivity distribution of scale free networks and are commonly orders of magnitude more connected than the average node. The existence of the hubs also provide a robust environment that is tolerant of random attack and failure but is very sensitive to hub perturbation [3, 7–10].
This scale free topology has been demonstrated in a variety of man-made networks such as the World Wide Web and the actor collaboration network [1, 2]. Scale free principles have also been noted in biologic systems such as the yeast protein-protein interaction dataset and the metabolic protein network [3, 6]. Nevertheless, the suitability of the static scale free construct across diverse biologic systems has been challenged as a universal principle. For example, the yeast protein interaction network has been described as a date and party hub scale free network, in which these hubs are defined by variable or consistent interactions, respectively . More critically, mathematical models of network growth have shown that preferential attachment may follow a random geometric topology rather than a scale free distribution . Another study uses a learning algorithm to infer duplication-mutation-complementation as the central topology mechanism in the Drosophila melanogaster protein interaction network . Indeed, it has been reported that the essential proteins within the larger yeast protein interaction network form an exponential connectivity distribution rather than a scale free distribution . These observations raise intriguing possibilities, one of which suggests that broader scale free systems may evolve from a compilation of sub networks of different topology. Alternatively, this non-scale free structure may be an anomaly that originates from examining essential hubs versus non-essential hubs in the framework of an already established network.
Within this context, phosphorylation dependent signal transduction pathways provide an interesting venue to examine network behavior. In eukaryotic organisms, kinase directed phosphorylation is one of the most common forms of post-translational modification and as such this protein class is noted as a vital regulator of cellular function [14–16]. In addition, kinase families are well conserved across diverse phyla, suggesting that network organization may be similarly conserved. However, phosphorylation pathways are commonly studied as linear events connecting stimulus to response through a simple ladder of molecular interactions, a concept that is based largely on experimental perturbation and observation of directly connected proteins.
As such, identification of the select kinase hubs and interaction profiling should offer an insight into the functional complexities of cellular signaling in yeast and higher eukaryotes. Here, we examined the subset of the S. cerevisiae interaction data, which include protein kinases and their direct protein interactions. In all cases, analysis was performed on filtered datasets available in public databases to identify likely hub kinases and their interactivity. We confirmed scale free behaviour of this dataset using connectivity analysis and observed parameters as applied to a novel computer program/visualization tool we termed Hubview . Interactions between the 19 most connected kinases, which we identified as super-hubs, were mapped along with less connected hub kinases. From this map we were able to discern three distinct clusters of kinase proteins, with each cluster retaining a common biologic function, i.e. cell cycle control, DNA repair/recombination and the pheromone/mating response. Together these observations suggest that scale free topology of the yeast kinome co-evolved with the emergence of distinct biologic domains.
Results and discussion
Summary of hub kinases as identified by Hubview: Weighted mean calculated by giving double weight to degrees listed in the core KPI dataset. Hubs with knockout lethal phenotype listed as identified by Giaver et al .
DIP Node Number
Confidence as a super-hub
150 ± 40
50 ± 10
40 ± 10
37 ± 5
34 ± 7
33 ± 5
24 ± 1
22 ± 2
22 ± 2
19 ± 5
19 ± 6
19 ± 3
17 ± 2
16 ± 6
16 ± 2
14 ± 1
14 ± 1
14 ± 1
14 ± 1
13 ± 5
13 ± 2
12 ± 4
12 ± 1
12 ± 1
12 ± 2
12 ± 2
11 ± 9
11 ± 7
11 ± 1
10 ± 3
The 124 members of the protein kinase superfamily list  were cross referenced with the list of essential yeast proteins  to identify the yeast kinases with known knockout lethal phenotypes. Of the 124 kinases only 16 were deemed to be lethal deleterious mutants yielding a 13% chance of lethality in an instance of random single kinase deletion. In contrast, 6 of the 19 hubs named as high confidence in table 1 are listed as essential resulting in a 32% chance of lethality attributed to random deletion of one of the 19 high confidence hubs. This marked increase in lethality associated with directed hub attack is consistent with existing studies of scale free networks  and indicates a likely tendency for hub kinases to be preserved in an evolutionary perspective.
In response to the latter interpretation we examined the basic purpose of the individual hubs and observed a common functional theme concomitant with each cluster. The largest cluster, containing cdc28, is functionally associated with cell cycle propagation through the various phases. The second cluster, with CKA1 as a peak, is generally associated with kinase proteins that manipulate response to DNA damage and the final KSS/MAPK cluster is involved with the regulation of the pheromone response. These results seem to offer a reasonable order to the emergence of specialized functions central to all eukaryotes i.e., the cell division cycle predates the DNA verification mechanisms, which in turn predates the youngest reproductive module, the mating response.
The entire core and complete kinomes were clustered using the probabilistic method described by Samanta and Liang . This method identifies functional relationships between proteins through redundancy of interaction partners. A number of the associations in the backbone clusters were confirmed using this algorithm (figure 3b). Interestingly the proteins in the cell cycle propagation cluster did not appear as functionally redundant in the clustering. Presumably the three clusters converge downstream to some extent but at the hub level this indicates that these components offer highly specialized non redundant services to the cell cycle cluster likely due to the ancient nature of their function. This method can also be used to identify likely synthetic lethality as many viable knockouts are rescued through redundant interactions. The full results of the clustering is available as supporting information (see additional file 1) or can be generated using Hubview.
The network of essential yeast proteins has been compiled and identified as an exponential distribution . This distribution is normally associated with more stochastic evolutionary mechanisms. It has been argued that this network may represent an ancestral core about which the rest of the yeast interactome has formed . The existence of an exponential core does not directly contradict the scale free topology observed in the protein interaction network but may simply exist as a scaffold for scale free mechanisms to adhere to. This possibility is interesting as it may also suggest that different parts of the interactome may have evolved by different evolutionary pressures causing unequal distribution of topological properties within the same interactome.
A recent investigation concerning the effects of sampling on topology adds a small shard of doubt to studies of protein network topology. In this study the effects of various large scale experiments were simulated by first generating different networks of known topology and then sampling interactions in a scale mimicking yeast two hybrid and co-affinity purification . They found that under some conditions that non-scale free topologies (i.e. Erdös and Renyi network with <K> = 10), when sampled, can generate sub-networks with scale free properties. Here the kinome benefits from the fact that it is a widely studied mechanistic class and many of the interactions, especially in the core kinome, have been identified in smaller scale experiments and not exclusively large scale experiments. This suggests that the much smaller kinome network may not suffer as much as networks derived solely from large scale experiments. The results of this study certainly insist on the caveat that the results of our KPI network cannot be extrapolated to the complete yeast protein interaction network with any amount of confidence.
Our analysis suggests that the yeast kinome is an evolved scale free system. Moreover, these observations suggest the intriguing possibility that the scale free topology of the global protein-protein interaction network or any larger biologic network may be the composite of smaller evolving topologies (such as the kinome), all of which are subject to their own selective pressures.
Both the core and complete yeast interaction data of the manually curated DIP  were used as interaction data sets. The complete dataset consists largely of high throughput interaction data [19, 31–33]. The core DIP dataset consists of interactions found in small scale experiments, two or more independent larger scale experiments and, when paralogous interaction data exist, the Paralogous Verification Method PVM [31, 32]. The core dataset is believed to correctly identify the core of interacting proteins in yeast and provides a minimal interaction view of the yeast interactome. For our purposes the complete yeast interaction set is viewed as a hypothetical maximal interaction set. The many false positives, negatives and unlikely biologic interactions  available in the complete dataset are still valuable as they may be representative of interactions in a diseased state based on possible spatial and temporal protein delocalization. The DIP is available online at http://dip.doe-mbi.ucla.edu.
Both datasets were filtered to include only kinases and direct interaction partners with kinases as found in the DIP node search in conjunction with kinases listed in the protein kinase superfamily found by Hunter and Plowman . The resulting Kinase-Partner Interaction dataset (KPI) consisted of 607 nodes with 834 interactions in the case of the core dataset and 1085 nodes with 1481 interactions in the case of the complete dataset.
We developed a program called Hubview to help us analyze the KPI network and visualize the hubs and hub interactions found in the datasets. The degree distribution of the loaded network can be obtained by pressing the probability distribution button. The main program and OpenGL network interface utilize an undirected binary adjacency matrix which is then interpreted in real-time 3D. Yeast specific information such as the naming convention (DIP number, ORF and common name) and protein type (kinase or non-kinase) is hard coded into Hubview minimizing the amount of data required to generate an interaction network. The 3D representation is geared towards identifying nodes with degrees higher than a user-defined cut-off and displaying them in either a hub-star-satellite view whereby hub degree and inter-hub interactions are plainly visible or a Fruchterman-Rheingold (FR) force-directed placement arrangement  which offers a less tangled, more visually appealing interpretation.
Briefly, the FR algorithm causes the system to untangle itself through iterative simulation of mechanical and electrostatic forces. A connection between a pair of nodes is treated as though a spring were connecting those nodes creating attractive forces between all connected pairs. Repulsive electrostatic forces are also generated by considering each node as a negative point charge. The nodes in the analogous system move in 3 dimensional space according to the attractive and repulsive forces. The final arrangement is displayed once the system has evolved through a set number of iterations resulting in an intelligible and appealing graph.
The hub-star-satellite view is generated by placing all nodes randomly within a sphere of radius ri. All nodes with connectivity higher than the user defined cutoff are identified as hubs and projected outside of the sphere to a radial position, rf, outside the confines of the initial sphere (rf > ri). Any substrates of the new hub with unary degree are also moved to positions spherically centered near the newly placed hub generating a hub-star-satellite. The algorithm ends once all hubs are processed similarly. The advantage of this view type is that it allows interactions between hubs to be quickly and easily identified as all visually interfering substrates remain pooled within the initial sphere.
Another useful visualization method included in Hubview is the cascade crawler function. This view type is geared towards depiction of smaller cascades (the immediate and remote neighbors of a chosen protein) within the complete network. The cascade crawler function is controlled by a point and click interface whereby the user can define a specific protein(s) as a starting point and display all of its substrates by clicking on it. Clicking subsequent nodes will display their interaction partners in turn. Using this function along with the FR algorithm one can develop appealing visual interpretations of specific cascades and interactions (figure 4).
Hubview also utilizes the clustering method proposed by Samanta & Liang . The main suggestion of this algorithm is that if two proteins in a network share a significantly larger number of common interaction partners than what is expected from a similar random network then the pair of proteins likely share a close functional relationship. This process assigns a P value between every pair of proteins in the network representing the probability that an association between proteins is random i.e. a higher P score means that the pair is not functionally associated. The algorithm then merges the pair sharing the lowest P value into a cluster and recalculates P values for all possible pairs again treating the newly formed cluster as though it were a single protein. This process repeats until all P values are higher than a user defined cutoff. Once a network is loaded one can access this method by clicking the cluster button. Here a cutoff value can be defined which represents the probability that a particular association is random and a dendrogram is produced (which can be saved as a .BMP file), Samanta & Liang reported successful clustering of a large portion of the yeast interactome (N = 4692) using a cutoff value of up to 2 × 10-4  indicating that this cutoff can be considered sharp and biologically relevant in our much smaller KPI networks (Ncore = 607 and Ncomplete = 1085).
To counter the distortion associated with log-log data transformation the γ-value associated with the degree distribution of the KPI was analyzed using maximum likelihood estimation of the zeta function (MLE) and goodness of fit confirmed by the Kolmogorov-Smirnov test for power law distributions . Briefly, the γ parameter associated with the pure power law,
is best approximated by the solution of:
ζ(γ) is the Riemann Zeta function
ki is the ith non-zero observed degree of the P(k) vs. k distribution.
γ is the power law exponent 
Phenotypic profiles of gene-deletion mutants (nearly 96% of known ORFs) have been systematically constructed and analyzed by a PCR-based gene deletion strategy . A list of essential ORFs has been generated  and can be used to predict a lethal protein knockout or disruption phenotype.
The authors would like to thank Drs. Pasan Fernando, Lawrence Puente and Lukasz Salwinski for helpful discussions. This work was supported by grants to L.A.M. from the Canadian Institutes of Health Research (CIHR) and the Heart and Stroke Foundation of Canada. L.A.M. is the Mach-Gaennelsen chair in cardiac research at the Ottawa Health Research Institute.
- Albert R, Jeong H, Barabási AL: Internet: Diameter of the World-Wide Web. Nature 1999, 401(6749):130–131. 10.1038/43601View ArticleGoogle Scholar
- Barabasi AL, Albert R: Emergence of scaling in random networks. Science 1999, 286(5439):509–512. 10.1126/science.286.5439.509View ArticlePubMedGoogle Scholar
- Jeong H, Mason SP, Barabasi AL, Oltvai ZN: Lethality and centrality in protein networks. Nature 2001, 411(6833):41–42. 10.1038/35075138View ArticlePubMedGoogle Scholar
- Barabasi AL, Albert R, Jeong H: Mean-Field theory for scale-free random networks. Physica A 1999, 272: 173–187. 10.1016/S0378-4371(99)00291-5View ArticleGoogle Scholar
- Eisenberg E, Levanon EY: Preferential attachment in the protein network evolution. Phys Rev Lett 2003, 91(13):138701. 10.1103/PhysRevLett.91.138701View ArticlePubMedGoogle Scholar
- Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL: The large-scale organization of metabolic networks. Nature 2000, 407(6804):651–654. 10.1038/35036627View ArticlePubMedGoogle Scholar
- Callaway DS, Newman ME, Strogatz SH, Watts DJ: Network robustness and fragility: percolation on random graphs. Phys Rev Lett 2000, 85(25):5468–5471. 10.1103/PhysRevLett.85.5468View ArticlePubMedGoogle Scholar
- Sole RV, Montoya JM: Complexity and fragility in ecological networks. Proc R Soc Lond B Biol Sci 2001, 268(1480):2039–2045. 10.1098/rspb.2001.1767View ArticleGoogle Scholar
- Shargel B, Sayama H, Epstein IR, Bar-Yam Y: Optimization of robustness and connectivity in complex networks. Phys Rev Lett 2003, 90(6):68701. 10.1103/PhysRevLett.90.068701View ArticleGoogle Scholar
- Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJ, Cusick ME, Roth FP, Vidal M: Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 2004, 430(6995):88–93. 10.1038/nature02555View ArticlePubMedGoogle Scholar
- Przulj N, Corneil DG, Jurisica I: Modeling interactome: scale-free or geometric? Bioinformatics 2004, 20(18):3508–3515.View ArticlePubMedGoogle Scholar
- Middendorf M, Ziv E, Wiggins CH: Inferring network mechanisms: The Drosophila melanogaster protein interaction network. Proc Natl Acad Sci U S A 2005.Google Scholar
- Pereira-Leal JB, Audit B, Peregrin-Alvarez JM, Ouzounis CA: An Exponential Core in the Heart of the Yeast Protein Interaction Network. Mol Biol Evol 2004.Google Scholar
- Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S: The protein kinase complement of the human genome. Science 2002, 298(5600):1912–1934. 10.1126/science.1075762View ArticlePubMedGoogle Scholar
- Hunter T: Signaling--2000 and beyond. Cell 2000, 100(1):113–127. 10.1016/S0092-8674(00)81688-8View ArticlePubMedGoogle Scholar
- Holmberg CI, Tran SE, Eriksson JE, Sistonen L: Multisite phosphorylation provides sophisticated regulation of transcription factors. Trends Biochem Sci 2002, 27(12):619–627. 10.1016/S0968-0004(02)02207-7View ArticlePubMedGoogle Scholar
- Hunter T, Plowman GD: The protein kinases of budding yeast: six score and more. Trends Biochem Sci 1997, 22(1):18–22. 10.1016/S0968-0004(96)10068-2View ArticlePubMedGoogle Scholar
- von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 2002, 417(6887):399–403. 10.1038/nature750View ArticlePubMedGoogle Scholar
- Goldstein ML, Morris SA, G.G. Y: Problems with fitting to the power-law distribution. Eur Phys J B 2004, 41: 255–258. 10.1140/epjb/e2004-00316-5View ArticleGoogle Scholar
- Farkas I, Derenyi I, Jeong H, Neda Z, Oltvai ZN, Ravasz E, Schubert A, Barabasi AL, Vicsek T: Networks in life: Scaling properties and eigenvalue spectra. Physica A 2002, 314: 25–34. 10.1016/S0378-4371(02)01181-0View ArticleGoogle Scholar
- Yook SH, Oltvai ZN, Barabasi AL: Functional and topological characterization of protein interaction networks. Proteomics 2004, 4(4):928–942. 10.1002/pmic.200300636View ArticlePubMedGoogle Scholar
- Ma HW, Zeng AP: The connectivity structure, giant strong component and centrality of metabolic networks. Bioinformatics 2003, 19(11):1423–1430. 10.1093/bioinformatics/btg177View ArticlePubMedGoogle Scholar
- Yeast Deletion Project[http://www-sequence.stanford.edu/group/yeast_deletion_project/Essential_ORFs.txt]
- Barz T, Ackermann K, Pyerin W: Perturbation of protein kinase CK2 uncouples executive part of phosphate maintenance pathway from cyclin-CDK control. FEBS Lett 2003, 537(1–3):210–214. 10.1016/S0014-5793(03)00112-1View ArticlePubMedGoogle Scholar
- Wuchty S: Evolution and topology in the yeast protein interaction network. Genome Res 2004, 14(7):1310–1314. 10.1101/gr.2300204PubMed CentralView ArticlePubMedGoogle Scholar
- Kunin V, Pereira-Leal JB, Ouzounis CA: Functional evolution of the yeast protein interaction network. Mol Biol Evol 2004, 21(7):1171–1176. 10.1093/molbev/msh085View ArticlePubMedGoogle Scholar
- Samanta MP, Liang S: Predicting protein functions from redundancies in large-scale protein interaction networks. Proc Natl Acad Sci U S A 2003, 100(22):12579–12583. 10.1073/pnas.2132527100PubMed CentralView ArticlePubMedGoogle Scholar
- Han JD, Dupuy D, Bertin N, Cusick ME, Vidal M: Effect of sampling on topology predictions of protein-protein interaction networks. Nat Biotechnol 2005, 23(7):839–844. 10.1038/nbt1116View ArticlePubMedGoogle Scholar
- Database of Interacting Proteins[http://dip.doe-mbi.ucla.edu/]
- Deane CM, Salwinski L, Xenarios I, Eisenberg D: Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol Cell Proteomics 2002, 1(5):349–356. 10.1074/mcp.M100037-MCP200View ArticlePubMedGoogle Scholar
- Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The Database of Interacting Proteins: 2004 update. Nucleic Acids Res 2004, 32 Database issue: D449–51. 10.1093/nar/gkh086View ArticleGoogle Scholar
- Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D: DIP: the database of interacting proteins. Nucleic Acids Res 2000, 28(1):289–291. 10.1093/nar/28.1.289PubMed CentralView ArticlePubMedGoogle Scholar
- Fruchterman TMJ, Reingold EM: Graph Drawing by Force-directed Placement. Softw Exp Pract 1991, 21(11):1129–1164.View ArticleGoogle Scholar
- Giaever G, Chu AM, Ni L, Connelly C, Riles L, Veronneau S, Dow S, Lucau-Danila A, Anderson K, Andre B, Arkin AP, Astromoff A, El-Bakkoury M, Bangham R, Benito R, Brachat S, Campanaro S, Curtiss M, Davis K, Deutschbauer A, Entian KD, Flaherty P, Foury F, Garfinkel DJ, Gerstein M, Gotte D, Guldener U, Hegemann JH, Hempel S, Herman Z, Jaramillo DF, Kelly DE, Kelly SL, Kotter P, LaBonte D, Lamb DC, Lan N, Liang H, Liao H, Liu L, Luo C, Lussier M, Mao R, Menard P, Ooi SL, Revuelta JL, Roberts CJ, Rose M, Ross-Macdonald P, Scherens B, Schimmack G, Shafer B, Shoemaker DD, Sookhai-Mahadeo S, Storms RK, Strathern JN, Valle G, Voet M, Volckaert G, Wang CY, Ward TR, Wilhelmy J, Winzeler EA, Yang Y, Yen G, Youngman E, Yu K, Bussey H, Boeke JD, Snyder M, Philippsen P, Davis RW, Johnston M: Functional profiling of the Saccharomyces cerevisiae genome. Nature 2002, 418(6896):387–391. 10.1038/nature00935View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.