Towards the systematic discovery of signal transduction networks using phosphorylation dynamics data
© Imamura et al; licensee BioMed Central Ltd. 2010
Received: 20 November 2009
Accepted: 7 May 2010
Published: 7 May 2010
Phosphorylation is a ubiquitous and fundamental regulatory mechanism that controls signal transduction in living cells. The number of identified phosphoproteins and their phosphosites is rapidly increasing as a result of recent mass spectrometry-based approaches.
We analyzed time-course phosphoproteome data obtained previously by liquid chromatography mass spectrometry with the stable isotope labeling using amino acids in cell culture (SILAC) method. This provides the relative phosphorylation activities of digested peptides at each of five time points after stimulating HeLa cells with epidermal growth factor (EGF). We initially calculated the correlations between the phosphorylation dynamics patterns of every pair of peptides and connected the strongly correlated pairs to construct a network. We found that peptides extracted from the same intracellular fraction (nucleus vs. cytoplasm) tended to be close together within this phosphorylation dynamics-based network. The network was then analyzed using graph theory and compared with five known signal-transduction pathways. The dynamics-based network was correlated with known signaling pathways in the NetPath and Phospho.ELM databases, and especially with the EGF receptor (EGFR) signaling pathway. Although the phosphorylation patterns of many proteins were drastically changed by the EGF stimulation, our results suggest that only EGFR signaling transduction was both strongly activated and precisely controlled.
The construction of a phosphorylation dynamics-based network provides a useful overview of condition-specific intracellular signal transduction using quantitative time-course phosphoproteome data under specific experimental conditions. Detailed prediction of signal transduction based on phosphoproteome dynamics remains challenging. However, since the phosphorylation profiles of kinase-substrate pairs on the specific pathway were localized in the dynamics-based network, our method will be a complementary strategy to explore new components of protein signaling pathways in combination with previous methods (including software) of predicting direct kinase-substrate relationships.
Post-translational modification (PTM) of proteins regulates many biological phenomena . Among the several kinds of PTM, phosphorylation affects enzymatic activity, conformations, interactions, degradation, and localization of proteins, among other effects [2–4]; one of the critical roles of phosphorylation is in the control of protein signaling . More than 500 protein kinases are thought to regulate protein signaling in humans . In protein signaling, various reaction cascades transmit and amplify signals in a highly regulated manner by means of reversible site-specific protein phosphorylation . Kinases recognize the specific surrounding sequences of phosphosites when they phosphorylate their targets, and the majority of the identified kinases are thought to have their own unique target sequences, which are known as "motif sequences" .
Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS), combined with phosphopeptide enrichment technology , is a powerful method for identifying large numbers of phosphosites . This technology has achieved phosphorylation analysis at the proteome level and has greatly expanded the new field of phosphoproteomics. In recent years, phosphoproteome data has rapidly increased for various organisms, including humans [10–12], mice , yeast [14–16], flies [17, 18], plants , and bacteria [20, 21], as a result of system-wide experiments to investigate the behavior of signal transduction pathways under various stimuli and environmental conditions. For example, at least one-third of all human proteins have been revealed to be phosphorylated . The phosphoproteins and their phosphosites identified in these studies have been stored in public databases, such as Phospho.ELM , PHOSIDA , PhosphoSitePlus , and UniProt .
"Phosphoinformatics" approaches (i.e., bioinformatics of the phosphoproteome) have derived many useful biological interpretations from the huge and complex body of phosphoproteome data and have aided in the discovery of novel biological principles of protein signaling [27, 28]. Understanding of the relationship between protein kinases and the specific sequence patterns of their phosphorylation targets has increased rapidly and considerable data has accumulated as a result of recent phosphoproteome data mining studies [29, 30] or the use of predictive software such as Scansite  and NetPhos . Many kinases and their corresponding substrate recognition motifs have been accumulated in databases, such as the Human Protein Reference Database (HPRD) , PhosphoSitePlus , and NetPhorest .
The relationships between enzymes and their substrate motifs are useful for the discovery and reconstruction of signal transduction networks in living systems. Several computational approaches, including NetworKIN , have been developed to predict signaling pathways based on the unique phosphorylation target motifs of kinases and other -omics datasets (e.g., protein-protein interaction data and functional relationship data for genes). However, the kinase specificity for target motifs appears to be limited, and therefore the accurate prediction of enzyme-substrate pairs based on the documented motifs of the kinase targets remains difficult. Although the current LC-MS/MS-based phosphoproteome approach excels at the identification of phosphorylated components of the proteome, the identification of kinase-substrate relationships is fraught with challenges .
On the other hand, by taking advantage of protein labeling methods, the high-throughput LC-MS/MS-based proteomics approaches have enabled us to compare different states of intracellular proteomes, and several studies have revealed the time course of phosphoproteome behaviors [11, 37, 38]. Many phosphoproteins have been shown to have multiple potential phosphosites , and the phosphosites in each protein behaved in different ways in response to different external stimuli. The time-course profile of phosphorylation states at individual phosphosites across multiple cellular states can be used to capture intracellular signal transitions, for example, where signals are propagated from a receptor on the cell membrane to downstream pathways . Recently, several enzyme-substrate prediction approaches have started to use such time-course phosphoproteome data to solve the challenge of accurately predicting signal transduction pathways . Since the phosphorylation profiles of substrates are affected by their corresponding kinases, most of whose enzymatic activities are enhanced by phosphorylation, it has been generally assumed that in many cases, a kinase and its substrate have similar phosphorylation profiles. This concept of a "projection effect" of signaling proteins seems to work well to predict direct relationships between a kinase and its substrates in signaling pathways based on quantitative phosphoproteome data. For example, Jørgensen and colleagues  have reported the combination of quantitative phosphoproteome data with computational prediction of the signaling pathway utilizing NetworKIN and NetPhorest, and Locasale and Wolf-Yadlin  have reported a new approach for the prediction of a tyrosine signaling pathway in which they used the principle of maximum entropy to represent similarities in the phosphorylation profile as a network construct and to predict specific pathways.
In this paper, we propose a phosphorylation dynamics-based network approach that roughly clusters the proteins localized within a condition-specific signaling pathway. Although the maximum entropy-based approach  predicts pair-wise connections of proteins in tyrosine signaling, our proposed approach clusters all types of phosphoproteins (pS, pT, and pY proteins) in a network graph and suggests proteins that are localized within a specific pathway activated by a given experimental condition under which the dynamics of the phosphoproteome are measured. In this study, we used the time-course phosphoproteome data for the epidermal growth factor (EGF) stimulation of HeLa cells previously obtained using LC-MS/MS based on the stable isotope labeling using amino acids in cell culture (SILAC) approach . This phosphoproteome dynamics data consists of the phosphorylation activities of digested phosphopeptides at five time points after the EGF stimulation. Using this data, we constructed a dynamics-based network by connecting pairs of phosphopeptides that exhibited similar patterns of phosphorylation dynamics. According to several analyses based on graph theory and comparisons with known signaling pathways registered in public databases, we demonstrated that the phosphopeptides corresponding to proteins that participated in a signaling pathway were clustered within the dynamics-based network. Moreover, we found that the dynamics-based network for EGF stimulation was more strongly correlated with the known EGF receptor (EGFR) signaling pathway than with other known pathways. This suggests the biological specificity of the EGFR signaling pathway. Although the time-course data show that many phosphorylation sites, including those that are not thought to be involved in EGFR signaling, seemed to be activated by the EGF stimulation, only the known EGFR signaling pathway was markedly correlated with the time-course data at the pathway level. The EGFR signaling pathway thus seemed to be precisely controlled to avoid activation of unrelated pathways. The phosphorylation dynamics-based network approach will therefore prove to be a useful strategy for providing an overview of and for exploring condition-specific protein signaling; although it cannot replace direct prediction methods, it can help to improve their accuracy.
Results and Discussion
Construction of the phosphorylation dynamics-based network
The projection effect exists in protein signaling: some phosphorylation reactions are known to transfer signals from one molecule to another (i.e., project their effects onto the other molecule), and these reactions occur continuously (e.g., when kinase A phosphorylates protein B, then protein B is activated and starts to behave as a kinase; protein B, in turn, phosphorylates its target substrates). Because most kinases are thought to gain their enzymatic activities as a result of phosphorylation, these signaling proteins and their targeted substrates should have similar time courses for their phosphorylation behaviors. In this context, we proposed using a phosphorylation dynamics-based network approach to cluster proteins localized within a signaling pathway.
In this study, we used the time-course phosphoproteome data previously reported by Olsen et al. . This data provides the relative phosphorylation activities of digested peptides at each of five time points (0, 1, 5, 10, and 20 min) after the EGF stimulation (for details, see the Methods section). We extracted 1,050 peptides for which the relative phosphorylation activities were available at each of the five time points. Using this time-course data, we constructed dynamics-based networks by connecting all pairs of phosphopeptides with similar phosphorylation activity profiles. We calculated the similarity of every possible pair of phosphorylation profiles using Pearson's correlation coefficient (R). The dynamics-based network was then constructed to connect all phosphopeptide pairs with R values greater than a given threshold.
EGFR signaling pathways are more abundant in the cytoplasm than in the nucleus
To determine whether the phosphorylation dynamics-based network reflected the biological properties of a cell, we analyzed the network features using graph theory. First, we separated the dynamics-based network with R > 0.99 into cytoplasmic and nucleic subnetworks (for details, see the Methods section). The network densities of the cytoplasmic and nucleic networks were 2.25% and 1.52%, respectively (Figure 1B). The higher density in the cytoplasmic subnetwork indicates that in the cytoplasm, more peptides have similar phosphorylation patterns than in the nucleus, which in turn suggests that the EGF stimulation triggers more phosphorylation reactions in the cytoplasm than in the nucleus.
Consistent with this observation, the overall degrees of the cytoplasmic nodes were higher than those of the nucleic nodes in the dynamics-based network with R > 0.99 (Figure 1C), although fewer phosphopeptides were identified in the cytoplasm than in nucleus, as described previously (in the section "Construction of the phosphorylation dynamics-based network"). We obtained similar results even if we calculated node degrees separately in subnetworks that consisted only of nucleic nodes and only of cytoplasmic nodes (Figure 1D). These results may reflect the fact that in the larger cellular space of the cytoplasm, many long and alternative pathways can be activated to amplify and transmit signals into the nucleus, where the mandatory signals are transmitted through shorter, more highly regulated nuclear pathways to precisely control gene expression.
Time course of the phosphoproteome reflects the cellular fraction
The network density for the whole dynamics-based network with R > 0.99 (1.36%) was sparser than the densities of the cytoplasmic and nucleic subnetworks (Figure 1B). As illustrated in Figure 1A, this suggests that in the dynamics-based network, phosphopeptides from the same cellular fraction (i.e., the cytoplasm or nucleus) are tightly connected, whereas connections are less likely between the cytoplasmic and nucleic nodes.
We analyzed the patterns of several local network motifs that appeared in the dynamics-based network, where each node was either nucleic or cytoplasmic (Figure 1E, I, and 1K). The triangular motif (in which the three nodes are connected together by three edges) in a biological network is the smallest clump of nodes that reveals the connection patterns of nodes in a locally dense network, and is thought to be a basic motif that reflects biologically relevant phenomena . As Figure 1E shows, there are four possible triangular motifs (T) in the dynamics-based network, which consist of various combinations of nucleic and cytoplasmic nodes: all nodes are cytoplasmic (T3:0), two are cytoplasmic and one is nucleic (T2:1), one is cytoplasmic and two are nucleic (T1:2), and all nodes are nucleic (T0:3). In each of the dynamics-based networks (with R > 0.97, 0.98, and 0.99), we calculated the proportion of each of the four triangular motifs and compared it with the expected results generated by random edge rewiring (RER, used as the negative control; n = 1,000; for details, see the Methods section). In the dynamics-based network with R > 0.99, the proportions of T3:0 and T0:3 were 16.50% and 14.75%, respectively, which were significantly higher than the expected negative control values of 12.24 ± 1.74% (s.d.) and 11.77 ± 1.12% in the random networks (P < 0.001 and 0.01, respectively; Figure 1F). Conversely, the proportions of T2:1 and T1:2 were 35.40% and 33.34%, which were significantly lower than the corresponding control values of 38.30 ± 1.09% and 37.69 ± 1.70% (P < 0.05 and 0.001, respectively; Figure 1F). The dynamics-based networks generated with thresholds of R > 0.97 and 0.98 showed similar patterns (Figure 1G, H). Consequently, the triangular motif analysis supports the hypothesis that phosphopeptides identified in the same cellular fraction were localized (clustered) in the dynamics-based network. We also analyzed the number of binary (Figure 1I) and square (Figure 1K) motifs in the dynamics-based network with R > 0.99, and observed similar patterns (Figure 1J, L).
The connectivity of phosphopeptides within a given cellular fraction suggests that the phosphorylation dynamics-based network is a valid approach for the reconstruction of the entire protein-based signal pathway, because in most cases, signal transmission occurs between proteins that lie close together. We expected that the validity of our approach would depend partly on the threshold R. To test this hypothesis, we defined a phosphopeptide localization score for the dynamics-based network as the sum of the proportions T3:0 and T0:3, and calculated the O/E value for the localization score (where O is the observed localization score in the real network and E is the expected average score in the RER network with n = 1,000). The resulting average O/E values with R > 0.99, 0.98 and 0.97 were 1.30, 1.17, and 1.12, respectively. Therefore, the higher R threshold generates a dynamics-based network that more accurately reflects the actual intracellular signaling network.
The EGF stimulation activates many signaling proteins
Proteins with a degree ≥ 30 in the dynamics-based network.
phosphatidylinositol biosynthetic process; receptor-mediated endocytosis; phosphoinositide phosphorylation; phosphoinositide-mediated signaling
regulation of cell shape; signal transduction
epithelial to mesenchymal transition; regulation of transcription from RNA polymerase II promoter; positive regulation of gene-specific transcription
cell surface receptor linked signal transduction; chromatin modification; regulation of transcription from RNA polymerase II promoter; regulation of growth
insulin receptor signaling pathway
regulation of transcription, DNA-dependent
intracellular protein transport; protein secretion; vesicle-mediated transport; regulation of ARF GTPase activity
double-strand break repair via homologous recombination
regulation of transcription, DNA-dependent
regulation of dendrite development; actin filament organization; regulation of neuronal synaptic plasticity
multicellular organismal development; transcription
regulation of transcription, DNA-dependent
nuclear mRNA splicing, via spliceosome
DNA repair; cell cycle
multicellular organismal development; negative regulation of transcription from RNA polymerase II promoter
protein import into nucleus; transcription
multicellular organismal development; transcription
protein transport; cell communication
regulation of Rab GTPase activity
retrograde vesicle-mediated transport, Golgi to ER; COPI coating of Golgi vesicle; regulation of ARF protein signal transduction
oxidation reduction; chromatin modification; regulation of transcription, DNA-dependent
activation of JUN kinase activity
regulation of transcription, DNA-dependent; apoptosis
protein amino acid phosphorylation; protein kinase cascade; ion transport
interspecies interaction between organisms; regulation of transcription, DNA-dependent
cell proliferation; cell cycle
regulation of transcription, DNA-dependent; chromatin remodeling
transcription initiation from RNA polymerase II promoter; androgen receptor signaling pathway; regulation of transcription, DNA-dependent
RNA splicing, via transesterification reactions; nuclear mRNA splicing, via spliceosome
The phosphorylation dynamics-based network reflects known signaling pathways
To evaluate whether the dynamics-based network reflects actual signaling phenomena at the pathway level, we compared the network with R > 0.99 with known signal transduction pathways registered in public databases.
For this comparison, we used the shortest path length (SPL; for details, see the Methods section) to evaluate the closeness of pairs of nodes in the network and the pathways. In the dynamics-based network, each edge between two nodes denotes a similarity in the time course of the phosphorylation profiles, and in the known signaling pathways, each edge that connects an enzymatic protein and its substrate represents a phosphorylation reaction. Although the edges in the known signaling pathways are directional, we treated them as nondirectional in this study because we only needed to investigate whether the phosphorylation dynamics patterns of two phosphosites were strongly correlated when their corresponding proteins were located close together in the known pathway, regardless of the reaction direction.
The data size for the EGFR (NetPath) pathway was limited. To quantitatively measure the correlation between the dynamics-based network and the known pathways with a larger dataset, we prepared two other datasets of known signaling pathways from NetPath: All (NetPath), consisting of a total of 296 phosphorylation reactions in the database, and All - EGFR (NetPath), generated by eliminating all the EGFR (NetPath) reactions from the All (NetPath) pathways. Although the SPLs of the All (NetPath) group were significantly correlated with those of the dynamics-based network (Figure 2D), the All - EGFR (NetPath) had no correlation with the dynamics-based network (Figure 2E). The average SPLs of the phosphopeptides in the dynamics-based networks that corresponded to protein pairs in the known datasets of EGFR (NetPath), All (NetPath), and All - EGFR (NetPath) were 4.50, 5.71, and 6.80, respectively (Figure 2C-E). Moreover, those SPLs which corresponded to neighboring protein pairs (i.e., SPL = 1) in the same datasets were 2.75, 5.20, and 15.00, respectively (Figure 2C-E). Thus, two proteins that participated in the EGFR signaling pathway were also shown to be close in the dynamics-based network for the EGF stimulation, and those localized in the non-EGFR signaling pathway were shown to be more distant. The correlation between the dynamics-based network and the All (NetPath) pathway might have resulted from the indirect effect of the proteins belongs to the EGFR signaling pathways in the All (NetPath) group.
We further compared the dynamics-based network with two known signaling pathway networks generated from Phospho.ELM : All (Phospho.ELM), consisting of a total of 1,140 kinase-substrate reactions, and Kinases (Phospho.ELM), generated by selecting 201 protein pairs in which both were annotated as protein kinases from the All (Phospho.ELM) dataset. Since Phospho.ELM covers phosphorylation reactions that were identified by means of large-scale, high-throughput measurements, and might include reactions that are not meaningful in living cells, the Kinases (Phospho.ELM) pathway was the more biologically relevant dataset. We confirmed that the dynamics-based network for the EGF stimulation was more similar to the Kinases (Phospho.ELM) group than to the All (Phospho.ELM) (Figure 2F and 2G); for example, the average SPLs in the dynamics-based network corresponding to neighboring protein pairs with SPL = 1 in the respective datasets of Kinases (Phospho.ELM) and All (Phospho.ELM) were 3.33 and 5.50, respectively. These results suggest that the dynamics-based network constructed according to the time-course data for the EGF stimulation reflects the actual intracellular EGFR signaling pathways, and that our approach can be used to understand proteins localized in other condition-specific signaling pathways.
The EGFR signal is precisely transmitted to its downstream components under the control of whole-proteome pathways
As discussed above, given the projection effect of protein signaling, the fact that many signaling proteins unrelated to EGFR signaling appear as hubs in the dynamics-based network (Table 1, Additional File 4) indicates that these unrelated signaling proteins may be activated by the EGF stimulation. Many signaling proteins seem to be indiscriminately activated, and this would lead to side-effects that transmit and amplify inappropriate signals within the cell. However, we found that the dynamics-based network was correlated specifically with the EGFR signaling pathway (Figure 2C-E). In particular, the average SPL for the phosphopeptide pairs in the dynamics-based network that corresponded to neighboring protein pairs in the EGFR signaling pathway was 2.75, whereas that in the non-EGFR signaling pathway was 15.00 (Figure 2C and 2E). The adjacent protein pairs in the non-EGFR signaling were therefore farther apart in the dynamics-based network for the EGF stimulation. These results suggest that at the pathway level, the EGFR signaling system is precisely activated to transmit and amplify signals along the appropriate intracellular pathway and to not transmit signals along inappropriate pathways.
The dynamics-based network generated using the time-course phosphoproteome data for the EGF stimulation was clearly correlated with the known EGFR signaling pathways. Although recently developed computational methods predict direct signaling relationships between enzyme proteins and their substrates based on substrate-recognition motifs, our network construction approach is useful because it groups pairs of proteins localized within a signaling pathway based on the similarity of their phosphorylation dynamics data. The dynamics-based network, in turn, will complement the results of enzyme-substrate predictions based on the enzyme-specific target sequence motif (e.g., NetworKIN, Scansite, and NetPhos) and will therefore increase their prediction accuracy. The new approach will also serve as a guide to explore enzyme targets even when the enzyme's motif information is unavailable. Furthermore, the dynamics-based network for the EGF stimulation allowed us to infer that EGFR signaling is independently activated and precisely controlled at the whole-pathway level under conditions where many unrelated enzymatic proteins are activated by the EGF stimulation. In the near future, when we obtain more condition-specific time-course phosphoproteome data with higher resolution, the phosphorylation dynamics-based network approach will improve both our general understanding of whole cellular signaling pathway and our understanding of conditionally activated signaling pathways; it will not replace recently developed prediction approaches for direct enzyme-substrate relationships, but will improve their accuracy.
Time-course phosphoproteome data
We used the time-course phosphoproteome data of Olsen et al. in this study. In this dataset, the relative abundances of cytoplasmic and nucleic phosphopeptides of HeLa cells were measured at five different time points (0, 1, 5, 10, and 20 min) after EGF stimulation using SILAC-based LC-MS/MS analysis. In this dataset, each phosphopeptide had a unique identifier along with the International Protein Index (IPI)  accession numbers of the corresponding protein, the relative abundances at each of the five time points, and the cellular fraction in which each was isolated (cytoplasm or nucleus). Note that throughout this study, we defined a given phosphopeptide that was obtained from both cellular fractions (nucleus and cytoplasm) as two different phosphopeptides. We selected 1,050 phosphopeptides whose relative abundances were completely measured at all five time points for additional analysis. Among the 1,050 phosphopeptides, 459 were cytoplasmic and 591 were nucleic. Since some IPI accession numbers in Olsen et al. were outdated, we updated these numbers according to a newer version of the human IPI data (version 3.58).
Phosphorylation dynamics-based network
Using the time-course phosphoproteome data, we generated a phosphorylation dynamics-based network by connecting pairs of phosphopeptide nodes with similar time-course abundance profiles; the degree of similarity was defined using Pearson's correlation coefficient (R). Among the time-course phosphoproteome data, R values for all possible pairs of phosphopeptides were calculated using the R statistical software (version 2.8.0) , and we generated a phosphorylation dynamics-based network by applying a given threshold value of R. In this study, we used three R thresholds (0.97, 0.98, and 0.99) to generate three corresponding dynamics-based networks.
Subnetworks of the dynamics-based network
The phosphorylation dynamics-based network with R > 0.99 was separated into two subnetworks based on the two cellular fractions: a cytoplasmic subnetwork and a nucleic subnetwork. For example, the cytoplasmic subnetwork was composed of cytoplasmic phosphopeptide nodes and edges connecting pairs of cytoplasmic nodes.
Node degree and network density
where E and N are the numbers of edges and nodes, respectively.
Negative control of the dynamics-based network
We used random edge rewiring (RER) to prepare negative controls for the dynamics-based network. RER randomly selects two edges within a given network and randomly rewires (connects) them; it repeats this operation a sufficient number of times until all pair-wise interactions in the queried network have disappeared or until the number of iterations reaches 100 times the number of interactions. During this process, each rewiring operation is retried if a pair of nodes redundantly wired at two edges occurs in the network. RER does not change the degree of each node in a given network.
Network motifs in the dynamics-based network
We observed the cellular fraction patterns (cytoplasmic or nucleic) of the binary, triangular, and square motifs (Figure 1) revealed in the dynamics-based network. For the dynamics-based networks with R > 0.97, 0.98, and 0.99, we counted the numbers of these motifs and compared the results with those estimated using the negative controls.
Gene ontology annotations of human proteins were obtained from Swiss-Prot (version 57.1) using the Swiss-Prot IDs  (see below).
Known signaling and kinase-substrate reaction datasets
To validate whether the dynamics-based network reflected actual intracellular phosphorylation reactions, we compared the dynamics-based network for R > 0.99 with datasets [23, 41] for the following five types of known signaling pathways: From NetPath, we used all the available phosphorylation reactions, only the phosphorylation reactions related to EGFR signaling (the "EGFR1 signaling pathway"), and all non-EGFR signaling reactions to create the known-pathway datasets "All (NetPath)", "EGFR (NetPath)", and "All - EGFR (NetPath)", respectively. From Phospho.ELM (version 8.2), we obtained all available kinase-substrate reactions, and denoted these as the "All (Phospho.ELM)" dataset. Among these pathways, we generated a subset composed only of kinase proteins, denoted the "Kinases (Phospho.ELM)" dataset, based on the kinase information registered in the Human Protein Initiative (Release 57.13) .
Identifier standardization using the Swiss-Prot ID
All protein and peptide identifiers in the multiple datasets used in this study were standardized into Swiss-Prot ID numbers to permit data integration and comparison. Each peptide identifier in the dynamics-based network was assigned to its corresponding Swiss-Prot ID according to the identifier cross-reference list in the IPI database (version 3.58), the Swiss-Prot database (version 57.1), or both. (Note that in some cases, multiple peptides corresponded to a single Swiss-Prot ID.) UniProt accessions of proteins in the documented NetPath and Phospho.ELM phosphorylation reactions were standardized to their corresponding Swiss-Prot IDs according to the cross-reference lists in Swiss-Prot (version 57.1) and UniProt (Release 12), respectively. Since the protein accessions in Phospho.ELM are based on UniProt (Release 12.3), we used the cross-references in UniProt (Release 12), which was the nearest available version, for the Phospho.ELM data.
Shortest path length (SPL)
We defined the closeness of two nodes in a network using the shortest path length (SPL), which represents the minimum number of steps (edges) between two nodes. The SPL of every possible and reachable pair in a given network was calculated using the Dijkstra algorithm provided by version 1.4 of the Boost::Graph package, which was obtained from the Comprehensive Perl Archive Network (http://www.cpan.org/).
We determined the statistical significance of the differences between a single real value and the expected value from a group of repeatedly generated random values by calculating the proportion of random values that were greater than or equal to the real value (or less, depend on the instances).
This work was supported by research funds from Yamagata Prefecture and Tsuruoka City to Keio University and grants from the Japan Society for the Promotion of Science (JSPS) to NY.
- Jensen ON: Interpreting the protein language using proteomics. Nat Rev Mol Cell Biol 2006, 7(6):391–403. 10.1038/nrm1939View ArticlePubMedGoogle Scholar
- Cohen P: The regulation of protein function by multisite phosphorylation--a 25 year update. Trends Biochem Sci 2000, 25(12):596–601. 10.1016/S0968-0004(00)01712-6View ArticlePubMedGoogle Scholar
- Schlessinger J, Lemmon MA: SH2 and PTB domains in tyrosine kinase signaling. Sci STKE 2003, 2003(191):RE12. 10.1126/stke.2003.191.re12PubMedGoogle Scholar
- Ang XL, Wade Harper J: SCF-mediated protein degradation and cell cycle control. Oncogene 2005, 24(17):2860–2870. 10.1038/sj.onc.1208614View ArticlePubMedGoogle Scholar
- Pawson T, Scott JD: Protein phosphorylation in signaling--50 years and counting. Trends Biochem Sci 2005, 30(6):286–290. 10.1016/j.tibs.2005.04.013View ArticlePubMedGoogle Scholar
- Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S: The protein kinase complement of the human genome. Science 2002, 298(5600):1912–1934. 10.1126/science.1075762View ArticlePubMedGoogle Scholar
- Kobe B, Kampmann T, Forwood JK, Listwan P, Brinkworth RI: Substrate specificity of protein kinases and computational prediction of substrates. Biochim Biophys Acta 2005, 1754(1–2):200–209.View ArticlePubMedGoogle Scholar
- Rappsilber J, Mann M, Ishihama Y: Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat Protoc 2007, 2(8):1896–1906. 10.1038/nprot.2007.261View ArticlePubMedGoogle Scholar
- Collins MO, Yu L, Choudhary JS: Analysis of protein phosphorylation on a proteome-scale. Proteomics 2007, 7(16):2751–2768. 10.1002/pmic.200700145View ArticlePubMedGoogle Scholar
- Beausoleil SA, Jedrychowski M, Schwartz D, Elias JE, Villén J, Li J, Cohn MA, Cantley LC, Gygi SP: Large-scale characterization of HeLa cell nuclear phosphoproteins. Proc Natl Acad Sci USA 2004, 101(33):12130–12135. 10.1073/pnas.0404720101View ArticlePubMedPubMed CentralGoogle Scholar
- Olsen JV, Blagoev B, Gnad F, Macek B, Kumar C, Mortensen P, Mann M: Global, in vivo , and site-specific phosphorylation dynamics in signaling networks. Cell 2006, 127(3):635–648. 10.1016/j.cell.2006.09.026View ArticlePubMedGoogle Scholar
- Olsen JV, Vermeulen M, Santamaria A, Kumar C, Miller ML, Jensen LJ, Gnad F, Cox J, Jensen TS, Nigg EA, et al.: Quantitative phosphoproteomics reveals widespread full phosphorylation site occupancy during mitosis. Science signaling 2010, 3(104):ra3. 10.1126/scisignal.2000475View ArticlePubMedGoogle Scholar
- Villén J, Beausoleil SA, Gerber SA, Gygi SP: Large-scale phosphorylation analysis of mouse liver. Proc Natl Acad Sci USA 2007, 104(5):1488–1493. 10.1073/pnas.0609836104View ArticlePubMedPubMed CentralGoogle Scholar
- Chi A, Huttenhower C, Geer LY, Coon JJ, Syka JEP, Bai DL, Shabanowitz J, Burke DJ, Troyanskaya OG, Hunt DF: Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry. Proc Natl Acad Sci USA 2007, 104(7):2193–2198. 10.1073/pnas.0607084104View ArticlePubMedPubMed CentralGoogle Scholar
- Gruhler A, Olsen JV, Mohammed S, Mortensen P, Faergeman NJ, Mann M, Jensen ON: Quantitative phosphoproteomics applied to the yeast pheromone signaling pathway. Mol Cell Proteomics 2005, 4(3):310–327. 10.1074/mcp.M400219-MCP200View ArticlePubMedGoogle Scholar
- Ptacek J, Devgan G, Michaud G, Zhu H, Zhu X, Fasolo J, Guo H, Jona G, Breitkreutz A, Sopko R, et al.: Global analysis of protein phosphorylation in yeast. Nature 2005, 438(7068):679–684. 10.1038/nature04187View ArticlePubMedGoogle Scholar
- Hilger M, Bonaldi T, Gnad F, Mann M: Systems-wide analysis of a phosphatase knock-down by quantitative proteomics and phosphoproteomics. Mol Cell Proteomics 2009, 8(8):1908–1920. 10.1074/mcp.M800559-MCP200View ArticlePubMedPubMed CentralGoogle Scholar
- Zhai B, Villén J, Beausoleil SA, Mintseris J, Gygi SP: Phosphoproteome analysis of Drosophila melanogaster embryos. J Proteome Res 2008, 7(4):1675–1682. 10.1021/pr700696aView ArticlePubMedPubMed CentralGoogle Scholar
- Sugiyama N, Nakagami H, Mochida K, Daudi A, Tomita M, Shirasu K, Ishihama Y: Large-scale phosphorylation mapping reveals the extent of tyrosine phosphorylation in Arabidopsis. Mol Syst Biol 2008, 4: 193. 10.1038/msb.2008.32View ArticlePubMedPubMed CentralGoogle Scholar
- Macek B, Gnad F, Soufi B, Kumar C, Olsen JV, Mijakovic I, Mann M: Phosphoproteome analysis of E. coli reveals evolutionary conservation of bacterial Ser/Thr/Tyr phosphorylation. Mol Cell Proteomics 2008, 7(2):299–307.View ArticlePubMedGoogle Scholar
- Ravichandran A, Sugiyama N, Tomita M, Swarup S, Ishihama Y: Ser/Thr/Tyr phosphoproteome analysis of pathogenic and non-pathogenic Pseudomonas species. Proteomics 2009, 9(10):2764–2775. 10.1002/pmic.200800655View ArticlePubMedGoogle Scholar
- Cohen P: The role of protein phosphorylation in human health and disease. The Sir Hans Krebs Medal Lecture. Eur J Biochem 2001, 268(19):5001–5010. 10.1046/j.0014-2956.2001.02473.xView ArticlePubMedGoogle Scholar
- Diella F, Gould CM, Chica C, Via A, Gibson TJ: Phospho.ELM: a database of phosphorylation sites--update 2008. Nucleic Acids Res 2008, (36 Database):D240–244.Google Scholar
- Gnad F, Ren S, Cox J, Olsen JV, Macek B, Oroshi M, Mann M: PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biol 2007, 8(11):R250. 10.1186/gb-2007-8-11-r250View ArticlePubMedPubMed CentralGoogle Scholar
- The UniProt Consortium: The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res 2010, (38 Database):D142–148. 10.1093/nar/gkp846Google Scholar
- Jørgensen C, Sherman A, Chen GI, Pasculescu A, Poliakov A, Hsiung M, Larsen B, Wilkinson DG, Linding R, Pawson T: Cell-specific information processing in segregating populations of Eph receptor ephrin-expressing cells. Science 2009, 326(5959):1502–1509. 10.1126/science.1176615View ArticlePubMedGoogle Scholar
- Yachie N, Saito R, Sugahara J, Tomita M, Ishihama Y: In silico analysis of phosphoproteome data suggests a rich-get-richer process of phosphosite accumulation over evolution. Mol Cell Proteomics 2009, 8(5):1061–1071. 10.1074/mcp.M800466-MCP200View ArticlePubMedPubMed CentralGoogle Scholar
- Schwartz D, Gygi SP: An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets. Nat Biotechnol 2005, 23(11):1391–1398. 10.1038/nbt1146View ArticlePubMedGoogle Scholar
- Yaffe MB, Leparc GG, Lai J, Obata T, Volinia S, Cantley LC: A motif-based profile scanning approach for genome-wide prediction of signaling pathways. Nat Biotechnol 2001, 19(4):348–353. 10.1038/86737View ArticlePubMedGoogle Scholar
- Obenauer JC, Cantley LC, Yaffe MB: Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res 2003, 31(13):3635–3641. 10.1093/nar/gkg584View ArticlePubMedPubMed CentralGoogle Scholar
- Blom N, Gammeltoft S, Brunak S: Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol 1999, 294(5):1351–1362. 10.1006/jmbi.1999.3310View ArticlePubMedGoogle Scholar
- Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, et al.: Human Protein Reference Database--2009 update. Nucleic Acids Res 2009, (37 Database):D767–772. 10.1093/nar/gkn892
- Miller ML, Jensen LJ, Diella F, Jørgensen C, Tinti M, Li L, Hsiung M, Parker SA, Bordeaux J, Sicheritz-Ponten T, et al.: Linear motif atlas for phosphorylation-dependent signaling. Science signaling 2008, 1(35):ra2. 10.1126/scisignal.1159433View ArticlePubMedGoogle Scholar
- Linding R, Jensen LJ, Ostheimer GJ, van Vugt MATM, Jørgensen C, Miron IM, Diella F, Colwill K, Taylor L, Elder K, et al.: Systematic discovery of in vivo phosphorylation networks. Cell 2007, 129(7):1415–1426. 10.1016/j.cell.2007.05.052View ArticlePubMedPubMed CentralGoogle Scholar
- Jørgensen C, Linding R: Directional and quantitative phosphorylation networks. Briefings in functional genomics & proteomics 2008, 7(1):17–26. 10.1093/bfgp/eln001View ArticleGoogle Scholar
- Van Hoof D, Muñoz J, Braam SR, Pinkse MWH, Linding R, Heck AJR, Mummery CL, Krijgsveld J: Phosphorylation dynamics during early differentiation of human embryonic stem cells. Cell Stem Cell 2009, 5(2):214–226. 10.1016/j.stem.2009.05.021View ArticlePubMedGoogle Scholar
- Zhang Y, Wolf-Yadlin A, Ross PL, Pappin DJ, Rush J, Lauffenburger DA, White FM: Time-resolved mass spectrometry of tyrosine phosphorylation sites in the epidermal growth factor receptor signaling network reveals dynamic modules. Mol Cell Proteomics 2005, 4(9):1240–1250. 10.1074/mcp.M500089-MCP200View ArticlePubMedGoogle Scholar
- Huang PH, White FM: Phosphoproteomics: unraveling the signaling web. Mol Cell 2008, 31(6):777–781. 10.1016/j.molcel.2008.09.001View ArticlePubMedPubMed CentralGoogle Scholar
- Locasale JW, Wolf-Yadlin A: Maximum entropy reconstructions of dynamic signaling networks from quantitative proteomics data. PLoS ONE 2009, 4(8):e6522. 10.1371/journal.pone.0006522View ArticlePubMedPubMed CentralGoogle Scholar
- Kandasamy K, Mohan S, Raju R, Keerthikumar S, Kumar GSS, Venugopal AK, Telikicherla D, Navarro DJ, Mathivanan S, Pecquet C, et al.: NetPath: a public resource of curated signal transduction pathways. Genome Biol 2010, 11(1):R3. 10.1186/gb-2010-11-1-r3View ArticlePubMedPubMed CentralGoogle Scholar
- Kersey PJ, Duarte J, Williams A, Karavidopoulou Y, Birney E, Apweiler R: The International Protein Index: an integrated database for proteomics experiments. Proteomics 2004, 4(7):1985–1988. 10.1002/pmic.200300721View ArticlePubMedGoogle Scholar
- R: A Language and Environment for Statistical Computing[http://www.R-project.org]
- O'Donovan C, Apweiler R, Bairoch A: The human proteomics initiative (HPI). Trends Biotechnol 2001, 19(5):178–181. 10.1016/S0167-7799(01)01598-0View ArticlePubMedGoogle Scholar
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003, 13(11):2498–2504. 10.1101/gr.1239303View ArticlePubMedPubMed CentralGoogle Scholar
- Negishi Y, Nakamura H, Yachie N, Saito R, Tomita M: eXpanda: an integrated platform for network analysis and visualization. In Silico Biol (Gedrukt) 2007, 7(2):141–144.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.