Protein kinases associated with the yeast phosphoproteome

Background Protein phosphorylation is an extremely important mechanism of cellular regulation. A large-scale study of phosphoproteins in a whole-cell lysate of Saccharomyces cerevisiae has previously identified 383 phosphorylation sites in 216 peptide sequences. However, the protein kinases responsible for the phosphorylation of the identified proteins have not previously been assigned. Results We used Predikin in combination with other bioinformatic tools, to predict which of 116 unique protein kinases in yeast phosphorylates each experimentally determined site in the phosphoproteome. The prediction was based on the match between the phosphorylated 7-residue sequence and the predicted substrate specificity of each kinase, with the highest weight applied to the residues or positions that contribute most to the substrate specificity. We estimated the reliability of the predictions by performing a parallel prediction on phosphopeptides for which the kinase has been experimentally determined. Conclusion The results reveal that the functions of the protein kinases and their predicted phosphoprotein substrates are often correlated, for example in endocytosis, cytokinesis, transcription, replication, carbohydrate metabolism and stress response. The predictions link phosphoproteins of unknown function with protein kinases with known functions and vice versa, suggesting functions for the uncharacterized proteins. The study indicates that the phosphoproteins and the associated protein kinases represented in our dataset have housekeeping cellular roles; certain kinases are not represented because they may only be activated during specific cellular responses. Our results demonstrate the utility of our previously reported protein kinase substrate prediction approach (Predikin) as a tool for establishing links between kinases and phosphoproteins that can subsequently be tested experimentally.

A large-scale mass spectrometry-based study of phosphoproteins in a whole-cell lysate of S. cerevisiae detected 383 phosphorylation sites in 216 phosphopeptide sequences [4] (Additional file 2), but the protein kinases responsible for the phosphorylation of these sites have previously remained unknown. We have previously developed methodology to predict the substrates of protein kinases (program Predikin [5]). The predictions are based on the nature of residues located in the substrate-binding pocket at specific positions/distances relative to the conserved motifs found in all Ser/Thr protein kinase sequences. The approach allows predictions to be made based only on the amino acid sequence of the catalytic domain of the kinase. Several studies have confirmed Predikin predictions experimentally, for example in G-protein-coupled recep- With no known homologues Bub1p 5 * Based on [1]. ** There are two protein kinases designated Pak1p in S. cerevisiae; in this manuscript, Pak1p from group VI_D is referred to as Pak1p (Yer129wp), while Pak1p from group V_C is referred to as Prk1p (Pak1p).
Here we used Predikin to predict possible associations between the phosphorylation sites identified by Ficarro et al. [4] and one or more of the yeast protein kinase(s). Some of these predictions are thought-provoking and suggest areas to investigate further. The study suggests biological functions for uncharacterized protein kinases and substrates, forecasts new signalling connections between known proteins, and provides a basis to direct future experimental work to verify the links between kinases and substrates.

Reliability and limitations of the predicted kinasesubstrate associations
Ficcaro and co-workers attempted to characterise the majority of the phosphoproteins in a whole cell lysate of S. cerevisiae [4]. Proteins were digested with trypsin, converted to methyl esters, enriched for phosphopeptides with immobilised metal-affinity chromatography (IMAC), and analyzed by HPLC/ electrospray ionization mass spectrometry. A total of 383 phosphopeptides were found, corresponding to 216 phosphoproteins (Additional file 2). It appears that phosphopeptides with more than one phosphorylated residue are enriched in this collection, therefore it may not be entirely representative of the yeast phosphoproteome. We attempted to associate these phosphorylation sites with 116 Ser/Thr protein kinases present in S. cerevisiae (Additional file 1, 3), through matching the phosphopeptide sequences with the substrate specificity of each individual kinase as predicted using Predikin [5].
We used Scansite [11] to assess the quality of the match between the search motif (the predicted optimal heptapeptide phosphorylation sequence for each kinase) and the experimentally determined phosphopeptide sequences. To estimate the reliability of the protein kinase-phosphoprotein matches, we performed an analysis of Scansite scores for phosphorylation sites where the kinase has been experimentally identified (see Methods section; Figure 1).
Scansite scores could be obtained and probabilities estimated for all but 27 of the phosphorylation sites in the phosphoproteome dataset; one cannot obtain an accurate score when the phosphorylation site is located too close to the N-or C-terminus of the peptide. The statistics on estimated probabilities are summarized in Table 2.
In some cases, two or more kinases were predicted to be equally likely to be associated with a particular phosphopeptide. Often, these protein kinases were closely related (e.g. from the same family in the phosphopeptides from Hxt2p, Ras2p, Rpl7Ap, Ira2p and Nup2p; from the same group in Acc1p, Gnp1p, Fpr4p, Hsp26p, Pea2p, Rpn8p, Shp1p, Yhr186cp, Yml029wp, Ynl321wp, Ysc84p), although this was not always the case (e.g. phosphopeptides from Abf1p, Bud4p, Erg6p, Msc3p, Ncb2p, Sgv1p, Trs120p, Tsl1p, Yml072cp). Some sites may in fact be phosphorylated by two or more protein kinases in vivo, particularly by closely related protein kinases. An example of a substrate phosphorylated by two different kinases is the yeast amphiphysin homologue Rvs167p, which is phosphorylated both by the CDK Pho85p and the MAPK Fus3 [12].
In the cases where both the phosphoprotein and the predicted kinase have characterized functions, correlations between these functions are consistent with and support (although do not necessarily prove) our predictions (see below). On several occasions the phosphoproteins scored better with poorly characterized protein kinases than with any better-known ones; such associations could predict functions for these novel proteins.

Known protein kinase-substrate pairs
Although our associations were predicted without taking into account the available data on the functions of the Estimation of kinase substrate prediction probabilities Figure 1 Estimation of kinase substrate prediction probabilities. The analysis is based on Scansite [11] scores. Diamonds, the probability (A) that a phosphorylation site is associated with a particular protein kinase, where the protein is a known substrate but the phosphorylation site is unknown, as a function of Scansite score. Squares, the probability (B) that a protein kinase is associated with a particular substrate protein, where the phosphorylation site is known but the protein kinase is unknown, as a function of Scansite score. The lines are 3rd order polynomials that best fit the data (dashed, probability A; solid, probability B).  (Table 3). These include Yak1p [13] and Sra1p [14,15] as substrates for Tpk1p, and Acc1p [16] and Pfk2p [17] as substrates for Snf1p. The MAPKs Kss1p and Fus3p, closely related to Hog1p, have been shown to be substrates for Ste7p [18]. The MAPKK Pbs2p is known to be a substrate for the MAPKKKs Ssk2p and Ssk22p [19], but phosphopeptides corresponding to the known Ssk2p and Ssk22p phosphorylation sites of Pbs2p were not detected by Ficarro et al. [4]; the phosphopeptides identified by Ficarro et al. map to a different region of Pbs2p.
Less direct experimental evidence has previously suggested the association of several other predicted kinasesubstrate pairs (Table 4). For example, Bni5p is a part of a multi-protein complex involving the protein kinase Gin4p and the septins, and the protein kinase Cdc28p is required for the association of Gin4p with the septins [20]. Overexpression of Gpd1p reduces the hypersensitiv-ity to osmotic shock of a hypersensitive yeast mutant, presumably due to the effect on signalling through the protein kinase C-MAPK pathway involving Bck1p; the over-expression of Bck1p similarly restored the protein kinase C signalling to the same mutant and rescued its hypersensitivity to osmotic shock [21]. The protein kinase Ume5p (Ssn3p, Srb10p) is known to be involved in the transcriptional repression of the HSP26 gene [22]. Msn2p is a protein kinase Pkc1p-regulated transcription factor [23]. Inactivation of the protein kinase Yak1p and overexpression of Sok2p have similar effects on yeast defective in protein kinase A signalling [24], suggesting that phosphorylation by Yak1p could have an inhibitory effect on Sok2p. The inactivation of protein kinases Hog1p and Ssd1p have similar effects on mutants deficient in the protein phosphatase activators Rrd1p and Rrd2p [25]. Finally, mutation of the gene encoding the protein kinase Fus3p (a MAPK in the mating pathway) enhances the mating defect of some ste2 mutants; the deletion of STE2 is completely sterile, therefore these are weak alleles of ste2 [26]. The available experimental evidence for all these predicted associations supports the reliability of our predictions in general.

Predicted kinase-substrate pairs showing functional correlations
In addition to the known kinase-substrate pairs, our predicted associations feature a number of pairs consistent with the known roles for the kinase and the substrate (i.e. functional correlations). Indeed, none of our predicted associations involve proteins with functions that are clearly incompatible. The functions of some associated  * The phosphorylated residues are underlined, and the residues not present in the phosphopeptide sequences [4] are shown in italic. When there is more than one phosphorylation site, the one discussed is shown in bold, unless the same protein kinase is predicted for all sites in the peptide. ** Scansite [11] scores were calculated as described in the Methods section. When the same protein kinase is predicted for all sites in the peptide, the scores are given for the respective sites, starting at the N-terminus. If more than one protein kinase yields a similar score, all the possible kinases are listed. *** Probabilities were calculated as described in the Methods section ( Figure 1). When the same protein kinase is predicted for all sites in the peptide, the values are given for the respective sites, starting at the N-terminus.  [68]. ** The phosphorylated residues are underlined, and the residues not present in the phosphopeptide sequences [4] are shown in italic. When there is more than one phosphorylation site, the one discussed is shown in bold, unless the same protein kinase is predicted for all sites in the peptide. *** Scansite [11] scores were calculated as described in the Methods Section. When the same protein kinase is predicted for more than one site in the peptide, the scores are given for the respective sites, starting at the N-terminus. If more than one protein kinase yields a similar score, all the possible kinases are listed. **** Probabilities were calculated as described in the Methods section ( Figure 1). When the same protein kinase is predicted for more than one site in the peptide, the values are given for the respective sites, starting at the N-terminus. ***** Protein kinases Cka1p, Cka2p or Cdc7p. The predicted specificties are too similar to be distinguished. ****** Protein kinases Yck1p, Yck2p, Yck3p or Hrr25p. The predicted specificities are identical. ******* Scansite score could not be measured because the phosphorylation site is too close to the C-terminus. proteins do not show obvious relationships, but instead suggest cross-connections between different cellular pathways. For example, seemingly unrelated processes such as mitosis in growing cells or sporulation/meiosis in starved non-growing cells involve common processes such as chromosome segregation and new cell wall deposition. Our associations provide a number of proposed functional links that can now be experimentally tested.
The following examples are representative of the correlated functions in substrate-kinase pairs in the dataset (Table  4).
Protein kinase CK1 with roles in endocytosis and cytokinesis is predicted to be associated with a number of proteins involved in endocytosis, including Ede1p, a key endocytic protein that binds membranes in a ubiquitindependent manner and is involved in a network of interactions with endocytic proteins, the SH3 domain-containing protein Ysc84p, and the actin cytoskeletal protein Pan1p involved in actin cortical actin patch formation. CK1 phosphorylation of receptor cytoplasmic tails [27] is required for subsequent ubiquitination of flanking lysines by Rsp5p [27] and receptor endocytosis. Ubiquitination by Rsp5p of endocytic machinery appears to be required too, as fusion of the receptor cytoplasmic tail to ubiquitin bypasses the requirement for ubiquitination of the receptor cytoplasmic tail, but does not bypass the requirement for Rsp5p in endocytosis per se [28]. Also, Rsp5p physically interacts with Sla1p [29], which in turn interacts with Ysc84p [30] and Pan1p [31]. Furthermore, RSP5 genetically interacts with EDE1 [32]. Therefore, there is a possibility that CK1 phosphorylation of Ede1p, Ysc84p, and Pan1p is required for their subsequent Rsp5p-dependent ubiquitination on flanking lysines, analogous to what has been shown for receptor cytoplasmic tails. It is not known yet if these proteins are ubiquitinated by Rsp5p; Rvs167p is a component of the endocytic machinery that is known to be ubiquitinated by Rsp5p [29], but it has not been detected among the phosphopeptides. Rvs167p is in a complex with Sla1p and Rsp5p [29] so other Sla1p interactors such as Ysc84p and Pan1p are strong possibilities for Rsp5p-dependent ubiquitination; moreover, the mammalian Pan1p orthologue Eps15 is known to be ubiquitinated [33,34]. A model for the regulation of endocytosis by CK1-dependent phosphorylation and ubiquitination is shown in Figure 2.
CK1 is also predicted to phosphorylate the Rab GTPase Sec4p essential for exocytosis [35]. One yeast CK1 (Yck3p) has been shown to regulate Rab-GTPase dependent vacuole fusion [36,37]. Some endocytic cytoskeletal proteins (e.g. actin, Rvs167p and Sla2p also have roles in the same step of exocytosis as Sec4p [38][39][40]. Another interesting association involves the major cell cycle regulatory kinase Cdc28p and two actin cytoskeletal proteins, Crn1p and Abp1p. The subcellular distribution of the actin cytoskeleton is tightly regulated during the cell cycle. In late G1 phase and prior to visible formation of a bud, the cells pass "Start"; the Cdc28p kinase becomes activated by G1 cyclins, and the actin cytoskeleton starts to polarise towards the site where the new bud will emerge [41][42][43]. Cortical actin patches, which appear as highly motile spots, concentrate at this site. Cytoplasmic actin cables, which appear as elongated fibres, exhibit alignment along the mother-bud axis such that their tips are also focused at this site. During S phase, Cdc28p starts to be activated by S-phase cyclins; at this stage of the cell cycle the bud emerges and cortical actin patches concentrate inside the growing bud, especially at the rapidly growing tip. Actin cables remain aligned with their tips in the growing bud, causing the bud to extend in a highly polarised manner. When the cells enter G2 phase, the cortical actin patches distribute more isotropically within the bud, and the bud becomes more rounded and expands both laterally and at the tip. Driven by the mitotic cyclins, maximum Cdc28p activity is achieved and the cells enter mitosis (M-phase). At this stage of the cell cycle cortical actin patches transiently redistribute throughout the mother cell and bud, thus losing their polarisation. The cytoplasmic actin cables become randomly oriented in the mother cell and bud during M phase. Finally, upon exit from mitosis and reduction of Cdc28p activity, the cortical actin patches in the mother cell and the bud align on either side of the bud neck. Cytoplasmic actin cables align with their tips on either side of the bud neck. After cell division is complete and the cells enter early G1, the cortical actin patches again adopt a random distribution and the cytoplasmic actin cables become randomly oriented.
Abp1p is an actin-binding protein that specifically localises to cortical actin patches [44]. Crn1p also localises to cortical actin patches and is thought to act as a linker between these patches and microtubules [45]. Both Abp1p and Crn1p bind to the Arp2/3 complex, a multisubunit complex that mediates the nucleation step of actin filament assembly within the cortical actin patch. Cortical actin patches are highly motile and short-lived structures. They form by de novo actin filament assembly at polarised sites on the cortex where Arp2/3 activators concentrate. Hence their polarisation during the cell cycle is thought to reflect polarisation of the sites where they form. Once assembled, cortical actin patches move rapidly away from these cortical sites and into the body of the cell. This rapid movement is thought to be propelled by the force generated by de novo actin filament assembly. One of the Arp2/3 activators implicated in cortical actin patch assembly at the cortex and actin-dependent move-ment is Abp1p [46]. Abp1p binds to and activates the Arp2/3 complex and thus stimulates de novo assembly of actin filaments. Crn1p also binds the Arp2/3 complex. However, in contrast to Abp1p, Crn1p binding to the Arp2/3 complex inhibits (or at least restricts) Arp2/3 activity, in part by preventing Arp2/3 activation by Abp1p [45]. Hence, both Abp1p and Crn1p regulate cortical actin patch formation and dynamics. The association of Cdc28p with Abp1p and Crn1p predicted in this study suggests an important role for these two cortical actin patch components in the response of the cortical actin patches to intrinsic cues generated by Cdc28p, the cell cycle regulatory kinase.
A large proportion of yeast proteins are implicated directly or indirectly in functions such as carbohydrate metabolism, stress, and cell growth. The pairs glycogen synthase kinase Mds1p (Rim11p) -chaperone Hsp26p as substrate, and the protein kinase Yfl033cp (Rim15p) -trehalase Nth1p as substrate have common roles in stress response. The protein kinase Sha3p (Sks1p) and the predicted substrate, the high affinity glucose transporter Hxt2p, share roles in hexose transport, while the AMPK Snf1p and the predicted substrates, the putative regulator of protein phosphatase-1 Shp1p, and the regulatory subunit of trehalose-6-phosphate/synthase/phosphatase complex Tps3p, are all involved in carbohydrate metabolism.
Other novel kinase-substrate pairs are also predicted in processes including autophagy, DNA replication and transcription, mitosis and cell cycle, the cytoskeleton, nitrogen utilization, sporulation and cellular growth and morphogenesis ( Model for regulation of endocytosis by phosphorylation and ubiquitination Figure 2 Model for regulation of endocytosis by phosphorylation and ubiquitination. In stage 1 (top), neither the receptor (transmembrane region, black; cytoplasmic tail, red rectangle) nor the endocytic machinery (blue circle) are phosphorylated or ubiquitinated. CK1-dependent phosphorylation (P) of the receptor and the endocytic machinery (stage 2, below) leads to Rsp5p-dependent ubiquitination (Ub) of the receptor and the endocytic machinery (stage 3, below), resulting in endocytic internalization of the receptor (stage 4, bottom). Yck1/2p are known to phosphorylate the alpha factor pheromone receptor Ste2p (a 7-transmembrane domain G-protein-coupled receptor) on its cytoplasmic tail [27]. This phosphorylation is essential for ubiquitination of the receptor on its cytoplasmic tail by the ubiquitin protein ligase Rsp5p, and for endocytic internalization of the receptor. There is evidence Rsp5p also has to ubiquitinate components of the endocytic machinery for the receptors to be endocytosed [28]; our analysis of phosphorylation sites suggests that phosphorylation by Yck1/2p of the components of the endocytic machinery (Ede1p, Ysc84p, Pan1p) may play a role also in their Rsp5p-dependent ubiquitination.
for the meiosis-specific checkpoint kinase Mek1p and the CDK Cdc28p).

Prediction of function for novel phosphoproteins and kinases
Several associations are predicted between a protein with a known function and a protein with an unknown function. Putative functional annotations of uncharacterised proteins through such associations represent an intriguing result of this study.
The associations suggest a functional role for a number of uncharacterised protein kinases (Additional file 4). The kinase Akl1p, while uncharacterised, is in the same family as Ark1p and Prk1p, both of which regulate the actin cytoskeleton, and may function in transcriptional regulation, the cell cycle and cell growth-related processes. The kinase Ksp1p, while uncharacterised, is highly homologous to MARK1 (microtubule affinity regulating kinase) in mammals, predicted to function in microtubule cytoskeleton, stress response and polyamine transport.

Phosphorylation of several sites on the same protein by the same kinase
The predicted association between the kinase-substrate pair may be stronger in the cases where the same kinase is predicted to phosphorylate two or more distinct sites on the same protein ( Table 5). Examples of such a kinasesubstrate pair are CK1-Ede1p (discussed earlier) and CK1-Tat1p. CK1 is implicated in endocytosis and Tat2p (Tat1p and Tat2p are two highly homologous tryptophan permeases) levels are regulated in response to starvation by endocytosis [48].

Autophosphorylation
While autophosphorylation may be common in yeast cells, there are only a few protein kinases represented in the phosphoproteome dataset (Akl1p, Cla4p, Hog1p, Ksp1p, Npr1p, Pbs2p, Sgv1p, Slt2p, Yak1p and Ybr466wp), incorporating 17 phosphorylation sites. For 14 of these sites, the specificity of the kinase is quite different from the phosphorylation site, suggesting autophosphorylation is unlikely. Autophosphorylation is most likely in the case of Akl1p, a protein kinase of unknown function from family V_C. This prediction is consistent with yeast two-hybrid analysis [49]. The phosphorylation site (Ser521) is C-terminal to the protein kinase domain (residues 50-400) and is distinct from the activation loop threonine (Thr220), therefore unlikely to involve an auto-activation event. In two other cases involving Cla4p and Pbs2p, autophosphorylation is a possibility, although the phosphorylation sites better match specificities of other yeast kinases. It should be kept in mind that the sequences of autophosphorylation sites can deviate substantially from the usual specificity of the kinase, as a result of the effect of high local concentration during an autophosphorylation reaction [50].

Tyrosine phosphorylation
Although fungi have no protein kinases classified as protein tyrosine kinases, phospho-tyrosine residues are found in yeast [2,51]. An early example is the protein kinase Spk1p (Rad53p), which can phosphorylate proteins on Ser, Thr and Tyr residues, and can phosphorylate poly(Tyr-Glu) [2,51]. In fact, protein chip experiments showed that 27 yeast kinases can phosphorylate poly(Tyr-Glu) [3], although the physiological relevance of tyrosine phosphorylation by most of these kinases is not yet clear.
Currently it is accepted that Tyr phosphorylation in yeast is due to dual specificity protein kinases that can phosphorylate tyrosine in tandem with a nearby threonine [52,53]; they cannot phosphorylate solitary Tyr residues. There are two classes of dual specificity protein kinases in yeast. The first are the MAPKKs (such as Ste7p or Pbs2p) that phosphorylate TXY motifs in the activation loop of MAPKs (in yeast, the MAPKs include Hog1p and Slt2p). The phosphorylation events are carried out in order, with the Tyr phosphorylation occurring first, the Thr phosphorylation occurring second [54]. In our dataset, the topoisomerase-associated protein, Pat1p, has both Ser and Tyr phosphorylated in a SXY motif, which may be the result of MAPKK phosphorylation, or the S6K-like protein kinase Ynr047wp (family I_D), followed by MAPKK phosphorylation. Another example of a dual specificity kinase is Swe1p that phosphorylates both residues at TY sites in CDKs. Cdc28p (the yeast orthologue of CDK2), is a known substrate for Swe1p (the S. cerevisiae Wee1 orthologue) [55,56]. No occurrences of dually-phosphorylated TY motifs are present in the phosphoproteome dataset we used.

Correlations with comprehensive protein association studies in yeast
Several groups attempted comprehensive analyses of protein-protein associations in S. cerevisiae (summarized in the Biomolecular Interaction Network Database BIND [57]). A mass spectrometry-based study [58] included 49 yeast protein kinases and 9 proteins from the phosphoproteome dataset as bait proteins. Associations were demonstrated (bait protein listed first) between Hrr25p and Ede1p, and between Rad53p and Ede1p (however, neither phosphorylation site from Ede1p has a good match with Hrr25p or Rad53p specificity); between Ksp1p and Yhr186cp (however, the phosphorylation site matches the specificity of Prk1p to a much greater extent); between Kss1p and Ste11p, between Bck1p and Hog1p (the latter two are MAPK-MAPKKK associations, suggesting they interact through binding an anchoring protein, and do not involve an enzyme-substrate relationship), and between Prk1p with Akl1p (which are closely related and form a family of kinases implicated in actin cytoskeleton regulation). Another mass spectrometry-based study [59] revealed the associations between Chd1p and CK2-type protein kinases Cka1p and Cka2p (consistent with our prediction), between Pkc1p and Eno1p (although Eno1p does not contain a Pkc1p-specific motif), and between Sec31p and both Cka1p and Cka2p (again, neither Sec31p phosphopeptide from the phosphoproteome matches CK2 specificity).
It is not surprising that many protein kinase -substrate pairs cannot be detected as protein-protein complexes, because of the temporary nature of the interaction. The protein association studies using yeast two hybrid methodology [49,60,61] would be expected to reveal more short-term associations than affinity capture-based approaches. A study by Ito and co-workers [60] included a set of 10 S. cerevisiae protein kinases as bait, 6 protein kinases as prey, and one kinase as both. Only one pair of associated proteins involved a protein kinase and a protein from the phosphoproteome, protein kinase Ypr106wp (Isr1p) and the protein Chs2p (the catalytic subunit of chitin synthase 2); the association is consistent with our prediction. Only one association with a kinase has been identified in the study by Uetz et al. [49] (protein kinase Chk1p as the kinase with the glycogen synthase Gsy1p).
In summary, few predicted kinase-substrate pairs are supported by comprehensive protein-protein interaction studies. The likely reasons include poor representation of protein kinases and proteins in the phosphoproteome dataset in these studies, and the temporary nature of the kinase-substrate interaction that may be difficult to detect by the methods used in these studies.
Very recently, proteome chip technology has been used to identify the in vitro substrates for 82 unique yeast protein kinases, using yeast proteome microarrays containing 4,400 proteins [62]. This study identified candidate kinases for 50 proteins from the phosphoproteome dataset we used here [4]. Significantly, this study confirmed in 12 cases that our predicted kinases phosphorylate the substrates in vitro, and additionally in 12 more cases closely related kinases were shown to phosphorylate the substrates.

Representation of different kinase groups among the set of kinases responsible for phosphorylation in the phosphoproteome dataset
The phosphopeptide sequences in the dataset suggest that all 7 major groups of kinases are represented among the kinases responsible for their phosphorylation (Table 1). However, not all the individual protein kinases are repre- sented in our predictions (76 out of 116 were associated with a substrate). There are a number of protein kinases that have apparently phosphorylated a number of the proteins in the dataset, while others are absent altogether. The most frequent kinases to be predicted to phosphorylate sites in the phosphoproteome are the yeast orthologues of mammalian PKA, CaMK2, AMPK, CK1, CK2, and members of the CDK, MAPK and CLK families. Along with PKC, these are the protein kinases that are known to perform housekeeping functions within the cell. We may have been unable to predict all Pkc1p-substrate associations because some Pkc1p-phosphorylated sequences may differ considerably from the optimal Pkc1p phosphorylation consensus motif, and may also fit phosphorylation consensus motifs for other protein kinases (particularly those from group I).
Many protein kinases are linked to specific cellular responses and may only be induced under specific circumstances, and are therefore not necessarily represented in our dataset. Examples of such cellular responses include double-stranded DNA break repair, starvation and mating. The yeast cells used in the study by Ficarro et al. were from a normal growing culture [4] (S. Ficarro, personal communication).
A survey of the functions of the proteins represented in the yeast phosphoproteome shows components of fundamental cellular structures and machines (ribosomes, vacuoles, actin filaments, nuclear pores), intermediary metabolism, endocytosis, cytokinesis, transport proteins, permeases and transcription factors. It is reasonable to imagine that the represented proteins that are as yet uncharacterized will function in these processes; a similar argument can be made for the associated protein kinases that have unknown functions. These kinases often have specificities similar to the better-studied kinases. Our predicted associations are generally consistent with these conclusions.

Implications for human protein kinases
Few of the proteins in the yeast phosphoproteome dataset have sequences containing the phosphorylation sites conserved in human proteins; many of these proteins do have human homologues, but the phosphorylation site, which is often located near the N-or C-terminus of the protein, has diverged. This observation suggests that the exact location of the phosphorylation site may often be unimportant as a determinant of the regulatory pathway, and that the intricacies of the regulatory mechanisms may differ among species.
The following examples illustrate the different cases of conservation between the yeast and human proteins.
In some cases, the sequences surrounding the phosphorylation sites are well conserved, suggesting an equivalent kinase may be responsible for the phosphorylation events in both organisms. Examples include the α-subunit of pyruvate dehydrogenase (phosphorylation site Ser313 in yeast), and glycogen synthase (phosphorylation sites Ser650 and Ser654 in yeast). Neither of the human proteins is known to be phosphorylated at these positions. The yeast MAPK Hog1p and its human orthologue both require double phosphorylation by a MAPKK at a TGY motif [63]. The similarity of the two sequences suggests an equivalent MAPKK is likely to be responsible in both organisms.
In other cases, the phosphorylated residue appears to be conserved, but the surrounding sequence has diverged to the extent that a protein kinase with a different specificity would be required in the two organisms. One such example involves yeast Ace2p (phosphorylation site Ser701) and its human orthologue KLF14. The residues corresponding to the phosphorylated Ser91 and Ser96 in yeast Bud4p are Ser and Glu in the human orthologue claspin; if claspin Ser225 (the equivalent of Ser91 in Bud4p) was phosphorylated, it would require a protein kinase with a different specificity for phosphorylation, while the Glu provides a constitutive negative charge. Human enolase similarly has a glutamic acid in place of the yeast Eno1p phospho-Ser10.
There are cases where both the yeast and human orthologous proteins are known to be regulated by phosphorylation, but the mechanisms do not appear to be strictly conserved. Protein kinase Snf1 was predicted in this study to be responsible for phosphorylation of Acc1p (acetyl coenzyme A carboxylase) at Ser1157. Human acetyl coenzyme A carboxylase is known to be phosphorylated by AMPK at Ser1201 [64,65]. While the two serines are located in similar regions in their respective proteins, they do not strictly align [66]. Similarly, the regulatory subunit of PKA is a substrate for its catalytic subunit in both S. cerevisiae and mammalian proteins [15]. Again, the phosphorylated residue is in a similar location in the sequence, but they do not strictly align.
However, in many cases the region of the protein phosphorylated in yeast is poorly conserved in humans, or the equivalent region of the protein does not exist in the human protein. For example, in the human orthologue of the yeast protein Sec4p, the N-terminal phosphorylation sites are missing, while the C-terminal sites have diverged in sequence. The sequence surrounding the phosphorylation sites in yeast protein Abp1p is well conserved in the human orthologue mAbp1 [67]; however, the residues equivalent to the phosphorylated Thr181 and Ser183 have been substituted by amino acids other than Ser or Thr.

Conclusion
In this study, we aimed to associate every phosphorylation site in the yeast phosphoproteome reported in the literature [4] with the protein kinase(s) most likely to be responsible for phosphorylation of that site. Our approach made use of the computer program Predikin, which is the only computational method that is able to shed light on the specificity of uncharacterized protein kinases [5,50]. The accuracy of Predikin-based predictions has been demonstrated previously using an experimental cross-validation set [5]. Moreover, several subsequent experimental confirmations of novel Predikin predictions have been reported [6][7][8][9][10]. As part of the present work, we also estimated the probabilities of individual predictions. We identified a possible kinase for most phosphorylation sites in the phosphoproteome dataset, more than half of the associations showing high probabilities. Certain classes of protein kinases have well-defined substrate specificities that make the associations more reliable; these include kinases in the AGC, CaMK, CMGC, MAPKK and CK1 groups. Because the phosphoproteome has been determined using yeast cells not subjected to any particular challenge, it is not surprising that certain groups of protein kinases such as the checkpoint kinases, were not predicted to be responsible for phosphorylation of any substrate in the dataset. On the other hand, housekeeping enzymes such as protein kinases CK1 and CK2 appear to have a number of substrates in the phosphoproteome dataset, supporting fundamental and constitutive roles in cell regulation for these kinases. Our analysis has created a foundation on which to base future experimental work, e.g. the effects of depletion or over-expression of the associated kinase on phosphorylation of the predicted substrate(s).

Association of the phosphoproteins with protein kinases
The procedure we used to associate each phosphorylation site with the protein kinase most likely to be responsible for that phosphorylation event consisted of the following steps.
1. The optimal heptapeptide sequences phosphorylated by the kinases were predicted using Predikin [5]. We considered a set of 116 protein kinases, based on the analysis of Hunter and Plowman [1], and additionally protein kinases Ylr253wp (Yl53p), Atg1p, and Yjl057cp. We did not include Scy1p into the analysis; although this protein is clearly related to protein kinases, it lacks the catalytic aspartate and some other conserved residues. The conserved sequence motifs were ambiguous in the case of Cak1p, Bub1p, Bud32p and Ygr262cp; this should be considered when interpreting the predictions for these kinases. Predictions cannot be carried out for phosphatidylinositol 3 kinase-like kinases (Tor1p and Mec1p) that are only distantly related to Ser/Thr protein kinases. Protein kinases Yck1p, Yck2p, Yck3p and Hrr25p also have identical predicted specificities and were designated as the "CK1 group". Protein kinases Cka1p, Cka2p and Cdc7p have the predicted specificities too similar to be distinguished, and were designated as the "CK2 group".
2. In the case of CMGC kinases (including CDK, MAPK, GSK3β and CLK families), the prediction rules strictly required a Pro residue at P+1 in the substrate. Similarly, the rules for CK2 family required that there was a Glu or Asp at P+1, and that there was a Glu, Ser, Ile or Gly (but not Asp) at P+1 for CK1.
3. The phosphopeptides were sorted and associated with the most likely protein kinase(s), using Scansite 1.5 scores [11] as a guide (the Quick Matrix option of Scansite 1.5 was used as described previously [5]. Scansite scores are not calculated accurately if the phosphorylation site is less than 7 residues from the terminus of the protein. Where no exact match between the phosphopeptide and the predicted motif was possible, the following process was used. (iv) With Glu predominating in sites (-3), (-2), (+1), (+2) and (+3), the kinase is from either the CK1 or the CK2 families (VI_A or VI_B). Ala and Gly at (+1) indicate CK1, while Asp at (+1) is only found with CK2. Phospho-serine or phospho-threonine residues in sites (-3), (-2), (+1), (+2) and (+3) are indicative of CK1. Phosphorylated residues are not acceptable in the (-1) site because of a clash with bound ATP, and this could often suggest the order of phosphorylation in multiply-phosphorylated peptides.
(v) Amino acids such as Gln, Ala, Gly, Leu or Tyr at (-3) indicate a kinase from families IV, V or VI_C-VI_F. A larger hydrophobic residue at (-3) is strongly indicative of family V_C (Ark1p or Prk1p), or possibly Gcn2p (family VII_A). These kinases are distinguished by the (+1) residue; Glu, Val, Thr or Tyr in the case of Ark1p and Prk1p, or Gly, Ser, Asp or Gly in the case of Gcn2p. Most of these kinases have a partially occluded (-2) pocket with a resulting specificity for Val, Ala, Thr or Pro. Specificity for Gln at (-2), on the other hand, indicates family IV_C (Cdc15p) or family V_A (Pbs2p), whereas (-2) specificity for small hydrophilic residues such as Asn or Ser signifies CDC15p (family IV_C). A smaller neutral or hydrophobic residue at (-2) suggests a family IV_A kinase, while a larger hydrophobic residue (such as Leu or Val) at (-2) suggests a family IV_C kinase (Sps1p) or a family VII_A kinase (Ybr097wp). The two alternatives can be distinguished by the (+2) residue specificity, which is for smaller residues in the case of family IV_C. An acidic residue at (+2) indicates Ste11p (family IV_A). A large hydrophobic residue, such as Phe, Leu, Ile or Met, at (+1) also indicates families IV_A (the Ste11p family), IV_B or VI_C. A hydrophilic residue (particularly a smaller one) indicates family V (the Ste20p family). A basic hydrophilic residue at (+1), such as Arg, indicates family VI_C (the Hal5p family). In all cases, residues with larger side chains represent more definitive specificity than those with smaller side chains.
(vi) For the few phosphopeptides that did not satisfy any of the criteria described above, the sequences were sorted according to the amino acids in positions (-3), (-2), (+1), (+2) and (+3) (in the order of decreasing constraints in subsite specificity), with the amino acids grouped as basic, acidic, neutral hydrophilic, small hydrophilic, small hydrophobic, and large hydrophobic (Val or larger), and compared with the predicted kinase specificities. The amino acid preferences for a particular subsite in a kinase can usually be grouped into primary (utilizing the majority of available interactions), secondary (making favourable interactions but not taking advantage of the optimal number of contacts), and compatible (making few or no interactions with the kinase, but compatible with the polarity of the environment; usually amino acids such as Asn, Ser, Pro, Ala and Gly). Small hydrophobic and small hydrophilic residues can usually be accommodated equally well. Accurate predictions are not possible when the phosphoresidue is less than 4 residues from the N-or C-terminus of a protein.
4. The probabilities of predictions were estimated based on the Scansite 1.5 scores (based on the relationships derived as described in the next section). Characterized phosphorylation sites can exhibit a poor match with the optimal phosphorylation sequence of a kinase [5,11]; therefore, it is possible that some phosphorylation sites are substrates for other kinase(s) with similar but weaker match to the optimal specificities. 5. Information on the functions of the phosphoproteins and protein kinases was obtained from various databases (RefSeq [68], Swissprot [69] and others) and literature searches.

Estimation of prediction probabilities
To estimate the probabilities of the predictions of phosphorylation sites in known substrates, we analysed the Scansite scores for all possible phosphorylation sites in a substrate, using a dataset of known protein kinase -substrate pairs (Phosphobase, [70]). We compared the Scansite scores of sites that are phosphorylated to the Scansite scores of sites that are not phosphorylated ( Table 6). The serine and threonine residues that are known not to be phosphorylated yielded a mean Scansite score of 0.260 ± 0.050, while the sites known to be phosphorylated yielded a mean Scansite score of 0.135 ± 0.045. Only 4 out of 558 sites with scores below 0.128 were not phosphorylated, and 10 out of 52 known phosphorylation sites had scores above 0.212 ( Figure 1).
To estimate the probability of a particular protein kinase being associated with a particular phosphorylation site, we require a different analysis that involves the examination of the range of scores obtained for known substrates using a diverse set of kinases. We analysed all the substrates listed in Phosphobase [70] for 8 diverse protein kinases (PKA, PKC, CaMK2, PHK, CDK1, MAPK, CK1 and CK2). There is n% probability that a kinase is responsible for a phosphorylation at a site yielding a particular Scansite score, when (100-n)% phosphorylation sites have Scansite scores lower or equal to that particular score (Figure 1).