The reconstruction of signal transduction networks is intensively applied in different fields of biomedicine, particularly, for identification of promising drug targets. Designed for biological network analysis databases support the effective integration of huge data obtained in large-scale experiments [1, 2]. However, the experimentally derived data has many gaps, which lead to difficulties in simulating the cell signaling pathways. This problem can be settled by the network enrichment with predicted interactions. In this study we propose to apply the previously published method PAAS (Projection of Amino Acid Sequences) [3, 4] for the enrichment of signal transduction networks through the recognition of proteins phosphorylated by certain kinases. We applied PAAS method to TRANSPATH® database to estimate its efficiency and to predict of the new interactions that could be used for the enrichment of signal transduction networks. The TRANSPATH® database is manually curated information resource providing both specific and general information on signal transduction that can has also the means for network analysis . TRANSPATH® database is one of the most comprehensive collections of experimentally verified data on signal transduction in eukaryotic cells. Still, many signaling interactions in various cell types are not documented in TRANSPATH®. This gap of knowledge can hamper the analysis of signaling networks and the prediction of functionally important elements. We suppose that addition of interactions predicted by the algorithm presented here will be useful for filling up of these gaps.
Several bioinformatics approaches were applied for prediction of the new functional characteristics of proteins with the aim of determination of new network nodes and edges . Using the predictive tools one can significantly enrich the database and reconstruct more relevant models. It allows detection of promising drug targets.
Several well known algorithms use the network context information based on the protein location in the network  and on the comparison of the networks constructed for different species . Frequently, such context information is very sparse. The amino acid sequences of proteins can serve as an important informational source for increasing the reliability of predicted proteins that participate in signal transduction.
The signaling network can be represented as a series of protein-protein interactions; therefore, the methods for prediction of the interacting protein pairs can also be used for the network enrichment. Some methods are based on the calculation of co-variation of positional substitutions in aligned sequences of interacting protein families . In other methods, the members of the query pair are compared to the training set with the known protein interactions . PIPE-like methods  calculate the similarity of short regions for the input sequence pair and the training sets and estimate the putative interactions based on the resulting matrix with the number of matches above the given threshold included. PPI-SP method is also based on the sequence comparison, but each input sequence pair is represented as vector of similarity scores calculated by the Smith-Waterman alignment . The prediction of interacting pairs is performed by SVM algorithm.
In the sequence-based method for prediction of protein-protein interactions the both members of each pair are compared with the sets of sequences of known interacting proteins. We used an original sequence-based method of protein classification PAAS [3, 4]. In this study the training set consisted of the known protein kinase substrates, classified according to the kinase types that can be considered as recognition of substrate specificity class using only the substrate sequences. PAAS method [3, 4] is particularly appropriate for the situation when the single kinase phosphorylates many different substrates and, therefore, participates in many pathways. So, the suggested method can be applied in wide area of signal transduction pathways.
Generally, the proposed positional score is close to the measures used in other approaches - summation of weights of coincided positions (e.g. BLOSUM or PAM matrices) over the sliding window. All such methods require the shifting of sequences to each other. The more sophisticated local alignment procedure can also be considered as merging the local un-gapped similarities. Unlike other algorithms, in our approach the projection scores are assigned to each position of the query sequence. The maximal value of scores is calculated for all regions containing this position. It resembles the local alignment algorithm with more simple realization. The training sequences are projected onto the query sequence, and the summarized values obtained for all positions and all training set classes are the input to the classifier. This simple procedure does not require the large memory space. Unlike the methods based on the algorithmic alignment, PAAS algorithm does not contain the time-consuming steps.
It was shown that PAAS provides high accuracy of the functional class prediction composed of homologous amino acid sequences revealing the global sequence similarity. The proteins interacting with the same protein partner can also be characterized by the global sequence similarity. However, in many cases the proteins reveal only the local similarity. We consider that the proposed approach can be useful for determination of the proteins in the interaction network.
The proposed approach was applied for prediction of new interactions in protein phosphorylation networks. The interaction cascades between protein kinases and their substrates play a key role in cell cycle regulation, in the normal and tumor cells . Protein phosphorylation (including substrate specificity of different protein kinase types, phosphorylated peptides and regions responsible for kinase-substrate binding) is well studied, providing a lot of information necessary for the evaluation and improvement of the method. The proteins included into the training set were classified according to the kinase's specificity, so that each class consisted of the proteins phosphorylated by the same kinase.
The common approach for prediction of protein kinase substrates involves the recognition of specific regions in amino acid sequences. The data set of experimentally determined phosphorylated peptides is used to compose the sequence motifs surrounding the modified Thr, Ser or Tyr residues. However, the phosphorylation motifs are not sufficient for provision of strongly specific interaction of the kinase and its substrates. The additional regions located in the substrate proteins are responsible for the enzyme recruitment, i.e. for increasing the probability of binding between kinase and substrate .
The algorithms based on the recognition of phosphorylation motifs and other interaction regions are used for searching of these motifs in the annotated sequences. The software like ScanSite , NetPhosK , PredPhospho  use the different mathematical approaches including Hidden Markov Models or Support Vector Machine. They provide the prediction of the substrates of certain kinases with high accuracy on the basis of sequence mapping . In contrast to the above mentioned methods the data from the signal transduction networks frequently do not allow to make the sequence mapping. In this study we investigated the efficiency of our approach, if the amino acid sequences of training set were not mapped.
At the first stage of this study, we validated PAAS method on the basis of the known kinase-substrate interactions. At the second stage, we applied the suggested approach for prediction of new interactions for the proteins stored in TRANSPATH® database. At the third stage, the predicted interactions were used for the enrichment of network. It helped us to reconstruct potential cell signaling cascades.