Skip to main content

Advertisement

NBA-Palm: prediction of palmitoylation site implemented in Naïve Bayes algorithm

Article metrics

Abstract

Background

Protein palmitoylation, an essential and reversible post-translational modification (PTM), has been implicated in cellular dynamics and plasticity. Although numerous experimental studies have been performed to explore the molecular mechanisms underlying palmitoylation processes, the intrinsic feature of substrate specificity has remained elusive. Thus, computational approaches for palmitoylation prediction are much desirable for further experimental design.

Results

In this work, we present NBA-Palm, a novel computational method based on Naïve Bayes algorithm for prediction of palmitoylation site. The training data is curated from scientific literature (PubMed) and includes 245 palmitoylated sites from 105 distinct proteins after redundancy elimination. The proper window length for a potential palmitoylated peptide is optimized as six. To evaluate the prediction performance of NBA-Palm, 3-fold cross-validation, 8-fold cross-validation and Jack-Knife validation have been carried out. Prediction accuracies reach 85.79% for 3-fold cross-validation, 86.72% for 8-fold cross-validation and 86.74% for Jack-Knife validation. Two more algorithms, RBF network and support vector machine (SVM), also have been employed and compared with NBA-Palm.

Conclusion

Taken together, our analyses demonstrate that NBA-Palm is a useful computational program that provides insights for further experimentation. The accuracy of NBA-Palm is comparable with our previously described tool CSS-Palm. The NBA-Palm is freely accessible from: http://www.bioinfo.tsinghua.edu.cn/NBA-Palm.

Background

Protein palmitoylation is a reversible lipid modification that plays important roles in cell signaling associated with cellular dynamics and plasticity. However, very little is known about the molecular mechanism underlying this modification and regulation in cells. Palmitoylation, also known as S-acylation, is one of the most ubiquitous post-translational modifications (PTM), reversibly attaching a 16-carbon saturated fatty acid as lipid palmitate (C16:0) to cysteine residues in protein substrates through thioester linkage [16]. Biochemically, palmitoylation increases the hydrophobicity of proteins to promote protein-membrane association [16]. Also, palmitoylation modifies numerous proteins to control protein-protein interaction [79], intracellular trafficking [10, 11], lipid raft targeting [12, 13], and proteins' activities [8, 14], etc. Moreover, palmitoylation has been implicated in a variety of biological and physiological processes, including signal transduction [14, 15], mitosis [16], neuronal development [3, 6], and apoptosis [17], etc. Although protein palmitoylation has attracted extensive attention, its molecular mechanisms still remain to be elusive.

Identification of palmitoylation sites is essential for a better understanding the molecular regulation of palmitoylation process. To date, only a few palmitoylation sites have been experimentally identified. Although several efficient techniques, such as mass spectrometry (MS), have been employed recently, most of the known palmitoylation sites are mapped by mutagenesis of candidate cysteine residues with conventional biochemical methods. The features of substrate specificity for palmitoylation is still unclear and most previous studies have proposed that there is no common and canonical consensus sequence/motif for palmitoylation [1, 35].

Moreover, only a few palmitoyltransferases have been identified although palmitoylation of proteins has been known for many years [2, 4, 18, 19]. Palmitoylation of proteins can be carried out in both enzyme- and nonenzyme-dependent manners [5, 1820]. These intrinsic but diversified characteristics of palmitoylation introduce great difficulties into choosing appropriate candidate cysteine residues in the substrates for further experimental manipulation. Thus, in silico prediction of palmitoylation sites implemented in an apt algorithm/approach is in urgent need and insightful for further experimental design.

Previously, we developed a computational program named CSS-Palm, deployed with the approach of Clustering and Scoring strategy [21]. In that work, the data set for training was curated from scientific literature (PubMed) with 210 experimentally verified palmitoylated sites from 83 distinct proteins (referred to as old data set). Due to the fast pace of research progress in this area, more palmitoylation sites have been identified since our last publication of CSS-Palm. After survey recent progress and redundancy elimination, the final data set includes 245 non-homologous sites from 105 proteins (referred to as new data set, see in Table 1). We then employ several machine learning algorithms including Naïve Bayes [22], Support Vector Machines (SVMs) [23] and RBF Networks [24] for palmitoylation site prediction. Also, the proper window length for a potential palmitoylated peptide has been optimized. The accuracy of prediction performance fluctuates from 82% to 86%. By comparison, the Naïve Bayes approach achieves the best accuracy of 85.79% for 3-fold cross-validation, 86.72% for 8-fold cross-validation and 86.74% for Jack-Knife validation, with the window length of six. Thus, we construct a computational web service of NBA-Palm – prediction of palmitoylation site implemented in Naïve Bayes algorithm. And the prediction performance is comparable with our previous work of CSS-Palm.

Table 1 The detailed description of data set.

Results & discussions

Functional analysis of Palmitoylated Proteins

In order to elucidate the molecular determinants responsible for protein palmitoylation, we downloaded the GO annotation files for Uniprot from EBI-GOA [25] for processing. In our non-redundant data set with 105 palmitoylated proteins, we have observed 455 distinct GO categories. Table 2 shows the top five Gene Ontology (GO) entries of biological processes, molecular functions and cellular components of palmitoylated proteins.

Table 2 Top five Gene Ontology (GO) groups of biological processes, molecular functions and cellular components in palmitoylated proteins.

The most abundant GO item of biological process in which palmitoylated proteins are implicated is "signal transduction" (26 proteins). The other four biological processes are "G-protein coupled receptor protein signaling pathway" (21 proteins), "transport" (16 proteins), "ion transport" (7 proteins) and "cell adhesion" (7 proteins). The most enriched GO group of molecular function is "protein binding" (41 proteins), while the other four highly-abundant molecular functions are "receptor activity" (27 proteins), "signal transducer activity" (25 proteins), "G-protein coupled receptor activity" (15 proteins) and "rhodopsin-like receptor activity" (14 proteins). Again, the most frequent GO entry of cellular component is "membrane" (70 proteins), and the other four highly-frequent cellular components are "integral to membrane" (54 proteins), "plasma membrane" (245 proteins), "integral to plasma membrane" (19 proteins) and "endoplasmic reticulum" (9 proteins).

Taken together, the computational analyses of the palmitoylated proteins support the notion that palmitoylated proteins carry diversified cellular functions. The result points to two conclusions. First, the data set is general enough and suitable for our prediction work as training data set. Second, computational tools which can accelerate palmitoylation function research are valuable and helpful.

Performance of NBA-Palm

We carried out 3-fold cross-validation, 8-fold cross validation and the Jack-Knife validation to evaluate the performance of NBA-Palm (shown in Table 3 and Table 4). On the old data set, NBA-Palm achieves best average MCC of 0.594 with window length of six. On the new data set, the best average MCC is 0.548 with the same optimized window length of six. The prediction performances on the old and new data set are very similar. However, the performance on the new data set is slightly lower than that on the old data set. To find out the reason of this performance decrease, we built sequence logos [26] on the old and new data sets (shown in Figure 1a and Figure 1b). Both two logos show that around palmitoylation sites there is a Leucine/Cysteine-rich region. Comparison of the two logos leads to the observation that the pattern of the old data set is slightly stronger than that of the new data set. This may explain why performance of NBA-Palm on the new data set is slightly lower.

Table 3 Comparison of the prediction performance for three machine learning algorithms on old data set.
Table 4 Comparison of the prediction performance for three machine learning algorithms on new data set.
Figure 1
figure1

The sequence logos of palmitoylation sites. Both two logos show that around palmitoylation sites there is a Leucine/Cysteine-rich region. A taller letter indicates that this kind of residue is more frequently used. (a) on old data set; (b) on new data set.

Comparison of Prediction Performance with several machine learning algorithms

Besides Naïve Bayes, we also adopt two additional machine learning algorithms, RBF networks and support vector machines (SVMs), to predict palmitoylation site. Table 3 and Table 4 show the detailed performances of the three algorithms on the old and new data sets, separately. Several conclusions can be reached: firstly, despite of its simple structure, Naïve Bayes is overall the best algorithm. However, its performance is only slightly better than that of the other two. Secondly, best window lengths for the three algorithms are not identical, e.g. on new data set 6 for Naïve Bayes, 8 for RBF networks and 7 for SVMs, according to average MCC of 3-fold cross-validation, 8-fold cross-validation and Jack-Knife validation. Thirdly, performances of Jack-Knife tests are often better than those of 3-fold and 8-fold cross-validations because there are more training data and less test data. Among the three algorithms, SVM has the largest differences in MCC between 3-fold cross-validation, 8-fold cross validation and Jack-Knife test while Naïve Bayes has the smallest. This implies that Naïve Bayes may be the most robust algorithm when changing the numbers of training data and test data. And the window length of Naïve Bayes algorithm is optimized as six by comparison of the average MCC. Hence, Naïve Bayes is a very simple-structured algorithm with high performance and robustness, which is extremely suitable for biological classification problems.

Comparison with previously described analysis CSS-Palm

Performance comparison was carried out between NBA-Palm and the previously established method CSS-Palm [21] on the same old data set. Details are shown in Table 5. In the Jack-Knife validation, NBA-Palm performs comparatively with CSS-Palm in all metrics. However, in 3-fold cross-validation, NBA-Palm achieves much higher MCCs, which is probably due to the volatility of the 3-fold cross-validation, because the 3-fold cross-validation uses less training data (2/3 of whole data set) and makes predictions on more testing data (1/3 of whole data set) while Jack-Knife validation uses all data but one for training. The result implies that the robustness of the Naïve Bayes method is probably inherited from the nature of probability theory. This is consistent with the conclusion above achieved in comparison of Naïve Bayes and SVM. In contrast, CSS-Palm is based on sequence/peptide homology scoring and clustering. And lacking a key sequence/peptide in training data might cause large changes in clustering results. Thus, CSS-Palm depends on training data heavily with less robustness.

Table 5 Comparison of prediction performances between NBA-Palm and CSS-Palm.

Perspective of Future work

Our work points to several paths for further research. Firstly, as the proteomic techniques continue to be improved, more and more palmitoylation sites will be identified. We can expect that the accuracies will be further improved with more training data. Secondly, some other machine learning methods could be applied, i.e., decision trees [24] and hidden Markov models [27]. These approaches could be used separately or combined together to build potentially better models. Thirdly, evolutionary information, for example, phylogenetic conservation between human and mouse, can also be integrated into the prediction system to improve its accuracy.

Conclusion

In this work, we present a new method for protein palmitoylation site prediction based on Naïve Bayes. The performance is satisfactorily high. Comparison between Naïve Bayes, RBF networks and SVMs was also carried out, and demonstrated that Naïve Bayes outperforms the other two methods. We also compared NBA-Palm with our previously established method CSS-Palm. The comparison demonstrates that NBA-Palm carries superior computing efficiency to CSS-Palm with equal predicting accuracy. These results indicate that Naïve Bayes is an effective classification algorithm for biological problems. In addition, with high specificity and sensitivity, NBA-Palm could be a valuable computational tool for functional proteomic biologists.

Methods

Data Preparation

Here we define the cysteine (C) residues that undergo palmitoylated modification as positive data (+), while those non-palmitoylated cysteine residues are regarded as negative data (-). Previously, we have collected 210 experimentally verified palmitoylation sites of 84 proteins [21]. Since palmitoylation-related research is updated rapidly, more and more palmitoylated sites have been identified and reported. We searched the PubMed with the keyword "palmitoylation" to collect new palmitoylation sites. Now the updated new data set contains 266 sites from 111 proteins (before March. 31st, 2006). We then retrieved the primary sequences of these proteins from Swiss-Prot/TrEMBL database [28]. The final curated data set is available upon request.

The positive data (+) set for training might contain several homologous sites from homologous proteins. If the training data are highly redundant with too many homologous sites, the prediction accuracy will be overestimated. To avoid the overestimation, we clustered the protein sequences from positive data (+) set with a threshold of 30% identity by BLASTCLUST [29], one program of clustering highly homologous sequences into distinct groups. If two proteins were similar with ≥30% identity, we re-aligned the proteins with BL2SEQ, a program in the BLAST package [29], and checked the results manually. If two palmitoylation sites from two homologous proteins were at the same position after sequence alignment, only one item was reserved while the other was discarded. Thus, we obtained a non-redundant positive data (+) of high quality with 245 palmitoylation sites from 105 proteins.

As previously described [30, 31], the negative (-) sites were composed of non-annotated cysteine residues in the same proteins from which positive (+) sites were taken, instead of using proteins randomly picked from the Swiss-Prot/TrEMBL database. Thus, both (+) and (-) sites are extracted from the same protein sequences, making our test more strict. Obviously, the (-) sites may contain some false negative hits – these cysteine residues in fact undergo palmitoylation but are not characterized so far. In this regard, the prediction performance of any computational approaches will overestimate the false positive rates. However, without a high-quality gold-standard (-) set, this overestimation is inevitable.

For comparing the prediction performance from NBA-Palm with our previous tool of CSS-Palm [21], both the previously used old data set from CSS-Palm and the new updated data set were used. The detailed information of data description is listed in Table 1.

Algorithm design and validation

Sequence coding

We employed a traditional sliding window strategy to represent a potentially palmitoylated peptide (PPP). Given the window length n, a fragment of 2n residues centering on palmitoylated site was adopted to represent a PPP. Since there is always C in middle of a PPP, we didn't include the center site into the encoding fragment. We chose an orthogonal binary coding scheme to transform protein sequences into numeric vectors. For example, Glycine was designated as 00000000000000000001, Alanine designated as 00000000000000000010, and so on. The length of final feature vector representing the palmitoylated site is n × 2 ×20. Different values of n varying from 3 to 8 were used to determine the optimized window length.

The Machine Learning Algorithms

Naïve Bayes is a classification model based on so-called Bayes theorem [22]. Naïve Bayes classifiers assume that the effect of a variable value on a given class is independent of the values of other variables. This assumption is called class conditional independence. It is made to simplify the computation and in this sense is considered to be "Naïve". Given a potential palmitoylation site X, described by its 0–1 feature vector (x1, x2,..., xn) described in above section, we are looking for a class C that maximizes the likelihood: P(X|C)=P(x1, x2,..., xn|C) where C can be "palmitoylation" or "non-palmitoylation". The assumption of class conditional independence allows us to decompose the likelihood to a product of simpler probabilities: P ( X | C ) = i = 1 n P ( x i | C ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGqbaucqGGOaakcqqGybawcqGG8baFcqWGdbWqcqGGPaqkcqGH9aqpdaqeWbqaaiabdcfaqjabcIcaOiabdIha4naaBaaaleaacqWGPbqAaeqaaOGaeiiFaWNaem4qamKaeiykaKcaleaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGUbGBa0Gaey4dIunaaaa@43AE@ . Despite of its simple structure and ease of implementation, Naïve Bayes often performs comparatively well with other algorithms, such as SVMs and neural networks.

The support vector machine (SVM) is a new machine learning method, which has been applied for many kinds of pattern recognition problems. The principle of the SVM method is to transform the samples into a high dimension Hilbert space and seek a separating hyperplane in the space. The separating hyperplane, called the optimal separating hyperplane, is chosen in such a way as to maximize its distance from the closest training samples. As a supervised machine learning technology, SVM is well founded theoretically on Statistical Learning Theory [23]. Recently, SVM has been successfully adopted to solve many biological problems, such as predicting protein subcellular locations [32], protein secondary structures [32, 33], tumor classification [34] and phosphorylation sites [30, 31]. In present work, the feature vector of each potential palmitoylation site was transformed into a higher dimension space through polynomial kernel function.

The RBF network is a kind of multi-layer, feed-forward artificial neural network [24]. An RBF network consists of three layers, namely the input layer, the hidden layer, and the output layer. The input layer broadcasts the coordinates of the input vector to each of the nodes in the hidden layer. Each node in the hidden layer then produces an activation based on the associated radial basis function. Finally, each node in the output layer computes a linear combination of the activations of the hidden nodes. How an RBF network reacts to a given input stimulus is completely determined by the activation functions associated with the hidden nodes and the weights associated with the links between the hidden layer and the output layer. In our model, after feature vectors were fed into input layers, the links between nodes were iteratively updated until convergence. The output layer finally produced the decision of "palmitoylation" or "non-palmitoylation".

The Jack-Knife validation and n-fold cross-validation

The prediction performances of NBA-Palm were evaluated by the 3-fold cross-validation, 8-fold cross-validation and the Jack-Knife validation, for the convenience of comparison with the previous method CSS-Palm. In the Jack-Knife validation, which is also named "leave-one-out" cross-validation, each sample in the dataset is singled out in turn as an independent test sample, and all the remaining samples are used as training data. This process is repeated until every sample is used as test sample one time. In n-fold cross validation all the (+) sites and (-) sites were combined and then divided equally into n parts, keeping the same distribution of (+) and (-) sites in each part. Then n-1 parts were merged into a training data set while the one part left out was taken as a test data set. The average accuracy of n-fold cross validation was used to estimate the performance. All models were implemented in the WEKA software package[35].

Performance measurements

We adopted four frequently considered measurements: accuracy(Ac), sensitivity (Sn), specificity (Sp) and Mathew correlation coefficient (MCC). Accuracy(Ac) illustrates the correct ratio between both positive (+) and negative (-) data sets, while sensitivity (Sn) and specificity (Sp) represent the correct prediction ratios of positive (+) and negative data (-) sets respectively. However, when the number of positive data and negative data differ too much from each other, the Mathew correlation coefficient (MCC) should be included to evaluate the prediction performance. The value of MCC ranges from -1 to 1, and a larger MCC value stands for better prediction performance.

Among the data with positive hits by NBA-Palm, the real positives are defined as true positives (TP), while the others are defined as false positives (FP). Among the data with negative predictions by NBA-Palm, the real positives are defined as false negatives (FN), while the others are defined as true negatives (TN). The performance measurements of sensitivity (Sn), specificity (Sp), accuracy (Ac), and Mathew correlation coefficient (MCC) are all defined as below:

S n = T P T P + F N MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGtbWucqWGUbGBcqGH9aqpdaWcaaqaaiabdsfaujabdcfaqbqaaiabdsfaujabdcfaqjabgUcaRiabdAeagjabd6eaobaaaaa@3826@
(1)

, S p = T N T N + F P MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGtbWucqWGWbaCcqGH9aqpdaWcaaqaaiabdsfaujabd6eaobqaaiabdsfaujabd6eaojabgUcaRiabdAeagjabdcfaqbaaaaa@3826@ ,

A c = T P + T N T P + F P + T N + F N MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGbbqqcqWGJbWycqGH9aqpdaWcaaqaaiabdsfaujabdcfaqjabgUcaRiabdsfaujabd6eaobqaaiabdsfaujabdcfaqjabgUcaRiabdAeagjabdcfaqjabgUcaRiabdsfaujabd6eaojabgUcaRiabdAeagjabd6eaobaaaaa@417C@
(2)

,

M C C = ( T P × T N ) ( F N × F P ) ( T P + F N ) × ( T N + F P ) × ( T P + F P ) × ( T N + F N ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGnbqtcqWGdbWqcqWGdbWqcqGH9aqpdaWcaaqaaiabcIcaOiabdsfaujabdcfaqjabgEna0kabdsfaujabd6eaojabcMcaPiabgkHiTiabcIcaOiabdAeagjabd6eaojabgEna0kabdAeagjabdcfaqjabcMcaPaqaamaakaaabaGaeiikaGIaemivaqLaemiuaaLaey4kaSIaemOrayKaemOta4KaeiykaKIaey41aqRaeiikaGIaemivaqLaemOta4Kaey4kaSIaemOrayKaemiuaaLaeiykaKIaey41aqRaeiikaGIaemivaqLaemiuaaLaey4kaSIaemOrayKaemiuaaLaeiykaKIaey41aqRaeiikaGIaemivaqLaemOta4Kaey4kaSIaemOrayKaemOta4KaeiykaKcaleqaaaaaaaa@65AA@
(3)

.

ROC curves

The prediction performance of Naïve Bayesian algorithm with window length of six is very similar to that of seven. To compare their performance in detail, ROC curves were used for intuitively visualizing prediction performance (see in Figure 2). ROC curves plot the true positive rate as a function of the false positive rate, which is equal to 1-specificity. The area under the ROC curve (the ROC score) is the average sensitivity over all possible specificity values, which can be used as a measure of prediction performance over different thresholds. ROC curves of random predictors will be around the diagonal line from bottom left to top right with scores of about 0.5, while a perfect predictor will produce a curve along the left and top boundary of the square and will receive a score of one.

Figure 2
figure2

The ROC curves for potential palmitoylated peptides with window length of six. The "3 fold CV" stands for 3 fold cross-validation, the "8 fold CV" for 8 fold cross-validation and the "Jack-Knife" stands for the Jack-Knife validation. The "AUC" stands for Area Under Curve score. (a) ROC curves on old data set; (b) ROC curves on new data set.

References

  1. 1.

    Bijlmakers MJ, Marsh M: The on-off story of protein palmitoylation. Trends Cell Biol 2003, 13(1):32–42. 10.1016/S0962-8924(02)00008-9

  2. 2.

    Dietrich LE, Ungermann C: On the mechanism of protein palmitoylation. EMBO Rep 2004, 5(11):1053–1057. 10.1038/sj.embor.7400277

  3. 3.

    el-Husseini Ael D, Bredt DS: Protein palmitoylation: a regulator of neuronal development and function. Nat Rev Neurosci 2002, 3(10):791–802. 10.1038/nrn940

  4. 4.

    Linder ME, Deschenes RJ: New insights into the mechanisms of protein palmitoylation. Biochemistry 2003, 42(15):4311–4320. 10.1021/bi034159a

  5. 5.

    Smotrys JE, Linder ME: Palmitoylation of intracellular signaling proteins: regulation and function. Annu Rev Biochem 2004, 73: 559–587. 10.1146/annurev.biochem.73.011303.073954

  6. 6.

    Huang K, El-Husseini A: Modulation of neuronal protein trafficking and function by palmitoylation. Curr Opin Neurobiol 2005, 15(5):527–535. 10.1016/j.conb.2005.08.001

  7. 7.

    Yang X, Kovalenko OV, Tang W, Claas C, Stipp CS, Hemler ME: Palmitoylation supports assembly and function of integrin-tetraspanin complexes. J Cell Biol 2004, 167(6):1231–1240. 10.1083/jcb.200404100

  8. 8.

    Zhou B, Liu L, Reddivari M, Zhang XA: The palmitoylation of metastasis suppressor KAI1/CD82 is important for its motility- and invasiveness-inhibitory activity. Cancer Res 2004, 64(20):7455–7463. 10.1158/0008-5472.CAN-04-1574

  9. 9.

    Clark KL, Oelke A, Johnson ME, Eilert KD, Simpson PC, Todd SC: CD81 associates with 14–3-3 in a redox-regulated palmitoylation-dependent manner. J Biol Chem 2004, 279(19):19401–19406. 10.1074/jbc.M312626200

  10. 10.

    Kalinina EV, Fricker LD: Palmitoylation of carboxypeptidase D. Implications for intracellular trafficking. J Biol Chem 2003, 278(11):9244–9249. 10.1074/jbc.M209379200

  11. 11.

    Navarro-Lerida I, Corvi MM, Barrientos AA, Gavilanes F, Berthiaume LG, Rodriguez-Crespo I: Palmitoylation of inducible nitric-oxide synthase at Cys-3 is required for proper intracellular traffic and nitric oxide synthesis. J Biol Chem 2004, 279(53):55682–55689. 10.1074/jbc.M406621200

  12. 12.

    Salaun C, Gould GW, Chamberlain LH: The SNARE proteins SNAP-25 and SNAP-23 display different affinities for lipid rafts in PC12 cells. Regulation by distinct cysteine-rich domains. J Biol Chem 2005, 280(2):1236–1240. 10.1074/jbc.M410674200

  13. 13.

    Wong W, Schlichter LC: Differential recruitment of Kv1.4 and Kv4.2 to lipid rafts by PSD-95. J Biol Chem 2004, 279(1):444–452. 10.1074/jbc.M304675200

  14. 14.

    Vazquez P, Roncero I, Blazquez E, Alvarez E: Substitution of the cysteine 438 residue in the cytoplasmic tail of the glucagon-like peptide-1 receptor alters signal transduction activity. J Endocrinol 2005, 185(1):35–44. 10.1677/joe.1.06031

  15. 15.

    Kleuss C, Krause E: Galpha(s) is palmitoylated at the N-terminal glycine. Embo J 2003, 22(4):826–832. 10.1093/emboj/cdg095

  16. 16.

    Caron JM, Vega LR, Fleming J, Bishop R, Solomon F: Single site alpha-tubulin mutation affects astral microtubules and nuclear positioning during anaphase in Saccharomyces cerevisiae: possible role for palmitoylation of alpha-tubulin. Mol Biol Cell 2001, 12(9):2672–2687.

  17. 17.

    Wang DA, Sebti SM: Palmitoylated cysteine 192 is required for RhoB tumor-suppressive and apoptotic activities. J Biol Chem 2005, 280(19):19243–19249. 10.1074/jbc.M411472200

  18. 18.

    Fukata M, Fukata Y, Adesnik H, Nicoll RA, Bredt DS: Identification of PSD-95 palmitoylating enzymes. Neuron 2004, 44(6):987–996. 10.1016/j.neuron.2004.12.005

  19. 19.

    Huang K, Yanai A, Kang R, Arstikaitis P, Singaraja RR, Metzler M, Mullard A, Haigh B, Gauthier-Campbell C, Gutekunst CA, Hayden MR, El-Husseini A: Huntingtin-interacting protein HIP14 is a palmitoyl transferase involved in palmitoylation and trafficking of multiple neuronal proteins. Neuron 2004, 44(6):977–986. 10.1016/j.neuron.2004.11.027

  20. 20.

    Wolff J, Zambito AM, Britto PJ, Knipling L: Autopalmitoylation of tubulin. Protein Sci 2000, 9(7):1357–1364.

  21. 21.

    Zhou F, Xue Y, Yao X, Xu Y: CSS-Palm: palmitoylation site prediction with a clustering and scoring strategy (CSS). Bioinformatics 2006, 22(7):894–896. 10.1093/bioinformatics/btl013

  22. 22.

    Borgelt C, Kruse R: Graphical Models - Methods for Data Analysis and Mining. Chichester, United Kingdom, J. Wiley and Sons; 2002.

  23. 23.

    Vapnik V: The Nature of Statistical Learning Theory. Springer; 1995.

  24. 24.

    Mitchell T: Machine learning. McGraw Hill; 1997.

  25. 25.

    EBI-GOA[http://www.ebi.ac.uk/GOA/]

  26. 26.

    Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res 2004, 14(6):1188–1190. 10.1101/gr.849004

  27. 27.

    Durbin R, Eddy S, Krogh A, Mitchison G: Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press; 1998.

  28. 28.

    Swiss-Prot/TrEMB[http://cn.expasy.org]

  29. 29.

    Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389–3402. 10.1093/nar/25.17.3389

  30. 30.

    Xue Y, Zhou F, Zhu M, Ahmed K, Chen G, Yao X: GPS: a comprehensive www server for phosphorylation sites prediction. Nucleic Acids Res 2005, 33(Web Server issue):W184–7. 10.1093/nar/gki393

  31. 31.

    Zhou FF, Xue Y, Chen GL, Yao X: GPS: a novel group-based phosphorylation predicting and scoring method. Biochem Biophys Res Commun 2004, 325(4):1443–1448. 10.1016/j.bbrc.2004.11.001

  32. 32.

    Hua S, Sun Z: A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J Mol Biol 2001, 308(2):397–407. 10.1006/jmbi.2001.4580

  33. 33.

    Guo J, Chen H, Sun Z, Lin Y: A novel method for protein secondary structure prediction using dual-layer SVM and profiles. Proteins 2004, 54(4):738–743. 10.1002/prot.10634

  34. 34.

    Lee Y, Lee CK: Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 2003, 19(9):1132–1139. 10.1093/bioinformatics/btg102

  35. 35.

    Witten. IH, Frank. E: Data Mining: Practical machine learning tools and techniques. 2nd edition. San Francisco, Morgan Kaufmann; 2005.

Download references

Acknowledgements

This work is supported by National Natural Science Grant (90408019, 90303017), Chinese 863 projects (2002AA234041, 2002AA231031) and Chinese 973 projects (2003CB715900) to Z. Sun, and Chinese Natural Science Foundation (39925018, 30270654, 30270293, and 90508002), Chinese Academy of Science (KSCX2-2-01), Chinese 973 project (2002CB713700), Chinese 863 project (2001AA215331), and Chinese Minister of Education (20020358051) to X. Yao. X. Yao is a Cheung Kong Scholar and GCC Eminent Cancer Scholar.

Author information

Correspondence to Zhirong Sun or Xuebiao Yao.

Additional information

Authors' contributions

YX and HC should be regarded as joint First Authors. YX and HC designed the methodology, carried out the analysis, developed the web service and drafted the manuscript. CJ contributed several insightful opinions and improved manuscript considerably. XY and ZS coordinated the research and finalized the manuscript.

Yu Xue, Hu Chen contributed equally to this work.

Authors’ original submitted files for images

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Xue, Y., Chen, H., Jin, C. et al. NBA-Palm: prediction of palmitoylation site implemented in Naïve Bayes algorithm. BMC Bioinformatics 7, 458 (2006) doi:10.1186/1471-2105-7-458

Download citation

Keywords

  • Support Vector Machine
  • Gene Ontology
  • Prediction Performance
  • Window Length
  • Palmitoylation Site