A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs
- Hui-Ju Kao†1,
- Chien-Hsun Huang†1, 2,
- Neil Arvin Bretaña3,
- Cheng-Tsung Lu1,
- Kai-Yao Huang1,
- Shun-Long Weng4, 5, 6Email author and
- Tzong-Yi Lee1, 7Email author
© Kao et al.; 2015
Published: 9 December 2015
Protein O-GlcNAcylation, involving the β-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity should aid understanding how O-GlcNAc contributes to diverse cellular processes. Due to an increasing number of O-GlcNAcylated peptides with site-specific information identified by mass spectrometry (MS)-based proteomics, we were motivated to characterize substrate site motifs of O-GlcNAc transferases. In this investigation, a non-redundant dataset of 410 experimentally verified O-GlcNAcylation sites were manually extracted from dbOGAP, OGlycBase and UniProtKB. After detection of conserved motifs by using maximal dependence decomposition, profile hidden Markov model (profile HMM) was adopted to learn a first-layered model for each identified OGT substrate motif. Support Vector Machine (SVM) was then used to generate a second-layered model learned from the output values of profile HMMs in first layer. The two-layered predictive model was evaluated using a five-fold cross validation which yielded a sensitivity of 85.4%, a specificity of 84.1%, and an accuracy of 84.7%. Additionally, an independent testing set from PhosphoSitePlus, which was really non-homologous to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (84.05%) and outperform other O-GlcNAcylation site prediction tools. A case study indicated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation and has been implemented as a web-based system, OGTSite, which is now freely available at http://csb.cse.yzu.edu.tw/OGTSite/.
A type of O-linked glycosylation, Protein O-GlcNAcylation (O-GlcNAc), attaches a single N-acetylglucosamine (GlcNAc) to serine (Ser)/threonine (Thr) residues . O-GlcNAc, commonly found on cytoplasmic and nuclear proteins, has been shown to modulate molecular processes and cellular processes . O-GlcNAc transferase (OGT) is an enzyme responsible for the addition of O-GlcNAc during glycosylation. On the other hand, an enzyme O-GlcNAcase (OGA) can remove O-GlcNAc. Recently, extracellular O-linked β-N-acetylglucosamine (EOGT) , an atypical OGT, has been reported to be responsible for extracellular O-GlcNAcylation of secreted and membrane glycoproteins . Protein O-GlcNAcylation is also responsible for regulating cell-cell and cell-matrix interactions . Accumulating evidence suggests that OGTs may act as a nutrient sensor that links hexosamine biosynthesis pathway to oncogenic signaling and regulation of factors involved in glucose and lipid metabolism . The O-GlcNAc-dependent regulation seems to play an important role in the signaling pathways involved in metabolic reprograming of cancer cells . In addition, O-GlcNAcylation is also an important post-translational modification and deregulation of this mechanism has been linked to various diseases such as diabetes , Alzheimer disease  and cancers [10–12].
With the improvement in mass spectrometry technologies, O-GlcNAcylated proteins in postsynaptic density , murine synapse , mouse brain , rat brain , mouse embryonic stem cell , and Hela cells , have been identified in recent years. However, precise identification of O-GlcNAcylation sites remains to be a challenge due to its dynamic characteristics . Due to an interest to better identify O-GlcNAcylation sites and reduce experimental efforts, computational prediction of site motifs and O-GlcNAcylation sites have been considered. Previously, Gupta and Brunak have developed YinOYang - an O-GlcNAcylation prediction tool trained using 40 O-GlcNAcylation sites . Chen et al. have developed a similar tool incorporating structural topology to identify O-glycosylation sites on transmembrane proteins . The increase in experimentally identified O-GlcNAcylation sites motivates new developments including OGlcNAcScan, which was trained using 373 O-GlcNAcylation sites . More recently, a new prediction tool, O-GlcNAcPRED, has been proposed claiming to have better performance than the aforementioned tools . In the midst of these developments, Carage et al. have demonstrated that ensembles of support vector machine (SVM) classifiers could outperform single SVM classifier in terms of predicting protein glycosylation sites .
Although several computational methods have been developed to predict protein O-GlcNAcylation sites, there is currently no such tool that includes the investigation of potential OGT substrate motifs. It has been reported that molecular level investigation on OGT substrate specificity may aid in understanding how O-GlcNAc contributes to a diverse set of cellular processes . With this, we were motivated to characterize O-GlcNAcylation sites with the consideration of amino acid composition . In this study, we apply maximal dependence decomposition (MDD) to explore potential OGT substrate motifs for the experimentally verified O-GlcNAcylation sites. Statistically significant substrate motifs were further tested its prediction power by cross-validation evaluation and independent testing. A two-layered machine learning method, incorporating profile hidden Markov model (HMM) and support vector machine (SVM), was utilized to construct the predictive models. Furthermore, to facilitate the study of protein O-GlcNAcylation, MDD-identified substrate motifs were exploited to implement a web-based tool for identifying O-GlcNAcylation sites with corresponding OGT substrate motifs.
Material and methods
Construction of positive and negative training data sets
Due to the high-throughput mass spectrometry-based glycol-proteomics , several databases [22, 28–30] have been developed for cumulating experimentally verified O-GlcNAcylation sites by manually surveying the glycosylation-associated literatures. In this work, the data set for training the predictive model of O-GlcNAcylation sites was mainly extracted from dbOGAP , O-GlycBase , and UniProtKB . From dbOGAP, a total of 250 and 142 sites for O-GlcNAcylated serine (Ser) and threonine (Thr) on 172 proteins were collected. From O-GlycBase version 6.0, 24 sites for O-GlcNAcylated Ser and Thr from 17 proteins were collected. In UniProtKB, experimentally verified O-GlcNAcylation data were first filtered by removing entries annotated as "by similarity", "potential", "probable". This resulted to the collection of 66 and 51 sites for O-GlcNAcylated Ser and Thr on 53 proteins. To avoid data redundancy, each data obtained from one database was compared to the data obtained from the other databases based on its O-GlcNAcylated site position and the UniProtKB accession number utilized by all three databases. Redundancy was removed by retaining only one record in the event of finding multiple records of the same site position and accession number. After the removal of redundant data, we have obtained 261 and 149 non-redundant sites for O-GlcNAcylated Ser and Thr on 176 proteins.
Data statistics of positive and negative training data.
Number of O-GlcNAcylated sites (Positive data)
Number of non-O-GlcNAcylated sites (Negative data)
Number of non-O-GlcNAcylated sites (Balanced negative data)
Detection of OGT substrate motifs
where X mn represented the number of sequences having amino acids from group m in position A i and amino acids from group n in position A j , for each pair (A i , A j ) with i≠j. E mn is calculated as , where X mR = X m1 + ...+X m5 , X Cn = X 1n + ...+X 5n , and X denotes the total number of sequences. If a strong dependence is detected (defined as that the chi-square value was larger than 34.3, corresponding to a cutoff level of P = 0.005 with 16 degrees of freedom) between two positions, then the process is continued as described . Moreover, a minimum cluster size is set when applying MDD to cluster the sequences in the positive training data. If the data size of a subgroup was less than the given parameter, the subgroup will not be divided any further. For this study, MDD was executed using various values in order to obtain an optimal minimum cluster size.
Construction of two-layered prediction model
In second layer, a binary SVM classifier is trained using the bit scores of profile HMMs. Based on binary classification, SVMs map the input samples into a higher dimensional space using a kernel function. It then finds a hyper-plane that discriminates between the two classes with maximal margin and minimal error. For this study, we employed a public SVM library, LIBSVM , to generate the second-layered model from the bit scores of positive and negative training data. The radial basis function (RBF) was used as the kernel function of the SVM. The LIBSVM library is able to produce a probability ranging from 0 to 1 for each prediction; in default, a probability value higher than 0.5 is defined as a positive instance. In order to avoid a biased prediction performance, the negative training data was balanced with the positive training data. To select a representative set of negative data, K-means clustering [36, 41] was employed with reference to previous PTM prediction methods [42–47]. This resulted in an equal number of positive and negative sequence fragments for the training data (Table 1).
Five-fold cross validation and performance evaluation
Five-fold cross validation was performed in order to evaluate the predictive performance of each model using various parameters. For this process, the training data is divided into five groups by splitting each dataset into approximately equal sized subgroups where one subgroup is regarded as the test set while the remaining four subgroups are regarded as the training set. This process is repeated five times with each subgroup being used as a test set once . The following measures were used to gauge the average predictive performance of the trained models: Sensitivity (Sn) = TP / (TP+FN), Specificity (Sp) = TN / (TN+FP), Accuracy (Acc) = (TP + TN) / (TP+FP+TN+FN), and Matthews Correlation Coefficient , where TP, TN, FP and FN represent the numbers of true positives, true negatives, false positives and false negatives, respectively. After thirty rounds of cross-validation process, average Sn, Sp, Acc and MCC values were calculated for each model. The predictive model with the best average performance was then selected for further evaluation by independent testing dataset.
Construction of independent testing data set
In order to address a potential overestimation of the predictive performance of the models due to over-fitting, an independent test was carried out. For this analysis, experimentally validated sequences obtained from PhosphoSitePlus  were used as independent testing data. A total of 779 and 582 experimentally verified sites for O-GlcNAcylated Ser and Thr on 542 proteins were obtained from PhosphoSitePlus. Similar to the construction of positive training set, the sequence fragments centered on O-GlcNAcylated Ser and Thr residues are extracted using 11-mer window length. Additionally, O-GlcNAcylated sequence fragments homologous to the positive training data were removed in order to generate a non-homologous independent testing data. As a result, a total of 956 sequence fragments, consisting of 522 and 434 O-GlcNAcylated Ser and Thr residues, respectively, were regarded as the positive data for independent testing. On the other hand, sequence fragments centered on non-O-GlcNAcylated Ser and Thr residues were regarded as negative data for independent testing. Upon removing homologous data, a total of 60976 sequence fragments (38682 and 22294 non-O-GlcNAcylated Ser and Thr residues) were collected for the negative testing data.
Results and discussion
Amino acids composition of O-GlcNAcylation sites
Substrate site motifs of O-GlcNAc transferases
Predictive performance of the identified substrate motifs
Five-fold cross validation results on profile HMMs learned from all data and seven MDD-clustered subgroups.
Number of positive data
Number of negative data
Single HMM with all data
HMM with OGT1
HMM with OGT2
HMM with OGT3
HMM with OGT4
HMM with OGT5
HMM with OGT6
HMM with OGT7
MDD-clustered HMMs (Combined 7 OGT HMMs)
Two-layered model (7 HMMs + 1 SVM)
With reference to a previous work applying two-layered SVMs on the prediction of viral phosphorylation sites , this work further combined seven profile HMMs (first layer) and one SVM (second layer) into a two-layered prediction model, which provides a better performance than the combination of seven OGT HMMs (MDD-clustered HMMs). The two-layered prediction model yielded a sensitivity of 85.4%, a specificity of 84.1%, an accuracy of 0.84.7%, and an MCC value of 0.695. In this investigation, the model providing best performance was further evaluated by independent testing set.
Independent testing and comparison with other prediction tools
The comparison of independent testing results between our methods and other three O-GlcNAcylation prediction tools.
Single HMM with all data
(7 OGT HMMs)
(7 HMMs + 1 SVM)
To further demonstrate the effectiveness of our method, the independent testing set was used to compare the two-layered model with three popular O-GlcNAcylation site prediction tools, YinOYang, O-GlcNAcScan, and O-GlcNAcPRED. Table 3 indicated that the prediction power yielded by our two-layered model was superior to that by other three prediction tools. By using default threshold value (0.5), YinOYang yielded a sensitivity of 46.97%, a specificity of 83.01%, an accuracy of 82.46%, and an MCC value of 0.097. O-GlcNAcScan achieved a sensitivity of 42.99%, a specificity of 84.00%, an accuracy of 83.37%, and an MCC value of 0.089. O-GlcNAcPRED provided a lowest independent testing performance: 57.95% sensitivity, 63.00% specificity, 62.92% accuracy, and 0.053 MCC value. This independent testing indicated that the two-layered model could provide balanced sensitivity and specificity for such unbalanced positive and negative datasets. The proposed method also provided comparable accuracy with that analyzed by O-GlcNAcScan. Overall, as presented in Figure S1 (Additional file 3), the proposed method outperformed the three prediction tools.
Web-based system for the identification of O-GlcNAcylation sites
This study presents a novel scheme to identify potential substrate specificity of O-GlcNAc transferase based on a set of experimentally verified O-GlcNAcylation sites. We have demonstrated the utility of MDD clustering method in the characterization of substrate motifs of O-GlcNAcylation sites. Additionally, the proposed pipeline includes the effectiveness of the identified MDD-detected short linear motifs to predict O-GlcNAcylated sites. A five-fold cross-validation evaluation showed the power of MDD-identified substrate motifs in the prediction of O-GlcNAcylated sites. Moreover, the two-layered model combining seven profile HMMs and one SVM could provide the best performance. The two-layered model has been used to implement an online system, OGTSite, for an effective identification of protein O-GlcNAcylation sites. By identifying potential O-GlcNAcylation sites using the proposed method, we will be providing a reliable lead to the scientific community to minimize costs and effort for experimentally verifying actual O-GlcNAcylation sites. It should be noted that the proposed method could also be extended to include more meaningful substrate motifs by further acquiring experimentally verified O-GlcNAcylation sites. Additionally, a more abundant set of experimentally verified O-GlcNAcylation sites with protein tertiary structure information could be used to strengthen site prediction capabilities .
The proposed method is implemented as a web-based resource, which is now freely available to all interested users at http://csb.cse.yzu.edu.tw/OGTSite/. All of the dataset used in this work is also available for download in the website.
The authors sincerely appreciate the Ministry of Science and Technology (MOST) of Taiwan for financially supporting this research under contract number of MOST 103-2221-E-155-020-MY3, MOST 103-2633-E-155-002, and MOST 104-2221-E-155-036-MY2.
Publication charge for this work was funded by MOST grant under contract number of MOST 103-2221-E-155-020-MY3 and MOST 104-2221-E-155-036-MY2 to TYL.
This article has been published as part of BMC Bioinformatics Volume 16 Supplement 18, 2015: Joint 26th Genome Informatics Workshop and 14th International Conference on Bioinformatics: Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/16/S18.
- Hart GW, Housley MP, Slawson C: Cycling of O-linked beta-N-acetylglucosamine on nucleocytoplasmic proteins. Nature. 2007, 446 (7139): 1017-1022.PubMedView ArticleGoogle Scholar
- Comer FI, Hart GW: O-GlcNAc and the control of gene expression. Biochim Biophys Acta. 1999, 1473 (1): 161-171.PubMedView ArticleGoogle Scholar
- Ogawa M, Furukawa K, Okajima T: Extracellular O-linked beta-N-acetylglucosamine: Its biology and relationship to human disease. World J Biol Chem. 2014, 5 (2): 224-230.PubMedPubMed CentralGoogle Scholar
- Sakaidani Y, Nomura T, Matsuura A, Ito M, Suzuki E, Murakami K, Nadano D, Matsuda T, Furukawa K, Okajima T: O-linked-N-acetylglucosamine on extracellular protein domains mediates epithelial cell-matrix interactions. Nat Commun. 2011, 2: 583-PubMedView ArticleGoogle Scholar
- Delporte A, De Zaeytijd J, De Storme N, Azmi A, Geelen D, Smagghe G, Guisez Y, Van Damme EJ: Cell cycle-dependent O-GlcNAc modification of tobacco histones and their interaction with the tobacco lectin. Plant Physiol Biochem. 2014, 83: 151-158.PubMedView ArticleGoogle Scholar
- Ferrer CM, Reginato MJ: Cancer metabolism: cross talk between signaling and O-GlcNAcylation. Methods Mol Biol. 2014, 1176: 73-88.PubMedView ArticleGoogle Scholar
- Jozwiak P, Forma E, Brys M, Krzeslak A: O-GlcNAcylation and Metabolic Reprograming in Cancer. Front Endocrinol (Lausanne). 2014, 5: 145-Google Scholar
- McClain DA, Crook ED: Hexosamines and insulin resistance. Diabetes. 1996, 45 (8): 1003-1009.PubMedView ArticleGoogle Scholar
- Liu F, Iqbal K, Grundke-Iqbal I, Hart GW, Gong CX: O-GlcNAcylation regulates phosphorylation of tau: a mechanism involved in Alzheimer's disease. Proc Natl Acad Sci USA. 2004, 101 (29): 10804-10809.PubMedView ArticlePubMed CentralGoogle Scholar
- Mi W, Gu Y, Han C, Liu H, Fan Q, Zhang X, Cong Q, Yu W: O-GlcNAcylation is a novel regulator of lung and colon cancer malignancy. Biochim Biophys Acta. 2011, 1812 (4): 514-519.PubMedView ArticleGoogle Scholar
- Fardini Y, Dehennaut V, Lefebvre T, Issad T: O-GlcNAcylation: A New Cancer Hallmark?. Front Endocrinol (Lausanne). 2013, 4: 99-Google Scholar
- Huang X, Pan Q, Sun D, Chen W, Shen A, Huang M, Ding J, Geng M: O-GlcNAcylation of cofilin promotes breast cancer cell invasion. J Biol Chem. 2013, 288 (51): 36418-36425.PubMedView ArticlePubMed CentralGoogle Scholar
- Vosseller K, Trinidad JC, Chalkley RJ, Specht CG, Thalhammer A, Lynn AJ, Snedecor JO, Guan S, Medzihradszky KF, Maltby DA, et al: O-linked N-acetylglucosamine proteomics of postsynaptic density preparations using lectin weak affinity chromatography and mass spectrometry. Mol Cell Proteomics. 2006, 5 (5): 923-934.PubMedView ArticleGoogle Scholar
- Trinidad JC, Barkan DT, Gulledge BF, Thalhammer A, Sali A, Schoepfer R, Burlingame AL: Global identification and characterization of both O-GlcNAcylation and phosphorylation at the murine synapse. Mol Cell Proteomics. 2012, 11 (8): 215-229.PubMedView ArticlePubMed CentralGoogle Scholar
- Alfaro JF, Gong CX, Monroe ME, Aldrich JT, Clauss TR, Purvine SO, Wang Z, Camp DG, Shabanowitz J, Stanley P, et al: Tandem mass spectrometry identifies many mouse brain O-GlcNAcylated proteins including EGF domain-specific O-GlcNAc transferase targets. Proc Natl Acad Sci USA. 2012, 109 (19): 7280-7285.PubMedView ArticlePubMed CentralGoogle Scholar
- Khidekel N, Ficarro SB, Clark PM, Bryan MC, Swaney DL, Rexach JE, Sun YE, Coon JJ, Peters EC, Hsieh-Wilson LC: Probing the dynamics of O-GlcNAc glycosylation in the brain using quantitative proteomics. Nat Chem Biol. 2007, 3 (6): 339-348.PubMedView ArticleGoogle Scholar
- Myers SA, Panning B, Burlingame AL: Polycomb repressive complex 2 is necessary for the normal site-specific O-GlcNAc distribution in mouse embryonic stem cells. Proc Natl Acad Sci USA. 2011, 108 (23): 9490-9495.PubMedView ArticlePubMed CentralGoogle Scholar
- Nandi A, Sprung R, Barma DK, Zhao Y, Kim SC, Falck JR: Global identification of O-GlcNAc-modified proteins. Anal Chem. 2006, 78 (2): 452-458.PubMedView ArticleGoogle Scholar
- Wang Z, Udeshi ND, O'Malley M, Shabanowitz J, Hunt DF, Hart GW: Enrichment and site mapping of O-linked N-acetylglucosamine by a combination of chemical/enzymatic tagging, photochemical cleavage, and electron transfer dissociation mass spectrometry. Mol Cell Proteomics. 2010, 9 (1): 153-160.PubMedView ArticlePubMed CentralGoogle Scholar
- Gupta R, Brunak S: Prediction of glycosylation across the human proteome and the correlation to protein function. Pac Symp Biocomput. 2002, 310-322.Google Scholar
- Chen SA, Lee TY, Ou YY: Incorporating significant amino acid pairs to identify O-linked glycosylation sites on transmembrane proteins and non-transmembrane proteins. BMC Bioinformatics. 2010, 11: 536-PubMedView ArticlePubMed CentralGoogle Scholar
- Wang J, Torii M, Liu H, Hart GW, Hu ZZ: dbOGAP - an integrated bioinformatics resource for protein O-GlcNAcylation. BMC Bioinformatics. 2011, 12: 91-PubMedView ArticlePubMed CentralGoogle Scholar
- Jia CZ, Liu T, Wang ZP: O-GlcNAcPRED: a sensitive predictor to capture protein O-GlcNAcylation sites. Mol Biosyst. 2013, 9 (11): 2909-2913.PubMedView ArticleGoogle Scholar
- Caragea C, Sinapov J, Silvescu A, Dobbs D, Honavar V: Glycosylation site prediction using ensembles of Support Vector Machine classifiers. BMC Bioinformatics. 2007, 8: 438-PubMedView ArticlePubMed CentralGoogle Scholar
- Vocadlo DJ: O-GlcNAc processing enzymes: catalytic mechanisms, substrate specificity, and enzyme regulation. Curr Opin Chem Biol. 2012, 16 (5-6): 488-497.PubMedView ArticleGoogle Scholar
- Wu HY, Lu CT, Kao HJ, Chen YJ, Chen YJ, Lee TY: Characterization and identification of protein O-GlcNAcylation sites with substrate specificity. BMC bioinformatics. 2014, 15 (Suppl 16): S1-View ArticleGoogle Scholar
- Wuhrer M, Catalina MI, Deelder AM, Hokke CH: Glycoproteomics based on tandem mass spectrometry of glycopeptides. J Chromatogr B Analyt Technol Biomed Life Sci. 2007, 849 (1-2): 115-128.PubMedView ArticleGoogle Scholar
- Lee TY, Huang HD, Hung JH, Huang HY, Yang YS, Wang TH: dbPTM: an information repository of protein post-translational modification. Nucleic Acids Res. 2006, 34 (Database): D622-627.PubMedView ArticlePubMed CentralGoogle Scholar
- Lu CT, Huang KY, Su MG, Lee TY, Bretana NA, Chang WC, Chen YJ, Huang HD: DbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications. Nucleic Acids Res. 2013, 41 (Database): D295-305.PubMedView ArticlePubMed CentralGoogle Scholar
- Su MG, Huang KY, Lu CT, Kao HJ, Chang YH, Lee TY: topPTM: a new module of dbPTM for identifying functional post-translational modifications in transmembrane proteins. Nucleic Acids Res. 2014, 42 (Database): D537-545.PubMedView ArticlePubMed CentralGoogle Scholar
- Gupta R, Birch H, Rapacki K, Brunak S, Hansen JE: O-GLYCBASE version 4.0: a revised database of O-glycosylated proteins. Nucleic Acids Res. 1999, 27 (1): 370-372.PubMedView ArticlePubMed CentralGoogle Scholar
- Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, et al: UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 2004, 32 (Database): D115-119.PubMedView ArticlePubMed CentralGoogle Scholar
- Huang HD, Lee TY, Tzeng SW, Wu LC, Horng JT, Tsou AP, Huang KT: Incorporating hidden Markov models for identifying protein kinase-specific phosphorylation sites. J Comput Chem. 2005, 26 (10): 1032-1041.PubMedView ArticleGoogle Scholar
- Huang HD, Lee TY, Tzeng SW, Horng JT: KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites. Nucleic Acids Res. 2005, 33 (Web Server): W226-229.PubMedView ArticlePubMed CentralGoogle Scholar
- Ma X, Liu P, Yan H, Sun H, Liu X, Zhou F, Li L, Chen Y, Muthana MM, Chen ×, et al: Substrate specificity provides insights into the sugar donor recognition mechanism of O-GlcNAc transferase (OGT). PLoS One. 2013, 8 (5): e63452-PubMedView ArticlePubMed CentralGoogle Scholar
- Lee TY, Lin ZQ, Hsieh SJ, Bretana NA, Lu CT: Exploiting maximal dependence decomposition to identify conserved motifs from a group of aligned signal sequences. Bioinformatics. 2011, 27 (13): 1780-1787.PubMedView ArticleGoogle Scholar
- Nguyen VN, Huang KY, Huang CH, Chang TH, Bretana N, Lai K, Weng J, Lee TY: Characterization and identification of ubiquitin conjugation sites with E3 ligase recognition specificities. BMC bioinformatics. 2015, 16 (Suppl 1): S1-PubMedView ArticlePubMed CentralGoogle Scholar
- Burge C, Karlin S: Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997, 268 (1): 78-94.PubMedView ArticleGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14 (9): 755-763.PubMedView ArticleGoogle Scholar
- Chang CC, Lin CJ: LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology. 2011, 2 (27): 1-27.View ArticleGoogle Scholar
- Shien DM, Lee TY, Chang WC, Hsu JB, Horng JT, Hsu PC, Wang TY, Huang HD: Incorporating structural characteristics for identification of protein methylation sites. J Comput Chem. 2009, 30 (9): 1532-1543.PubMedView ArticleGoogle Scholar
- Lee TY, Bretana NA, Lu CT: PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity. BMC Bioinformatics. 2011, 12: 261-PubMedView ArticlePubMed CentralGoogle Scholar
- Lee TY, Bo-Kai Hsu J, Chang WC, Huang HD: RegPhos: a system to explore the protein kinase-substrate phosphorylation network in humans. Nucleic Acids Res. 2011, 39 (Database): D777-787.PubMedView ArticlePubMed CentralGoogle Scholar
- Xue Y, Ren J, Gao X, Jin C, Wen L, Yao X: GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy. Mol Cell Proteomics. 2008, 7 (9): 1598-1608.PubMedView ArticlePubMed CentralGoogle Scholar
- Wong YH, Lee TY, Liang HK, Huang CM, Wang TY, Yang YH, Chu CH, Huang HD, Ko MT, Hwang JK: KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns. Nucleic Acids Res. 2007, 35 (Web Server): W588-594.PubMedView ArticlePubMed CentralGoogle Scholar
- Xue Y, Li A, Wang L, Feng H, Yao X: PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory. BMC Bioinformatics. 2006, 7: 163-PubMedView ArticlePubMed CentralGoogle Scholar
- Huang KY, Wu HY, Chen YJ, Lu CT, Su MG, Hsieh YC, Tsai CM, Lin KI, Huang HD, Lee TY, et al: RegPhos 2.0: an updated resource to explore protein kinase-substrate phosphorylation networks in mammals. Database : the journal of biological databases and curation. 2014, 2014: bau034-PubMedView ArticleGoogle Scholar
- Lu CT, Chen SA, Bretana NA, Cheng TH, Lee TY: Carboxylator: incorporating solvent-accessible surface area for identifying protein carboxylation sites. J Comput Aided Mol Des. 2011, 25 (10): 987-995.PubMedView ArticleGoogle Scholar
- Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan M: PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res. 2012, 40 (Database): D261-270.PubMedView ArticlePubMed CentralGoogle Scholar
- Vacic V, Iakoucheva LM, Radivojac P: Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics. 2006, 22 (12): 1536-1537.PubMedView ArticleGoogle Scholar
- Kumar M, Gromiha MM, Raghava GP: Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins. 2008, 71 (1): 189-194.PubMedView ArticleGoogle Scholar
- Huang KY, Lu CT, Bretana N, Lee TY, Chang TH: ViralPhos: incorporating a recursively statistical method to predict phosphorylation sites on virus proteins. BMC Bioinformatics. 2013, 14 (Suppl 16): S10-PubMedView ArticlePubMed CentralGoogle Scholar
- Dias WB, Cheung WD, Wang Z, Hart GW: Regulation of calcium/calmodulin-dependent kinase IV by O-GlcNAc modification. J Biol Chem. 2009, 284 (32): 21327-21337.PubMedView ArticlePubMed CentralGoogle Scholar
- Su MG, Lee TY: Incorporating substrate sequence motifs and spatial amino acid composition to identify kinase-specific phosphorylation sites on protein three-dimensional structures. BMC Bioinformatics. 2013, 14 (Suppl 16): S2-PubMedView ArticlePubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.