Incorporating efficient radial basis function networks and significant amino acid pairs for predicting GTP binding sites in transport proteins

Le, Nguyen-Quoc-Khanh; Ou, Yu-Yen

doi:10.1186/s12859-016-1369-y

Volume 17 Supplement 19

15th International Conference On Bioinformatics (InCoB 2016): Bioinformatics

Research
Open access
Published: 22 December 2016

Incorporating efficient radial basis function networks and significant amino acid pairs for predicting GTP binding sites in transport proteins

Nguyen-Quoc-Khanh Le¹ &
Yu-Yen Ou¹

BMC Bioinformatics volume 17, Article number: 501 (2016) Cite this article

1692 Accesses
22 Citations
Metrics details

Abstract

Background

Guanonine-protein (G-protein) is known as molecular switches inside cells, and is very important in signals transmission from outside to inside cell. Especially in transport protein, most of G-proteins play an important role in membrane trafficking; necessary for transferring proteins and other molecules to a variety of destinations outside and inside of the cell. The function of membrane trafficking is controlled by G-proteins via Guanosine triphosphate (GTP) binding sites. The GTP binding sites active G-proteins initiated to membrane vesicles by interacting with specific effector proteins. Without the interaction from GTP binding sites, G-proteins could not be active in membrane trafficking and consequently cause many diseases, i.e., cancer, Parkinson… Thus it is very important to identify GTP binding sites in membrane trafficking, in particular, and in transport protein, in general.

Results

We developed the proposed model with a cross-validation and examined with an independent dataset. We achieved an accuracy of 95.6% for evaluating with cross-validation and 98.7% for examining the performance with the independent data set. For newly discovered transport protein sequences, our approach performed remarkably better than similar methods such as GTPBinder, NsitePred and TargetSOS. Moreover, a friendly web server was developed for identifying GTP binding sites in transport proteins available for all users.

Conclusions

We approached a computational technique using PSSM profiles and SAAPs for identifying GTP binding residues in transport proteins. When we included SAAPs into PSSM profiles, the predictive performance achieved a significant improvement in all measurement metrics. Furthermore, the proposed method could be a power tool for determining new proteins that belongs into GTP binding sites in transport proteins and can provide useful information for biologists.

Background

Transport proteins are proteins interacted in cell membrane to bind and carry atoms and small molecules within cells and throughout the body. There are many different kinds of transport proteins, they are critical to the growth and life of all living organisms. Membrane trafficking is the important process in transport protein, in which proteins and other macromolecules are transferred to various destinations inside and outside of the cell. This process uses membrane-bound vesicles and vesicular transporters as mediates transport to establish the absorption of molecules within a vesicle.

To enforce membrane trafficking, G-proteins are activated to be recruited to membrane vesicles by interacting with specific effector proteins. Figure 1 indicates the process of G-protein in membrane trafficking. As shown in Fig. 1, G-protein operates as a molecular switch between GDP-bound inactive state and GTP-bound active state. These two states are controlled by guanine nucleotide exchange factors (GEFs) and GTPase activating proteins (GAPs). If G-protein binds GTP, it will be activated and involved in membrane trafficking. A number of studies determined that a functional loss of GTP binding sites in membrane trafficking has been implicated in a variety of human diseases (i.e., neurodegenerative, cancer, Parkinson [1–4] … So there is a need to develop techniques such as computational techniques for identifying GTP binding sites in membrane trafficking (especially in transport protein).

Because GTP binding sites have an important role in many biological processes, many people attempted to focus on them to perform research. A prominent study conducted on GTP binding sites is made by Chauhan [5]. They used support vector machines to predict GTP interacting residues. Hu [6] approached a new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction, including GTP binding sites. Chen [7] predicted and analysed of GTP binding residues using sequence and sequence-derived structural descriptors. In these studies, they also provided the free web servers for evaluating their methods. Susan and Peter [5] tried to analyse the role of GTP-binding proteins in transport along the exocytic pathway. Moreover, Yang and Rosenwald [3] summarized the functions of the monomeric GTP-binding proteins in macroautophagy in Saccharomyces cerevisiae. For the role of GTP binding sites in membrane trafficking, there are many researchers focusing on this field. One of them is from Hutagalung and Novick [1], they have reviewed the mechanisms of Rabs interacting with membrane trafficking. From this research, we understand the process of membrane trafficking and GTP binding sites in membrane trafficking.

Membrane and transport proteins are very important biological functions; thus many researchers have conducted their studies on this issue. For instance, Saier [6] built a web server containing many information of transport proteins from various living organisms. Next, Le [7] tried to developed a web server to predict FAD interacting residues in electron transport proteins with favourable results. Furthermore, Ren [8] developed transportDB, which is a complete database for predicting cellular membrane transporters. Chen [9] presented computational techniques to conduct prediction and analysis of transport proteins. After this work, the transport proteins are classified into four major classes with different transporter targets.

The present work developed machine learning techniques to identify GTP binding sites in transport proteins according to PSSM profiles and SAAPs. The cross-validation dataset is applied for developing the model and then we evaluation the model performance with independent data set. The accuracy from cross-validation and independent data set reached 95.6 and 98.7%, respectively. When we compared with the previous works presented by Chauhan [10], Hu [11] and Chen [12], the performance from proposed method improved significantly in all measure metrics. The proposed method could also predict the number of GTP binding sites with high accuracy and provide useful information for biologists. This study also provided a web server for presenting our method and can help biologists understand the function of GTP binding sites in transport proteins.

Methods

This study focused on predicting GTP binding sites in transport proteins. Figure 2 shows a whole architecture of the study, which contains three stages: data collection, feature set extraction, and model evaluation. According to this architecture, we presented a precise model using PSSM profiles and SAAPs for predicting GTP binding sites in transport proteins. We described the details of all processes as follows.

Data collection

First of all, the data set about transport proteins is retrieved from the UniProt [13] database. In this collection step, we only selected sequences with the annotation “evidence at protein level” or “complete.” The detail query to retrieve transport proteins from UniProt is shown as follows:

(annotation:(type:location AND membrane) AND existence:“evidence at protein level’ AND fragment:no AND reviewed:yes) AND (keyword: transport OR go: transport)

After this step, 8772 transport proteins were collected. Next, we used the annotations from UniProt to collect GTP interacting residues in this data set. Note that in this step, we did not choose any GTP binding sites by similarity or by potential, we only choose GTP binding sites by experimental. After that we collected data on only 57 GTP binding proteins. To prevent overfitting in our model, we need to remove the similarity sequences inside the data set. We used BLAST [14] to perform this action, with sequence similarity of 40%. The number of transport proteins after remove redundant data is 22 proteins, and we used these 22 proteins as our final data set. We can see in Table 1, the 22 GTP binding proteins contain 364 GTP binding residues and 10434 non-GTP binding residues.

Table 1 All 22 GTP binding proteins using in the proposed study

Full size table

To build a model with high accuracy and avoid overfitting, we need to separate the data set into the cross-validation and independent data set. The proposed model will be fitted with the cross-validation data, and evaluated via the independent data set. The details of all data we used in this study are shown in Table 2. The number of training and testing dataset is chosen to have the balance positive data between each set. Finally, we used four GTP binding proteins in the transport protein (containing 52 GTP binding sites and 1710 non-GTP binding sites) as the independent data set. On the other hand, 18 GTP proteins (containing 312 GTP binding sites and 8774 non-GTP binding sites) contained in the cross-validation data set.

Table 2 The details of all 22 GTP binding proteins separated into independent dataset and cross-validation dataset

Full size table

Sequence information

In many problems in predicting the secondary structure of proteins, sequence information is one of the first choice for researcher [15, 16]. This feature set used two dimension matrices with values represented 20 amino acid sequences. We computed all values of amino acids inside matrix and input them as a feature set. There are many types of matrix for performing sequence information. In this study, we applied three types of matrix, namely BINARY, PAM250 [16] and BLOSUM62 [17].

Position specific scoring matrices profiles

PSSM is a common matrix in biology field to represent the sequences as motifs [18]. This matrix contains many score values represented for all amino acid in the original sequences. The row of PSSM shows the 20 amino acids and the column shows the original sequence of amino acids [19]. In several years, the PSSM has extensively been considered a trademark for representing the protein sequences. To identify protein sequences, the PSSM is proved better than the sequence information because it included values for full sequence at correct amino acid position. Many problems in bioinformatics, i.e., secondary protein structure used the PSSM and get the favourable results.

In this study, the PSSM profiles are generated from BLAST [14] and the non-redundant protein database. After this step, we retrieved the information from the PSSM profiles according to amino acids and their positions. The window size 19 also applied in this step to generate feature sets. Because the number of amino acid is 20, thus we have the matrix size 19 * 20 = 380 values. This matrix value should be converted into one vector and we extracted them for features. Finally we need to perform last step to scale data with the range from 0 to 1:

$$ \mathrm{F}\left(\mathrm{x}\right)=\frac{1}{1+ \exp \left(\hbox{-} x\right)} $$

(1)

Significant amino acid pairs

To improve the predictive performance, we described SAAPs and combined with PSSM feature sets [7]. The SAAPs were generated from the cross-validation data set (22 proteins) to identify which pairs of amino acids appeared more frequency in this problem. To calculate the values for each amino acid pair surrounding the data set, we applied the formula:

$$ \mathrm{p}\hbox{-} \mathrm{valu}{\mathrm{e}}_k=\frac{\left(\underset{\mathrm{x}}{\mathrm{M}}\right)\left(\underset{\mathrm{n}\hbox{-} \mathrm{x}}{\mathrm{N}\hbox{-} \mathrm{M}}\right)}{\left(\underset{\mathrm{n}}{\mathrm{N}}\right)} $$

(2)

where N, M and (N-M) are the number of all proteins in the data sets, positive data sets, and negative data sets; n, x, and n-x are the number of sequences including a k ^th SAAP in the entire data set, positive data set, and negative data set. The detail method to compute all p-values from data sets is shown in Fig. 3.

We decided that each amino acid pair met significant level with p-value less than 0.030212. Thus there is much special information in these amino acid pairs and we could use them as an additional feature to identify GTP binding sites in transport proteins. To implement that, we added the selected SAAPs to the PSSM feature set in descending order and performed experiment. Finally, this study used 160 SAAPs as additional features combine to PSSM profiles for predicting GTP binding sites in transport proteins.

Radial basis function networks

For constructing RBF network, we developed the QuickRBF package [20] as a classifier. The architecture of RBF network is shown in Fig. 4. Moreover, we assigned a regular bandwidth of five for each kernel function is generated in the network. In this work, we selected the center data equal to the training data to get the best accuracy. Eventually, our classifier was used to discover GTP binding proteins in transport proteins to the output function value. We defined the details of the network structure and design in our previous article.[21].

In several bioinformatics and computational biology applications, RBF networks have been utilized in predicting cleavage sites in proteins [22], inter-residue contacts [23], and protein disorder [24]; moreover, they have been implemented for identifying β-barrel proteins [25], classifying transporters [26, 27], predicting O-linked glycosylation sites [28], FAD binding sites [7] and ubiquitin conjugation sites [29].

The output nodes in our RBF network determined with the expression as follows:

$$ {g}_j(x)={\displaystyle \sum_{i=1}^k{w}_{ji}\varphi \left(\left\Vert x-{\mu}_i\right\Vert; {\sigma}_i\right);} $$

(3)

where g _j(x) denotes the function corresponding to the j ^th output node and is a linear combination of k radial basis functions $ \varphi \left(\right) $ with center μ_i and bandwidth σ_i . Besides that, w _ji is the weight parameter for balancing data within the i^th hidden node and the j^th output node.

Performance evaluation

Sensitivity, specificity, accuracy, and MCC (Matthew’s correlation coefficient) were used to evaluate the predictive performance. TP, FP, TN, FN are true positives, false positives, true negatives, and false negatives, respectively.

Sensitivity represents the percentage of GTP binding sites predicted correctly.

$$ \mathrm{Sensitivity}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{F}\mathrm{N}} $$

(4)

Specificity represents the percentage of non-GTP binding sites predicted correctly.

$$ \mathrm{Specificity}=\frac{\mathrm{TN}}{\mathrm{TN}+\mathrm{F}\mathrm{P}} $$

(5)

Accuracy represents the percentage of all GTP and non-GTP binding sites predicted correctly.

$$ \mathrm{Accuracy}=\frac{\mathrm{TP}+\mathrm{T}\mathrm{N}}{\mathrm{TP}+\mathrm{F}\mathrm{P}+\mathrm{T}\mathrm{N}+\mathrm{F}\mathrm{N}} $$

(6)

MCC represents the quality of prediction and prevent the unbalance data in model. A model prediction is perfect whenever the MCC value comes to 1.

$$ \mathrm{M}\mathrm{C}\mathrm{C}=\frac{\mathrm{TP}\times \mathrm{T}\mathrm{N}\hbox{-} \mathrm{F}\mathrm{P}\times \mathrm{F}\mathrm{N}}{\sqrt{\left(\mathrm{T}\mathrm{P}+\mathrm{F}\mathrm{P}\right)\left(\mathrm{T}\mathrm{P}+\mathrm{F}\mathrm{N}\right)\left(\mathrm{T}\mathrm{N}+\mathrm{F}\mathrm{P}\right)\left(\mathrm{T}\mathrm{N}+\mathrm{F}\mathrm{N}\right)}} $$

(7)

Results and discussion

Composition of amino acid analysis

We calculated the occurrence frequency of all amino acids inside the dataset to analyse the composition of GTP binding sites and non-GTP binding sites in transport proteins. We can see the interaction in Fig. 5; highest occurrence frequency appeared with the amino acids G, K, S, and D. Therefore, these amino acids are the vital amino acids interacting with GTP binding sites in transport proteins. On the other hand, the amino acids L, S and D exceeded the low occurrence frequency in GTP binding sites in transport proteins.

Comparison of the predictive performance with different window sizes

The proposed model is developed using the cross-validation dataset with 18 GTP binding proteins (including 312 GTP binding sites and 8774 non-GTP binding sites) in transport proteins. We selected the window sizes ranging from 13 to 19 for constructing our model. The measurement prediction executed with PSSM method and QuickRBF classifier. As shown in Table 3, the result did not improve too much when changing the window size. The better result was from window size 19, with the sensitivity, specificity, accuracy, and MCC were approximately 83.7%, 96%, 95.6%, and 0.58 respectively. Therefore we selected the performance result with a window size of 19 to develop our GTP binding model.

Table 3 Predicting GTP binding sites in the transport proteins with different window sizes

Full size table

Figure 6 plots the sequence frequency logo using WebLogo [30], which is a web application for sequence logos generator. We have cut-off the sequence with the window size 19 to have comparison between all fragments. This figure indicates that among all positions, there exist many amino acid differences from GTP binding sites in transport proteins. For instance, the amino acids G, K, S, and D contained some differences at positions 0. Therefore, we can identify GTP binding sites according to these amino acid differences.

Comparison of the predictive performance with different feature sets

In this section, we performed the experiment for predicting GTP sites in transport proteins with different feature sets, including BINARY, BLOSUM62, PAM250, PSSM and SAAPs. We used both cross-validation and independent data set with window size 19 to execute prediction in this part. As shown in Table 4, the proposed method could perform better performance the other feature sets. We realized that the combination between SAAPs and PSSM profiles was favourable for developing the proposed work.

Table 4 Predicting GTP binding sites in the transport proteins with different feature sets

Full size table

ROC curve and AUC in predicting GTP binding sites in transport proteins

Receiver operating characteristic (ROC) and area under the curve (AUC) are also applied as a significance analysis of the presented results [31]. The ROC curve plots from true positive rate and false positive rate based on our prediction results. In machine learning area, the ROC curve and AUC are the important metrics to present the accuracy of the test [32]. The AUC value is calculated from the ROC curve to represent the accuracy range. If the AUC comes to 1, we can detemine that our method perform accurately. In this study, our study reached higher AUC than other classifiers (AUC = 0.99), and therefore we could confirm that our classifier present better than others with this problem (Fig. 7).

Comparison of predictive performance with different classifiers

In this section, some different classifiers are used in the cross-validation and independent data to have comparison with our method. There are many classifiers considered in this portion, i.e., kNN, RandomForest and LibSVM [33–35]. Table 5 shows the predictive performance from them, and our classifier performed well than the other classifiers. The sensitivity, specificity, accuracy, and MCC were respectively 83.7%, 96%, 95.6%, and 0.58 for cross-validation dataset. For independent dataset, the sensitivity, specificity, accuracy, and MCC were consequently 94.2%, 98.8%, 98.7%, and 0.81 Therefore we can use our classifier to present the proposed method to predict GTP binding sites in transport proteins.

Table 5 The comparison of predicting GTP binding sites in the transport proteins between different classifiers

Full size table

Comparison of the proposed method with other published methods

We compared the predictive performance of our method with the previous studies from GTPBinder [10], NsitePred [12] and TargetSOS [11]. In the first comparison, we used the cross-validation and the independent dataset (including four transport proteins which contain 52 GTP binding sites and 1710 non-GTP binding sites) to perform the experiments with these methods. Table 6 shows that our proposed method performed remarkly better than the others in both cross-validation and independent data set.

Table 6 Predicting GTP binding sites in the transport proteins with other studies

Full size table

Moreover, the second comparison is the predictive performance from two new discovered proteins after 2010, namely Q9H0F7 and A8INQ0. We applied our model in predicting these two proteins and compared the results with two studies GTPBinder [10] and TargetSOS approach [11]. The comparison performance in Table 7 indicatied that the proposed method improved better than the performance from GTPBinder method [10] and TargetSOS method [11].

Table 7 Predicting GTP binding sites in two newly discovered proteins

Full size table

Identification of new GTP binding sites in transport protein with the proposed method

We used our model in prediction of GTP binding sites in a set of human transport proteins, which retrieved from Swiss-Prot [36]. The BLAST also used in this section to remove redundant proteins with more than 30% similarity, and then remaining 100 proteins (including 21985 amino acids) were used to evaluate the model. After performing prediction with our approach, we found 938 GTP binding sites from this dataset. Therefore our model can be used to discover some new GTP binding sites in transport proteins with high accuracy.

Web server for predicting GTP binding sites in transport protein

We developed the web server namely GTP-TP-RBF for representing our method in this study. GTP-TP-RBF was built from QuickRBF package to predict GTP binding sites in transport proteins according to PSSM profiles and SAAPs. The user can access our web server at http://140.138.155.226/~kahn/gtp-tp/. The web interface contains many friendly functions, in which users can understand the process and submit sequences easily. Moreover, we optimized the server performance to avoid the time consumption from submitting until getting results. Finally we tried to make a good display in the result page, thus users can retrieve the information easily. According to this web server, biologists can understand our presented work and discover new GTP binding sites in transport proteins.

Conclusions

Because GTP binding sites have an important role in the process of transporters, predicting them is an important issue in bioinformatics and computational biology. This work presented an approach using radial basis function networks according to PSSM profiles and SAAPs for identifying GTP binding sites in transport proteins. We used the cross-validation to develop model and achieved the accuracy 98.7% when evaluating the performance with independent data set. Our predictive performance improved the accuracy by 18% and MCC by 0.55 when we compared with the general GTPBinder approach of Chauhan [10], Hu [11] and Chen [12]. Moreover, we have already provided a web server for presenting our method. Users can use our web server as an effective tool to understand the functions of GTP binding sites in transport proteins. They can identify some new GTP binding sites in transport proteins to serve their research. We expert that the contributions of this study will provide biologists many information for further research and enrich the bioinformatics field in future.

Abbreviations

AUC:: Area under curve
BLOSUM:: Block substitution matrix
GAP:: GTPase activating proteins
GDP:: Guanosine diphosphate
GEF:: Guanine nucleotide exchange factors
GTP:: Guanosine triphosphate
MCC:: Matthew’s correlation coefficient
PAM:: Percent accepted mutation
PSSM:: Position specific scoring matrix
RBF:: Radial basis function
ROC:: Receiver operating characteristic
SAAP:: Significant amino acid pairs

References

Hutagalung AH, Novick PJ. Role of Rab GTPases in membrane traffic and cell physiology. Physiol Rev. 2011;91(1):119–49.
Article CAS PubMed PubMed Central Google Scholar
Zhang M, et al. Rab7: roles in membrane trafficking and disease. Biosci Rep. 2009;29(3):193–209.
Article PubMed Google Scholar
Yang S, Rosenwald AG. The roles of monomeric GTP-binding proteins in macroautophagy in Saccharomyces cerevisiae. Int J Mol Sci. 2014;15(10):18084–101.
Article PubMed PubMed Central Google Scholar
Droppelmann CA, et al. The emerging role of guanine nucleotide exchange factors in ALS and other neurodegenerative diseases. Front Cell Neurosci. 2014;8:282.
Article PubMed PubMed Central Google Scholar
Ferro-Novick S, Novick P. The role of GTP-binding proteins in transport along the exocytic pathway. Annu Rev Cell Biol. 1993;9(1):575–99.
Article CAS PubMed Google Scholar
Saier MH, Tran CV, Barabote RD. TCDB: the transporter classification database for membrane transport protein analyses and information. Nucleic Acids Res. 2006;34 suppl 1:D181–6.
Article CAS PubMed Google Scholar
Le NQ, Ou YY. Prediction of FAD binding sites in electron transport proteins according to efficient radial basis function networks and significant amino acid pairs. BMC Bioinformatics. 2016;17:298.
Article PubMed PubMed Central Google Scholar
Ren Q, Kang KH, Paulsen IT. TransportDB: a relational database of cellular membrane transport systems. Nucleic Acids Res. 2004;32 suppl 1:D284–8.
Article CAS PubMed PubMed Central Google Scholar
Chen S-A, et al. Prediction of transporter targets using efficient RBF networks with PSSM profiles and biochemical properties. Bioinformatics. 2011;27(15):2062–7.
Article CAS PubMed Google Scholar
Chauhan JS, Mishra NK, Raghava GP. Prediction of GTP interacting residues, dipeptides and tripeptides in a protein from its evolutionary information. BMC Bioinformatics. 2010;11(1):301.
Article PubMed PubMed Central Google Scholar
Hu J, et al. A new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction. PloS one. 2014;9(9):e107676.
Article PubMed PubMed Central Google Scholar
Chen K, Mizianty MJ, Kurgan L. Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics. 2012;28(3):331–41.
Article PubMed Google Scholar
Bairoch A, et al. The universal protein resource (UniProt). Nucleic Acids Res. 2005;33 suppl 1:D154–9.
CAS PubMed Google Scholar
Johnson M, et al. NCBI BLAST: a better web interface. Nucleic Acids Res. 2008;36 suppl 2:W5–9.
Article CAS PubMed PubMed Central Google Scholar
Mullis KB, Faloona FA. [21] Specific synthesis of DNA in vitro via a polymerase-catalyzed chain reaction. Methods Enzymol. 1987;155:335–50.
Article CAS PubMed Google Scholar
Dayhoff MO, Schwartz RM. A model of evolutionary change in proteins in Atlas of protein sequence and structure. Maryland: National Biomedical Research Foundation; 1978.
Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci. 1992;89(22):10915–9.
Article CAS PubMed PubMed Central Google Scholar
Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999;292(2):195–202.
Article CAS PubMed Google Scholar
Lin H, et al. High prevalence of genital human papillomavirus type 52 and 58 infection in women attending gynecologic practitioners in South Taiwan. Gynecol Oncol. 2006;101(1):40–5.
Article PubMed Google Scholar
Ou YY. QuickRBF: a package for efficient radial basis function networks. QuickRBF software available at http://csie.org/~yien/quickrbf/. 2005.
Ou Y, Oyang Y, Chen C. A novel radial basis function network classifier with centers set by hierarchical clustering. 2005.
Google Scholar
Yang ZR, Thomson R. Bio-basis function neural network for prediction of protease cleavage sites in proteins. IEEE Transactions on Neural Networks. 2005;16(1):263–74.
Article CAS PubMed Google Scholar
Zhang GZ, Huang DS. Prediction of inter-residue contacts map based on genetic algorithm optimized radial basis function neural network and binary input encoding scheme. J Comput Aided Mol Des. 2004;18(12):797–810.
Article CAS PubMed Google Scholar
Su CT, Chen CY, Ou YY. Protein disorder prediction by condensed PSSM considering propensity for order or disorder. BMC Bioinformatics. 2006;7.
Ou YY, et al. TMBETADISC-RBF: Discrimination of beta-barrel membrane proteins using RBF networks and PSSM profiles. Comput Biol Chem. 2008;32(3):227–31.
Article CAS PubMed Google Scholar
Ou YY, Chen SA, Gromiha MM. Classification of transporters using efficient radial basis function networks with position‐specific scoring matrices and biochemical properties. Proteins. 2010;78(7):1789–97.
CAS PubMed Google Scholar
Ou YY, Chen SA. Using efficient RBF networks to classify transport proteins based on PSSM profiles and biochemical properties. In International Work-Conference on Artificial Neural Networks. Berlin: Springer; 2009. pp. 869–76.
Chen SA, Lee TY, Ou YY. Incorporating significant amino acid pairs to identify O-linked glycosylation sites on transmembrane proteins and non-transmembrane proteins. BMC Bioinformatics. 2010;11.
Lee TY, et al. Incorporating distant sequence features and radial basis function networks to identify ubiquitin conjugation sites. PloS one. 2011;6(3):e17331.
Article CAS PubMed PubMed Central Google Scholar
Crooks GE, et al. WebLogo: a sequence logo generator. Genome Res. 2004;14(6):1188–90.
Article CAS PubMed PubMed Central Google Scholar
Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 1997;30(7):1145–59.
Article Google Scholar
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36.
Article CAS PubMed Google Scholar
Hall M, et al. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter. 2009;11(1):10–8.
Article Google Scholar
Frank E, et al. Data mining in bioinformatics using Weka. Bioinformatics. 2004;20(15):2479–81.
Article CAS PubMed Google Scholar
Chang C-C, Lin C-J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST). 2011;2(3):27.
Google Scholar
Boeckmann B, et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003;31(1):365–70.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The methodology presented in this study which has been published in our previous paper “Le N.Q.K. and Y.Y. Ou, Prediction of FAD binding sites in electron transport proteins according to efficient radial basis function networks and significant amino acid pairs. BMC Bioinformatics, 2016. 17: p. 298.” and we used the same methodology in a different context. The BMC Bioinformatics editorial office agreed that for clarity of this supplement article, and allowed us to recycle some of our previously published methodology description.

Declarations

This article has been published as part of BMC Bioinformatics Volume 17 Supplement 19, 2016. 15th International Conference On Bioinformatics (INCOB 2016): bioinformatics. The full contents of the supplement are available online https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-17-supplement-19

Funding

Publication charges for this article were funded by Ministry of Science and Technology, Taiwan, R.O.C. under Grant no. MOST 104-2221-E-155-037 and 105-2221-E-155-065.

Availability of data and materials

The data sets supporting the results of this article are included within the article.

Authors’ contributions

Analyzed the data: YYO NQKL. Designed and performed the experiments: YYO NQKL. Wrote the paper: YYO NQKL. Read and approved the final version: YYO NQKL. Both authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Not applicable.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan
Nguyen-Quoc-Khanh Le & Yu-Yen Ou

Authors

Nguyen-Quoc-Khanh Le
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Yen Ou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Nguyen-Quoc-Khanh Le or Yu-Yen Ou.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Le, NQK., Ou, YY. Incorporating efficient radial basis function networks and significant amino acid pairs for predicting GTP binding sites in transport proteins. BMC Bioinformatics 17 (Suppl 19), 501 (2016). https://doi.org/10.1186/s12859-016-1369-y

Download citation

Published: 22 December 2016
DOI: https://doi.org/10.1186/s12859-016-1369-y

15th International Conference On Bioinformatics (InCoB 2016): Bioinformatics

Incorporating efficient radial basis function networks and significant amino acid pairs for predicting GTP binding sites in transport proteins

Abstract

Background

Results

Conclusions

Background

Methods

Data collection

Sequence information

Position specific scoring matrices profiles

Significant amino acid pairs

Radial basis function networks

Performance evaluation

Results and discussion

Composition of amino acid analysis

Comparison of the predictive performance with different window sizes

Comparison of the predictive performance with different feature sets

ROC curve and AUC in predicting GTP binding sites in transport proteins

Comparison of predictive performance with different classifiers

Comparison of the proposed method with other published methods

Identification of new GTP binding sites in transport protein with the proposed method

Web server for predicting GTP binding sites in transport protein

Conclusions

Abbreviations

References

Acknowledgements

Declarations

Funding

Availability of data and materials

Authors’ contributions

Competing interests

Consent for publication

Ethics approval and consent to participate

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us