Prediction of FAD binding sites in electron transport proteins according to efficient radial basis function networks and significant amino acid pairs
© The Author(s). 2016
Received: 13 November 2015
Accepted: 22 July 2016
Published: 30 July 2016
Cellular respiration is a catabolic pathway for producing adenosine triphosphate (ATP) and is the most efficient process through which cells harvest energy from consumed food. When cells undergo cellular respiration, they require a pathway to keep and transfer electrons (i.e., the electron transport chain). Due to oxidation-reduction reactions, the electron transport chain produces a transmembrane proton electrochemical gradient. In case protons flow back through this membrane, this mechanical energy is converted into chemical energy by ATP synthase. The convert process is involved in producing ATP which provides energy in a lot of cellular processes. In the electron transport chain process, flavin adenine dinucleotide (FAD) is one of the most vital molecules for carrying and transferring electrons. Therefore, predicting FAD binding sites in the electron transport chain is vital for helping biologists understand the electron transport chain process and energy production in cells.
We used an independent data set to evaluate the performance of the proposed method, which had an accuracy of 69.84 %. We compared the performance of the proposed method in analyzing two newly discovered electron transport protein sequences with that of the general FAD binding predictor presented by Mishra and Raghava and determined that the accuracy of the proposed method improved by 9–45 % and its Matthew’s correlation coefficient was 0.14–0.5. Furthermore, the proposed method enabled reducing the number of false positives significantly and can provide useful information for biologists.
We developed a method that is based on PSSM profiles and SAAPs for identifying FAD binding sites in newly discovered electron transport protein sequences. This approach achieved a significant improvement after we added SAAPs to PSSM features to analyze FAD binding proteins in the electron transport chain. The proposed method can serve as an effective tool for predicting FAD binding sites in electron transport proteins and can help biologists understand the functions of the electron transport chain, particularly those of FAD binding sites. We also developed a web server which identifies FAD binding sites in electron transporters available for academics.
As cells undergo cellular respiration, they require a pathway to store and transport electrons (i.e., the electron transport chain). The electron transport chain components are organized into four complexes (Complex I, Complex II, Complex III, and Complex IV) and ATP synthase (which can be called Complex V). The process of electron transport chain starts from the mitochondrial inner membrane, which electrons transfer from Complex I with nicotinamide adenine dinucleotide (NADH) and succinate (Complex II) to oxygen. In the next step, a carrier (coenzyme Q) that embeds in the cell membrane receives electrons from complex I and pass to Complex III (cytochrome b, c1 complex). Electrons bypass Complex II, the succinate dehydrogenase complex, which is an independent starting stage and is not a component of the NADH pathway. The pathway from Complex III leads to cytochrome c then moves to Complex IV (cytochrome oxidase complex). In the final step, ATP synthase is active by the proton electrochemical to utilize the flow of H+ to generate ATP, which provides energy in numerous cellular processes.
Flavin adenine dinucleotide is one of the most vital molecules in the electron transport chain. It is mainly in Complex II, which is an enzyme complex bound to the inner mitochondrial membrane of mammalian mitochondria and many bacterial cells. Regarding the reaction mechanism of Complex II, succinate is bound and a hydride is transferred to FAD to generate FADH2. After the electrons are derived from succinate oxidation through FAD, they tunnel along the [Fe-S] relay to the [3Fe-4S] cluster. These electrons are subsequently transferred to an awaiting ubiquinone molecule within the active site. The fundamental role of Complex II in the electron transfer chain of mitochondria renders it vital in most organisms, and removing Complex II from the genome has been shown to be lethal at the embryonic stage in mice.
Predicting FAD binding sites in electron transporters is vital for helping biologists clearly understand the operating mechanisms of the electron transport chain and Complex II. In this study, we developed a method that is based on position specific scoring matrix (PSSM) profiles and significant amino acid pairs (SAAPs) for identifying FAD binding residues in electron transport proteins.
FAD binding sites have attracted the interest of numerous researchers because of their relevance in electron transport chains. Prominent studies conducted on FAD binding sites include those by Mishra and Raghava  and Fang . Mishra and Raghava  used support vector machines to predict FAD binding residues. They also developed a free web server for identifying FAD binding residues in specific sequences. Moreover, Fang  used evolutionary information to improve the prediction performance.
Numerous studies have also been conducted on transport proteins. For example, Saier  provided a web database containing the sequence, classification, structural, and evolutionary information of transport systems from various living organisms. Furthermore, Ren  presented transportDB, which is a comprehensive database of transporters and outer membrane channels. Chen  divided electron transport targets into four types of transport proteins to conduct prediction and analysis. After the prediction and analysis, Chen classified the transport proteins and determined the functions of each protein type in the transport protein. Ou  attempted to discriminate metal-binding sites in electron transport by using radial basis function networks (RBFNs).
The current study proposes an approach based on PSSM profiles and SAAPs for identifying FAD binding sites in electron transport proteins. We used a set of 55 FAD binding proteins as the training data set and six FAD binding proteins in electron transport proteins as an independent data set. We applied the independent data set to evaluate the performance of the proposed method, which demonstrated an accuracy of 69.84 %. Compared with the general FAD binding predictor developed by Mishra and Raghava, the proposed method exhibited a 9 %–45 % improvement in accuracy and Matthew’s correlation coefficient (MCC) of 0.14–0.5 when applied to two newly discovered electron transport protein sequences. The proposed method also reduces the number of false positives significantly and offers useful information for biologists. The proposed method can serve as an effective tool for predicting FAD binding sites in electron transport proteins and can help biologists understand electron transport chain functions, particularly those of FAD binding sites.
Statistics of all retrieved FAD binding proteins with FAD and non-FAD binding sites
Number of proteins
FAD binding sites
Non-FAD binding sites
FAD binding in electron transport
General FAD binding proteins
Details of all 61 FAD binding proteins with a UniProt ID in the present study (six FAD binding proteins in electron transport served as an independent data set)
Sequence information is one of the first features set in predicting the secondary structure of proteins [12, 13]. In this feature, each amino acid sequence is represented by a number 0 or 1, creating a binary matrix. From the binary matrix, the value for each amino acid can be calculated. For example, if the sequence of amino acids is ARNDCQEGHILKMFPSWYV and the value for amino acid N must be calculated, the third position is set to 1 and the others are set to 0. In this study, we also used two types of advance sequence information, namely PAM250 and BLOSUM62.
A percent accepted mutation (PAM)  matrix represents the elements involved in the conversion of amino acids into amino acids within a variable probability of evolutionary distance. A PAM matrix was created in the protein sequence alignment and various phylogenetic trees with the assumption that amino acids are amino acids and that each amino acid is substituted with another amino acid, to establish an acceptable point mutation matrix (accepted point mutation matrix).
The block substitution matrix (BLOSUM)  is used to assess differences in effectiveness between evolutions of protein sequence alignment methods. They are retrieved from the BLOCKS database, and some of the protein amino acid sequences are retained; the calculated relative amino acid is replaced by the calculated frequency and probability. A BLOSUM62 matrix is commonly collected in a database sequence BLOCKS with 62 % sequence similarity, and the sequence is then deduced from a score matrix.
PSSM is a matrix commonly used for representing motifs in biological sequences . It is a matrix of score values and provides a weighted match to any specific substring of fixed length. This matrix has one row for each letter of the alphabet and one column for each position in the pattern.
In recent years, the PSSM has widely been considered an indicator of the properties of protein sequences. The PSSM is used in determining the evolution of sequence information in a specific location as well as the amino acid replacement ratio to identify protein sequences; such sequences represent the original 20 amino acid types in the protein and are used to replace an amino acid with its degree of influence. The PSSM has been extensively used for predicting the secondary structure of proteins as well as subcellular locations and other biological information, and it has been reported to produce favorable results.
Significant amino acid pairs
A p-value less than 0.13 indicates that the amino acid pair surrounding FAD binding sites is significant. That is, numerous special features exist, with some features having a p-value less than 0.13. After we calculated the p-values for all amino acid pairs surrounding FAD binding sites with a window size of 17, we added the ranked SAAPs to the feature set in descending order. Finally, 38 SAAPs were added to the feature set of FAD binding sites in electron transport proteins.
Radial basis function networks
RBFN-based classifications have been used in several applications in bioinformatics to predict cleavage sites in proteins , interresidue contacts , and protein disorder ; furthermore, they have been applied for discriminating β-barrel proteins , classifying transporters [24, 25], identifying O-linked glycosylation sites , and identifying ubiquitin conjugation sites .
Assessment of predictive ability
We measured the predictive performance of the proposed method by using sensitivity, specificity, accuracy, and MCC metrics. TP, FP, TN, and FN represent true positive, false positive, true negative, and false negative, respectively.
Results and discussion
Amino acid composition analysis
Performance in predicting FAD binding sites in electron transport proteins by using various window sizes
Comparison of performance in identifying FAD binding sites in the electron transport chain with different window sizes
Performance in predicting FAD binding sites in electron transport proteins with different feature sets
Comparison of performance in identifying FAD binding sites in the electron transport chain with different feature sets
PSSM + F-score
PSSM + SAAPs
Significance analysis based on the proposed method
Performance in predicting FAD binding sites in electron transport proteins with different classifiers
Comparison of performance in identifying FAD binding sites in the electron transport chain with different classifiers
Leave-one-out analysis with six FAD binding proteins in electron transport chains
Comparison of performance in identifying FAD binding sites in the electron transport chain with PSSM and SAAPs
Comparison of the proposed method with another method
Comparison of performance in identifying FAD binding sites in two newly discovered proteins
Identification of new FAD binding sites in electron transport protein
In this part, we applied our method for prediction of FAD binding sites in electron transport human proteins. The testing dataset retrieved from Swiss-Prot , which is a famous protein database. After using BLAST to remove sequence similarity more than 30 %, the rest of dataset contained 100 proteins, which including 21985 amino acids. Then our model can found 1136 FAD binding sites from dataset. Thus our research can help biologists discover some new FAD binding sites in electron transport proteins.
Web server for predicting FAD binding sites in electron transport protein
The web server FAD-ETC.-RBF was built for presenting our method in this study. FAD-ETC.-RBF trained for the identification of FAD binding sites in electron transport proteins by using QuickRBF classification based on PSSM profiles and SAAPs. The web server can be access at http://22.214.171.124/~kahn/Bioinformatics/. We developed friendly web interface including many page menus that users can easily use to retrieve information and submit their sequences. Moreover, the users just wait for the short time to receive the prediction result because the performance of this server is especially fast. In the result page, users can easily check the results because the amino acids predicted were displayed as different colors. According to this web server, biologists can discover new FAD binding sites in electron transport protein to understand clearly the operating mechanism of electron transport chains.
Predicting FAD binding sites in electron transporters is vital in helping biologists clearly understand the operating mechanisms of electron transport chains and Complex II. In this study, we developed a method based on PSSM profiles and SAAPs for identifying FAD binding residues in electron transport proteins. We used the independent data set to evaluate the performance of the proposed method, which achieved an accuracy of 69.84 %. We compared the performance of the proposed method in analyzing two newly discovered electron transport protein sequences with that of the general FADPred approach of Mishra and Raghava. We observed that the accuracy of the proposed method improved by 9 %–45 % and its MCC was 0.14–0.5. The proposed method can serve as an effective tool for predicting FAD binding sites in electron transport proteins and can help biologists understand the functions of the electron transport chain, particularly those FAD binding sites. We also developed a web server for the method described in this paper.
The contributions of this study provide a basis for further research that can enrich the field. However, this study still has some limitations related to the small sample size and limited time. The number of suitable FAD binding proteins in electron transport chains was not sufficient, potentially affecting the performance of the proposed method. To create a more effective model, we must identify additional FAD binding proteins in electron transport proteins. Doing so can enable us to conduct a comparative study and enhance prediction performance.
ATP, adenosine triphosphate; AUC, area under curve; BLOSUM, block substitution matrix; FAD, flavin adenine dinucleotide; MCC, Matthew’s correlation coefficient; NADH, nicotinamide adenine dinucleotide; PAM, percent accepted mutation; PSSM, position specific scoring matrix; RBFN, radial basis function network; ROC, receiver operating characteristic; SAAPs, significant amino acid pairs
We would like to acknowledgement Department of Computer Science and Engineering, Yuan Ze University for supporting good condition to finish this study.
This research is partially supported by Ministry of Science and Technology, Taiwan, R.O.C. under Grant no. MOST 104-2221-E-155-037 and 105-2221-E-155-065.
Availability of data and material
The data sets supporting the results of this article are included within the article.
Analyzed the data: YYO NQKL. Designed and performed the experiments: YYO NQKL. Wrote the paper: YYO NQKL. Read and approved the final version YYO NQKL.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Mishra NK, Raghava GP. Prediction of FAD interacting residues in a protein from its primary sequence using evolutionary information. BMC Bioinformatics. 2010;11(1):1.View ArticleGoogle Scholar
- Fang C, Noguchi T, Yamana H Prediction of FAD Binding Residues with Combined Features from Primary Sequence. Int Proc Computer Sci Inf Technol. 34;47–153.Google Scholar
- Saier MH, Tran CV, Barabote RD. TCDB: the Transporter Classification Database for membrane transport protein analyses and information. Nucleic Acids Res. 2006;34 suppl 1:D181–6.View ArticlePubMedGoogle Scholar
- West AB, Moore DJ, Choi C, Andrabi SA, Li X, Dikeman D, Biskup S, Zhang Z, Lim K-L, Dawson VL. Parkinson’s disease-associated mutations in LRRK2 link enhanced GTP-binding and kinase activities to neuronal toxicity. Hum Mol Genet. 2007;16(2):223–32.View ArticlePubMedGoogle Scholar
- Chen S-A, Ou Y-Y, Lee T-Y, Gromiha MM. Prediction of transporter targets using efficient RBF networks with PSSM profiles and biochemical properties. Bioinformatics. 2011;27(15):2062–7.View ArticlePubMedGoogle Scholar
- Ou Y-Y, Chen S-A, Wu S-C. ETMB-RBF: discrimination of metal-binding sites in electron transporters based on RBF networks with PSSM profiles and significant amino acid pairs. PLoS One. 2013;8(2):e46572.View ArticlePubMedPubMed CentralGoogle Scholar
- Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M. The universal protein resource (UniProt). Nucleic Acids Res. 2005;33 suppl 1:D154–9.PubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Abola EE, Bernstein FC, Koetzle TF. The protein data bank. In neutrons in biology. Springer US; 1984. pp. 441–441.Google Scholar
- Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 2007;35 suppl 1:D301–3.View ArticlePubMedGoogle Scholar
- Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL. NCBI BLAST: a better web interface. Nucleic Acids Res. 2008;36 suppl 2:W5–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Rychlewski L, Li W, Jaroszewski L, Godzik A. Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci. 2000;9(2):232–41.View ArticlePubMedPubMed CentralGoogle Scholar
- Mullis KB, Faloona FA. Specific synthesis of DNA in vitro via a polymerase-catalyzed chain reaction. Methods Enzymol. 1987;155:335–50.View ArticlePubMedGoogle Scholar
- Dayhoff MO, Schwartz RM. A model of evolutionary change in proteins. In Atlas of protein sequence and structure. 1978.Google Scholar
- Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci. 1992;89(22):10915–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999;292(2):195–202.View ArticlePubMedGoogle Scholar
- Chen Y-W, Lin C-J. Combining SVMs with various feature selection strategies. In: Feature extraction. Springer; 2006. p. 315–24.Google Scholar
- Ou Y-Y. QuickRBF: a package for efficient radial basis function networks. 2005. QuickRBF software, available at http://csie.org/~yien/quickrbf/.Google Scholar
- Ou YY, Oyang YJ, Chen CY. A novel radial basis function network classifier with centers set by hierarchical clustering. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks. 2005;3. pp. 1383–1388. IEEE.Google Scholar
- Yang ZR, Thomson R. Bio-basis function neural network for prediction of protease cleavage sites in proteins. Ieee Transactions on Neural Networks. 2005;16(1):263–74. doi:10.1109/tnn.2004.836196.View ArticlePubMedGoogle Scholar
- Zhang GZ, Huang DS. Prediction of inter-residue contacts map based on genetic algorithm optimized radial basis function neural network and binary input encoding scheme. J Comput Aided Mol Des. 2004;18(12):797–810. doi:10.1007/s10822-005-0578-7.View ArticlePubMedGoogle Scholar
- Su CT, Chen CY, Ou YY. Protein disorder prediction by condensed PSSM considering propensity for order or disorder. BMC Bioinformatics. 2006;7(1):1.View ArticleGoogle Scholar
- Ou YY, Gromiha MM, Chen SA, Suwa M. TMBETADISC-RBF: Discrimination of beta-barrel membrane proteins using RBF networks and PSSM profiles. Comput Biol Chem. 2008;32(3):227–31. doi:10.1016/j.compbiolchem.2008.03.002.View ArticlePubMedGoogle Scholar
- Ou YY, Chen SA, Gromiha MM. Classification of transporters using efficient radial basis function networks with position‐specific scoring matrices and biochemical properties. Proteins: Structure, Function, and Bioinformatics. 2010;78(7):1789–97.Google Scholar
- Ou YY, Chen SA. Using efficient RBF networks to classify transport proteins based on PSSM profiles and biochemical properties. In International Work-Conference on Artificial Neural Networks. Springer Berlin Heidelberg; 2009. pp. 869–876.Google Scholar
- Chen SA, Lee TY, Ou YY. Incorporating significant amino acid pairs to identify O-linked glycosylation sites on transmembrane proteins and non-transmembrane proteins. Bmc Bioinformatics. 2010;11(1):1.View ArticleGoogle Scholar
- Lee TY, Chen SA, Hung HY, Ou YY. Incorporating Distant Sequence Features and Radial Basis Function Networks to Identify Ubiquitin Conjugation Sites. PLoS One. 2011;6(3):e17331.View ArticlePubMedPubMed CentralGoogle Scholar
- Crooks GE, Hon G, Chandonia J-M, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14(6):1188–90.View ArticlePubMedPubMed CentralGoogle Scholar
- Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 1997;30(7):1145–59.View ArticleGoogle Scholar
- Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36.View ArticlePubMedGoogle Scholar
- Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter. 2009;11(1):10–8.View ArticleGoogle Scholar
- Frank E, Hall M, Trigg L, Holmes G, Witten IH. Data mining in bioinformatics using Weka. Bioinformatics. 2004;20(15):2479–81.View ArticlePubMedGoogle Scholar
- Chang C-C, Lin C-J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST). 2011;2(3):27.Google Scholar
- Boeckmann B, Bairoch A, Apweiler R, Blatter M-C, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O’Donovan C, Phan I. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003;31(1):365–70.View ArticlePubMedPubMed CentralGoogle Scholar