HemeBIND: a novel method for heme binding residue prediction by combining structural and sequence information
© Liu and Hu; licensee BioMed Central Ltd. 2011
Received: 6 November 2010
Accepted: 26 May 2011
Published: 26 May 2011
Accurate prediction of binding residues involved in the interactions between proteins and small ligands is one of the major challenges in structural bioinformatics. Heme is an essential and commonly used ligand that plays critical roles in electron transfer, catalysis, signal transduction and gene expression. Although much effort has been devoted to the development of various generic algorithms for ligand binding site prediction over the last decade, no algorithm has been specifically designed to complement experimental techniques for identification of heme binding residues. Consequently, an urgent need is to develop a computational method for recognizing these important residues.
Here we introduced an efficient algorithm HemeBIND for predicting heme binding residues by integrating structural and sequence information. We systematically investigated the characteristics of binding interfaces based on a non-redundant dataset of heme-protein complexes. It was found that several sequence and structural attributes such as evolutionary conservation, solvent accessibility, depth and protrusion clearly illustrate the differences between heme binding and non-binding residues. These features can then be separately used or combined to build the structure-based classifiers using support vector machine (SVM). The results showed that the information contained in these features is largely complementary and their combination achieved the best performance. To further improve the performance, an attempt has been made to develop a post-processing procedure to reduce the number of false positives. In addition, we built a sequence-based classifier based on SVM and sequence profile as an alternative when only sequence information can be used. Finally, we employed a voting method to combine the outputs of structure-based and sequence-based classifiers, which demonstrated remarkably better performance than the individual classifier alone.
HemeBIND is the first specialized algorithm used to predict binding residues in protein structures for heme ligands. Extensive experiments indicated that both the structure-based and sequence-based methods have effectively identified heme binding residues while the complementary relationship between them can result in a significant improvement in prediction performance. The value of our method is highlighted through the development of HemeBIND web server that is freely accessible at http://mleg.cse.sc.edu/hemeBIND/.
The heme cofactor, an extremely versatile prosthetic group, is essential and important for virtually all organisms . Hemes can be classified into different types based on their chemical structures. In nature, the most common type is b-type and its derivatives such as a-, c-, d-, and o-type, all use b-type as a template . Heme cofactors are usually bound by heme proteins, which play an important role in a wide variety of biological processes, including electron transfer , oxygen transport , metal ion storage , chemical catalysis , gene expression , and cellular signaling . Identification of residues involved in heme binding sites can help to better understand the biological functions of heme proteins, to uncover the mechanism of heme-protein interactions, and to provide valuable clues for bio-inspired protein design. However, experimental determination of heme binding residues is time-consuming and labor-intensive. It is therefore highly desirable to develop computational methods capable of predicting these important residues.
Over the past fifteen years, a large number of computational approaches have been developed to analyze and predict small ligand binding sites. Broadly, from the perspective of feature extraction, these methods can be divided into three categories: structure-based methods, sequence-based methods, and hybrid methods that combine both structural and sequence information. Among structure-based methods, geometric approaches are widely proposed to detect protein binding pockets, including POCKET , LIGSITE , SURFNET , CAST , and PocketPicker . These algorithms extract solvent accessible pockets on the protein surface and rank them by some geometric measures such as volume, for arranging top-ranked pockets as the putative binding sites. Alternatively, energy-based methods are also commonly used in identifying ligand binding sites when structural information is available. Q-SiteFinder  is an excellent example, which adds hydrophobic (CH3) probes to the protein for calculating van der Waals interaction energy and considers the clusters of probes with the most favorable interaction energy as the potential binding sites. On the other hand, sequence-based approaches such as Rate4Site  and ConSurf  have largely exploited evolutionary conservation of binding site motifs, or the tendency of functionally important residues to accept fewer mutations compared with the rest of the protein. Recently, more and more methods attempted to recognize ligand binding sites by integrating both structural and sequence information. For example, LIGSITECSC , SURFNET-ConSurf , and ConCavity  all incorporated residue evolutionary conservation into pocket detection. Additionally, FINDSITE  used protein threading to evaluate binding site conservation across groups of weakly homologous template structures. Subsequently, NCBI IBIS sever  was built to cluster binding sites found in homologous proteins based on their sequence and structure conservation to annotate different types of binding partners for a query protein. In summary, these computational approaches have achieved success at different levels in ligand binding site prediction.
However, most of the aforementioned methods focused on predicting general ligand binding sites without considering the differences in various ligands. In fact, protein binding sites vary in their roles in different types of protein-ligand interactions . Accordingly, separate consideration should be given for specialized ligand types. Several research groups have developed such ligand-specific binding site prediction algorithms. Sodhi et al.  presented a neural network based algorithm to predict the binding residues of six common metal ions using position specific scoring matrix (PSSM), secondary structure, solvent accessibility, and the inter-atomic distance matrix. Guo et al.  applied support vector machine (SVM) combined with a novel statistical descriptor (the Oriented Shell Model) containing various physicochemical properties to identify ATP-binding sites. Nebel et al.  reported a method to automatically generate structural motifs of protein binding sites on the basis of consensus atom positions and evaluated it on adenine-based ligands. Bordner  developed a group of random forest classifiers to predict the binding sites in protein structures for specific metal ions or small molecules using diverse residue-based properties. In addition, Raghava's group constructed four web servers based on SVM and PSSM to predict the binding residues of ATP, GTP, FAD and NAD ligands respectively only using protein sequence information [27–30]. Nevertheless, to our knowledge, no computational method has been developed for specifically detecting the binding residues interacting with heme ligands.
In this paper, a novel algorithm HemeBIND is proposed for identification of heme binding residues by combining structural and sequence information. First, we provided a detailed analysis of various properties of heme binding residues compared with other residues of the protein, such as interface propensity, evolutionary conservation, solvent accessibility, depth, protrusion and spatial clustering of binding residues, based on a non-redundant dataset of b- and c-type heme proteins. It was found that these features have distinctly different distributions between heme binding and non-binding residues. We then constructed and evaluated a set of structure-based classifiers by using sequence profile, solvent accessibility, depth, protrusion or the combinations of them as the input features of SVMs for heme binding residue prediction. The results showed that these four features provide largely complementary information and their combination achieved the best prediction performance. To further improve the performance, a post-processing procedure was developed to reduce false positives generated by the structure-based classifier with the combined four features. Next, we constructed a sequence-based classifier based on SVM and sequence profile as an alternative method, which is useful when only sequence information is available. Finally, a simple ensemble algorithm was proposed by combining the predictions of the structure-based and sequence-based classifiers, yielding a substantial improvement in prediction performance. Extensive experiments demonstrated that the proposed method can be successfully applied to the prediction of heme binding residues and could provide valuable insights into binding residue prediction for other types of ligands.
To construct the dataset of heme proteins, we extracted 2209 heme-protein complexes, mainly composed of b- and c-type hemes, by using "HEM" as a HET group code to search against the Het-PDB Navi. Database (version at May 2010) . Only the X-ray diffraction protein structures with a resolution better than 3Å were reserved in the current study. In order to reduce sequence redundancy, 4127 heme proteins from the selected complexes were compared using the BLASTCLUST program . Two chains were assigned to the same cluster if the sequence identity was more than 30% and the alignment length covered at least 90% of one member of a chain pair. As a result, these heme proteins were classified into 147 clusters. For each cluster, we chose the longest heme protein as a representative. Because five heme proteins (155C:A, 2OLP:A, 3CAO:A, 4CAT:A, 4CAT:B) contain "X" amino acid and the structural file of one heme protein (1C53:A) can not be calculated by the DSSP program , these chains were excluded. Therefore, the main dataset is composed of 141 non-redundant heme proteins (Additional file 1, Table S1).
In addition to the main dataset, we constructed an alternative dataset derived from the experimental data prepared by Fufezan et al. . The original dataset consists of 89 heme proteins, where no two chains have more than 25% sequence identity. We found that the HET group codes of 14 records are not labeled as "HEM". To keep consistent with the main dataset, these chains have been removed from the original dataset. Thus, the remaining 75 heme proteins were used as our alternative dataset (Additional file 1, Table S2).
Independent test set
Since the heme proteins in the study of Fufezan et al.  were collected in March 2007, the chains collected afterwards in our main dataset can be considered as an independent test set to evaluate our method by using the alternative dataset as a training set. Hence, from the main dataset, the chains sharing more than 30% sequence identity with any one of the 75 chains in the alternative dataset were eliminated. As a result, we obtained a non-redundant set of 72 heme proteins. In this dataset, 62 protein chains bind a single heme molecule and 10 protein chains interact with multiple heme molecules, respectively (Additional file 1, Table S3).
Extraction of heme binding residues
In this study, following the step of Raghava et al.'s work [27–30], we used the Ligand Protein Contact (LPC) server  to arrange heme binding and non-binding residues for the protein chains in our three datasets. The LPC server utilizes the surface complementarity approach developed by Sobolev et al.  to define the contacts in protein-ligand complexes. For the protein chains binding multiple heme molecules, we considered all the residues forming contacts with these ligands as the binding residues in the given chain. According to the analysis of LPC server, the main dataset contains 5079 binding residues and 32712 non-binding residues, the alternative dataset includes 2512 binding residues and 16045 non-binding residues, and the independent test set has 2652 binding residues and 15904 non-binding residues, respectively. It should be emphasized that since our prediction method attempts to take advantage of structural information, the residues that have no atomic coordinates were not used in the present work.
Position specific scoring matrix (PSSM)
where x is the raw matrix value.
Relative accessible surface area (RASA)
where SASAr is the SASA of residue r, max(SASAr) is the maximum SASA of residue r defined by Rost and Sander .
Depth index (DPX)
DPX is defined as the distance between a given atom and its closest solvent accessible atom (SASA > 0). Hence, the depth is zero for solvent accessible atoms and greater than zero for interior atoms, and deeply buried atoms have higher DPX values . In our study, the PSAIA software  with default parameters was utilized to generate the DPX-related features of each residue in the unbound chain that include the average of all atom DPXs, the standard deviation of all atom DPXs, the average of all side-chain atom DPXs, the standard deviation of all side-chain atom DPXs, the minimal atom DPX and maximal atom DPX. These features were scaled between 0 and 1 using the standard logistic function.
Protrusion index (CX)
CX is another important measure used to describe the geometric shape of a protein, reflecting the extent to which an atom protrudes from the protein surface . For each heavy atom in a protein structure, a sphere of predetermined radius is centered around it, and the ratio (CX) between the volume occupied by the protein and the remaining volume within the sphere is calculated. The PSAIA software re-implemented the CX algorithm developed by Pintar et al. . Thus, the CX-related features of each residue, including the average of all atom CXs, the standard deviation of all atom CXs, the average of all side-chain atom CXs, the standard deviation of all side-chain atom CXs, the minimal atom CX and maximal atom CX, were calculated using this software and normalized just as DPX-related features were done.
Support vector machine (SVM) is an effective supervised learning model suitable for binary classification . In this study, we used SVM classifiers to distinguish heme binding residues from non-binding residues. These classifiers can be divided into two classes, depending on whether structural information or sequence information was used to build the prediction model. Fifteen structure-based classifiers were constructed using PSSM, RASA, DPX, CX or the combinations of these features. The input of each structure-based classifier is a spatial window of M residues containing the target residue and its nearest neighbors obtained by calculating the distances between the α-carbons of residues. Alternatively, two sequence-based classifiers were built using amino acid binary pattern and PSSM [27–30]. The input of each sequence-based classifier is a sliding window of N consecutive residues centered on the target residue. The optimal values of M and N were determined by using different widow sizes as input. The LIBSVM package  was utilized to implement these SVM classifiers and the radial basis function was selected as kernel. The optimal parameters of each SVM classifier were obtained by combining a grid search with 5-fold cross-validation. By comparing the performances of all the SVM classifiers, we chose the structure-based classifier with the combination of PSSM, RASA, DPX and CX features and the sequence-based classifier with PSSM as the final classifiers.
Reduction of false positives
Previous studies showed that residues located in ligand binding interfaces are more evolutionarily conserved, and they tend to form spatial clusters . Based on this observation, we developed a post-processing procedure to reduce the number of false positives to further improve the prediction performance. Concretely, for the residues predicted as positives by the structure-based classifier with the combined four features, they were reassigned as negatives if less than T (1 ≤ T ≤ W) positive predictions were included in their W nearest spatial neighbors. In our experiments, we used different values of W and T to test the effectiveness of our post-processing procedure. To explain the rationale, we can consider two different scenarios. In both cases, the target residue has been predicted to be a heme binding residue by our structure-based classifier. However, in the first case most of its spatial neighbors are also predicted to be binding residues, but in the second case few of them are predicted to be in the interface. Obviously, the chance that the target residue is indeed a binding residue will be much higher in the first case. No post-processing procedure was applied to the outputs produced by the sequence-based classifier with PSSM, because no remarkable improvement was observed.
Training and testing
5-fold cross-validation was conducted on the main dataset and the alternative dataset respectively. In this procedure, the whole dataset were randomly divided into five subsets with an approximately equal number of protein chains. For each run, one subset was left out for testing, while the remaining four subsets were used for training. This process was repeated until all subsets had been tested. The final performance was obtained by averaging the performances of the five subsets. To further assess the robustness of our approach, we used the alternative dataset as a training set to train SVM classifiers which were then used to predict heme binding residues in the independent test set. In our three datasets, the numbers of non-binding residues were much larger than those of binding residues. If all non-binding residues were used for training, the classifiers would be biased to predict a residue as a non-binding residue. Thus, in the process of cross-validation and independent testing, the classifiers were trained using all binding residues and an equal number of non-binding residues extracted randomly from the training set.
where TP, FP, TN and FN represented true positive (correctly predicted heme binding residue), false positive (non-binding residue incorrectly predicted as binding), true negative (correctly predicted non-binding residue) and false negative (binding residue incorrectly predicted as non-binding), respectively.
Results and discussion
Characteristics of heme binding residues
In this study, the proposed prediction algorithm was developed on the basis of the complementary relationship between structural and sequence information. Before using HemeBIND for prediction, we examined the distributions of the following properties of residues located in heme binding interfaces compared with the remainder of the protein, including interface propensity, evolutionary conservation, solvent accessibility, depth, protrusion and spatial clustering of binding residues. In addition, the Kolmogorov-Smirnov test was conducted to evaluate the statistically significant difference. Among the aforementioned attributes, while the depth distributions of heme binding and non-binding residues were most similar, we got a P-value of 5.4 × 10-22. The P-values of the remaining attributes were smaller than that of the depth, indicating that the difference of the distributions was statistically significant for each attribute. The results described herein were derived from the main dataset and similar results were observed when we used the alternative dataset to perform the same analysis (Additional file 2, Figure S1).
Previous studies have demonstrated that ligand binding sites are more conserved than non-binding sites during evolution . To check whether heme binding sites have a similar conservation bias, we used the diagonal element of PSSM at each residue position to evaluate the evolutionary conservation of that residue as was done in  and calculated the distribution of the conservation scores of the heme binding and non-binding residues. As shown in Figure 2(b), non-binding residues had relatively higher proportions in the -5-4 brackets. However, binding residues dominated the remaining brackets, especially remarkable for the 8-12 brackets. These results suggested that residues involved in heme binding interfaces are more evolutionarily conserved.
Figure 2(c) displays the relative solvent accessibilities of heme binding residues compared with non-binding residues in the main dataset. We found that 78% of heme binding residues had RASAs of less than 40%, while only 64% of non-binding residues were located in the same brackets. When RASA increased over 40%, the percentages of binding residues became smaller than those of non-binding residues. One might expect that binding residues should be more solvent accessible than non-binding residues, but the results showed that this is not the case. Similar observation was reported by Bartlett et al.  when they analyzed the solvent accessibilities of catalytic residues in enzyme active sites. The main reason for this phenomenon might be due to the need for correct positioning and restriction of mobility of the residues in these functional sites.
The mean value of all atom DPXs for each residue was calculated and the distribution is given in Figure 2(d). It can be seen that about 26% of heme binding residues lied on the surface of the protein with depths less than 0.25Å, whereas 30% of non-binding residues were observed in this bracket. However, in the 0.5-1.75Å brackets, binding residues appeared more frequently than non-binding residues. Additionally, binding residues rarely had depths greater than 2.5Å, which allows these residues to have some solvent accessibility to interact with the heme molecule whilst remaining mostly buried.
Figure 2(e) shows the distribution of protrusion values. We found that the percentages of binding and non-binding residues with CXs no larger than 0.5Å were 63% and 48%, respectively. But as the protrusion value increased, the proportions of binding residues became smaller than those of non-binding residues. The results indicated that most of heme binding residues have lower CXs compared to non-binding residues. This might be due to the fact that ligand binding residues are usually located in the concavities of a protein.
It has been suggested that evolutionarily conserved residues tend to be clustered in the three-dimensional protein structures . Thus, in a heme protein, it would be expected that the residues involved in heme-protein interactions are conserved and clustered in vicinity of the heme ligands. For each residue, we counted the number of binding residues among its 18 spatially neighboring residues. Figure 2(f) shows that almost 66% of non-binding residues had no more than one binding residue in their 18 neighbors, and the proportion decreased steadily as the number of binding residues increased. Instead, heme binding residues illustrated a completely different distribution. For each binding residue, there was at least one binding residue observed in its neighbors. In the 6-7 brackets, the percentages of binding residues were the highest, indicating that heme binding residues indeed tend to form spatial clusters.
Determination of optimal window sizes for feature calculation
Performance of structure-based classifiers tested on main dataset
Performance of structure-based classifiers on main dataset
More interestingly, we found that combining any two features can improve the prediction performance to a certain degree. Among the six classifiers with the combination of two features, the classifier based on PSSM and DPX and the classifier based on RASA and CX achieved the best performance with the MCC of about 0.39 and F1-score of about 46%. Although the remaining two-feature based classifiers did not perform as well as the two classifiers aforementioned, they were still superior to the classifiers with a single feature. The results implied that these four features contain different and complementary information for heme binding residue prediction.
In addition, we observed that when RASA or CX was incorporated as an additional feature into the classifier based on PSSM and DPX, the MCC and F1-score were slightly raised. However, not all the classifiers with three features obtained a better prediction result. For example, as we added PSSM into the classifier based on RASA and CX, the predictive capability got a little worse. Finally, the classifier with the combined four features achieved the highest MCC of 0.407 and F1-score of 47.56% among all fifteen structure-based classifiers, which confirmed that the complementarity of these four features is beneficial for improving the prediction of heme binding residues.
Performance of post-processing procedure
Performance of post-processing procedure on main dataset
W = 0, T = 0a
W = 6, T = 1
W = 10, T = 2
W = 14, T = 4
W = 18, T = 5
W = 22, T = 6
Performance of sequence-based classifiers tested on main dataset
Performance of sequence-based classifiers on main dataset
Performance of the ensemble classifiers
Performance of different prediction models on main dataset
Performance of our classifiers tested on alternative dataset
To test whether our classifiers can effectively identify heme binding residues in another dataset, we conducted 5-fold cross-validation on the 75 heme proteins collected by Fufezan et al.  and the results are given in Table S1 (Additional file 2). As shown in this table, the ranking of the predictive capabilities of the different classifiers was almost consistent with that achieved on the main dataset, with exception of the structure-based classifiers with two features. Additionally, as expected, the performance of each classifier was not as good as that of the corresponding classifier tested on the main dataset, which could be due to the relatively small number of samples in the training set. Even so, when the combined prediction model was used to predict heme binding residues in the alternative dataset, we obtained a reasonable performance with the MCC of 0.465 and F1-score of 52.94%. The results demonstrated that our method performed well on different datasets.
Performance of different prediction models on independent test set
Comparison with other methods
Performance comparison of different prediction methods
In this study, we proposed HemeBIND, the first specialized algorithm for heme binding residue prediction, by combining structural and sequence information. Through systematic analysis of heme binding interfaces, we found that several sequence and structural attributes, such as evolutionary conservation, solvent accessibility, depth and protrusion can distinctly reflect the differences between heme binding regions and the rest of the protein. Based on this finding, the attributes mentioned above were separately used or combined to construct structure-based and sequence-based classifiers to identify the residues located in binding regions. Experimental results showed that evolutionary conservation is an indispensable factor for predicting heme binding residues, but not sufficient by itself, especially when structural information is available. Integrating structural attributes with evolutionary conservation yielded a remarkable improvement in performance over conservation alone. In summary, our study not only presents a new method to recognize heme binding residues, but also provides valuable insights into specific ligand binding site prediction.
This work was supported by the National Science Foundation Career Award (Grant BIO-DBI-0845381). The authors are grateful to Richard Porter for improving the language and Stephanie Hennrich for developing the web server.
- Schneider S, Marles-Wright J, Sharp KH, Paoli M: Diversity and conservation of interactions for binding heme in b-type heme proteins. Nat Prod Rep 2007, 24: 621–630. 10.1039/b604186hView ArticlePubMedGoogle Scholar
- Fufezan C, Zhang J, Gunner MR: Ligand preference and orientation in b- and c-type heme-binding proteins. Proteins 2008, 73: 690–704. 10.1002/prot.22097PubMed CentralView ArticlePubMedGoogle Scholar
- Gray HB, Winkler JR: Electron transfer in proteins. Annu Rev Biochem 1996, 65: 537–561. 10.1146/annurev.bi.65.070196.002541View ArticlePubMedGoogle Scholar
- Terwilliger NB: Functional adaptations of oxygen-transport proteins. J Exp Biol 1998, 201: 1085–1098.PubMedGoogle Scholar
- Reedy CJ, Gibney BR: Heme protein assemblies. Chem Rev 2004, 104: 617–649. 10.1021/cr0206115View ArticlePubMedGoogle Scholar
- Guengerich FP, Macdonald TL: Chemical Mechanisms of Catalysis by Cytochromes-P-450 - a Unified View. Accounts Chem Res 1984, 17: 9–16. 10.1021/ar00097a002View ArticleGoogle Scholar
- Smith A, Alam J, Escriba PV, Morgan WT: Regulation of heme oxygenase and metallothionein gene expression by the heme analogs, cobalt-, and tin-protoporphyrin. J Biol Chem 1993, 268: 7365–7371.PubMedGoogle Scholar
- Mense SM, Zhang L: Heme: a versatile signaling molecule controlling the activities of diverse regulators ranging from transcription factors to MAP kinases. Cell Res 2006, 16: 681–692. 10.1038/sj.cr.7310086View ArticlePubMedGoogle Scholar
- Levitt DG, Banaszak LJ: POCKET: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. J Mol Graph 1992, 10: 229–234. 10.1016/0263-7855(92)80074-NView ArticlePubMedGoogle Scholar
- Hendlich M, Rippmann F, Barnickel G: LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J Mol Graph Model 1997, 15: 359–363. 389 389 10.1016/S1093-3263(98)00002-3View ArticlePubMedGoogle Scholar
- Laskowski RA: SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Graph 1995, 13: 323–330. 307–328 307-328 10.1016/0263-7855(95)00073-9View ArticlePubMedGoogle Scholar
- Liang J, Edelsbrunner H, Woodward C: Anatomy of protein pockets and cavities: measurement of binding site geometry and implications for ligand design. Protein Sci 1998, 7: 1884–1897. 10.1002/pro.5560070905PubMed CentralView ArticlePubMedGoogle Scholar
- Weisel M, Proschak E, Schneider G: PocketPicker: analysis of ligand binding-sites with shape descriptors. Chem Cent J 2007, 1: 7. 10.1186/1752-153X-1-7PubMed CentralView ArticlePubMedGoogle Scholar
- Laurie AT, Jackson RM: Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics 2005, 21: 1908–1916. 10.1093/bioinformatics/bti315View ArticlePubMedGoogle Scholar
- Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N: Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 2002, 18(Suppl 1):S71–77. 10.1093/bioinformatics/18.suppl_1.S71View ArticlePubMedGoogle Scholar
- Armon A, Graur D, Ben-Tal N: ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J Mol Biol 2001, 307: 447–463. 10.1006/jmbi.2000.4474View ArticlePubMedGoogle Scholar
- Huang B, Schroeder M: LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation. BMC Struct Biol 2006, 6: 19. 10.1186/1472-6807-6-19PubMed CentralView ArticlePubMedGoogle Scholar
- Glaser F, Morris RJ, Najmanovich RJ, Laskowski RA, Thornton JM: A method for localizing ligand binding pockets in protein structures. Proteins 2006, 62: 479–488.View ArticlePubMedGoogle Scholar
- Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA: Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol 2009, 5: e1000585. 10.1371/journal.pcbi.1000585PubMed CentralView ArticlePubMedGoogle Scholar
- Brylinski M, Skolnick J: FINDSITE: a threading-based approach to ligand homology modeling. PLoS Comput Biol 2009, 5: e1000405. 10.1371/journal.pcbi.1000405PubMed CentralView ArticlePubMedGoogle Scholar
- Thangudu RR, Tyagi M, Shoemaker BA, Bryant SH, Panchenko AR, Madej T: Knowledge-based annotation of small molecule binding sites in proteins. BMC Bioinformatics 2010, 11: 365. 10.1186/1471-2105-11-365PubMed CentralView ArticlePubMedGoogle Scholar
- Henrich S, Salo-Ahen OM, Huang B, Rippmann FF, Cruciani G, Wade RC: Computational approaches to identifying and characterizing protein binding sites for ligand design. J Mol Recognit 2010, 23: 209–219.PubMedGoogle Scholar
- Sodhi JS, Bryson K, McGuffin LJ, Ward JJ, Wernisch L, Jones DT: Predicting metal-binding site residues in low-resolution structural models. J Mol Biol 2004, 342: 307–320. 10.1016/j.jmb.2004.07.019View ArticlePubMedGoogle Scholar
- Guo T, Shi Y, Sun Z: A novel statistical ligand-binding site predictor: application to ATP-binding sites. Protein Eng Des Sel 2005, 18: 65–70. 10.1093/protein/gzi006View ArticlePubMedGoogle Scholar
- Nebel JC, Herzyk P, Gilbert DR: Automatic generation of 3D motifs for classification of protein binding sites. BMC Bioinformatics 2007, 8: 321. 10.1186/1471-2105-8-321PubMed CentralView ArticlePubMedGoogle Scholar
- Bordner AJ: Predicting small ligand binding sites in proteins using backbone structure. Bioinformatics 2008, 24: 2865–2871. 10.1093/bioinformatics/btn543PubMed CentralView ArticlePubMedGoogle Scholar
- Ansari HR, Raghava GP: Identification of NAD interacting residues in proteins. BMC Bioinformatics 2010, 11: 160. 10.1186/1471-2105-11-160PubMed CentralView ArticlePubMedGoogle Scholar
- Chauhan JS, Mishra NK, Raghava GP: Identification of ATP binding residues of a protein from its primary sequence. BMC Bioinformatics 2009, 10: 434. 10.1186/1471-2105-10-434PubMed CentralView ArticlePubMedGoogle Scholar
- Chauhan JS, Mishra NK, Raghava GP: Prediction of GTP interacting residues, dipeptides and tripeptides in a protein from its evolutionary information. BMC Bioinformatics 2010, 11: 301. 10.1186/1471-2105-11-301PubMed CentralView ArticlePubMedGoogle Scholar
- Mishra NK, Raghava GP: Prediction of FAD interacting residues in a protein from its primary sequence using evolutionary information. BMC Bioinformatics 2010, 11(Suppl 1):S48. 10.1186/1471-2105-11-S1-S48PubMed CentralView ArticlePubMedGoogle Scholar
- Yamaguchi A, Iida K, Matsui N, Tomoda S, Yura K, Go M: Het-PDB Navi.: a database for protein-small molecule interactions. J Biochem 2004, 135: 79–84. 10.1093/jb/mvh009View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22: 2577–2637. 10.1002/bip.360221211View ArticlePubMedGoogle Scholar
- Sobolev V, Sorokine A, Prilusky J, Abola EE, Edelman M: Automated analysis of interatomic contacts in proteins. Bioinformatics 1999, 15: 327–332. 10.1093/bioinformatics/15.4.327View ArticlePubMedGoogle Scholar
- Sobolev V, Wade RC, Vriend G, Edelman M: Molecular docking using surface complementarity. Proteins 1996, 25: 120–129. 10.1002/(SICI)1097-0134(199605)25:1<120::AID-PROT10>3.3.CO;2-1View ArticlePubMedGoogle Scholar
- Jones DT: Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 1999, 292: 195–202. 10.1006/jmbi.1999.3091View ArticlePubMedGoogle Scholar
- Kuznetsov IB, Gou Z, Li R, Hwang S: Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins. Proteins 2006, 64: 19–27. 10.1002/prot.20977View ArticlePubMedGoogle Scholar
- Rost B, Sander C: Conservation and prediction of solvent accessibility in protein families. Proteins 1994, 20: 216–226. 10.1002/prot.340200303View ArticlePubMedGoogle Scholar
- Pintar A, Carugo O, Pongor S: DPX: for the analysis of the protein core. Bioinformatics 2003, 19: 313–314. 10.1093/bioinformatics/19.2.313View ArticlePubMedGoogle Scholar
- Mihel J, Sikic M, Tomic S, Jeren B, Vlahovicek K: PSAIA - protein structure and interaction analyzer. BMC Struct Biol 2008, 8: 21. 10.1186/1472-6807-8-21PubMed CentralView ArticlePubMedGoogle Scholar
- Jones S, Thornton JM: Analysis of protein-protein interaction sites using surface patches. Journal of Molecular Biology 1997, 272: 121–132. 10.1006/jmbi.1997.1234View ArticlePubMedGoogle Scholar
- Pintar A, Carugo O, Pongor S: CX, an algorithm that identifies protruding atoms in proteins. Bioinformatics 2002, 18: 980–984. 10.1093/bioinformatics/18.7.980View ArticlePubMedGoogle Scholar
- Vapnik VN: The nature of statistical learning. springer New York, NY; 2002.Google Scholar
- LIBSVM: a library for support vector machines[http://www.csie.ntu.edu.tw/~cjlin/libsvm]
- Schueler-Furman O, Baker D: Conserved residue clustering and protein structure prediction. Proteins 2003, 52: 225–235. 10.1002/prot.10365View ArticlePubMedGoogle Scholar
- Smith LJ, Kahraman A, Thornton JM: Heme proteins--diversity in structural characteristics, function, and folding. Proteins 2010, 78: 2349–2368. 10.1002/prot.22747View ArticlePubMedGoogle Scholar
- Zhou HX, Shan Y: Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 2001, 44: 336–343. 10.1002/prot.1099View ArticlePubMedGoogle Scholar
- Bartlett GJ, Porter CT, Borkakoti N, Thornton JM: Analysis of catalytic residues in enzyme active sites. J Mol Biol 2002, 324: 105–121. 10.1016/S0022-2836(02)01036-7View ArticlePubMedGoogle Scholar
- Paoli M, Marles-Wright J, Smith A: Structure-function relationships in heme-proteins. DNA Cell Biol 2002, 21: 271–280. 10.1089/104454902753759690View ArticlePubMedGoogle Scholar
- The PyMOL Molecular Graphics System[http://www.pymol.org]
- De Laurentis W, Khim L, Anderson JLR, Adam A, Johnson KA, Phillips RS, Chapman SK, van Pee KH, Naismith JH: The second enzyme in pyrrolnitrin biosynthetic pathway is related to the heme-dependent dioxygenase superfamily. Biochemistry 2007, 46: 14733–14733. 10.1021/bi702167mView ArticleGoogle Scholar
- Igarashi N, Moriyama H, Fujiwara T, Fukumori Y, Tanaka N: The 2.8 angstrom structure of hydroxylamine oxidoreductase from a nitrifying chemoautotrophic bacterium, Nitrosomonas europaea. Nature Structural Biology 1997, 4: 276–284. 10.1038/nsb0497-276View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.