Predicting the protein-protein interactions using primary structures with predicted protein surface

Chang, Darby Tien-Hao; Syu, Yu-Tang; Lin, Po-Chang

doi:10.1186/1471-2105-11-S1-S3

Volume 11 Supplement 1

Selected articles from the Eighth Asia-Pacific Bioinformatics Conference (APBC 2010)

Research
Open access
Published: 18 January 2010

Predicting the protein-protein interactions using primary structures with predicted protein surface

Darby Tien-Hao Chang¹,
Yu-Tang Syu¹ &
Po-Chang Lin¹

BMC Bioinformatics volume 11, Article number: S3 (2010) Cite this article

6473 Accesses
14 Citations
Metrics details

Abstract

Background

Many biological functions involve various protein-protein interactions (PPIs). Elucidating such interactions is crucial for understanding general principles of cellular systems. Previous studies have shown the potential of predicting PPIs based on only sequence information. Compared to approaches that require other auxiliary information, these sequence-based approaches can be applied to a broader range of applications.

Results

This study presents a novel sequence-based method based on the assumption that protein-protein interactions are more related to amino acids at the surface than those at the core. The present method considers surface information and maintains the advantage of relying on only sequence data by including an accessible surface area (ASA) predictor recently proposed by the authors. This study also reports the experiments conducted to evaluate a) the performance of PPI prediction achieved by including the predicted surface and b) the quality of the predicted surface in comparison with the surface obtained from structures. The experimental results show that surface information helps to predict interacting protein pairs. Furthermore, the prediction performance achieved by using the surface estimated with the ASA predictor is close to that using the surface obtained from protein structures.

Conclusion

This work presents a sequence-based method that takes into account surface information for predicting PPIs. The proposed procedure of surface identification improves the prediction performance with an F-measure of 5.1%. The extracted surfaces are also valuable in other biomedical applications that require similar information.

Background

The different types of interactions among proteins are essential to various biological functions in a living cell. Information about these interactions provides a basis to construct protein interaction networks and improves our understanding of the general principles of the functioning of biological systems [1]. Recent years have seen the development of various experimental techniques for systematic protein-protein interaction (PPI) analysis [2–5]. At present, however, experimentally detected interactions represent only a small fraction of the real interaction network [6, 7]. Therefore, a number of computational approaches have been proposed to expedite the PPI detection process based on only experimental techniques [8].

Computational methods that depend on not only sequence information but also some prior knowledge of, for example, localization data [9], structural data [10, 11], expression data [12, 13] or information on the interactions of orthologs [14, 15] cannot be applied on some essential proteins that are observed in most organisms [16]. To solve this problem, several sequence-based algorithms have been developed to detect potentially interacting protein pairs when no auxiliary information is available [17–23].

This work presents a novel sequence-based method which involves a mechanism for identifying the protein surface to help PPI prediction. This method employs the conjoint triad feature [24] for describing protein sequences and the relaxed variable kernel density estimator (RVKDE) [25] for classification. Conjoint triads, which treat three continuous amino acids as a single unit, have been shown to be a useful set of features in predicting protein-protein interactions [24]. This work improves this feature set by focusing on conjoint triads at the protein surface. This improvement is based on the assumption that protein-protein interactions are more related to amino acids at the surface than those at the core. To maintain the advantage of depending on only sequence information, this method employs an accurate accessible surface area (ASA) predictor, recently proposed by the authors [26], to determine the protein surface.

In this study, a collection of 691 PPIs is used to evaluate the prediction performance with and without the proposed mechanism for identifying the protein surface. The experimental results show that the surface information promotes PPI prediction based on feature encoding with conjoint triads. Furthermore, the quality of the predicted surface is analyzed using a number of protein structures collected from the Protein Data Bank (PDB) [27]. The experimental results demonstrate that the performance of PPI prediction achieved using the predicted surface is close to that achieved using the surface obtained from protein structures.

Results and discussion

This section first describes the workflow of the proposed method. Next, the measurements and datasets for performance evaluation are presented. The proposed method is evaluated and compared with another sequence-based PPI predictor. At the end of the section, the predicted surface is compared to those obtained from protein structures.

Proposed PPI prediction scheme

Figure 1 depicts the workflow of the developed method. Steps marked with an asterisk indicate the major differences between the procedure in this work and those presented in previous PPI studies. First, the feature vectors of both proteins of a given protein pair are individually generated. This operation is further split into three steps: 'ASA Prediction', 'Surface Identification' and 'Feature Encoding'. The 'ASA Prediction' step invokes a sequence-based ASA predictor for assigning a relative ASA (RSA) value to each residue of the protein sequence. Based on these RSA values, the 'Surface Identification' step identifies surface sequence segments in which most residues have large RSA values. The detailed criterion of identifying surface segments is presented in the Methods section. Next, the 'Feature Encoding' step determines the frequencies of conjoint triads that are observed in the identified surface segments and uses these frequencies to generate the feature vector. Finally, the two feature vectors of the given protein pair are concatenated and sent to RVKDE for classifying whether the two proteins have interactions. See the Methods section for details of all of these steps.

Measurements

Determining whether two proteins have interactions is a binary classification problem. Table 1 lists five measurements that are applied widely on evaluating binary classification problems. The accuracy is the most commonly used measurement, which represents an overall performance of a predictor. The F-measure is designed for problems where a class of instances attracts most attention, which is appropriate for PPI prediction [28]. The precision is the fraction of predicted interacting protein pairs that truly have interactions. The sensitivity is the fraction of interacting protein pairs correctly predicted to have interactions, while the specificity is the fraction of non-interacting protein pairs correctly predicted to have no interaction.

Table 1 Evaluation measurements.

Full size table

Datasets

A challenge in preparing protein-protein interaction datasets is the presence of some interactions that are observed in the laboratory experimentation but do not occur physiologically [6]. To ensure the quality of PPI data, an interaction should be consistent with other types of information [29], such as metabolomic [30] and gene-gene relationship data [31]. Though these types of data are often incomplete in most organisms at present, the interaction network of transcription factors (TF) of Saccharomyces cerevisiae is an extensively studied system in which all of such information are currently available [29]. Therefore, this study collects 691 interactions of 211 yeast TFs from several studies and databases [32–36] to generate a PPI dataset, SC691. In this dataset, the 691 interactions are used as positive instances, while other protein pairs created by coupling the 211 TFs are used as negative instances.

Evaluation of PPI prediction

In the experiment, the SC691 dataset is randomly split into three subsets of 341, 175 and 175 interacting pairs. These subsets also contain 341, 175 and 175 non-interacting pairs obtained by arbitrarily sampling of the negative instances in the SC691 dataset. Care is taken to ensure that different subsets will not share identical instances. In this experiment, the first subset is used as the training set to predict the other two subsets. The predicted results of the second subset are used for parameter selection, while the predicted results of the third subset indicate the prediction performance of a PPI predictor. Therefore, an evaluation process is performed by first using the first subset to predict the second subset. Then the parameters that maximize the F-measure are used to predict the third subset. Since the procedure for generating these subsets involves randomness, the evaluation process is performed ten times to eliminate the evaluation bias in a single evaluation process.

Table 2 presents the prediction performance of the proposed method under various surface conditions. In this work, the predicted surface is union of several surface sequence segments of fixed length. The parameter o restricts the minimum number of surface residues in a surface segment, and thereby affects the predicted surface. See the 'Surface identification' subsection for details. Table 2 also includes the prediction performance of the sequence-based method proposed by Shen et al. [24], which uses conjoint triads that are observed in protein sequences without considering surface information. In Table 2, all the five measurements of are improved after introducing the surface information without depending on the surface condition. Considering surface segments that include at least three surface residues achieves the best performance, and the other three surface conditions deliver similar performance. This suggests that to form a stable interface requires at least three residues. Restricting that a surface segment must have at least four surface residues would be too rigorous and filter out some potential surface segments.

Table 2 Performance achieved by considering and by neglecting surface information.

Full size table

As a result, the average Acc., Fm., Prec., Sens. and Spec. of the developed method are 74.1%, 75.5%, 71.8%, 79.7% and 68.6%, respectively. All five measurements are superior to those delivered by the predictor without surface information. These results show that the proposed mechanism for identifying the protein surface helps to predict protein-protein interactions based on feature encoding with conjoint triads.

Evaluation of predicted surface

As shown in Figure 1, the 'ASA Prediction' and 'Surface Identification' steps are the major differences between this work and others. To evaluate the added components, this subsection reports the experiment for answering two questions: a) how the predicted surface overlap with the surface obtained from protein structures and b) how the PPI prediction performs when using the predicted surface compared to those using the surface obtained from protein structures. The ten TFs from the SC691 dataset that have structures in PDB (Table 3) are used to generate a smaller dataset. This dataset, called SC85, includes 85 positive and 1980 negative instances from the SC691 dataset. Each pair of the SC85 dataset contains at least one of the ten TFs. In this experiment, a prediction is made by five-fold cross validation of the SC85 dataset, in which each fold includes 17 positive and 396 negative instances. The cross validation is performed ten times to eliminate the evaluation bias. The surface condition is set to consider surface segments that include at least three surface residues.

Table 3 Proteins in the SC691 dataset that have structures in PDB

Full size table

Table 4 shows the overlap of the predicted surface and the surface obtained from protein structures, called 'structural surface', in the residue level. The predicted surface is identified based on the predicted ASA obtained from the adopted ASA predictor, while the structural surface is identified based on the actual ASA obtained by invoking the Dictionary of Protein Secondary Structure (DSSP) program [37]. In this experiment, at least 75% (91.9% in average) of surface residues--residues in the structural surface--are included in the predicted surface. Conversely, some individual trials delivered <60% specificity, and the average specificity (77.7%) is relative lower in comparison with the sensitivity. These results indicate that a certain percentage of buried residues--residues outside the structural surface--are incorrectly included in the predicted surface. Namely, the proposed method delivers a larger surface than that obtained based on actual ASA. Overall, the predicted surface is consistent to structural surface in this dataset according to the accuracy and F-measure.

Table 4 Overlap between predicted and structural surface.

Full size table

The next analysis aims to elaborate how much does the difference between predicted and structural surface affect the results of PPI prediction. Table 5 presents the performance of PPI prediction using the predicted and structural surface. Though the predicted surface performs worse than the structural surface, the differences in all evaluation measures are less than the standard deviations of using the structural surface. These results reveal that the added components of this work can achieve comparable performance of dealing yeast TFs to that delivered using structure information.

Table 5 Performance achieved using predicted and structural surface.

Full size table

In the end of this section, a protein pair from the collected 691 PPIs of which both the proteins appear in the same complex structure in PDB is used to plot the overlap between the predicted surface and the interface. This complex (PDB ID: 2HZM) includes the two subunits (Med18 and Med20) of the RNA ploymerase II, which is central to eukaryotic gene expression and has been studied extensively [38]. Figure 2 presents the interface residues of Med18 (chain B in 2HZM) and Med20 (chain A in 2HZM). Interface residues are defined as those that have at least one heavy atom within 5 Å distance of the interacting partner. This definition is similar to those used in many studies [39–41].

For Med18, the present method successfully excludes 80 (accounting for ~26.1%) from total 307 residues while preserving 48 (accounting for ~92.3% of the 52) interface residues. As shown in Figure 2(a), most interface residues, specified in yellow, are included. However, for Med20, the proposed method misses 24 (accounting for ~54.5% of the 44) interface residues in the predicted surface in Figure 2(b). Figure 2(b) reveals that the predicted surface misses the segment (residues 86-107) of Med20 that acts like an arm stretching to Med18. A comparison with the interface shown in Figure 2(a) suggests that the present method may perform better at handling flatter interfaces. Since protein subunits may interact and form relatively flat or twisted surfaces [42], the good performance of the present method probably results from the fact that most of the collected S. cerevisiae TFs have relatively flat surfaces.

These results also reveal that the proposed mechanism for identifying the surfaces of proteins with relatively twisted surfaces must be improved.

Conclusion

An enormous gap exists between the number of protein structures and the huge number of protein sequences. Hence, predicting protein functions directly from amino acid sequences remains one of the most important problems in life science. This work presents a computational approach for PPI prediction based on only sequence information. Notably, a mechanism of extracting surface information is proposed to refine the feature vector for representing a protein sequence. This method is analyzed in terms of a) the performance in predicting PPIs and b) the quality of the predicted surface. The experimental results show that the present method improves on the prediction performance of PPI with an F-measure of 5.1%. Furthermore, the predicted surface of yeast TFs is consistent with that obtained from structures, which encourages applying the present steps of surface identification in other biomedical problems that require similar information.

Methods

ASA prediction

This study adopts two cascading regressions to predict relative ASA (RSA) values. The first stage uses the PSSM-2SP (stands for position specific scoring matrix with two sub-properties) profile [26] to encode a protein sequence. The PSSM-2SP profile is an enhanced PSSM profile, which describes the likelihood of a particular residue substitution at a specific position based on evolutionary information [21]. The construction of the PSSM profile is achieved by first invoking the PSI-BLAST program [43] to the non-redundant (NR) database obtained from the NCBI. The PSSM-2SP profile adds more two accumulated profile values according to residue groups Charged_sel(K and D) and Tiny_sel(A and G). The resulting PSSM-2SP profile is rescaled to [0,1], using the following logistic function [44]:

where x is the raw value in the PSSM profile and x' is the value corresponding to x after rescaling. Finally, we add a terminal flag and format the profile into the vector representation with a window size w₁ (w₁ = 11 in our implementation). Figure 3 shows an example of encoding a residue to its corresponding PSSM-2SP form.

The second stage encodes a protein sequence based on neighboring solvent accessibility [26, 45]. The i-th residue in a protein sequence is represented as a 2w₂+1 dimensional vector v = (a_i-h, t_i-h, a_i-h₊₁, t_i-h+1, ..., a_i, t_i, ..., a_i+h, t_i+h, l), where a_iis the predicted RSA value of the i-th residue in the first regression, t_iis the terminal flag as either 1 (a null/terminal residue) or 0 (otherwise), l is the sequence length and w₂ = 2h+1 is window size (w₂ = 5 in our implementation).

The support vector regression (SVR) is used as the regression tool for both stages. The SVR is a kernel regression technique that constructs a model based on support vectors. This model expresses y as a function of v with several parameters:

where K() is the kernel function, and b and w_iare numerical parameters determined by minimizing the prediction error on training samples. The problem is to find the support vectors and determine parameters b and w_i, which can be solved by constrained quadratic optimization [46]. The LIBSVM package (version 2.86) [47] is used for SVR implementation in this study.

Surface identification

The employed ASA predictor makes predictions at the residue level. The predicted RSA value of each residue enables surface residues to be defined as those whose RSA values are equal to or larger than a threshold t. These identified surface residues are frequently scattered throughout the protein sequences. This work develops a process for generating a set of surface segments each of which is a consecutive sub-sequence of minimum length. Because a conjoint triad represents three continuous amino acids, these consecutive segments are more suitable than scattered surface residues for being encoded with conjoint triads.

Figure 4 depicts the process of surface identification. The present method uses a sliding window of size w to scan the protein sequence. A sliding window is identified as a surface window if it contains at least o surface residues. Finally, the predicted surface is the union of all surface windows. In this study, t and w are parameters to be set either by cross-validation or by the user, while o is suggested to be three according to the experiment results.

Feature encoding

Based on the design by Shen et al. [24], this work encodes each protein sequence as a feature vector by considering the frequencies of conjoint triads of that protein sequence. An amino acid triad regards is a unit of three continuous amino acids. Each PPI pair is thus encoded by concatenating the two feature vectors of the two individual proteins of that pair. The 20 amino acids are clustered into seven groups (Table 6) based on their dipoles and side chain volumes.

Table 6 Amino acid groups used herein.

Full size table

Figure 5 depicts the process of encoding a protein sequence. First, the protein sequence is transformed into a group sequence. This method then scans the predicted surface along the group sequence. Each scanned triad is counted in an occurrence vector, O, of which each element o_irepresents the number of the i-th type of triad observed in the predicted surface. The major contribution of this work is to ignore the occurrences of conjoint triads outside the predicted surface. The two vectors of both sequences of a pair of proteins are concatenated to form a 686-dimensional feature vector.

Relaxed variable kernel density estimator

The relaxed variable kernel density estimator (RVKDE) [25] is used as the classification tool for PPI prediction. A kernel density estimator is in fact an approximate probability density function. Let {s₁, s₂... s_n} be a set of sampling instances randomly and independently taken from the distribution governed by f_Xin the m-dimensional vector space. Then, with the RVKDE algorithm, the value of f_Xat point v is estimated as follows:

1)
;
2)
R(s_i) is the maximum distance between s_i and its ks nearest training instances;
3)
Γ(·) is the Gamma function [48];
4)
β and ks are parameters to be set either through cross-validation or by the user.

When using RVKDE to predict protein-protein interactions, two kernel density estimators are constructed to approximate the distribution of interacting and non-interacting protein pairs, respectively. A query protein pair (represented as the feature vector v) is predicted to the class that gives the maximum value among the two likelihood functions defined as follows:

where |S_j| is the number of class-j training instances, and (v) is the kernel density estimator corresponding to class-j training instances. In this study, j is either 'interacting' or 'non-interacting'. Current RVKDE implementation includes only a limited number, denoted by kt, of the nearest class-j training instances of v while computing (v) in order to improve the efficiency of the predictor. The kt is also a parameter to be set either through cross-validation or by the user.

References

Ge H, Walhout AJM, Vidal M: Integrating 'omic' information: a bridge between genomics and systems biology. Trends Genet 2003, 19(10):551–560. 10.1016/j.tig.2003.08.009
Article CAS PubMed Google Scholar
Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 2001, 98(8):4569–4574. 10.1073/pnas.061034498
Article PubMed Central CAS PubMed Google Scholar
Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K, et al.: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002, 415(6868):180–183. 10.1038/415180a
Article CAS PubMed Google Scholar
Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld B, et al.: Proteome survey reveals modularity of the yeast cell machinery. Nature 2006, 440(7084):631–636. 10.1038/nature04532
Article CAS PubMed Google Scholar
Tong AHY, Drees B, Nardelli G, Bader GD, Brannetti B, Castagnoli L, Evangelista M, Ferracuti S, Nelson B, Paoluzi S, et al.: A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 2002, 295(5553):321–324. 10.1126/science.1064987
Article CAS PubMed Google Scholar
Han JDJ, Dupuy D, Bertin N, Cusick ME, Vidal M: Effect of sampling on topology predictions of protein-protein interaction networks. Nat Biotechnol 2005, 23(7):839–844. 10.1038/nbt1116
Article CAS PubMed Google Scholar
Hart GT, Ramani AK, Marcotte EM: How complete are current yeast and human protein-interaction networks? Genome Biol 2006, 7(11):120. 10.1186/gb-2006-7-11-120
Article PubMed Central PubMed Google Scholar
Shoemaker BA, Panchenko AR: Deciphering protein-protein interactions. Part II. Computational methods to predict protein and domain interaction partners. PLoS Comput Biol 2007, 3(4):e43. 10.1371/journal.pcbi.0030043
Article PubMed Central PubMed Google Scholar
Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO: Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc Natl Acad Sci USA 1999, 96(8):4285–4288. 10.1073/pnas.96.8.4285
Article PubMed Central CAS PubMed Google Scholar
Aloy P, Russell RB: InterPreTS: protein Interaction Prediction through Tertiary Structure. Bioinformatics 2003, 19(1):161–162. 10.1093/bioinformatics/19.1.161
Article CAS PubMed Google Scholar
Ogmen U, Keskin O, Aytuna AS, Nussinov R, Gursoy A: PRISM: protein interactions by structural matching. Nucleic Acids Res 2005, 33: W331-W336. 10.1093/nar/gki585
Article PubMed Central CAS PubMed Google Scholar
Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA: Protein interaction maps for complete genomes based on gene fusion events. Nature 1999, 402(6757):86–90. 10.1038/47056
Article CAS PubMed Google Scholar
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D: Detecting protein function and protein-protein interactions from genome sequences. Science 1999, 285(5428):751–753. 10.1126/science.285.5428.751
Article CAS PubMed Google Scholar
Huang TW, Tien AC, Lee YCG, Huang WS, Lee YCG, Peng CL, Tseng HH, Kao CY, Huang CYF: POINT: a database for the prediction of protein-protein interactions based on the orthologous interactome. Bioinformatics 2004, 20(17):3273–3276. 10.1093/bioinformatics/bth366
Article CAS PubMed Google Scholar
Espadaler J, Romero-Isart O, Jackson RM, Oliva B: Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships. Bioinformatics 2005, 21(16):3360–3368. 10.1093/bioinformatics/bti522
Article CAS PubMed Google Scholar
Valencia A, Pazos F: Computational methods for the prediction of protein interactions. Curr Opin Struct Biol 2002, 12(3):368–373. 10.1016/S0959-440X(02)00333-0
Article CAS PubMed Google Scholar
Ben-Hur A, Noble WS: Kernel methods for predicting protein-protein interactions. Bioinformatics 2005, 21: I38-I46. 10.1093/bioinformatics/bti1016
Article CAS PubMed Google Scholar
Chen XW, Liu M: Prediction of protein-protein interactions using random decision forest framework. Bioinformatics 2005, 21(24):4394–4400. 10.1093/bioinformatics/bti721
Article CAS PubMed Google Scholar
Martin S, Roe D, Faulon JL: Predicting protein-protein interactions using signature products. Bioinformatics 2005, 21(2):218–226. 10.1093/bioinformatics/bth483
Article CAS PubMed Google Scholar
Chou KC, Cai YD: Predicting protein-protein interactions from sequences in a hybridization space. J Proteome Res 2006, 5(2):316–322. 10.1021/pr050331g
Article CAS PubMed Google Scholar
Pitre S, Dehne F, Chan A, Cheetham J, Duong A, Emili A, Gebbia M, Greenblatt J, Jessulat M, Krogan N, et al.: PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs. BMC Bioinformatics 2006., 7: 10.1186/1471-2105-7-365
Google Scholar
Shen JW, Zhang J, Luo XM, Zhu WL, Yu KQ, Chen KX, Li YX, Jiang HL: Predicting protein-protein interactions based only on sequences information. Proc Natl Acad Sci USA 2007, 104(11):4337–4341. 10.1073/pnas.0607879104
Article PubMed Central CAS PubMed Google Scholar
Guo YZ, Yu LZ, Wen ZN, Li ML: Using support vector machine combined with auto covariance to predict proteinprotein interactions from protein sequences. Nucleic Acids Res 2008, 36(9):3025–3030. 10.1093/nar/gkn159
Article PubMed Central CAS PubMed Google Scholar
Shen JW, Zhang J, Luo XM, Zhu WL, Yu KQ, Chen KX, Li YX, Jiang HL: Predictina protein-protein interactions based only on sequences information. Proceedings of the National Academy of Sciences of the United States of America 2007, 104(11):4337–4341. 10.1073/pnas.0607879104
Article PubMed Central CAS PubMed Google Scholar
Oyang YJ, Hwang SC, Ou YY, Chen CY, Chen ZW: Data classification with radial basis function networks based on a novel kernel density estimation algorithm. IEEE Transactions on Neural Networks 2005, 16(1):225–236. 10.1109/TNN.2004.836229
Article PubMed Google Scholar
Chang DTH, Huang HY, Syu YT, Wu CP: Real value prediction of protein solvent accessibility using enhanced PSSM features. BMC Bioinformatics 2008, 9(Suppl 12):S12. 10.1186/1471-2105-9-432
Article PubMed Central PubMed Google Scholar
Kirchmair J, Markt P, Distinto S, Schuster D, Spitzer GM, Liedl KR, Langer T, Wolber G: The Protein Data Bank (PDB), Its Related Services and Software Tools as Key Components for In Silico Guided Drug Discovery. Journal of Medicinal Chemistry 2008, 51(22):7021–7040. 10.1021/jm8005977
Article CAS PubMed Google Scholar
Dohkan S, Koike A, Takagi T: Improving the Performance of an SVM-Based Method for Predicting Protein-Protein Interactions. In Silico Biol 2006, 6: 515–529.
CAS PubMed Google Scholar
Zhu J, Zhang B, Smith EN, Drees B, Brem RB, Kruglyak L, Bumgarner RE, Schadt EE: Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nat Genet 2008, 40(7):854–861. 10.1038/ng.167
Article PubMed Central CAS PubMed Google Scholar
Nielsen J, Oliver S: The next wave in metabolome analysis. Trends Biotechnol 2005, 23(11):544–546. 10.1016/j.tibtech.2005.08.005
Article CAS PubMed Google Scholar
Rajagopalan D, Agarwal P: Inferring pathways from gene lists using a literature-derived network of biological relationships. Bioinformatics 2005, 21(6):788–793. 10.1093/bioinformatics/bti069
Article CAS PubMed Google Scholar
Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS, Hester ET, Jia YK, Juvik G, Roe T, Schroeder M, et al.: SGD: Saccharomyces Genome Database. Nucleic Acids Research 1998, 26(1):73–79. 10.1093/nar/26.1.73
Article PubMed Central CAS PubMed Google Scholar
Zhu J, Zhang MQ: SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics 1999, 15(7–8):607–611. 10.1093/bioinformatics/15.7.607
Article CAS PubMed Google Scholar
Wingender E, Chen X, Fricke E, Geffers R, Hehl R, Liebich I, Krull M, Matys V, Michael H, Ohnhauser R, et al.: The TRANSFAC system on gene expression regulation. Nucleic Acids Research 2001, 29(1):281–283. 10.1093/nar/29.1.281
Article PubMed Central CAS PubMed Google Scholar
Mewes HW, Frishman D, Guldener U, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, Munsterkotter M, Rudd S, Weil B: MIPS: a database for genomes and protein sequences. Nucleic Acids Research 2002, 30(1):31–34. 10.1093/nar/30.1.31
Article PubMed Central CAS PubMed Google Scholar
Bairoch A, Consortium U, Bougueleret L, Altairac S, Amendolia V, Auchincloss A, Argoud-Puy G, Axelsen K, Baratin D, Blatter MC, et al.: The Universal Protein Resource (UniProt) 2009. Nucleic Acids Research 2009, 37: D169-D174. 10.1093/nar/gkn664
Article Google Scholar
Kabsch W, Sander C: Dictionary of Protein Secondary Structure - Pattern-Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers 1983, 22(12):2577–2637. 10.1002/bip.360221211
Article CAS PubMed Google Scholar
Nelson DL, Lehninger AL, Cox MM: Lehninger principles of biochemistry. 5th edition. New York: W.H. Freeman; 2008.
Google Scholar
Kim WK, Ison JC: Survey of the geometric association of domain-domain interfaces. Proteins 2005, 61(4):1075–1088. 10.1002/prot.20693
Article CAS PubMed Google Scholar
Kim WK, Henschel A, Winter C, Schroeder M: The many faces of protein-protein interactions: A compendium of interface geometry. Plos Computational Biology 2006, 2(9):e124. 10.1371/journal.pcbi.0020124
Article PubMed Central PubMed Google Scholar
Lise S, Walker-Taylor A, Jones DT: Docking protein domains in contact space. Bmc Bioinformatics 2006., 7: 10.1186/1471-2105-7-310
Google Scholar
Jones S, Thornton JM: Principles of protein-protein interactions. Proceedings of the National Academy of Sciences of the United States of America 1996, 93(1):13–20. 10.1073/pnas.93.1.13
Article PubMed Central CAS PubMed Google Scholar
Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389–3402. 10.1093/nar/25.17.3389
Article PubMed Central CAS PubMed Google Scholar
Nam JW, Shin KR, Han JJ, Lee Y, Kim VN, Zhang BT: Human microRNA prediction through a probabilistic co-learning model of sequence and structure. Nucleic Acids Res 2005, 33(11):3570–3581. 10.1093/nar/gki668
Article PubMed Central CAS PubMed Google Scholar
Nguyen MN, Rajapakse JC: Two-stage support vector regression approach for predicting accessible surface areas of amino acids. Proteins 2006, 63(3):542–550. 10.1002/prot.20883
Article CAS PubMed Google Scholar
Witten IH, Frank E: Data mining: practical machine learning tools and techniques. 2nd edition. Amsterdam; Boston, MA: Morgan Kaufman; 2005.
Google Scholar
Chang CC, Lin CJ: LIBSVM: a library for support vector machines.2001. [http://www.csie.ntu.edu.tw/~cjlin/libsvm]
Google Scholar
Artin E: The Gamma Function. New York: Holt, Rinehart and Winston; 1964.
Google Scholar

Download references

Acknowledgements

The authors would like to thank the National Science Council of the Republic of China, Taiwan, for financially supporting this research under Contract Nos. NSC 97-2627-P-001-002, NSC 96-2320-B-006-027-MY2 and NSC 96-2221-E-006-232-MY2. Ted Knoy is appreciated for his editorial assistance.

This article has been published as part of BMC Bioinformatics Volume 11 Supplement 1, 2010: Selected articles from the Eighth Asia-Pacific Bioinformatics Conference (APBC 2010). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/11?issue=S1.

Author information

Authors and Affiliations

Department of Electrical Engineering, National Cheng Kung University, Tainan, 70101, Taiwan
Darby Tien-Hao Chang, Yu-Tang Syu & Po-Chang Lin

Authors

Darby Tien-Hao Chang
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Tang Syu
View author publications
You can also search for this author in PubMed Google Scholar
Po-Chang Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Darby Tien-Hao Chang.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

Author DTHC designed the methodology and conceived of this study. YTS and BCL designed the experiments and performed all calculations and analyses. All authors have read and approved this manuscript.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Chang, D.TH., Syu, YT. & Lin, PC. Predicting the protein-protein interactions using primary structures with predicted protein surface. BMC Bioinformatics 11 (Suppl 1), S3 (2010). https://doi.org/10.1186/1471-2105-11-S1-S3

Download citation

Published: 18 January 2010
DOI: https://doi.org/10.1186/1471-2105-11-S1-S3

Selected articles from the Eighth Asia-Pacific Bioinformatics Conference (APBC 2010)

Predicting the protein-protein interactions using primary structures with predicted protein surface