- Research article
- Open Access
Interaction prediction and classification of PDZ domains
© Kalyoncu et al; licensee BioMed Central Ltd. 2010
- Received: 14 April 2010
- Accepted: 30 June 2010
- Published: 30 June 2010
PDZ domain is a well-conserved, structural protein domain found in hundreds of signaling proteins that are otherwise unrelated. PDZ domains can bind to the C-terminal peptides of different proteins and act as glue, clustering different protein complexes together, targeting specific proteins and routing these proteins in signaling pathways. These domains are classified into classes I, II and III, depending on their binding partners and the nature of bonds formed. Binding specificities of PDZ domains are very crucial in order to understand the complexity of signaling pathways. It is still an open question how these domains recognize and bind their partners.
The focus of the current study is two folds: 1) predicting to which peptides a PDZ domain will bind and 2) classification of PDZ domains, as Class I, II or I-II, given the primary sequences of the PDZ domains. Trigram and bigram amino acid frequencies are used as features in machine learning methods. Using 85 PDZ domains and 181 peptides, our model reaches high prediction accuracy (91.4%) for binary interaction prediction which outperforms previously investigated similar methods. Also, we can predict classes of PDZ domains with an accuracy of 90.7%. We propose three critical amino acid sequence motifs that could have important roles on specificity pattern of PDZ domains.
Our model on PDZ interaction dataset shows that our approach produces encouraging results. The method can be further used as a virtual screening technique to reduce the search space for putative candidate target proteins and drug-like molecules of PDZ domains.
- Cystic Fibrosis Transmembrane Conductance Regulator
- Area Under Curve
- Fluorescence Polarization
- Interaction Prediction
- Feature Vector Space
Protein-protein interactions play fundamental roles in signal transduction, formation of functional protein complexes and protein modification . One of the most common protein interaction domains in the cell is PDZ domain which is a central signaling protein of most species [2–4]. The PDZ domains, among other nearly 70 distinct recognition domains, are crucial because they are involved in development of multi-cellular organisms by constructing cell polarity, coordination of intercellular signaling system and directing the specificity of signaling proteins . They consist of 80 to 90 amino acids and have a compact globular fold composed of a core of six β strands (βA - βF) and two α helices (αA, αB). By binding the C-terminal motifs of their target proteins, PDZ domains target, cluster and route these proteins . However, some PDZ domains also can bind to the internal motifs of target proteins, lipids and other PDZ domains [3, 7].
Although PDZ domains show selectivity toward their target ligands, they also display promiscuity, binding to more than one ligand, and degenerate specificity [18–21], so interaction prediction of these domains can be challenging. Several studies aimed to classify and predict interaction specificity of PDZ domains that could save time-consuming and expensive experiments. Chen et al.  predicted PDZ domain-peptide interactions from primary sequences of PDZ domains and peptides by using a statistical model and reported an area under curve (AUC) value of 0.87 for extrapolations to both novel mouse peptides and PDZ domains. Bezprozvanny and Maximov  used a classification method based on the two critical positions of 249 PDZ domains and they presented 25 different classes of PDZ domains. Stiffler et al.  tried to characterize the binding selectivity of PDZ domains by training multi-domain selectivity model for 157 mouse PDZ domains with respect to 217 peptides and they indicated that PDZ domains are distributed throughout the selectivity space contrary to discrete specificity classes. Schillinger et al.  used a new approach, Domain Interaction Footprint (DIF), to predict binding peptides of SH3 and PDZ domains by using only the sequence of the peptides and they reported an AUC value of 0.89 for PDZ multi-domain model by using the sequence information of binding and non-binding peptides of four different PDZ domains. Tonikian et al.  constructed a specificity map consisting of 16 unique specificity classes for 72 PDZ domains and this lead to the prediction of PDZ domain interactions. Wiedemann et al.  tried to quantify specificity of three PDZ domains by relating the last four C-terminal motifs of their ligands to the corresponding dissociation constants which can provide selectivity pattern of PDZ domains and design of super-binding peptides. Eo et al.  used an SVM classifier by adapting amino acid contact matrices and physiochemical distance matrix as a feature encoding in order to identify PDZ domain ligand interactions.
In this study, we propose a method to predict PDZ domain-peptide interactions by using only the sequence information of PDZ domains and ligands. In order to construct a numerical feature vector for each interaction, trigram and bigram frequencies of each primary sequence of PDZ domains and peptides are calculated. We obtain a high prediction performance (accuracy of 91.4% and AUC of 0.97 for trigram model) distinguishing between binding and non-binding peptides of PDZ domains. We make use of the most commonly used classifiers (SVM, Nearest Neighbor, Naïve Bayes, J48, Random Forest) and find Random Forest classifier with the best prediction accuracy. Moreover, we show that our method can be efficiently used to distinguish between Class I, Class II and Class I-II PDZs (both binding to Class I and Class II peptides) with an accuracy of 90.7% and AUC of 0.90 for trigram model.
For interaction prediction part, a positive (binding) and a negative (non-binding) dataset are needed in our machine learning model. The PDZ interaction dataset is retrieved from the study of Stiffler et al., which is composed of interaction data of 85 mouse PDZ domains with respect to 217 mouse genome-encoded peptides [23, 24]. They used the combination of protein microarrays and fluorescence polarization (FP) methods to identify biological interactions of PDZ domains. In the current study, only binding and non-binding information that were confirmed by FP is used as the training set due to the fidelity of FP. After selection of FP confirmed interactions, we obtained 731 binding and 1361 non-binding interactions between 85 PDZ domains and 181 peptides (See additional file 1: Table S1 for PDZ interaction data).
An independent validation dataset is also used in interaction prediction part in order to test the predictive performance of our model. The validation dataset is extracted from the previous study of Stiffler et al. and it is composed of 27 binding and 62 non-binding interactions of 16 PDZ domains and 20 peptides  (See additional file 2: Table S2 for validation interaction data).
For class prediction part, 86 PDZ domains are categorized, resulting in 45 Class I, 20 Class II, 21 Class I-II. These are retrieved from our interaction dataset and PDZBase  by looking at their interactions with different classes of peptides. PDZ domains are annotated as Class I and Class II according to the C terminus sequence of the interacting peptides, [Ser/Thr-X-Φ-COOH] for Class I peptides and [Φ-X-Φ-COOH] for Class II peptides, respectively. Class I-II PDZ domains are determined if they bind to both Class I and Class II peptides. (See additional file 2: Table S3 for class data).
In order to be consistent in our interaction prediction model, we took the last 10 residues of each peptide sequence due to the selection specificities of PDZ domains up to -10 positions of peptides. The sequence data of PDZ domains and peptides can be seen in additional file 2: Table S4 and Table S5, respectively.
Seven amino acid classes used in our model.
Ala, Gly, Val
Ile, Leu, Phe, Pro
Tyr, Met, Thr, Ser
His, Asn, Gln, Trp
1.0 < Dip. < 2.0
2.0 < Dip. < 3.0
Here, X is the feature vector space of the PDZ sequence, and each feature xi represents the frequency of each trigram where i = 1, 2,...., 343 or each bigram where i = 1, 2,..., 49, Y is the feature vector space of peptide sequence, each feature yi represents the frequency of each trigram or bigram, and W is the corresponding label that contains binary data (w1: binding, w2: non-binding). Thus, a 686 dimensional vector for trigram part and a 98 dimensional vector for bigram part are constructed to represent each binding/non-binding interaction.
For the class prediction part, the peptide sequences are discarded and only the sequences of PDZ domains are used to construct the feature vector space, because peptide sequences are used as the label of the dataset. Therefore, a 343 dimensional vector space for trigram part and 49 for bigram part with three labels (w1: ClassI, w2: ClassII, w3: ClassI-II) are built to represent each class of PDZ domains.
There are several machine learning approaches to predict domain interactions [30–32]. We chose five classifiers, SVM (Support Vector Machine), Nearest Neighbor, Naïve Bayes, J48 and Random Forest which have been commonly used in protein-protein interaction prediction problems. In SVM algorithm, feature vectors are non-linearly mapped on a high dimensional feature space and a set of hyperplanes are constructed to be used for classification or regression . The simplest one among used classifiers is Nearest Neighbor which classifies instances according to their closeness to the training examples . The basic idea behind Naïve Bayes is to predict the class of an instance by learning conditional probability of each attribute . J48, also known as C4.5 grows an initial tree by using divide-and-conquer algorithm and then rank test instances . Random Forest developed by Breiman  generates many classification trees simultaneously where each node uses a random subset of the features and outputs the classification based on majority voting over all trees in the forest. After comparison of these different classifiers by using Weka 3.6 , Random Forest algorithm was found to outperform other classifiers which were previously shown to be the best classification algorithm (e.g. SVM) .
Each classifier is trained by using a 10-fold cross-validation. Cross-validation measures the prediction performance in a stable way by leaving out a few instances (about 10% for 10-fold cross-validation) to be used as the test set during the training process. The exclusion is repeated until every instance in the dataset is once among those left-outs. In comparison to using an independent test set, cross-validation provides less bias and a better predictive performance. Parameter selection for each classifier is done by varying their parameters step-by-step and their accuracy and AUC (Area Under the ROC Curve) values are compared to obtain the best parameters with the highest performance (See additional file 2: Table S7 for parameter values used for classifier trainings). At the end, the classifier with the best performance is chosen as the model classifier.
The number of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) are used to calculate true positive rate (also named as recall or sensitivity), TPR = TP/(TP + FN), false positive rate, FPR = FP/(FP + TN) and precision, P = TP/(TP + FP). We measure the performance of each classifier by using a ROC curve which is drawn as TPR (Sensitivity) versus FPR (1-Specificity). The area under the ROC curve, referred as AUC, represents the predictive power: while a random predictive model has an AUC = 0.5, a perfect one has an AUC = 1.0 so that a larger AUC shows a better predictive power. However, ROC curves can sometimes be misleading while dealing with highly unbalanced datasets. Therefore, Precision versus Recall (PR) curves are also constructed to interpret the performance of models in a more informative manner . PR curves show how many true positives are likely to be obtained in a prediction system.
Interaction prediction model
Random forest is chosen to build our model due to its highest AUC and accuracy values (See additional file 2: Figure S1 and Figure S2 for comparison of classifiers for trigram and bigram models, respectively). To optimize the parameters of Random Forest algorithm, we evaluate the effect of changes in parameters on its prediction performance by measuring out-of-bag (OOB) error rate of each model tree. There are two parameters: number of trees (numTree) and number of randomly selected features (numFeature). The number of features to be used in random selection is rather sensitive and it must be much lower than the total number of features . On the other hand, the changes in the number of trees can result only in small decreases in OOB error rate. The lowest OOB error rate is obtained when numTree = 200 and numFeature = 30 (See additional file 2: Figure S5 for parameter selection graph). Also, resampling is applied as a pre-process in order to take care of our imbalanced dataset which could be overwhelmed by the major class otherwise and to derive robust estimates of standard errors. Resampling is a supervised filter producing a random subset of the dataset. In our study, class distribution is left as-is and sampling is done with replacement by adjusting the parameters.
Prediction results for interaction prediction of PDZ domains for both trigram and bigram models.
Training set (10-fold cross validation)
Class prediction model
There is a multi-classification problem for class prediction because we do not only want to discriminate between PDZ domains which bind to Class I or Class II, but also we want to Class I-II domains whose interaction specificity reflects the promiscuous pattern of PDZ domains. All five classifiers are trained on these classification datasets and again Random Forest gives the best predictive performance with the highest AUC and accuracy values (See additional file 2: Figure S3 and Figure S4 for comparison of classifiers for trigram and bigram models, respectively).
Prediction results for class prediction of PDZ domains for both trigram and bigram models.
ClassI, ClassII, Class I-II*
In order to make the resulting model faster and extract important features, dimensionality of our dataset is reduced by using feature selection methods . Selection of important features can help us to get rid of redundant and/or irrelevant data. As the first step of feature selection, a correlation-based feature subset selection method is used to evaluate the individual performance of each feature for predicting labels (wi) as well as the level of intercorrelation among all features. Successful feature subsets include features highly correlated with the label, but uncorrelated with each other . In the second step of feature selection part, several search algorithms are performed and results of all of them are considered in order to reduce features carefully. The search methods that are used are presented in additional file 2: Table S6.
Feature selection (dimension reduction) is applied to both trigram and bigram models because we want to observe important common features of both models. For the trigram model, we obtained 23 features for PDZ domain and 23 features for peptides to be used in interaction prediction part. Also, 53 trigram features, for the classification part, are obtained (Data is not shown).
Prediction results after feature reduction.
Critical sequence motifs
As seen in Figure 3, characteristic GLGF repeat of PDZ domains was determined by extracting sequence motif of "12" between βA-βB loop and αB helix. Other two highly occurring sequence motifs were positioned at the end of the αB ("25") and at the loop between αA and βD ("16"). When these sequence motifs are displayed on the 3D structure of PDZ domains, motif "25" is positioned near the binding groove (at the end of the αB), while motif "16" is positioned far from the binding groove (at the αA-βD loop) (Figure 3).
Extracted motif on αB helix could function in specificity of PDZ domains. Songyang et al. investigated the importance of αB helix on peptide selectivity of PDZ domains by showing high correlation between first residue in the αB helix and peptide position -2 . Below, we discuss some specific PDZ domains:
The specific interaction property of α1-syntrophin PDZ domain was investigated by Schultz et al. and they found that Leu 14, Gly 15 and Ile 16 showed a large chemical shift upon binding of ligand . PDZ domain of α1-syntrophin forms hydrophobic pocket consisting of Leu 14, Ile 18 and Leu 71 to bury the side chain of Val -2 of the peptide. Motif "12" corresponds to Gly 15, Ile 16 and "5" of motif "25" corresponds to Leu 71 which is an important part of the hydrophobic pocket.
First PDZ domain of NHERF1 plays important role in cellular localization by binding to the cystic fibrosis transmembrane conductance regulator (CFTR) . Leu 0 of the ligand forms hydrophobic contact with Phe 26 and Ile 79 and makes H-bonds with Gly 25, Phe 26 and Arg 80. These important residues were also extracted by using our method: while motif "12" in βB corresponds to Gly 25, Phe 26, motif "25" in αB exactly corresponds to Ile 79, Arg 80.
Pan et al. tried to elucidate structural basis of binding pattern of Harmonin(2/3) and found that carboxyl group of cad 23 ligand forms hydrogen bonds with Leu 222, Glu 223, Cys 224 (GLGF motif) and is stabilized by Lys 279 . These important residues of Harmonin were also observed in our motifs as seen from Figure 3 (PDZ2 domain of Harmonin includes residues 208-299, but in the 3D structure it is between residues 9-100).
The carboxyl group of ligand forms hydrogen bonds with Ile 33, Gly 34 and Ile 35 of Pick1 PDZ domain . While Gly 34 and Ile 35 constitute motif "12", we observed motif "24" on αB helix instead of motif "25".
Gianni et al. investigated allosteric property of PTP-BL(2/5) domain by using structural and dynamical methods and found that binding is regulated by long range interactions which showed correlation with ligand-induced structural rearrangements . There is a detectable conformational change, dominantly occurring in αB-βB interface, L1 loop and hydrophobic core, upon ligand binding to PTP-BL domain. Plasticity and selectivity of PTP-BL domain are usually determined by reorientation of alpha B helix. Amides of Leu 25, Gly 26 and Ile 27 stabilize the charge of C-terminus of the ligand and there is a hydrophobic contact between C-terminal peptide valine and Leu 85, Val 82 positions. In our study, motif "12" in βB corresponds to the Gly 26, Ile 27 and "5" of motif "25" in αB corresponds to Leu 85 as seen Figure 3.
Our results show that our model can be used as a stable interaction prediction model of PDZ domains with higher accuracy than other similar methods [22, 24]. We also proposed a classification model for PDZ domains based on the general classification pattern unlike other methods [15, 25] and the result with high accuracy indicates that our classification model highly correlates with the current classification pattern of PDZ domains. Although PDZ domains show highly selective interaction pattern, there are some PDZ domains which bind to both Class I and Class II peptides. We named these promiscuous PDZ domains as Class I-II PDZ domains and obtained a very high performance when discriminating them among other classes. Therefore, it is concluded that there may be some characteristic pattern in the structure of Class I-II PDZ domains that provide its promiscuous property.
Some important characteristic features of PDZ domains were extracted. After selection of most occurring features along the same secondary structure region of PDZ domains, we obtained three critical sequence motifs. Two of them ("12" and "25") were previously shown to have an important role in ligand interaction. Motif "12" is on conserved GLGF repeat and located between βA-βB loop and motif "25" is located on αB which is one of the parts of binding pocket. There is not any previous study investigating the importance of motif "16" which is positioned on αA-βD loop. After multiple alignment of PDZ domain sequences, it was observed that motif "16" on αA-βD loop is conserved as shown in another study . Although this motif does not locate near the canonical binding pocket, it could be involved in dimerization of PDZ domains which is a common characteristic for some PDZ domains [48–51]. In the study of Im et al., it was shown that the dimeric interface of GRIP1 PDZ6 dimer include a βA strand and αA-βD loop from each domain, and motif "16" is located on this αA-βD loop of GRIP1 PDZ6 domain. Also, it could have an allosteric effect regulating the binding specificity of PDZ domains . However, further study has to be performed in order to reveal biological importance of this motif.
This study has two intercorrelated aims: prediction of PDZ domain-peptide interactions, and classification of PDZ domains as Class I, II and I-II. A statistical learning model was constructed by using interaction dataset of PDZ domains (consist of 85 PDZ domains and corresponding 181 peptides). To convert primary sequence information into numerical feature input, trigram and bigram amino acid frequencies were calculated for each instance. We predicted binary interactions and classes of PDZ domains with accuracies of 91.4% and 90.7%, respectively. After feature extraction, three critical amino acid sequence motifs were proposed to have significant roles on PDZ domain specificity. With these highly encouraging results, this study could be an important step in the automated prediction of PDZ domain interactions.
The discovery of features within primary sequences of known protein interaction pairs could be subsequently developed by using other features (binding affinities, secondary/tertiary structure, etc.) in the learning model. Further improvements on these lines may generate a powerful computational virtual screening technique that significantly reduces the search space for putative candidate target proteins of PDZ domains.
This project has been supported by TUBITAK (Research Grant No 109T343 and 109E207).
- Keskin Z, Gursoy A, Ma B, Nussinov R: Principles of protein-protein interactions: What are the preferred ways for proteins to interact? Chemical Reviews 2008, 108(4):1225–1244. 10.1021/cr040409xView ArticlePubMedGoogle Scholar
- Dev KK: PDZ domain protein-protein interactions: A case study with PICK1. Current Topics in Medicinal Chemistry 2007, 7(1):3–20. 10.2174/156802607779318343View ArticlePubMedGoogle Scholar
- Nourry C, Grant SG, Borg JP: PDZ domain proteins: plug and play! Sci STKE 2003, 2003(179):RE7. 10.1126/stke.2003.179.re7PubMedGoogle Scholar
- Jemth P, Gianni S: PDZ domains: folding and binding. Biochemistry 2007, 46(30):8701–8708. 10.1021/bi7008618View ArticlePubMedGoogle Scholar
- Dev KK: Making protein interactions druggable: Targeting PDZ domains. Nature Reviews Drug Discovery 2004, 3(12):1047–1056. 10.1038/nrd1578View ArticlePubMedGoogle Scholar
- van Ham M, Hendriks W: PDZ domains-glue and guide. Mol Biol Rep 2003, 30(2):69–82. 10.1023/A:1023941703493View ArticlePubMedGoogle Scholar
- Hung AY, Sheng M: PDZ domains: structural modules for protein complex assembly. J Biol Chem 2002, 277(8):5699–5702. 10.1074/jbc.R100065200View ArticlePubMedGoogle Scholar
- Basdevant N, Weinstein H, Ceruso M: Thermodynamic basis for promiscuity and selectivity in protein-protein interactions: PDZ domains, a case study. J Am Chem Soc 2006, 128(39):12766–12777. 10.1021/ja060830yView ArticlePubMedPubMed CentralGoogle Scholar
- Doyle DA, Lee A, Lewis J, Kim E, Sheng M, MacKinnon R: Crystal structures of a complexed and peptide-free membrane protein-binding domain: molecular basis of peptide recognition by PDZ. Cell 1996, 85(7):1067–1076. 10.1016/S0092-8674(00)81307-0View ArticlePubMedGoogle Scholar
- Gerek ZN, Keskin O, Ozkan SB: Identification of specificity and promiscuity of PDZ domain interactions through their dynamic behavior. Proteins 2009, 77(4):796–811. 10.1002/prot.22492View ArticlePubMedGoogle Scholar
- Fanning AS, Anderson JM: Protein-protein interactions: PDZ domain networks. Curr Biol 1996, 6(11):1385–1388. 10.1016/S0960-9822(96)00737-3View ArticlePubMedGoogle Scholar
- Daniels DL, Cohen AR, Anderson JM, Brunger AT: Crystal structure of the hCASK PDZ domain reveals the structural basis of class II PDZ domain target recognition. Nat Struct Biol 1998, 5(4):317–325. 10.1038/nsb0498-317View ArticlePubMedGoogle Scholar
- Niv MY, Weinstein H: A flexible docking procedure for the exploration of peptide binding selectivity to known structures and homology models of PDZ domains. Journal of the American Chemical Society 2005, 127(40):14072–14079. 10.1021/ja054195sView ArticlePubMedGoogle Scholar
- Gerek ZN, Ozkan SB: A flexible docking scheme to explore the binding selectivity of PDZ domains. Protein Science 2010, 19(5):914–928.PubMedPubMed CentralGoogle Scholar
- Bezprozvanny I, Maximov A: Classification of PDZ domains. FEBS Lett 2001, 509(3):457–462. 10.1016/S0014-5793(01)03214-8View ArticlePubMedGoogle Scholar
- Song E, Gao S, Tian R, Ma S, Huang H, Guo J, Li Y, Zhang L, Gao Y: A high efficiency strategy for binding property characterization of peptide-binding domains. Mol Cell Proteomics 2006, 5(8):1368–1381. 10.1074/mcp.M600072-MCP200View ArticlePubMedGoogle Scholar
- Songyang Z, Fanning AS, Fu C, Xu J, Marfatia SM, Chishti AH, Crompton A, Chan AC, Anderson JM, Cantley LC: Recognition of unique carboxyl-terminal motifs by distinct PDZ domains. Science 1997, 275(5296):73–77. 10.1126/science.275.5296.73View ArticlePubMedGoogle Scholar
- Ferrer M, Maiolo J, Kratz P, Jackowski JL, Murphy DJ, Delagrave S, Inglese J: Directed evolution of PDZ variants to generate high-affinity detection reagents. Protein Eng Des Sel 2005, 18(4):165–173. 10.1093/protein/gzi018View ArticlePubMedGoogle Scholar
- Kang BS, Cooper DR, Devedjiev Y, Derewenda U, Derewenda ZS: Molecular roots of degenerate specificity in syntenin's PDZ2 domain: reassessment of the PDZ recognition paradigm. Structure 2003, 11(7):845–853. 10.1016/S0969-2126(03)00125-4View ArticlePubMedGoogle Scholar
- Reina J, Lacroix E, Hobson SD, Fernandez-Ballester G, Rybin V, Schwab MS, Serrano L, Gonzalez C: Computer-aided design of a PDZ domain to recognize new target sequences. Nat Struct Biol 2002, 9(8):621–627.PubMedGoogle Scholar
- Wiedemann U, Boisguerin P, Leben R, Leitner D, Krause G, Moelling K, Volkmer-Engert R, Oschkinat H: Quantification of PDZ domain specificity, prediction of ligand affinity and rational design of super-binding peptides. J Mol Biol 2004, 343(3):703–718. 10.1016/j.jmb.2004.08.064View ArticlePubMedGoogle Scholar
- Chen JR, Chang BH, Allen JE, Stiffler MA, MacBeath G: Predicting PDZ domain-peptide interactions from primary sequences. Nat Biotechnol 2008, 26(9):1041–1045. 10.1038/nbt.1489View ArticlePubMedPubMed CentralGoogle Scholar
- Stiffler MA, Chen JR, Grantcharova VP, Lei Y, Fuchs D, Allen JE, Zaslavskaia LA, MacBeath G: PDZ domain binding selectivity is optimized across the mouse proteome. Science 2007, 317(5836):364–369. 10.1126/science.1144592View ArticlePubMedPubMed CentralGoogle Scholar
- Schillinger C, Boisguerin P, Krause G: Domain Interaction Footprint: a multi-classification approach to predict domain-peptide interactions. Bioinformatics 2009, 25(13):1632–1639. 10.1093/bioinformatics/btp264View ArticlePubMedGoogle Scholar
- Tonikian R, Zhang Y, Sazinsky SL, Currell B, Yeh JH, Reva B, Held HA, Appleton BA, Evangelista M, Wu Y, et al.: A specificity map for the PDZ domain family. PLoS Biol 2008, 6(9):e239. 10.1371/journal.pbio.0060239View ArticlePubMedPubMed CentralGoogle Scholar
- Eo HS, Kim S, Koo H, Kim W: A machine learning based method for the prediction of G protein-coupled receptor-binding PDZ domain proteins. Mol Cells 2009, 27(6):629–634. 10.1007/s10059-009-0091-2View ArticlePubMedGoogle Scholar
- Stiffler MA, Grantcharova VP, Sevecka M, MacBeath G: Uncovering quantitative protein interaction networks for mouse PDZ domains using protein microarrays. J Am Chem Soc 2006, 128(17):5913–5922. 10.1021/ja060943hView ArticlePubMedPubMed CentralGoogle Scholar
- Beuming T, Skrabanek L, Niv MY, Mukherjee P, Weinstein H: PDZBase: a protein-protein interaction database for PDZ-domains. Bioinformatics 2005, 21(6):827–828. 10.1093/bioinformatics/bti098View ArticlePubMedGoogle Scholar
- Shen J, Zhang J, Luo X, Zhu W, Yu K, Chen K, Li Y, Jiang H: Predicting protein-protein interactions based only on sequences information. Proc Natl Acad Sci USA 2007, 104(11):4337–4341. 10.1073/pnas.0607879104View ArticlePubMedPubMed CentralGoogle Scholar
- Bradford JR, Westhead DR: Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 2005, 21(8):1487–1494. 10.1093/bioinformatics/bti242View ArticlePubMedGoogle Scholar
- Chen XW, Liu M: Prediction of protein-protein interactions using random decision forest framework. Bioinformatics 2005, 21(24):4394–4400. 10.1093/bioinformatics/bti721View ArticlePubMedGoogle Scholar
- Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 2003, 302(5644):449–453. 10.1126/science.1087361View ArticlePubMedGoogle Scholar
- Cortes C, Vapnik V: Support-Vector Networks. Machine Learning 1995, 20(3):273–297.Google Scholar
- Brazdil PB, Soares C, Da Costa JP: Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. Machine Learning 2003, 50(3):251–277. 10.1023/A:1021713901879View ArticleGoogle Scholar
- Friedman N, Geiger D, Goldszmidt M: Bayesian network classifiers. Machine Learning 1997, 29(2–3):131–163. 10.1023/A:1007465528199View ArticleGoogle Scholar
- Quinlan JR: C4.5: Programs for Machine Learning. San Mateo, CA, Morgan Kaufmann Publishers; 1993.Google Scholar
- Breiman L: Random forests. Machine Learning 2001, 45(1):5–32. 10.1023/A:1010933404324View ArticleGoogle Scholar
- Witten IH, Frank E: Data Mining: Practical machine learning tools and techniques. 2nd edition. Morgan Kaufmann, San Francisco; 2005.Google Scholar
- Qi Y, Bar-Joseph Z, Klein-Seetharaman J: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins 2006, 63(3):490–500. 10.1002/prot.20865View ArticlePubMedPubMed CentralGoogle Scholar
- Davis J, Goadrich M: The Relationship Between Precision-Recall and ROC Curves. Proceedings of the 23rd International Conference on Machine Learning (ICML): 2006 2006.Google Scholar
- Jain AK, Duin RPW, Mao JC: Statistical pattern recognition: A review. Ieee Transactions on Pattern Analysis and Machine Intelligence 2000, 22(1):4–37. 10.1109/34.824819View ArticleGoogle Scholar
- Hall MA, Smith LA: Feature subset selection: A correlation based filter approach. Progress in Connectionist-Based Information Systems, Vols 1 and 2 1998, 855–858.Google Scholar
- Schultz J, Hoffmuller U, Krause G, Ashurst J, Macias MJ, Schmieder P, Schneider-Mergener J, Oschkinat H: Specific interactions between the syntrophin PDZ domain and voltage-gated sodium channels. Nature Structural Biology 1998, 5(1):19–24. 10.1038/nsb0198-19View ArticlePubMedGoogle Scholar
- Karthikeyan S, Leung T, Ladias JAA: Structural basis of the Na+/H+ exchanger regulatory factor PDZ1 interaction with the carboxyl-terminal region of the cystic fibrosis transmembrane conductance regulator. Journal of Biological Chemistry 2001, 276(23):19683–19686. 10.1074/jbc.C100154200View ArticlePubMedGoogle Scholar
- Pan LF, Yan J, Wu L, Zhang MJ: Assembling stable hair cell tip link complex via multidentate interactions between harmonin and cadherin 23. Proceedings of the National Academy of Sciences of the United States of America 2009, 106(14):5575–5580. 10.1073/pnas.0901819106View ArticlePubMedPubMed CentralGoogle Scholar
- Pan L, Wu H, Shen C, Shi Y, Jin W, Xia J, Zhang M: Clustering and synaptic targeting of PICK1 requires direct interaction between the PDZ domain and lipid membranes. Embo Journal 2007, 26(21):4576–4587. 10.1038/sj.emboj.7601860View ArticlePubMedPubMed CentralGoogle Scholar
- Gianni S, Walma T, Arcovito A, Calosci N, Bellelli A, Engstrom A, Travaglini-Allocatelli C, Brunori M, Jemth P, Vuister GW: Demonstration of long-range interactions in a PDZ domain by NMR, kinetics, and protein engineering. Structure 2006, 14(12):1801–1809. 10.1016/j.str.2006.10.010View ArticlePubMedGoogle Scholar
- Wu JW, Yang YS, Zhang JH, Ji P, Du WJ, Jiang P, Xie DH, Huang HD, Wu M, Zhang GZ, et al.: Domain-swapped dimerization of the second PDZ domain of ZO2 may provide a structural basis for the polymerization of claudins. Journal of Biological Chemistry 2007, 282(49):35988–35999. 10.1074/jbc.M703826200View ArticlePubMedGoogle Scholar
- Im YJ, Park SH, Rho SH, Lee JH, Kang GB, Sheng M, Kim E, Eom SH: Crystal structure of GRIP1 PDZ6-peptide complex reveals the structural basis for class IIPDZ target recognition and PDZ domain-mediated multimerization. Journal of Biological Chemistry 2003, 278(10):8501–8507. 10.1074/jbc.M212263200View ArticlePubMedGoogle Scholar
- Tochio H, Mok YK, Zhang Q, Kan HM, Bredt DS, Zhang MJ: Formation of nNOS/PSD-95 PDZ dimer requires a preformed beta-finger structure from the nNOS PDZ domain. Journal of Molecular Biology 2000, 303(3):359–370. 10.1006/jmbi.2000.4148View ArticlePubMedGoogle Scholar
- Grembecka J, Cierpicki T, Devedjiev Y, Derewenda U, Kang BS, Bushweller JH, Derewenda ZS: The binding of the PDZ tandem of syntenin to target proteins. Biochemistry 2006, 45(11):3674–3682. 10.1021/bi052225yView ArticlePubMedGoogle Scholar
- Lee J, Natarajan M, Nashine VC, Socolich M, Vo T, Russ WP, Benkovic SJ, Ranganathan R: Surface sites for engineering allosteric control in proteins. Science 2008, 322(5900):438–442. 10.1126/science.1159052View ArticlePubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.