- Research article
- Open Access
Predicting PDZ domain mediated protein interactions from structure
© Hui et al.; licensee BioMed Central Ltd. 2013
- Received: 15 February 2012
- Accepted: 19 December 2012
- Published: 21 January 2013
PDZ domains are structural protein domains that recognize simple linear amino acid motifs, often at protein C-termini, and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity and neural development. PDZ domain-peptide interaction predictors have been developed based on domain and peptide sequence information. Since domain structure is known to influence binding specificity, we hypothesized that structural information could be used to predict new interactions compared to sequence-based predictors.
We developed a novel computational predictor of PDZ domain and C-terminal peptide interactions using a support vector machine trained with PDZ domain structure and peptide sequence information. Performance was estimated using extensive cross validation testing. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that the predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs, and is less dependent on training-testing domain sequence similarity. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and suggest new interactions for other processes including wound healing and Wnt signalling.
We built a structure-based predictor of PDZ domain-peptide interactions, which can be used to scan C-terminal proteomes for PDZ interactions. We also show that the structure-based predictor finds many known PDZ mediated PPIs in human that were not found by our previous sequence-based predictor and is less dependent on training-testing domain sequence similarity. Using both predictors, we defined a functional map of human PDZ domain biology and predict novel PDZ domain function. Users may access our structure-based and previous sequence-based predictors athttp://webservice.baderlab.org/domains/POW.
- Training Domain
- Domain Structure Feature
- Phage Display Experiment
- Multiple Cross Validation
- Binding Site Definition
Recent high throughput experiments have resulted in the availability of large data sets of PDZ domain-peptide interactions[7, 8]. As a result, several computational methods have been developed to predict PDZ domain-peptide interactions using sequence-based information only[8-12]. Previously, we developed a sequence-based predictor to scan proteomes of multiple organisms for binders of PDZ domains. Although this predictor is more accurate and precise at proteome scanning compared to previous sequence-based predictors, like others, it performs better on sequences similar to those in the training set. It is known that structure features within the domain binding pocket play important roles in determining binding specificity[13-15]. Since domain structure features capture different information about binding compared to sequence features, we hypothesized that training with such features would result in a predictor that is complementary to the sequence-based predictor. In particular, such a predictor would be less dependent on sequence similarity and would predict additional interactions not predicted by the sequence-based predictor. This would expand the coverage of PDZ domain C-terminal peptide interactions that can currently be predicted by sequence-based predictors alone.
Structure-based predictors have been developed to more generally predict protein-protein interactions. For instance, Hue et al., used a support vector machine (SVM) to predict PPIs using a structure kernel. Methods utilizing structure information to more specifically predict PPIs mediated by peptide recognition domains have also been developed. Sanchez et al., used an empirical force field to calculate structure-based energy functions for human SH2 domain interactions. Fernandez-Ballester et al., constructed position weight matrices of all possible SH3-ligand complexes in yeast using homology modelling. Smith et al., used protein backbone sampling to predict binding specificity for 85 human PDZ domains. Kaufmann et al., developed an optimized energy function to predict the binding specificity of PDZ domain-peptide interactions for 12 PDZ domains.
In this paper, we present a structure-based predictor for PDZ domain-peptide interactions that can be used for proteome scanning. Our predictor uses a variety of different structure features that are known to play roles in protein structure stability and facilitating PPIs. Through leave 12% of domain out cross validation, we showed that the structure-based predictor depends less on training-testing domain sequence similarity compared to our previous sequence-based predictor. Based on human proteome scanning results, we also show that the structure-based predictions correspond to known experimentally determined PDZ domain-peptide interactions and known PPIs involving PDZ domain containing proteins. A substantial number of the structure-based predictions correspond to known PPIs not previously predicted by the sequence-based predictor (48% increase), confirming that the structure-based predictor finds different interactions than the sequence-based predictor. Using predictions from both methods, we created a functional map using all predicted human PDZ mediated PPIs and identify xenobiotic metabolism as a novel biological process enriched in PDZ interactors.
Finally, we developed a website called POW! PDZ domain-peptide interaction prediction website (http://webservice.baderlab.org/domains/POW), which enables users to run our sequence-based and structure-based predictors online in human, mouse, fly and worm.
Domain binding site definition
A number of positions in the PDZ domain that are in close contact with the peptide are important for binding[7, 8]. Following previous work, we defined the binding site using ten domain positions (core positions) that are in close contact with the peptide ligand (< 4.5 angstroms) across nine PDZ domain structures. In total, 218 out of 267 human PDZ domains could be used because they didn’t have gaps in their binding sites based on a PDZ family multiple sequence alignment (8 structures), and we could obtain structures and compute features for them (41 structures). For mouse, fly and worm, respectively, 178 of 237, 85 of 117 and 64 of 81 known PDZ domains were supported with 11, 14 and 7 of the remaining domains containing gaps. All PDZ domains were defined by HMMER 3.0 against UniProt defined PDZ proteins as of Apr 2012. Overall, the structure-based predictor supports the majority of PDZ domains (i.e. 82%, 74%, 73% and 79% of known PDZ domains) for human, mouse, fly and worm, respectively.
Although previous studies used a binding site definition of 16 domain positions (a superset of the ten we use), these positions were identified from only a single PDZ domain-peptide complex structure[9, 10] and many domains contain gaps using this larger 16-position binding site definition (based on a multiple sequence alignment with other PDZ domains). A comparison of cross validation performance (see section on Predictor Performance Evaluation) using ten versus 16 binding site positions showed that the ten positions were adequate for achieving good predictor performance (see Additional file1: Table S1).
Domain structure data
The initial set of PDZ domain structures consists of one NMR and 17 X-ray structures for human collected from the Protein Data Bank (PDB) with corresponding interaction data from phage display or protein microarray experiments[7, 8]. Five NMR structures were collected from the PDB for mouse. For NMR structures, only the first model was used. Homology models were used to increase the number of structures available for domain structure feature encoding. In total, 11 human and 54 mouse PDZ domain models were modelled by SWISS-MODEL (downloaded Feb-Sep 2011) through the Protein Model Portal, which is a website providing access to structure models generated by different protein structure resources.
The quality of the homology models was estimated by computing the number of identical residues between the target and template sequence (i.e. template sequence identity). It has been shown that target-template sequence identity is positively correlated with model quality. In particular, state-of-the-art algorithms can always build high quality models (RMSD < 2 Å) if the target-template sequence identity is higher than 35-40%. Furthermore, there is no significant variation in model quality for targets with sequence similarity between 40-70%. If the similarity is 35%, there is no correlation[25, 26]. All training models have greater than 50% sequence similarity to their template structure (average 90%). At this threshold, models are expected to have the correct fold with most inaccuracies arising from structural variation in templates and incorrect reconstruction of loops[25, 26]. We also computed the QMEAN score which is a scoring function measuring multiple geometrical aspects of protein structure including torsion angle potential, secondary structure-specific interaction potentials and solvation exposure potential. This score ranges from zero to one with scores closer to one indicating more reliable models. The minimum QMEAN score for our training models is 0.520 (average 0.836). Please see Additional file2: Table S1 for details on all training domains.
Domain-peptide interaction data
PDZ domain-peptide interactions were collected from published high throughput phage display and protein microarray experiments for human and mouse, respectively[7, 8]. Since the phage display data consisted of only positive interactions (of which many could be non-genomic, meaning not similar to any genomic peptide), we used an established protocol to filter the interactions to enrich for genomic interactions and to generate artificial negative interactions. Briefly, this protocol involves creating a position weight matrix for a given training domain using its experimentally determined binders (positives) and then using the matrix to scan a pool of C-terminal peptides (last 5 positions) for low scoring binders (negatives). We adopted a minor modification of this procedure to allow for the inclusion of additional class II type PDZ domains to increase coverage of the PDZ family - the minimum number of genomic peptides required for inclusion was relaxed from ten to four. Only domains with both positive and negative interaction data were used for predictor training.
Domain structure feature encoding
Structure features across the entire PDZ domain structure were computed and values corresponding to the ten core binding site positions were extracted from the larger list of features computed for all domain positions. Four types of structure features (detailed below) involved in protein folding and stability were computed to describe the PDZ domain structure (Figure 1). Three-dimensional geometric descriptors were investigated but were not included because they resulted in inferior cross validation performance (see Additional file1: Figure S1). In total, the PDZ domain structure as defined by the core positions was represented by a vector of length 240 features. Each value in the feature vector was scaled to lie between zero and one. Details regarding software parameters used to compute the following structure features are available in Additional file1, section A.
Solvent accessibility, hydrogen bonding and positive phi angle properties
The first feature type consists of five values describing protein structure and were computed using the JOY web server. Solvent accessibility indicates whether the protein surface in the area at the given core residue position is available to interact with ligands. Therefore, the first value indicates whether a given residue is solvent accessible or inaccessible. Patterns of hydrogen bonding are important in forming protein secondary and tertiary structure and are known to be important for canonical C-terminal peptide binding to the PDZ domain. The next three values indicate if there is a residue side chain hydrogen bonded to a main chain amide, carbonyl or another side chain. Finally, since positive main chain phi angles may restrict what types of residues may be accommodated at a given position, the last value indicates if the residue has a positive phi angle. These binary features (i.e. absence is 0, presence is 1) were computed for each core residue position resulting in a binary vector of length 50 (5 features x 10 core positions).
Solvent accessible area
The second feature type is a single value indicating how much surface (i.e. area) for a core residue is available for binding to a ligand residue. This feature was computed using the SURFV software for each residue resulting in a numeric vector of length 10 (1 feature x 10 core positions).
Electrostatic potential and hydrophobicity
Protein-protein interactions are facilitated by the electrostatic and hydrophobic complementarity of molecular surfaces. Therefore, the third and fourth feature types describe the electrostatic potential and hydrophobicity along the surface of the domain. At each core residue position, nine values were sampled from the surface resulting in a total of 90 electrostatic and 90 hydrophobicity values (9 features x 10 core positions). These features were generated by the VASCo software.
Peptide sequence feature encoding
Peptides were encoded using a sparse binary vector encoding, as described in previous work. Briefly, each residue in a peptide of length five was represented using a binary vector of length 20 with each bit corresponding to an amino acid type. The vectors were concatenated to form the final feature vector of length 100.
Support vector machine
A grid search was used to find locally optimal values for γ and C. Instead of explicitly balancing the positive and negative training examples, weighted costs were used according to C + = (n+/n-) C - , where n+ is the number of positive training interactions and n- is the number of negative training interactions. The LibSVM software library was used to build the SVM.
Semi supervised negative training set expansion
Summary of the training data
Predictor performance evaluation
We carried out multiple cross validation strategies to provide an estimate of predictor performance. First we performed ten fold cross validation which involves partitioning the training data into ten randomly selected interaction sets, independently holding out each set for testing against a predictor trained using the remainder of the data, and computing average performance across all ten runs. Following previous prediction methods and to better compare our results with previous work, we held out 12% of the domains (to estimate performance dependence on specific sets of domains), 8% of the peptides (to estimate predictor performance dependence on specific sets of peptides) and both 12% of the domains and 8% of the peptides (to estimate predictor performance dependence on specific sets of domains and peptides) and tested on the rest, again repeating this ten times. In general, the training domain features are more similar to each other (average 0.85 using normalized dot product similarity), compared to the peptide features (average 0.13). Thus, we also performed leave 12% of domains out cross validation with training set filtering based on domain sequence similarity and compared the performance of the structure-based predictor to our previously published sequence-based predictor. This involved holding out all data for 12% of domains for testing and training with only remaining domains and their interactions that had sequence similarity less than a given threshold to all testing domains.
We computed the following statistics to measure predictor performance:
Sensitivity or Recall: TP/(TP + FN)
Specificity: TN/(TN + FP)
Precision: TP/(TP + FP)
where TP is the number of true positives, FP is the number of false positives, TN is the number of true negatives, FP is the number of false positives. The overall performance was summarized by computing the area under the receiver operating characteristic (ROC) curves and Precision/Recall (PR) curves[37, 38].
Functional enrichment analysis
A gene function enrichment analysis was performed on the predicted sequence-based and structure-based gene targets using Gene Ontology (GO) biological process terms. The BiNGO (Biological Network Gene Ontology tool) software library was used to determine the enriched terms. The hypergeometric test was used to compute a p-value assessing the GO term enrichment for a given set of predicted genes. Multiple testing correction was performed using the Benjamini and Hochberg False Discovery Rate (FDR) correction. GO v1.2 (downloaded Dec 7, 2011) and human GO annotations (downloaded Dec 7, 2011) were used. Only gene-sets with between five and 300 genes were used from the GO ontology (defined by the GMT file dated Dec 6, 2011 and available athttp://www.baderlab.org/Data/StructurePDZProteomeScanning). A list of enriched terms (p-value < 0.05 and FDR < 0.1) with more than one gene interactor and associated with more than two domains were retained. To better interpret the structure-based and sequence-based enrichment results, we created an enrichment map, a network-based visual representation of enriched terms that groups similar terms and eases identification of functional themes. We used the Enrichment Map Cytoscape plugin software to create the enrichment map[41, 42], using the parameters p-value < 0.05, FDR Q value < 0.1 and “Jaccard + overlap similarity” cutoff = 0.517.
The structure-based predictor achieves high cross validation results
Structure-based predictor achieves better cross validation results than the sequence-based predictor ( p -value < 0.025)
10 Fold (95% CI)
(0.957 ~ 0.962)
(0.936 ~ 0.941)
(0.932 ~ 0.940)
(0.890 ~ 0.900)
Domain (95% CI)
(0.839 ~ 0.862)
(0.765 ~ 0.805)
(0.747 ~ 0.779)
Peptide (95% CI)
(0.929 ~ 0.941)
(0.883 ~ 0.902)
(0.898 ~ 0.918)
(0.825 ~ 0.850)
Domain + Peptide (95% CI)
(0.919 ~ 0.934)
(0.862 ~ 0.877)
(0.875 ~ 0.896)
(0.783 ~ 0.804)
The structure-based predictor is less dependent on training-testing domain sequence similarity
Structure-based predictions are validated by known PDZ domain-peptide interactions
We used the predictor to scan the human C-terminal proteome (defined by genome assembly Ensembl:GRCh37.64) for binders of 45 PDZ domains with known interactions in PDZBase that we could obtain structures and compute features for. For each domain, this involved scanning 43827 unique C-termini of length five (including splice variants). Structures for these domains were obtained from the PDB or were homology modelled and are at least 35% sequence similar (average over 80%) to their template structures. The minimum QMEAN score for these models is 0.36 (average 0.78). Please see Additional file2: Table S3 for more details.
The structure-based predictor has a true positive rate (TPR) of 0.36 and precision of 0.0033 and correctly predicted interactions for 22 of the 45 domains. For these domains approximately 73% of known PDZ domain-peptide interactions in PDZBase, an independent data source not used for training, were predicted (see Additional file2: Table S4). The sequence-based predictor had a higher TPR of 0.46 and correctly predicted interactions for 28 out of 45 domains. For these domains, 65% of known PDZ interactions were predicted and the precision was 0.0024. Although the sequence-based predictor has a higher TPR than the structure-based predictor, its precision and coverage of known PDZ domains is lower. This is likely because the sequence-based predictor predicts on average more interactions per domains than the structure-based predictor (average 426.89 and 239.71 per domain respectively). The low precision for both predictors is due to the few known interactions per domain that are available from PDZBase (average 2.2 interactions per domain).
We also tested the false positive rate (FPR) of the predictor using two real negative data sets for human, which were used in a recent study to benchmark a sequence-based predictor developed by Chen et al.. The first data set consists of 466 experimentally validated negative interactions involving peptides that contain a PDZ binding motif found from the literature. The second data set consists of 133 negative literature-described interactions involving peptides with a non-binding PDZ motif caused by a mutation. The structure-based predictor made predictions for 410 negative interactions from the first data set and 126 negative interactions from the second data set, which resulted in an FPR of 0.145 and 0.0, respectively. The sequence-based predictor had a FPR of 0.09 and 0.0, and made predictions for 421 and 128 negative interactions for the first and second data sets, respectively. Compared to our structure-based and sequence-based predictors, the Chen sequence-based predictor has a much higher FPR of 0.482 and 0.256 for the first and second data sets, respectively (see Additional file2: Table S5).
Many structure-based predictions correspond to known PDZ domain containing protein-protein interactions
To determine how many structure-based predicted interactions correspond to known PPIs, we scanned the human proteome to predict interactions for 218 human PDZ domains with known PPIs (that we could obtain structures and compute structure features for). Known PPIs were retrieved from iRefIndex, which is a database integrating interactions from different databases including BIND, BioGRID, CORUM, DIP, HPRD, IntAct and MINT. In total, 61 XRAY and nine NMR structures (only the first models used) were obtained from the PDB and 148 homology models were created. All models had a template sequence similarity of at least 22% (average 72%) and QMEAN score of at least 0.36 (average 0.78) Please see Additional file2: Table S3 for more details.
In total, 88 domains had predicted interactions that corresponded to known PPIs, with an average of greater than 21% of known PPIs being correctly predicted per domain. The number of PPIs successfully predicted per domain was significant (p-value < 0.05, Fisher’s exact test) for all but ten domains. A caveat of this result is that PDZ domain containing proteins may contain multiple PDZ domains and other domains, so it is not possible to uniquely assign a PPI to a PDZ domain. This could result in erroneous false negative or true positive statistics for the above tests. However, the results still serve as an estimate of predictor performance and show that the predictor is able to predict many known human PPIs.
The structure-based predictor is complementary to the sequence-based predictor
To better understand how unique predictions are made, we compared the results in more detail. The unique structure based predictions arise for different reasons. Some domains (43 domains) are more challenging for the sequence-based predictor, which returns a low number of hits per domain (ten or less) with none corresponding to known PPIs (see Additional file2: Table S8) (e.g. APBA1-1, CNKSR2-1, IL16-1, IL16-3). The structure predictor fares better for nine of these domains (ARHGEF11-1, IL16-1, IL16-3, MPDZ-12, MPP6-1, PDZD2-3, PDZD2-5, RAPGEF6-1, SCRIB-3) and is able to predict many more hits per domain (on average approximately 510 hits) with on average approximately three known hits per domain. On the other hand, the structure-based predictor has difficulty predicting hits for 19 domains (e.g. DLG5-3, MPDZ-6, MPDZ-8), of which four are better predicted by the sequence-based predictor (MLLT4-1, MPDZ-8, MPP3-1, PDZD2-2; average 383 hits) with on average one known PPI hit per domain. In another scenario, two domains may have identical binding sites at the sequence level (e.g. DLG1-1 and DLG2-1), but be different at the structure level. The sequence-based predictor cannot distinguish between the two domains in this case, even though the domains may actually bind different proteins. While the structure-based predictor uses features corresponding to ten core positions, these features are computed by considering the entire domain structure. Therefore, even if two domains have the same binding site residues, the resulting features will be different if their whole domain structures are different. The structure-based predictor’s ability to distinguish between domains with highly similar binding site sequences helps explain why it is able to predict more unique interactions than the sequence-based predictor. Overall, these results demonstrate situations where the structure-based predictor can be used to make predictions for domains that otherwise could not be easily predicted by the sequence-based predictor and thus shows that both methods are complementary.
Structure-based predicted binding specificities recapitulate experimental binding specificities
Predicted binding specificities are supported by known structural determinants of PDZ domain binding specificity
As noted above, there are many cases where the structure-based predicted binding specificity is closer to the experimental binding specificity than the sequence-based predicted binding specificity. For some examples, the structure-based predicted binding specificity better predicts the experimental binding specificity at certain positions (e.g. MLLT4-1, TJP1-1 and DVL2-1). To examine if this is caused by specific structural features used by the structure-based predictor, we searched the literature to find known structure determinants influencing these specific amino acid preferences and compared them to our results. For MLLT4-1, the structure-based predictions indicate a preference for a hydrophilic Thr residue at position −2. The preference for a hydrophilic Thr residue at position −2 is explained by the findings of Chen et al.. Their work showed that the Thr preference at position −2 is due to its interaction with Gln at position α2-1 of the domain, which forms a hydrophilic binding site pocket at position −2. This preference is reflected in the structure-based predicted binding specificity, whereas a completely different preference for a hydrophobic Ile residue at this position is predicted by the sequence-based predictor (Figure 5). The domain TJP1-1 is another example where the predicted structure and sequence-based binding specificities are very different (Figure 5). Appleton et al., showed that this domain has a bi-specific preference for Trp or Tyr at position −1. The Trp preference is accommodated through main chain interactions with β2 and β3 strands, while the Tyr preference is accomplished through hydrogen bonding with Asp at position β3-5 of the domain. The bi-specific preference for a Trp or Tyr at position −1 is reflected in the structure-based binding specificity, while only a preference for Tyr is indicated in the sequence-based binding specificity. Finally, the predicted binding specificities for domain DVL2-1 are very different (Figure 5). Zhang et al. found that the −2 binding site of the domain actually accommodates a Gly-Tyr pair. The preference for a Gly at position −2 is reflected in the predicted structure-based binding specificity whereas there is no obvious preference in the predicted sequence-based binding specificity. Since the binding specificities for these examples are determined by specific domain structure features, this helps explain why the structure-based predictor can better predict their binding preferences than the sequence-based predictor.
A functional map of PDZ domain biology highlights PDZ involvement in a variety of biological processes
In some cases, although the themes were also enriched in the iRefIndex map, only limited information about PDZ domain involvement in the associated process was found in the literature. These themes represent opportunities for our predictions to shed light on the role of PDZ domains where little is currently known. One example is ‘wound healing’, where both predictors predicted PDZ domains to interact with proteins involved in different stages of wound healing. These included platelet activators and aggregators (e.g. HGNC:CD9, P2RY12), growth factor receptors (e.g. HGNC:PDGFRA, TGFBR1, HGF), plasma membrane calcium-transporting ATPases (e.g. HGNC:ATP2B1, ATP2B2, ATP2B3, ATP2B4), calcium-activated potassium channels (e.g. HGNC:KCNMA1, KCNMB2), fibrinogen (HGNC:FGG), coagulation factors (e.g. HGNC:F8, F11), immune system proteins such as chemokines (e.g. HGNC:CXCR1, CXCR2, CCL19), tumour necrosis factors (e.g. HGNC:TNFAIP6, TNF) and inhibitor of nuclear factor kappa-β kinase (HGNC:IKBKB)) (Additional file2: Tables S9-10).
Finally, our predictions also suggested additional interactions for well studied processes that are known to involve PDZ domains. For ‘Wnt signalling’, both predictors predicted known interactions between the domain MAGI3-2 and frizzled-4 and 7 as well as domains DLG4-1,2 and frizzled-1,2,4 and 7. However, several other PDZ domains were also predicted to interact with frizzled family members. Some examples include AHNAK2-1, CAR14-1, CNKSR2-1 (structure-based) and MPDZ-13, PDZRN4-1, SYNJ2BP-1 (sequence-based) which are all predicted to interact with one or more frizzled family members (HGNC:FZD1, FZD2, FZD4, FZD7, FZD10). Interactions which may negatively regulate Wnt signalling were also predicted and involve F-box-like proteins (HGNC:TBL1X, TBL1XR1) and human colorectal mutant cancer protein (HGNC:MCC) (Additional file2: Tables S9-10).
Many functional themes we identify consist of multiple different enriched terms containing multiple proteins, predicted to interact with several PDZ domains. These patterns involve many proteins and are unlikely to occur by chance. Thus, our functional analysis provides additional validation of our prediction methods and highlights novel PDZ interactors involved in a variety of biological processes.
We have presented a structure-based predictor of PDZ domain-peptide interactions that can be used to scan C-terminal proteomes to predict PDZ domain mediated PPIs. Our predictor utilizes domain structure features derived from the whole domain, focusing on a core peptide-binding site defined by ten highly conserved amino acid positions. Combined with our use of experimentally determined and computationally generated training negative interactions, our predictor achieves high cross validation results and is expected to generalize well to unseen interactions in practice. Compared to our previous sequence-based predictor, the structure-based predictor is less dependent on training-testing domain sequence similarity and predicts many new validated interactions in human. As a result, the structure-based predictor is complementary to the sequence-based predictor and both should be used to identify candidates for further biological experiments and to expand our knowledge of PDZ domain mediated PPIs.
An important technical result of our work is our use of computationally generated negatives to supplement training and reduce over-prediction. We showed that the negative interactions in current experimental data sets do not adequately cover the negative proteome space resulting in a predictor that returns many hits that are likely false positives. While this problem is more apparent for the structure-based predictor, it also affects our sequence-based predictor, as there are several domains where sequence-based proteome scanning predicts thousands of hits, and likely affects other sequence-based predictors. Since additional experimentally determined negatives for training are limited, using computationally generated negatives is required. While PWMs can be used to computationally generate such negatives as previously shown, such methods do not model dependencies between ligand positions and depend on a user or naively defined cutoff to discriminate between positives and negatives. Here, we use a semi supervised learning approach utilizing an SVM to generate additional negatives, since SVMs can better address the limitations faced by PWMs. As a result, the proteome scanning performance was improved by reducing the number of false positive hits that would otherwise be returned. As this problem is not unique to the structure-based predictor, training with additional negatives is likely to benefit other predictors as well.
Comparing proteome scanning hits to known PPIs, there is only a moderate overlap in hits predicted by both the structure-based and sequence-based predictor. While this suggests that the predictors are complementary and thus should both be used, there are cases when using either the structure-based or sequence-based predictor to find interactors may be more appropriate. For example, when the training-testing domain sequence similarity is < 0.7, the structure-based predictor may be more useful, since its performance is less dependent on sequence similarity at lower similarity levels. In fact, when the sequence similarity is very low the sequence-based predictor may fail to return any predictions. For other domains, a reliable structure may not be obtained or modelled, or the required structure features cannot be successfully generated. In this case, the sequence-based predictor may be the only predictor that can be used. However, for the majority of cases, both predictors should be used to find as many hits as possible for a given domain.
Although PDZ domains can recognize motifs internal to a protein, most data is available for domain-C-terminal binding, thus our predictors have been trained using this data and are best suited for the prediction of such interactions. Although other similar methods exist that are also available on the web, they can only predict that a protein containing a PDZ domain interacts with another protein or are best suited for interactions between PDZ domains and specific types of proteins (e.g. membrane proteins). Thus, we expect our website will be useful to biologists in helping to further map the many processes mediated by PDZ domains.
While the current structure-based predictor performs well, other domain structure related features should be considered in the future. For example, it is known that the structural flexibility of the PDZ domain binding pocket can contribute to the domain’s ability to bind specific ligands[15, 52]. Recently, a model of PDZ domain backbone flexibility was used to successfully predict domain binding specificity, but for a subset of human PDZ domains. Thus, domain backbone flexibility features should be considered as they may help to improve predictor performance. Another structure related feature, which should also be considered, is binding pocket geometry and shape. Although we explored the use of 3D-Zernike descriptors, we found that their use did not benefit our predictor. However, there are other shape descriptors such as real spherical harmonic coefficients that could be investigated that may improve predictor performance. Although we have built an entirely structure-based predictor, additional features including sequence features can be combined to build a single predictor that utilizes all available types of information. Finally, since the predictor predicts in vitro interactions, incorporating contextual information such as co-expression and protein location will help to build a more physiologically relevant map of PDZ domain mediated protein-protein interactions.
We have presented a structure-based predictor of PDZ domain-peptide interactions using domain structure and peptide sequence information. Our predictor achieves high cross validation results and finds many interactions corresponding to known PDZ mediated PPIs not previously found by our sequence-based predictor. Using both predictors we defined a functional map of PDZ domain biology and identified novel PDZ interactors involved in a variety of biological processes. As a result, our predictions will help expand the coverage of current PDZ mediated PPI networks and provide new insight into the molecular mechanisms underlying a variety of biological processes.
For web-based proteome scanning:
Project name: POW! PDZ domain-peptide interaction prediction website
Project home page:http://webservice.baderlab.org/domains/POW/
Operating systems: Platform independent (web-based)
For proteome scanning software:
Project name: PDZ Structure-based Proteome Scanning
Project home page:http://baderlab.org/Data/StructurePDZProteomeScanning
Operating systems: Platform independent
Programming language: Java 1.5
License: Source code is freely available under the GNU Lesser Public General License (LPGL)
We thank members of the Bader lab for helpful discussions and Thijs Beuming for early discussions. This work was supported by the Canadian Institutes of Health Research grant MOP-84324.
- Pawson T, Nash P: Assembly of cell regulatory systems through protein interaction domains. Science 2003, 300: 445-452. 10.1126/science.1083653View ArticlePubMedGoogle Scholar
- Dev KK: Making protein interactions druggable: targeting PDZ domains. Nat Rev Drug Discov 2004, 3: 1047-1056. 10.1038/nrd1578View ArticlePubMedGoogle Scholar
- Doorbar J: Molecular biology of human papillomavirus infection and cervical cancer. Clin Sci (Lond) 2006, 110: 525-541.View ArticleGoogle Scholar
- Moyer BD, Denton J, Karlson KH, Reynolds D, Wang S, Mickle JE, Milewski M, Cutting GR, Guggino WB, Li M, Stanton BA: A PDZ-interacting domain in CFTR is an apical membrane polarization signal. J Clin Invest 1999, 104: 1353-1361. 10.1172/JCI7453PubMed CentralView ArticlePubMedGoogle Scholar
- Songyang Z, Fanning AS, Fu C, Xu J, Marfatia SM, Chishti AH, Crompton A, Chan AC, Anderson JM, Cantley LC: Recognition of unique carboxyl-terminal motifs by distinct PDZ domains. Science 1997, 275: 73-77. 10.1126/science.275.5296.73View ArticlePubMedGoogle Scholar
- Zhang Y, Yeh S, Appleton BA, Held HA, Kausalya PJ, Phua DC, Wong WL, Lasky LA, Wiesmann C, Hunziker W, Sidhu SS: Convergent and divergent ligand specificity among PDZ domains of the LAP and zonula occludens (ZO) families. J Biol Chem 2006, 281: 22299-22311. 10.1074/jbc.M602902200View ArticlePubMedGoogle Scholar
- Tonikian R, Zhang Y, Sazinsky SL, Currell B, Yeh JH, Reva B, Held HA, Appleton BA, Evangelista M, Wu Y, et al.: A specificity map for the PDZ domain family. PLoS Biol 2008, 6: e239. 10.1371/journal.pbio.0060239PubMed CentralView ArticlePubMedGoogle Scholar
- Stiffler MA, Chen JR, Grantcharova VP, Lei Y, Fuchs D, Allen JE, Zaslavskaia LA, MacBeath G: PDZ domain binding selectivity is optimized across the mouse proteome. Science 2007, 317: 364-369. 10.1126/science.1144592PubMed CentralView ArticlePubMedGoogle Scholar
- Chen JR, Chang BH, Allen JE, Stiffler MA, MacBeath G: Predicting PDZ domain-peptide interactions from primary sequences. Nat Biotechnol 2008, 26: 1041-1045. 10.1038/nbt.1489PubMed CentralView ArticlePubMedGoogle Scholar
- Hui S, Bader GD: Proteome scanning to predict PDZ domain interactions using support vector machines. BMC Bioinforma 2010, 11: 507.Google Scholar
- Shao X, Tan CS, Voss C, Li SS, Deng N, Bader GD: A regression framework incorporating quantitative and negative interaction data improves quantitative prediction of PDZ domain-peptide interaction from primary sequence. Bioinformatics 2011, 27: 383-390. 10.1093/bioinformatics/btq657PubMed CentralView ArticlePubMedGoogle Scholar
- Eo HS, Kim S, Koo H, Kim W: A machine learning based method for the prediction of G protein-coupled receptor-binding PDZ domain proteins. Mol Cells 2009, 27: 629-634. 10.1007/s10059-009-0091-2View ArticlePubMedGoogle Scholar
- Appleton BA, Zhang Y, Wu P, Yin JP, Hunziker W, Skelton NJ, Sidhu SS, Wiesmann C: Comparative structural analysis of the Erbin PDZ domain and the first PDZ domain of ZO-1. Insights into determinants of PDZ domain specificity. J Biol Chem 2006, 281: 22312-22320. 10.1074/jbc.M602901200View ArticlePubMedGoogle Scholar
- Skelton NJ, Koehler MF, Zobel K, Wong WL, Yeh S, Pisabarro MT, Yin JP, Lasky LA, Sidhu SS: Origins of PDZ domain ligand specificity. Structure determination and mutagenesis of the Erbin PDZ domain. J Biol Chem 2003, 278: 7645-7654. 10.1074/jbc.M209751200View ArticlePubMedGoogle Scholar
- Chen Q, Niu X, Xu Y, Wu J, Shi Y: Solution structure and backbone dynamics of the AF-6 PDZ domain/Bcr peptide complex. Protein Sci 2007, 16: 1053-1062. 10.1110/ps.062440607PubMed CentralView ArticlePubMedGoogle Scholar
- Hue M, Riffle M, Vert JP, Noble WS: Large-scale prediction of protein-protein interactions from structures. BMC Bioinforma 2010, 11: 144. 10.1186/1471-2105-11-144View ArticleGoogle Scholar
- Sanchez IE, Beltrao P, Stricher F, Schymkowitz J, Ferkinghoff-Borg J, Rousseau F, Serrano L: Genome-wide prediction of SH2 domain targets using structural information and the FoldX algorithm. PLoS Comput Biol 2008, 4: e1000052. 10.1371/journal.pcbi.1000052PubMed CentralView ArticlePubMedGoogle Scholar
- Fernandez-Ballester G, Beltrao P, Gonzalez JM, Song YH, Wilmanns M, Valencia A, Serrano L: Structure-based prediction of the Saccharomyces cerevisiae SH3-ligand interactions. J Mol Biol 2009, 388: 902-916. 10.1016/j.jmb.2009.03.038View ArticlePubMedGoogle Scholar
- Smith CA, Kortemme T: Structure-based prediction of the peptide sequence space recognized by natural and synthetic PDZ domains. J Mol Biol 2010, 402: 460-474. 10.1016/j.jmb.2010.07.032View ArticlePubMedGoogle Scholar
- Kaufmann K, Shen N, Mizoue L, Meiler J: A physical model for PDZ-domain/peptide interactions. J Mol Model 2011, 17: 315-324. 10.1007/s00894-010-0725-5PubMed CentralView ArticlePubMedGoogle Scholar
- Eddy SR: Accelerated Profile HMM Searches. PLoS Comput Biol 2011, 7: e1002195. 10.1371/journal.pcbi.1002195PubMed CentralView ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28: 235-242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Arnold K, Bordoli L, Kopp J, Schwede T: The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 2006, 22: 195-201. 10.1093/bioinformatics/bti770View ArticlePubMedGoogle Scholar
- Arnold K, Kiefer F, Kopp J, Battey JN, Podvinec M, Westbrook JD, Berman HM, Bordoli L, Schwede T: The Protein Model Portal. J Struct Funct Genomics 2009, 10: 1-8. 10.1007/s10969-008-9048-5PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Y: Protein structure prediction: when is it useful? Curr Opin Struct Biol 2009, 19: 145-155. 10.1016/j.sbi.2009.02.005PubMed CentralView ArticlePubMedGoogle Scholar
- Fischer D: Servers for protein structure prediction. Curr Opin Struct Biol 2006, 16: 178-182. 10.1016/j.sbi.2006.03.004View ArticlePubMedGoogle Scholar
- Benkert P, Tosatto SC, Schomburg D: QMEAN: A comprehensive scoring function for model quality assessment. Proteins 2008, 71: 261-277. 10.1002/prot.21715View ArticlePubMedGoogle Scholar
- Mizuguchi K, Deane CM, Blundell TL, Johnson MS, Overington JP: JOY: protein sequence-structure representation and analysis. Bioinformatics 1998, 14: 617-623. 10.1093/bioinformatics/14.7.617View ArticlePubMedGoogle Scholar
- Sridharan S, Nicholls A, Honig B: A new vertex algorithm to calculate solvent accessible surface areas. J Biophys 1992, 61: A174.Google Scholar
- Steinkellner G, Rader R, Thallinger GG, Kratky C, Gruber K: VASCo: computation and visualization of annotated protein surface contacts. BMC Bioinforma 2009, 10: 32. 10.1186/1471-2105-10-32View ArticleGoogle Scholar
- Boser B, Guyon I, Vapnik V: A training algorithm for optimal margin classifiers. In Fifth Annual Workshop on Computational Learning Theory (COLT 92). Pittsburgh: ACM Press; 1992:144-152.View ArticleGoogle Scholar
- Cristianini N, Shawe-Taylor J: An introduction to support vector machines and other kernel-based learning methods. Cambridge; New York: Cambridge University Press; 2000.View ArticleGoogle Scholar
- Turner B, Razick S, Turinsky AL, Vlasblom J, Crowdy EK, Cho E, Morrison K, Donaldson IM, Wodak SJ: iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence. Database (Oxford) 2010, 2010: baq023. 10.1093/database/baq023View ArticleGoogle Scholar
- Hsu C-W, Chang C-C, Lin C-J: A practical guide to support vector classification. National Taiwan University: Department of Computer Science; 2010.Google Scholar
- Chang C-C, Lin C-J: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2011, 2: 27:21-27:27.View ArticleGoogle Scholar
- Wang C, Ding C, Meraz RF, Holbrook SR: PSoL: a positive sample only learning algorithm for finding non-coding RNA genes. Bioinformatics 2006, 22: 2590-2596. 10.1093/bioinformatics/btl441View ArticlePubMedGoogle Scholar
- Davis J, Goadrich M: The relationship between Precision-Recall and ROC curves. In In Proceedings of the 23rd International Conference on Machine Learning (ICML'06). Pittsburgh: ACM; 2006:233-240.View ArticleGoogle Scholar
- Fawcett T: An introduction to ROC analysis. Pattern Recogn Lett 2006, 27: 861-874. 10.1016/j.patrec.2005.10.010View ArticleGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25: 25-29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Maere S, Heymans K, Kuiper M: BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 2005, 21: 3448-3449. 10.1093/bioinformatics/bti551View ArticlePubMedGoogle Scholar
- Merico D, Isserlin R, Stueker O, Emili A, Bader GD: Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PLoS One 2010, 5: e13984. 10.1371/journal.pone.0013984PubMed CentralView ArticlePubMedGoogle Scholar
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003, 13: 2498-2504. 10.1101/gr.1239303PubMed CentralView ArticlePubMedGoogle Scholar
- Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L, et al.: Ensembl 2009. Nucleic Acids Res 2009, 37: D690-697. 10.1093/nar/gkn828PubMed CentralView ArticlePubMedGoogle Scholar
- Luck K, Fournane S, Kieffer B, Masson M, Nominé Y, Travé G: Putting into Practice Domain-Linear Motif Interaction Predictions for Exploration of Protein Networks. PLoS One 2011, 6: e25376. 10.1371/journal.pone.0025376PubMed CentralView ArticlePubMedGoogle Scholar
- Bader GD, Betel D, Hogue CW: BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res 2003, 31: 248-250. 10.1093/nar/gkg056PubMed CentralView ArticlePubMedGoogle Scholar
- Stark C, Breitkreutz BJ, Chatr-Aryamontri A, Boucher L, Oughtred R, Livstone MS, Nixon J, Van Auken K, Wang X, Shi X, et al.: The BioGRID Interaction Database: 2011 update. Nucleic Acids Res 2011, 39: D698-704. 10.1093/nar/gkq1116PubMed CentralView ArticlePubMedGoogle Scholar
- Ruepp A, Waegele B, Lechner M, Brauner B, Dunger-Kaltenbach I, Fobo G, Frishman G, Montrone C, Mewes HW: CORUM: the comprehensive resource of mammalian protein complexes-2009. Nucleic Acids Res 2010, 38: D497-501. 10.1093/nar/gkp914PubMed CentralView ArticlePubMedGoogle Scholar
- Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The Database of Interacting Proteins: 2004 update. Nucleic Acids Res 2004, 32: D449-451. 10.1093/nar/gkh086PubMed CentralView ArticlePubMedGoogle Scholar
- Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivakumar K, Anuradha N, Reddy R, Raghavan TM, et al.: Human protein reference database-2006 update. Nucleic Acids Res 2006, 34: D411-414. 10.1093/nar/gkj141PubMed CentralView ArticlePubMedGoogle Scholar
- Aranda B, Achuthan P, Alam-Faruque Y, Armean I, Bridge A, Derow C, Feuermann M, Ghanbarian AT, Kerrien S, Khadake J, et al.: The IntAct molecular interaction database in 2010. Nucleic Acids Res 2010, 38: D525-531. 10.1093/nar/gkp878PubMed CentralView ArticlePubMedGoogle Scholar
- Ceol A, Chatr Aryamontri A, Licata L, Peluso D, Briganti L, Perfetto L, Castagnoli L, Cesareni G: MINT, the molecular interaction database: 2009 update. Nucleic Acids Res 2010, 38: D532-D539. 10.1093/nar/gkp983PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Y, Appleton BA, Wiesmann C, Lau T, Costa M, Hannoush RN, Sidhu SS: Inhibition of Wnt signaling by Dishevelled PDZ peptides. Nat Chem Biol 2009, 5: 217-219. 10.1038/nchembio.152View ArticlePubMedGoogle Scholar
- Razick S, Magklaras G, Donaldson IM: iRefIndex: a consolidated protein interaction database with provenance. BMC Bioinforma 2008, 9: 405. 10.1186/1471-2105-9-405View ArticleGoogle Scholar
- Omiecinski CJ, Vanden Heuvel JP, Perdew GH, Peters JM: Xenobiotic metabolism, disposition, and regulation by receptors: from biochemical phenomenon to predictors of major toxicities. Toxicol Sci 2011,120(Suppl 1):S49-S75.PubMed CentralView ArticlePubMedGoogle Scholar
- Eling TE, Curtis JF: Xenobiotic metabolism by prostaglandin H synthase. Pharmacol Ther 1992, 53: 261-273. 10.1016/0163-7258(92)90012-OView ArticlePubMedGoogle Scholar
- Zhang J, Dong J, Gu H, Yu S, Zhang X, Gou Y, Xu W, Burd A, Huang L, Miyado K, et al.: CD9 is critical for cutaneous wound healing through JNK signaling. J Invest Dermatol 2012, 132: 226-236. 10.1038/jid.2011.268View ArticlePubMedGoogle Scholar
- Klepeis VE, Weinger I, Kaczmarek E, Trinkaus-Randall V: P2Y receptors play a critical role in epithelial cell communication and migration. J Cell Biochem 2004, 93: 1115-1133. 10.1002/jcb.20258View ArticlePubMedGoogle Scholar
- Lynch SE, Nixon JC, Colvin RB, Antoniades HN: Role of platelet-derived growth factor in wound healing: synergistic effects with other growth factors. Proc Natl Acad Sci USA 1987, 84: 7696-7700. 10.1073/pnas.84.21.7696PubMed CentralView ArticlePubMedGoogle Scholar
- Liu J, Johnson K, Li J, Piamonte V, Steffy BM, Hsieh MH, Ng N, Zhang J, Walker JR, Ding S, et al.: Regenerative phenotype in mice with a point mutation in transforming growth factor beta type I receptor (TGFBR1). Proc Natl Acad Sci USA 2011, 108: 14560-14565. 10.1073/pnas.1111056108PubMed CentralView ArticlePubMedGoogle Scholar
- Bevan D, Gherardi E, Fan TP, Edwards D, Warn R: Diverse and potent activities of HGF/SF in skin wound repair. J Pathol 2004, 203: 831-838. 10.1002/path.1578View ArticlePubMedGoogle Scholar
- Talarico EF Jr: Plasma membrane calcium-ATPase isoform four distribution changes during corneal epithelial wound healing. Mol Vis 2010, 16: 2259-2272.PubMed CentralPubMedGoogle Scholar
- Becchetti A, Arcangeli A: Integrins and ion channels in cell migration: implications for neuronal development, wound healing and metastatic spread. Adv Exp Med Biol 2010, 674: 107-123. 10.1007/978-1-4419-6066-5_10View ArticlePubMedGoogle Scholar
- Laurens N, Koolwijk P, de Maat MP: Fibrin structure and wound healing. J Thromb Haemost 2006, 4: 932-939. 10.1111/j.1538-7836.2006.01861.xView ArticlePubMedGoogle Scholar
- Inbal A, Dardik R: Role of coagulation factor XIII (FXIII) in angiogenesis and tissue repair. Pathophysiol Haemost Thromb 2006, 35: 162-165. 10.1159/000093562View ArticlePubMedGoogle Scholar
- Gillitzer R, Goebeler M: Chemokines in cutaneous wound healing. J Leukoc Biol 2001, 69: 513-521.PubMedGoogle Scholar
- Barrientos S, Stojadinovic O, Golinko MS, Brem H, Tomic-Canic M: Growth factors and cytokines in wound healing. Wound Repair Regen 2008, 16: 585-601. 10.1111/j.1524-475X.2008.00410.xView ArticlePubMedGoogle Scholar
- Wawrzak D, Luyten A, Lambaerts K, Zimmermann P: Frizzled-PDZ scaffold interactions in the control of Wnt signaling. Adv Enzyme Regul 2009, 49: 98-106. 10.1016/j.advenzreg.2009.01.002View ArticlePubMedGoogle Scholar
- Lagna G, Carnevali F, Marchioni M, Hemmati-Brivanlou A: Negative regulation of axis formation and Wnt signaling in Xenopus embryos by the F-box/WD40 protein beta TrCP. Mech Dev 1999, 80: 101-106. 10.1016/S0925-4773(98)00208-1View ArticlePubMedGoogle Scholar
- Fukuyama R, Niculaita R, Ng KP, Obusez E, Sanchez J, Kalady M, Aung PP, Casey G, Sizemore N: Mutated in colorectal cancer, a putative tumor suppressor for serrated colorectal cancer, selectively represses beta-catenin-dependent transcription. Oncogene 2008, 27: 6044-6055. 10.1038/onc.2008.204PubMed CentralView ArticlePubMedGoogle Scholar
- Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, Doerks T, Stark M, Muller J, Bork P, et al.: The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 2011, 39: D561-568. 10.1093/nar/gkq973PubMed CentralView ArticlePubMedGoogle Scholar
- Bhardwaj N, Stahelin RV, Zhao G, Cho W, Lu H: MeTaDoR: a comprehensive resource for membrane targeting domains and their host proteins. Bioinformatics 2007, 23: 3110-3112. 10.1093/bioinformatics/btm395View ArticlePubMedGoogle Scholar
- La D, Esquivel-Rodriguez J, Venkatraman V, Li B, Sael L, Ueng S, Ahrendt S, Kihara D: 3D-SURFER: software for high-throughput protein surface comparison and analysis. Bioinformatics 2009, 25: 2843-2844. 10.1093/bioinformatics/btp542PubMed CentralView ArticlePubMedGoogle Scholar
- Morris RJ, Najmanovich RJ, Kahraman A, Thornton JM: Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons. Bioinformatics 2005, 21: 2347-2355. 10.1093/bioinformatics/bti337View ArticlePubMedGoogle Scholar
- Liu Y, Henry GD, Hegde RS, Baleja JD: Solution structure of the hDlg/SAP97 PDZ2 domain and its mechanism of interaction with HPV-18 papillomavirus E6 protein. Biochemistry 2007, 46: 10864-10874. 10.1021/bi700879kView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.