Structural assembly of two-domain proteins by rigid-body docking
© Cheng et al. 2008
Received: 29 May 2008
Accepted: 16 October 2008
Published: 16 October 2008
Skip to main content
© Cheng et al. 2008
Received: 29 May 2008
Accepted: 16 October 2008
Published: 16 October 2008
Modelling proteins with multiple domains is one of the central challenges in Structural Biology. Although homology modelling has successfully been applied for prediction of protein structures, very often domain-domain interactions cannot be inferred from the structures of homologues and their prediction requiresab initiomethods. Here we present a new structural prediction approach for modelling two-domain proteins based on rigid-body domain-domain docking.
Here we focus on interacting domain pairs that are part of the same peptide chain and thus have an inter-domain peptide region (so called linker). We have developed a method called pyDockTET (tethered-docking), which uses rigid-body docking to generate domain-domain poses that are further scored by binding energy and a pseudo-energy term based on restraints derived from linker end-to-end distances. The method has been benchmarked on a set of 77 non-redundant pairs of domains with available X-ray structure. We have evaluated the docking method ZDOCK, which is able to generate acceptable domain-domain orientations in 51 out of the 77 cases. Among them, our method pyDockTET finds the correct assembly within the top 10 solutions in over 60% of the cases. As a further test, on a subset of 20 pairs where domains were built by homology modelling, ZDOCK generates acceptable orientations in 13 out of the 20 cases, among which the correct assembly is ranked lower than 10 in around 70% of the cases by our pyDockTET method.
Our results show that rigid-body docking approach plus energy scoring and linker-based restraints are useful for modelling domain-domain interactions. These positive results will encourage development of new methods for structural prediction of macromolecules with multiple (more than two) domains.
It is estimated that two thirds of proteins in prokaryotes and four fifths of those in eukaryotes are multi-domain proteins[1, 2], many of which have important functions in cell regulation and signalling. From a structural point of view, they range from those with significant and stable interactions between domains, which can usually be defined by X-ray and NMR, to those with flexible linkers and few domain-domain interactions that endow them with large conformational freedom. Crystallography of multi-domain proteins that have flexible linkers is more problematic. X-ray crystallography and NMR approaches have tended to adopt a "divide-and-conquer" approach, by first defining structures of individual domains, although their structures have often been determined within multi-protein complexes where the relationships between domains are often well defined.
For multi-domain proteins with no structural information, their domain orientations may be predicted through homology modelling. However, homologous multi-domain templates are not always available. Furthermore, even if a homologous template exists, its domains might not interact in the same way as the protein to model (see the review of Aloy and Russell). To minimize the chance of inferring wrong interaction data from the templates, Aloy and Russell tried to model putative interactions by assessing residue contacts in the interfaces of known three-dimensional protein structures. Thus, in addition to homology modelling, there has been increasing focus onab initioapproaches. For instance, Wollacott and co-workers modelled domain-domain assemblies by placing the domains at the N- and C-terminal of the linker structure, whose conformation is sampled during the procedure Their approach successfully identified near-native assemblies in 50% of the studied cases.
Another promising tool forab initiomodelling of multi-domain proteins is docking. Rigid-body docking approaches have already shown success in predicting interactions between relatively rigid globular protomers in protein complexes, as seen in the recent CAPRI (Critical Assessment of PRedicted Interactions;http://capri.ebi.ac.uk) blind tests. However, although protein-protein docking could be directly applied to model domain-domain interactions, only a few specific cases have been reported (perhaps because ranking of domain-domain poses is still challenging). As an example, vitronectin was reportedly modelled by docking two of its domains, but it required a strong inter-domain constraint from a disulfide cross-link. Lise and co-workers have developed an approach for docking two domains that are part of the same protein chain, using pair-wise residue contact function, which includes structural, physicochemical and evolutionary information, to distinguish the native-like domain assemblies from other solutions generated by standard docking procedures. Their work suggests that data-driven docking is useful in modelling domain assembly as well. Furthermore, Inbar and co-workers have extended the docking approach to multi-domain and multi-molecular assemblies, by using a heuristic that applies hierarchical construction to represent the assembly process and a greedy algorithm to select candidate complexes. The modelling of multi-domain proteins has also further promising applications in the field of modelling protein-protein complexes where any of the components has multiple domains. Instead of docking multi-domains directly, the problem can be tackled through divide-and-conquered approaches, which solve the structure of a multi-domain complex by first modelling the orientation of domains within a protein if there are stable relationships, and secondly each domain assembly can then be treated as a protomer for further docking. In this line, Ben-Zeev et al. applied docking between domains with residue conservation restraints to one of the CAPRI targets (T09), as part of a multi-docking protocol, although with limited success (acceptable model ranked 75).
In this paper, we describe a new approach, pyDockTET, for pair-wise assembly of domains that are connected by an inter-domain linker. In addition to the electrostatics and desolvation energy in the original pyDock scoring function, which gave one of the best performances for protein-protein docking in the recent CAPRI test, an additional pseudo-energy term derived from the end-to-end distance of linkers is incorporated in pyDockTET to select the near-native pair-wise domain poses. We also discuss here the dependence of this scoring function on the linker length and on the quality of the domain models used for the docking.
From these average end-to-end distance values and their standard deviations, we have derived a scoring function pyDockTET for docking of domain pairs (see Methods). Given the low frequency (and correspondingly higher variation of average end-to-end distance value) of linkers with length larger than 17, the docking sets we used to benchmark pyDockTET included only domain pairs that have inter-domain linkers with length between 2 – 17 residues.
The use of pyDock to identify domain assemblies from docking sets clearly gives well over random scoring, and the introduction of linker-based restraints as in pyDockTET further improves the results. Of course, for a realistic case, one has to rely on the docking procedure to generate near-native orientations. We have used here the known FFT-based docking method ZDOCK, but it is expected that the increasing success of rigid-body docking methods will also improve the predictive rates of pyDockTET.
The scoring function of pyDockTET uses the average end-to-end distance for every linker lengthL(L= 2, 3, …,17) as a restraint. It is expected that, as the linker length increases, it will provide a less useful restraint on the selection of docking poses, and therefore the performance of the method will likely depend on linker length. Here we analyse the success rates of pyDockTET for different linker lengths, considering only those cases of our domain-domain set that have at least one acceptable solution.
The scoring function of pyDockTET consists of a pseudo-energy term derived from linker end-to-end distances, in addition to the original pyDock function that is formed by electrostatics and desolvation energies. We have already shown that pyDockTET function performs in general better than that of pyDock, so now we will analyze in which cases this improvement is more apparent.
As for the size of the domain-domain interface, we showed above (the first section of Results) that the existence of cases with low number of inter-domain contact residues did not affect the average linker distances and corresponding standard deviations derived from our data set. However, we observed in Figure5b that the docking results actually depended quite significantly on the interface size. Figure5b shows the global success rates of pyDock and pyDockTET, with regard to the number of contact residues in the interface (defined as residues within 5Å distance from any atom of the other domain). It also shows the percentage of cases with acceptable solutions within the docking set (this actually limited the maximum success rates we could expect from pyDock or pyDockTET). Strikingly, ZDOCK found acceptable solutions only in one of the 13 cases with less than 20 contact residues (and no acceptable solution was found for the cases with less than 10 contact residues), which indicates a clear limitation of the FFT-based docking generation. This is in line with previous reports relating docking difficulty and interface size[18, 19]. For cases with acceptable docking poses, the pyDock scoring function also showed worse results when the number of contact residues was small – for the cases with less than 30 contact residues, pyDock has success rate at 25% whereas pyDockTET provides a significantly better success rate at 58%. In summary, the linker-based restraints of pyDockTET were able to largely improve the predictive results on those cases particularly difficult for unrestricted docking (i.e. with poor docking energies and/or small number of contact residues).
Domain-domain assembly with pyDockTET using homology models or X-ray structures of the interacting domains
docking from cryst.b
Docking from modelsc
We have evaluated the performance of pyDockTET with respect to other computational methods that have been recently reported for domain-domain assembly. Lise et al. tested their contact prediction method by generating 10 domain-domain orientations with the docking server GRAMM-X. They found an acceptable solution (fraction of native contacts > 0.1) in 12 out of 20 cases. For 5 of these 12 cases, the best model (in terms of fraction of native contacts) was ranked first by their contact scoring function. We can test pyDockTET in this benchmark. However, most of the cases in their benchmark have two linkers between the domains. Our method is focused onto two domains joined by a single linker (which in principle have more flexibility) and it is not directly applicable to domain-domain interactions with two linkers. Thus, we have applied our method to the only three cases of their benchmark where the domains are joined by a single linker. When we used the close configurations (with their side-chains remodelled by SCWRL) we found acceptable solutions (RMSD ≤ 10Å) for two cases, 13pk and 1tfb, which were ranked 1 and 678, respectively. When we used the open configurations (with their side-chains remodelled by SCWRL), we found only one case with acceptable solutions, 1jmc. Lise et al. used the open configuration (but not remodelling of the side-chains) and found an acceptable solution (fraction of native contacts > 0.1) for only one case, 1tfb, which ranked 3. As a note of caution, the overall results of Lise et al. strongly depended on the ability of GRAMM to generate acceptable docking poses in such small number of alternative poses. Another difference between their method and ours that makes difficult the comparison is that they used the criterion of fraction of native contacts above 0.1 to define the acceptable solutions, while we use here the RMSD (equivalent to the ligand RMSD as defined in CAPRI) below 10Å. Both criteria are used in CAPRI, in combination also with the interface RMSD, but no by separate. Ligand RMSD is arguably a more restrictive parameter than fraction of native contacts. For instance, from the last round 15 of CAPRIhttp://www.ebi.ac.uk/msd-srv/capri we can observe a significant number of cases that, in spite of having fraction of native contacts above 0.1, are incorrect predictions by the global CAPRI criteria (average false positive rate of 9%). On the contrary, virtually all cases with ligand RMSD below 10 Å are correct predictions (average false positive rate of 0%), and there are even some solutions with ligand RMSD above 10 Å that are still acceptable (e.g. 2 cases in target T32, and 1 case in target T36).
Inbar et al. recently described their combinatorial docking approach (CombDock) for multi-domain and multi-molecular assembly. However, they reported only three cases of domain-domain docking (the other reported cases were either docking of secondary structure elements within a single domain, or multi-molecular docking): 1a47, 1b23, and 1d0n. For all of them they found near-native assemblies within the top 10 solutions. However, our method is not directly applicable to these cases, since they have more than two domains (we could dock one domain onto the other two domains taken as a single rigid-body, but that would not be a realistic test for our method).
Domain docking results on difficult cases for domain-domain assembly
templates (dom 1/dom 2)d
We have described here a procedure to build multi-domain proteins from the structure (experimental or modelled) of their individual domains, using a combination of rigid-body docking, binding energy scoring, and linker-length based distance restraints. The inclusion of linker-based distance restraints largely improves the structural predictions, especially for those cases where binding energy alone is not sufficient to discriminate the near-native conformations. Provided that the rigid-body generation method is able to produce acceptable domain-domain orientations, our scoring function (based on docking energy plus restraints) finds the correct assembly within the top 10 solutions in about 60–70 % of the cases.
E = Eelec+ Edesolv+ Elinker (1)
where Eelecrepresents electrostatics and Edesolvrepresents desolvation energy.
For a pair of domain structures we generated 2,000 rigid-body docking orientations by ZDOCK2.1. The scoring function was then tested by calculating the success rate of predicting a near-native solution among the N top rankings (N = 10, 20, 30, 40, 50, 100, 200, 300, 400, 500) as scored by pyDock (before restraints) and pyDockTET (after restraints). Following the criteria in CAPRIhttp://capri.ebi.ac.uk for the assessment of results from protein-protein docking, here a near-native solution is considered acceptable if the RMSD of the one of the domains is ≤ 10Å from the equivalent one in the X-ray structure, when the other domain (typically the larger one) is superimposed onto that of the X-ray structure (similarly, a near-native solution is defined as a good one if the RMSD from the X-ray structure is ≤ 5Å).
A benchmark of 77 non-redundant domain pair structures was compiled by selecting all crystal structures of multi-domain proteins in PDB that satisfied the following criteria: i) since domain pairs that are not in direct contact cannot be predicted by our domain-domain docking, the benchmark cases were required to have at least one pair of residues that had side chain atoms within a distance ≤ 5Å in their crystal structures (see additional file2: The 77 non-redundant bound structures); ii) all crystal structures had a resolution ≤ 2.5Å and less than 30% sequence identity to each other (this is a standard sequence identity threshold in homologue search); and iii) we considered only proteins formed by two domains as defined by Pfam, with a single inter-domain linker (the linker regions were thus defined by the domain boundaries of Pfam, and all the 77 non-redundant domain pairs contained linkers that covered the domain cutting sites defined by SCOP, or were near them within three amino acids difference). For a more realistic domain assembly test, we used SCWRL 3.0 in order to re-model all side chains of the individual domains before docking.
The second benchmark set contained 20 non-redundant domain pairs in which each domain was modelled on the basis of a homologue. This sub-set was generated from the previously described benchmark of 77 pairs, after selecting those cases in which both domains had available templates and thus could be independently modelled. The modelling process applied BLAST to search for template structures (considering only homologous sequences with the best E-values, as long as they are below the limit of 10-20) and used Baton (D. Burke, unpublished; based on the COMPARER algorithm) to do multiple structural alignment of templates. Fugue was used to find templates in those cases in which BLAST failed and also to generate all the sequence-structural alignments. Finally MODELLER was used to generate models for each domain. The modelled cases are listed in Table1. The template structures and the sequence identities (computed from the structural alignments) can be found in the additional file1: The 20 unbound (modelled) structures.
We are grateful for the suggestions received from the anonymous reviewers, especially with regard to the analysis of domain-domain contacts. T.M.K.C. is recipient of a Cambridge Overseas Trust Fellowship. This work is supported by the Plan Nacional I+D+I grant BIO2005-06753 from the Spanish Ministry of Science.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.