Specific αα-hairpin loops determine the handedness of helix-helix packing
To identify typical hairpin motifs, we performed a statistical analysis of helix-loop-helix fragments and found that shorter loops are present in greater frequency (Additional file 1: Figure S1). To focus on hairpins rather than general helix-loop-helix fragments, we defined helix-orientation vectors [19] and calculated their crossing angles (Additional file 1: Figure S1). On applying the condition that the helix-helix crossing angles θHH are less than 60° in the fragment dataset, we found a decrease in the population of single-residue loops, as the short helix-loop-helix prefers corners or kinks rather than hairpins. We focused on the more frequent short hairpin fragments and extracted 2, 3, and 4 residue length loops for subsequent analysis.
To investigate the preferable loop conformations related to the specific handedness of helix-helix packing in naturally occurring protein structures, we assigned a 5-state coarse-grained representation for the backbone torsion angle, i.e., ABEGO representations (Additional file 1: Figure S2) for each fragment and evaluated their statistical information [2]. ABEGO is a five-state coarse-grained representation of polypeptide backbone dihedral angles; Ramachandran map is divided into four sections and labelled by single letters A, B, E and G, to enable the representation of dihedral angle series by character strings. The A region roughly corresponds to the conformation of α-helix, and the B region corresponds roughly to the β-strand conformation. The G region corresponds to left-handed α-helix, and the E region represents the rest of the Ramachandran map. The O state corresponds to the cis-conformation of peptide bond, which are almost negligible in this paper. Then we sorted the backbone torsion types specified by the ABEGO representations by their population and found that the hairpins showed limited conformations (Fig. 1). For example, the GB and BB loops occupied more than 90% of the top five frequent populations among two-residue loops.
To identify hairpins that show specific handedness in helix-helix packing, we defined helix-helix dihedral angles φHH and calculated the ratio of left- (L-) and right- (R-) types among the helix-helix dihedral angle distribution (Fig. 2). We selected the most populated loop types that showed R/L or L/R ratios higher than 5.0 in each class of loop lengths as representative hairpin species. This resulted in the selection of the GB, GBB, and BAAB loops; we did not select BAB loop because their inter-helix dihedral angles were broadly distributed, resulting in both left- and right-handed helix-helix packing (Additional file 1: Figure S3).
We extracted the structures whose φHH was nearest to the median of the angle distribution as the class representatives. The representative structure and the distribution of φHH clarified that GB loops are closely related to the L-type handedness of helix-helix packing motifs (Fig. 2). Similarly, the GBB loop was related to the R-type and the BAAB loop to L-type packing. The handedness of helix packing for the GB, GBB, and BAAB loops was broadly consistent with a previous report [20] and the overall tendency did not change when we performed the same analysis for different dataset (Additional file 1: Figure S4). We also checked the clustering quality by sequence alignments for each hairpin structure, and observed typical periodic patterns of hydrophobic residues in the flanking helix regions (Additional file 1: Figure S5). We concluded that classification using the ABEGO patterns worked well to extract hairpin motifs related to the specific handedness of helix-helix packing.
Previous studies on αα-hairpins reported both L and R types of helix-helix packing can result from GB or GBB hairpins [1, 4]. However, we observed that these loops indeed strongly bias the handedness of helix-helix packing. This does not imply that a single ABEGO-level representation can always specify the single handedness of helix-helix packing; for example, BAABB loop can result in both the L and R type packing (Additional file 1: Figure S3). However, certain hairpin conformations such as GB, GBB, and BAAB can strongly determine the handedness of the packing of two flanking α-helices, and are an example of a pair of local and nonlocal structural motifs that are consistently incorporated into a single tertiary structure. As the spatial arrangements and orientation of two α-helices connected by a loop region are stereochemically determined by the backbone dihedral angles in the loop region, preference to specific handedness of helix-helix packing can be attributed to the rigid conformation of loop region. Hydrogen bond analysis using DSSP [21] revealed that intra-loop backbone-backbone hydrogen-bond network energetically stabilizes such typical loop conformations, making the loop conformation rigid enough to relate the local conformation of loops to the specific geometry helix-loop-helix fragments (Additional file 1: Figure S6–S8).
Sequence-independent backbone-building simulations clarify the condition for building compact single-chain three-helix bundles
The length of the second α-helix is expected to play a crucial role in the construction of compactly packed single-chain three-helical bundle structures since extension of α-helix leads to large repositioning of the following segments. As one turn of the α-helix requires 3.6 residues, compact bundle structures may appear for every increase of 3 or 4 residues. However, it remains unclear which exact combination of loops and helix-length results in a compact single-chain three-helix bundle structure. Therefore, we performed comparative sequence-independent fragment-assembly simulations to identify which combination of loops and helix lengths result in tight packing between the first and third α-helices.
The set of backbone dihedral angles of the fragments are roughly specified in the ABEGO representation (referred to as “blueprint” [22]) and are used in fragment-picking before the fragment-assembly simulations. Hereafter, we refer to this type of fragment assembly simulation guided by the blueprints as backbone-building simulations. Using the GB, GBB, and BAAB loops identified in the previous section, we constructed blueprint files for various types of single-chain three-helix structures and systematically scanned the length of the second α-helix. Next, 2500 trajectories of backbone-building simulations were performed for each of these blueprints [23, 24]. We prepared ideal single-chain three-helical bundle decoys using CC-builder for the reference structures [25], and calculated the template-match score (TM-score) of the final structure from each trajectory that was referenced by the decoys to quantify the success ratio of the backbone-building simulations [26]. Importantly, we had two reference decoys for each blueprint-based folding simulation because single-chain three-helical bundles can take two types of helix configurations; i.e., a clockwise (CW) or counter-clockwise (CCW) arrangement of three α-helices (Additional file 1: Figure S9).
The results of the backbone-building simulations are summarized in Fig. 3. The simulations showed three important features that are summarized here by taking the results for the helix-GB-helix-GB-helix simulations as examples. First, the length of the second alpha-helix plays a crucial role in the construction of compactly packed single-chain three-helical bundle structures; the success ratio of the backbone-building simulations was obviously related to the periodicity of the α-helix structure. For example, CW bundles can be efficiently generated with the second α-helix with lengths of 10, 14, and 17 residues for helix-GB-helix-GB-helix blueprints. Similarly, the second α-helix with lengths of 9, 12, 16 residues resulted in CCW bundles. Here, the peaks were separated in every three or four residues, which was consistent with the canonical α-helix structure that requires 3.6 residues per turn. Second, certain combinations of loops and helix-lengths do not yield well-packed helix bundles. For example, the blueprint with a 15 residue helix in the middle cannot fold into a compact helical bundle. This is because the number of turns in the second α-helix is unable to pack the first and third helices closely, causing them to be apart from each other (Additional file 1: Figure S10). Such a blueprint has a local conformation that is inconsistent with the global structure of compact single-chain three-helical bundles. Third, the position of the peaks oscillates between CW and CCW bundles as the length of the second helix increases. The switch between a CW bundle to the neighboring CCW bundle is very sharp and sometimes requires an increase/decrease of a single residue. For example, the blueprint with a 16-residue helix in the middle preferentially results in the CCW bundle structure, and an increase of one residue results in a preference for the CW bundle. Overall, the results of the backbone-building simulations agree with the qualitative expectations that were guided by the periodicity of the α-helix structure, and provide further detailed information on the exact combination of loop types and helix-lengths that result in compact bundle conformations. These results were not affected when different threshold and reference structures were used for analysis (Additional file 1: Figures S11 and S12).
Taken together, these results indicate that the appropriate combination of local loop motifs and the length of the secondary structures are relatively rare among the possible combinations, especially under approximation that the loops and α-helices are semi-rigid under ABEGO constraints on backbone dihedral angles. In our simple simulations for single-chain three-helix bundle structures, approximately half of the blueprints were able to generate compact bundle conformations. As the foldable combinations of building blocks are rare and sparsely distributed even for simple single-chain three-helical bundles, valid combinations for more complicated topologies are expected to become rarer and more difficult to find. We expect that the possibility of obtaining foldable combinations will decrease exponentially as the number of secondary structures increases, and it will be difficult to hypothesize as to which combinations may result in a compact, globular, and protein-like structure without exhaustive sampling in the conformational space.
Other types of blueprints, such as for helix-GBB-helix-GBB-helix, and helix-BAAB-helix-BAAB-helix showed results similar to the GB-blueprint. Interestingly, the “phase” of the peak oscillation was inverted between the GBB and BAAB-blueprints, whereas the positions of the peaks were similar to each other, reflecting the local handedness of the hairpin structures. The former results in a CCW bundle when the second helix has 13 residues, and the latter yields a CW bundle in the same conditions. These observations that the local handedness of hairpins can control the global chirality of the topology may be informative for efficiently diversifying the shapes of design proteins. Additionally, the blueprints showing a mixture of hairpins with different handedness failed to pack the first and third α-helices because their crossing angles do not cancel out (Additional file 1: Figure S10 and S13–17).
Amino acid sequence design suggests that the enumerated globular single-chain three-helix bundle structures may be designable
As the backbone model generated in the previous section lacked any information on amino acid sequences, we performed sequence designs using Rosetta [24] to check if the compact single-chain three-helix bundle structures are designable as concrete amino acid sequences. We selected 27 backbone structures that are listed in Fig. 3 and performed amino acid sequence designs for these backbone structures. We designed ~ 7000–9000 sequences for each backbone structure and observed that the interfaces between the first and third α-helices recovered the sequence motifs for helix-helix packing (Additional file 1: Figure S18). The results show that the relative arrangements of the first and third α-helices sampled in the sequence-independent backbone-building simulations are realistic enough to mold the typical amino acid sequences observed in helix-helix packing motifs. The optimal combinations of local properties such as the hairpin types and α-helix lengths lead to the successful recovery of non-local features. From these ensembles of design models, we selected the most foldable sequences for each topology using sequence-dependent fragment assembly simulations [27] (Fig. 4 and Additional file 1: Figure S19–S21). In most of the simulation settings, the lowest-score models agreed well with the design models and recovered local hairpin structures well (Additional file 1: Figure S22–S27 and Table S1). We also performed negative-control designs in which the loop regions of the up-down helix bundles have atypical conformations, such as EE, BEB, and BEEE. The best-effort designs for these backbone models were indistinguishable in terms of per-residue Rosetta scores from the designs with typical hairpin motifs (Additional file 1: Table S2). However, they were not able to efficiently fold into the target topology in the sequence dependent fragment-assembly simulations (Additional file 1: Figure S28). This result suggests that the compact up-down bundle structures with typical hairpins have higher designability than the ones composed of atypical hairpins. For these best design models, we performed blast search using blastp against a non-redundant sequence database [28, 29] and confirmed that three were no similar sequences found in the database.
To confirm the stability of designed proteins independently from the Rosetta score function and fragment assembly simulations, we utilized molecular dynamics simulations of designed proteins models and assessed their quality [30, 31]. We performed molecular dynamics simulations for the 27 best design models using GPU-accelerated GROMACS 2020.6 [32, 33] alongside Amber 15FB force field [34]. For each design, we performed 10 trajectories of 100 ns molecular dynamics simulations with explicit TIP3P water models. After energy minimization and equilibration, we performed 100 ns of the production run under the pressure of 1 bar and temperature of 300 K. The simulation showed the most of the design models can stay within 5 Å in the root-mean-square deviations (RMSD) of Cα coordinate from the designed structures for 100 ns (Additional file 1: Figure S29–S34). The only exceptions were 7th trajectory of GB-CW7 and 3rd and 8th trajectories of GBB-CCW6 (Additional file 1: Figure S29 and S31), which resulted in partial unfolding of the structures. These structures may be unstable probably because they have too small hydrophobic cores to maintain the designed topologies. Overall, the molecular dynamics simulations showed that the designed proteins were stable enough to keep the native conformation in the solution state. These results provided independent validation for the designability of the backbone structures we sampled by fragment assembly with the Rosetta score function.
As we designed up-down types of helical bundles, whose substructures can be regarded as antiparallel and parallel coiled coils, we confirmed whether the best-designed sequences can be recognized as coiled-coils by DeepCoil [35,36,37]. Interestingly, the predicted probability to observe coiled-coil arrangements within the designed structure increased as the length of the design proteins increased (Additional file 1: Figure S35–S40). Approximately, when the length of the second α-helix is longer than 15 amino-acid residues, the probability to recognize the sequence as coiled-coil became higher than the significance threshold. This suggests that these up-down helical bundles can be regarded as coiled coils when the chain lengths are large enough. Therefore, although we designed amino-acid sequences without considering the structures are related to coiled coils, sequence design techniques for coiled coils can be repurposed to the sequence design of large helical bundles and may result in more optimized helix-helix packing, which may lower the computational cost of design and improve the yield of successful design sequences. On the other hand, smaller helical bundles failed to be predicted as coiled coils. This does not immediately imply that such small helical bundles are not designable; such small helical bundles were designed in a previous study [38]. Therefore, the sequence design of such small helical bundles should be performed without considering the structure as coiled-coils. This analysis suggested that we may be able to select optimal design methods depending on the size of target helical bundles. It is also interesting whether parametrically designed multiple-chain coiled coils can be redesigned into single-chain helical bundles by designing the loops connecting the α-helices; the question is whether the designer can find appropriate loop conformations to connect the α-helices [39] without frustrations between local and nonlocal interactions [40, 41].
Finally, to detect knobs-into-holes structure in our designs, we performed structure analysis using SOCKET [42]. According to SOCKET, knob-into-holes structures were observed roughly in two-third of our designs (Additional file 1: Figure S41–S44). In addition, SOCKET detected coiled-coil structure in 4 of our designs, although we did not intend to design coiled-coil-like substructures in our design scheme. This also suggests that the design of helical bundles shares many similar aspects with the design of coiled coils, and the rich and matured protocols for coiled-coil design can be imported into the design of single-chain helical bundles.