Structure prediction of polyglutamine disease proteins: comparison of methods

Background The expansion of polyglutamine (poly-Q) repeats in several unrelated proteins is associated with at least ten neurodegenerative diseases. The length of the poly-Q regions plays an important role in the progression of the diseases. The number of glutamines (Q) is inversely related to the onset age of these polyglutamine diseases, and the expansion of poly-Q repeats has been associated with protein misfolding. However, very little is known about the structural changes induced by the expansion of the repeats. Computational methods can provide an alternative to determine the structure of these poly-Q proteins, but it is important to evaluate their performance before large scale prediction work is done. Results In this paper, two popular protein structure prediction programs, I-TASSER and Rosetta, have been used to predict the structure of the N-terminal fragment of a protein associated with Huntington's disease with 17 glutamines. Results show that both programs have the ability to find the native structures, but I-TASSER performs better for the overall task. Conclusions Both I-TASSER and Rosetta can be used for structure prediction of proteins with poly-Q repeats. Knowledge of poly-Q structure may significantly contribute to development of therapeutic strategies for poly-Q diseases.


Background
Knowledge of protein structure can be critical for devising therapeutic strategies for diseases in which protein dysfunction contributes to pathogenesis. For the polyglutamine (poly-Q) diseases, pathogenic poly-Q expansions typically cause gains of toxic functions associated with protein misfolding or aberrant interactions with RNAs or other proteins [1]. At least ten neurodegenerative disorders are caused by poly-Q expansions, including Huntington's disease (HD), dentatorubral and pallidoluysian atrophy (DRPLA), spinal and bulbar muscular atrophy (SBMA), and the poly-Q spinocerebellar ataxias [2] (SCA1, SCA2, SCA3, SCA6, SCA7, SCA8, and SCA17) [3][4][5]. The proteins involved in these diseases have no significant sequence, compositional or structural homologies [6,7] and numerous studies and observations have established that the length of the polyglutamine repeats plays a critical role in the progress and pathogenesis of these diseases [5,8]. Analysis from patients' data reveals that the expansion of polyglutamine repeats beyond certain pathological threshold causes the disease phenotype (Table 1) [9][10][11][12]. Also the number of the glutamines in the polyglutamine region is inversely correlated with age of onset [9,[13][14][15][16][17]. For instance for SCA2, people with 32 or 33 repeats tend to first experience symptoms of SCA2 in late adulthood, while people with more than 45 repeats usually have symptoms by their teens [2].
One possible mechanism for these diseases pathology is the assembly of unfolded protein monomers into β-sheet amyloid fibers [18]. Both in vivo and in vitro studies have shown that the poly-Q expansion may lead to protein misfolding [19] and may cause a structure transition to form parallel β-helix and β-sheet folds [20]. Protein misfolding and aggregation has been shown to depend on the poly-Q length and the concentration of the protein [21][22][23]. As shown in [24] the poly-Q tract will form β-sheet structures when the number of the Qs increases resulting in an increase of the chance of aggregation. Therefore the understanding of the effect of the lengthening of the poly-Q repeat segment on protein folding can provide new insights and perhaps therapies for these diseases.
Although the association of the lengthening of the poly-Q repeats with the related polyglutamine diseases has been known for almost 20 years [25,26], high-resolution structural analysis of these proteins in their native context has eluded researchers [27] and only very limited experimental information exists. Kim has crystallized multiple structures of the N-terminal segment of huntingtin protein with 17 and 36 glutamines repeats [28,29], finding that the poly-Q regions exhibit conformational flexibility with α-helix, random coil, and extended loops [28,29]. These structures are the only crystal structures of poly-Q segments available in the RCSB PDB database. Computational modeling can provide valuable insights to this problem [23,30,31], but to our knowledge no comprehensive studies have been reported comparing the 3D structures predicted for these segments with the limited experimental data available.
The accuracy of the structures obtained using 3D structure prediction programs is improving rapidly, and some of the commonly available programs have shown excellent performance in the CASP competition [32]. However, all the 3D structure prediction programs are trained with a variety of proteins and their performance is usually evaluated on a general dataset [33]. There is no literature evidence reporting the performance of these programs on proteins containing poly-Q tracts. So it is necessary for us to evaluate the performance of these programs before we use them to predict the structure of polyglutamine disease proteins at large scale.
In this paper we present our results of the evaluation of the prediction performance of two efficient and popular 3D structure prediction programs, I-TASSER and Rosetta, on the N-terminal end of huntingtin protein with 17 glutamines (HTT17Q-EX1).

Predicted models
As evidence shows that the poly-Q region can adopt different structures [28,29] in the proteins of interest for poly-Q diseases, it is not appropriate to seek the 'best structure' of this region, but it is more appropriate to look for ensembles of structures (generated by multiple independent runs) which can show overall trends and represent the variety of structures observed by experimental methods.
Following this reasoning, both Rosetta and I-TASSER were run 10 times using different random seeds for each run of 3D structure prediction of the HTT17Q-EX1 sequence shown in Figure 1(b). For each run we kept the five best models, so a total of 50 I-TASSER models and 50 Rosetta models were retained for analysis.
Each structure prediction program will return some parameters to estimate the accuracy of the models. For I-TASSER, the C-score, which lies in the (-5,2) range, is calculated for each model [34]. The C-scores of the best 50 I-TASSER models, listed in Table 2, range from -2.62 to -4.72.
The clustering algorithm from Rosetta was used to identify the most frequently sampled conformations. For each run we selected the five structures with the lowest energy from the structures encountered in the five different clusters in which the number of structures was greater than 10 on each. The energies of the total 50 Rosetta structures, listed in Table 3, they range from 16.06 to 20.13.

Secondary structure
For better visualization, WebLogo [35] was used to display secondary structure patterns. The WebLogo of the secondary structures of the experimental PDB structures and the best I-TASSER and Rosetta models are shown in Figure 2. For easy description, we divided the sequence into three regions: the 17-residue head region including residues 1 to 17; the poly-Q region including residues 18 to 34 and C-terminal region including residues 35 to 60. As discussed in the original publication for the 21 PDB structures most crystals show α-helix in the head region, which is always well resolved, with only a few structures showing turns at the beginning and end of the head region. Both the I-TASSER and Rosetta best models reproduce the observed trends showing a majority of helix structures in the head region, but the I-TASSER Unfortunately, as depicted in Figure 2(a), the number of well resolved structures rapidly decreases beyond the head region making comparison with the experiments less reliable. None-the-less the overall experimental trends are reproduced by both I-TASSER and Rosetta, but it appears that the I-TASSER structures show more loops than the experimental data.
Overall I-TASSER appears to be superior reproducing quite well the stable α-helix structure of the N-terminal regions and showing increased diversity of structures in the poly-Q region and a predominance of coil structures in the C-terminal region.

Reproducibility of I-TASSER and Rosetta results
In order to test the sensitivity of I-TASSER and Rosetta with the selection of the seeds used in the calculations, we have calculated the structure similarity using the TM-score between models obtained using the same prediction program. A total of 1225 TM-scores were generated comparing pairwise the best 50 I-TASSER and 50 Rosetta models, respectively. TM-scores between any two models from I-TASSER range from 0.2781 to 0.7163, with an average of 0.4086 and a standard deviation 0.0692. Whereas the TM-scores between any two Rosetta models range from 0.2865 to 0.8236, with an average of 0.4979 and a standard deviation 0.0892. The difference between TM-scores of I-TASSER and Rosetta is statistically significant (t-test, p < 0.001, Figure 3). The number of TM-scores greater than 0.5 is two times greater for Rosetta/Rosetta pairs than for I-TASSER/I-TASSER pairs, i.e. 561 pairs in Rosetta and 126 pairs in I-TASSER have scores larger than 0.5.
When comparing only the best models of each run, the TM-scores range from 0.4539 to 0.6813 for I-TAS-SER (Table 4) and from 0.2872 to 0.6879 for Rosetta (Table 5). Therefore the best models of each run from I-TASSER are more similar among themselves than those from Rosetta, i.e. 33 pairs of the 45 structure pairs have TM-scores greater than 0.5 for I-TASSER, whereas for Rosetta, only 18 pairs of best models have TMscores greater than 0.5.
The sensitivity to the selected random seeds was also evaluated at the run level. TM-scores were calculated for the structures of any 5 models in one run compared with any 5 models of other runs. The number of pairs with TM-score greater than 0.5 between any two experiments is shown in Table 6 for I-TASSER and Table 7 for Rosetta. For I-TASSER, the number of pairs with TM-score greater than 0.5 ranges from 0 to 6. There are 6 pairs with TM-scores greater than 0.5 between Figure 1 The sequence construction of HTT17Q-EX1. (a) sequence structure of the PDB records; (b) sequence used for structure prediction.  Run 4 and Run 7, however, no pairs with TM-scores greater than 0.5 between Run 1 and Run 8. For Rosetta, the number of pairs with TM-score greater than 0.5 at run level ranges from 5 to 20. 20 of 25 pairs are with TM-scores greater than 0.5 between Run 3 and Run 7, which is the best. The smallest number of pairs for Rosetta is 5, which shows in 3 pairs, Run 1 and Run 6, Run 6 and Run 8, Run 5 and Run 8. These results show that our ensemble approach to predict the structure of proteins associated with poly-Q diseases appears to be appropriate. Using multiple seeds it is possible to obtain an ensemble of structures that show reasonable diversity, but still retain the main features. We believe that this approach is quite promising because it can incorporate in future analysis the diverse structure of which appears to be an emerging observation from the limited experimental data on these proteins.

Validity evaluation of I-TASSER and Rosetta
As depicted in Figure 2(a) not all of the 21 PDB structures have been resolved in the poly-Q region, which is our main interest. For instance, the longest well resolved poly-Q region is the B chain of the 3IOW [PDB: 3IOW] structure in which all the 17 Qs structures are resolved, whereas for the A chain of the 3IOT [PDB: 3IOT] structure only one Q has been resolved. Also, there are numerous gaps in several structures as some of the residues are not resolved. Taking this into account and in order to make an accurate comparison with the experimental ones in the region of interest, only PDB structures in which at least 9 (more than half the total number) of consecutive Qs in the poly-Q region show well resolved structures were used for the evaluation of the results produced with I-TASSER and Rosetta. There are ten PDB structures that meet this criteria: the B  Table 8.
The best 50 I-TASSER and 50 Rosetta models were compared with these 10 PDB structures using the TMalign program. TM-scores, root-mean-square deviation (RMSD), aligned number of residues, sequence identity and the structure superposition were obtained from       TM-align [36]; the number of exact matches and the number of exact matched Qs were extracted from the structure alignment and finally the exact structure overlap (ESO) and exact structure overlap of Qs (ESOP) were calculated using equation (2) and equation (3) given in the methods section. The values of each similarity parameter considered here are shown in Table 9 along with the p-values assessing the significance of the difference between the I-TASSER and Rosetta results. The average TM-score of I-TASSER/PDB superposition pairs is 0.50 and the average TM-score of Rosetta/ PDB pairs is 0.45, reflecting the fact that 253, of the 500, I-TASSER/PDB pairs have TM-scores greater than 0.5 while only 87 pairs of the Rosetta/PDB pairs have TM-scores greater than 0.5. The average RMSD of I-TASSER/PDB pairs (1.53 Å) is also smaller than that of Rosetta/PDB pairs (1.74 Å). Other TM-align parameters depicted in Table 9 also show that I-TASSER performs better than Rosetta in this test.
The structure overlap scores, ESOP and ESO, for I-TASSER models are also better than those for Rosetta models. For instance more than 75% of the Rosetta models have no exact match in the poly-Q region nor for the entire sequence, whereas the 75% quantile of the ESO and ESOP scores for I-TASSER are 53.13 and 9.09, respectively. The statistical tests have shown that these differences are significant ( Table 9).
Fifty of the I-TASSER/PDB structure superpositions have ESOP values greater than or equal to 50, which means that 50 pairs have more than 50% of Qs in the poly-Q region with exact match. These 50 pairs include 9 of the 10 PDB structures, so 9 of the 10 structures have corresponding I-TASSER models with very good matches in the poly-Q regions. In contrast only 5 of these 10 structures have corresponding Rosetta/PDB structure superposition matches when the same criteria are used.
The best matches between the predicted structures by I-TASSER and Rosetta, respectively, and one of the PDB structures considered here are depicted in Figure 4

Discussion
This study evaluated two software tools for predicting from amino acid sequences, the 3D structures of the poly-Q regions of proteins related to polyglutamine diseases. Pathogenic neurodegenerative poly-Q proteins were used as a model, for relevance to developing structure-specific therapeutics based on normal vs. poly-Q expanded protein structures. Two highly recognized and efficient 3D structure prediction programs, I-TASSER and Rosetta, were evaluated to assess their performance for structure prediction using segments of the huntingtin protein harboring poly-Q repeats. Both I-TASSER and Rosetta produced good results.
When tested for structure stability under changes of the initial random seed, Rosetta shows less variability than I-TASSER. This means that if we run Rosetta and I-TASSER several times respectively, it is possible that we will get less variance in the results from Rosetta than from I-TASSER. None-the-less, both programs produce a reasonable ensemble of structures with sufficient diversity and without extreme deviations. Several studies have illustrated that the poly-Q repeat regions of these proteins are highly disordered with structure flexibility [31], but this has not been quantified experimentally. Therefore it is challenging to discriminate among these two approaches using these criteria. In consequence we must conclude that both I-TASSER and Rosetta are suitable for the task on predicting ensemble structures of protein containing poly-Q segments.
The accuracy of the prediction program is a very important factor that we evaluated here. In this study, the structure similarity between the predicted models and the PDB experiment structures available was used to evaluate the validity of the prediction programs. The root-mean-square deviation (RMSD) score is the most often-used parameter to calculate the structure similarity, but a drawback of its use is that a relatively small local variation can result in a high RMSD [37]. TM-score weights the close atom pairs stronger than the distant matches, and it is more sensitive to the topology fold than the RMSD [37]. Besides the global similarity measured by TM-score, more restricted scores on the exact match of two structures were also calculated. The exact structure overlap (ESO), derived from the structure overlap (SO) score [38], was introduced and instead of calculating the number of aligned pairs it counts the exact match pairs, which not only counts aligned residues but also residues that lie in the same positions in both the sequences of predicted model and PDB structure. The exact structure overlap of poly-Q repeat (ESOP) is the special version of ESO, which is used to measure the prediction accuracy in the poly-Q region. Considering the TM-score, ESO and ESOP together gives a more comprehensive view of similarity between the predicted model and the PDB experimental structures from both a global and a local aspect. The ESO score and ESOP score can be used for similarity comparison tasks, especially if there are regions which play more critical roles than others. Rosetta models have a larger number of aligned residues on average than I-TASSER, but the average RMSD values and TM-scores are much higher (lower) than that of I-TASSER. So when the Rosetta models are aligned with the PDB structures, the distance between the models and the experimental structures is large, which is not a good sign for good structural matches. On the contrary, I-TASSER models aligned better with PDB structures not only with better RMSD and TM-scores, but also better ESO and ESOP scores. This can also be seen from the secondary structures patterns. When considering specific structure pairs, both I-TAS-SER and Rosetta have predicted models which can match the PDB structures with good global (TM>0.5) and local (ESO>=50 and ESOP>=50) structures. So both Rosetta and I-TASSER have the ability to get the native models, but for the overall performance, I-TASSER appears to be better than Rosetta.
As several models are returned by the structure prediction programs, it is important to have criteria to select the best models. However, the model with the lowest energy in the prediction program may not be the best model for reproducing the poly-Q regions. For instance for Rosetta, the two predicted models with TM-score greater than 0.5 and ESOP of 100 (Figure 4(b) and 4(c)) are not the models with the lowest energy in that Rosetta run. This is true also for the I-TASSER model with TM-score greater than 0.5 and ESOP of 100 (Figure 4(a)). In fact, of the 29 good models which have TM-score greater than 0.5 and ESOP score greater than 50, only one model is ranked as the best by I-TASSER.

Conclusions
Both I-TASSER and Rosetta can be used for in silico studies of the structures of proteins with poly-Q repeats related to neurodegenerative diseases. However, I-TAS-SER shows better performance than Rosetta when considering the overall agreement between results produced using these two prediction models with the limited experimental results available for comparison.
Both I-TASSER and Rosetta are computationally efficient as both applications can be easily parallelized by executing numerous jobs each with a unique random seed.
In our future studies we will attempt to predict the change of the structure as function of the number of Qs in the poly-Q repeat segment for all the proteins involved in poly-Q neurological diseases. Ideally we could use both these two programs to predict structures of the poly-Q disease related proteins. This could provide a quasi "crowdsourcing" mechanism to cross check the results, but may prove computationally too expensive (see Methods). Therefore the results presented here suggest that studies should be, at least initially, performed using I-TASSER.

Poly-Q segments
We searched the RCSB PDB database [39] for structures with more than 10 consecutive glutamines in their sequences on November 2012. A total of 11 structures were retrieved, including 7 of the first exon of the huntingtin protein with 17 glutamines (HTT17Q-EX1) [28] and 4 of the first exon of huntingtin protein with 36 glutamines (HTT36Q-EX1) [29]. Figure 1(a) shows the sequence construction for the X-Ray diffraction experiment on HTT17Q-EX1 which was expressed and crystallized as a maltose-binding (MDP) fusion protein [28]. The same methods were used to get the crystal structure of HTT36Q-EX1, but the resolution of the HTT36Q-EX1 is of such poor quality that only HTT17Q-EX1 structures were used in this study.
PDB Each crystal includes a trimer of MDP-HTT17Q-EX1, so a total of 21 structures of HTT17Q-EX1 were considered. Figure 1(b) shows the sequence of the HTT17Q-EX1 used as the input of the 3D structure prediction.

Protein 3D structure prediction
Two protein structure prediction programs were used in this study, I-TASSER and Rosetta. Both I-TASSER and Rosetta have been used by thousands users and they are among the few programs which can handle large proteins with more than 1000 residues [34,40].
I-TASSER is the 3D structure prediction program based on multiple-threading alignments and iterative template fragment assembly simulations [41]. I-TASSER is a fully automated method and was used without further modifications, but we have verified that none of the templates corresponding to the structures 3IO4 3IOW] was included in the knowledge data used in the version of I-TASSER used here. Rosetta is a flexible software suite for macromolecular modeling, which includes tools for structure prediction and design [42]. Rosetta ab initio module was used in this study. For Rosetta, the quota protocol fragment picking was used to generate 3mers and 9-mers fragments, which took into account the secondary structure predictions by PsiPred [43], Jufo9D Server [44] and SAM-T08 [45] as the quota pools. The weight given to the each quota pool was assigned following reference [46] and 200 fragments were picked from the total of 700 candidates available from both 3-mers and 9-mers fragments. The default parameters were used for Rosetta ab initio modelling with the number of output structures set as 5000, the default parameters also were used for Rosetta cluster module.
We installed I-TASSER Version 2.1 and Rosetta Version 3.4 in a cluster at the Center for High Performance Computing (CHPC) of University of Utah, where all computations were performed. As a fully automated program, the number of decoys to screen and the number of simulation jobs in I-TASSER are fixed, whereas Rosetta is much more flexible and users can define the output number of structures and the number of parallel simulation jobs, making it much more adaptable to the hardware architecture used. So it is difficult to compare the computational cost of the two programs. However, for the modelling tasks with the parameters used in our simulation, the total CPU time for I-TASSER to finish one HTT17Q-EX1 (60 amino acid residues) prediction was, in average, 24.58 hours using one core in a 2.4 GHz dual-core Opteron processor, whereas the average total CPU time for Rosetta to finish one HTT17Q-EX1 prediction with 5000 prediction structures was about 50.91 hours in the same computing environment.

3D structure alignment
To assess 3D structure similarity, TM-align was used for structure comparison and alignment [36]. The TM-score calculated by TM-align, which lies in (0,1] interval, is considered a good measure of the similarity of two structures [37]. A TM-score of less than 0.17 indicates a random alignment, whereas TM-score greater than 0.5 indicates that the two structures are generally in the same fold [37].

Similarity measurement
Besides the TM-score, exact structure overlap (ESO) and exact structure overlap of poly-Qs (ESOP) were also used to measure the similarity of two structures. The words 'exact' here means the aligned residues are within certain threshold, 5Å in this study, and that they are the same residue in the HTT17Q-EX1 sequence. For example, if a serine (SER) in the 16th position of the predicted structure of HTT17Q-EX1is aligned, within the distance threshold, with the serine (SER) in the 16th position of PDB experimental structure, the 16SER-16SER is an exact match. ESO and ESOP is derived from the Structure Overlap (SO) which is a standardized score to compare the structure alignments and measure the local similarity of two structures [38]. The SO score is calculated as: where L(A) is the structure alignment length; the Lm and Le are the length of the predicted model and the experimental structure, respectively.
We have modified Equation (1) to meet the aim of more strict structure comparison, and get the ESO score: where L(EA) is the length of exact match; Lm and Le is the length of predicted model and the length of the PDB experimental structure respectively.
The structure of poly-Q region may play a more important role than other positions. In this study, the ESOP score is calculated to evaluate the structure similarity of the poly-Q regions. The ESOP is a special version of ESO, and it is calculated as: where L(EAQ) is the length of the exact match of Qs; LQm and LQe are the length of poly-Q in predicted model and PDB experimental structure respectively.

Secondary structure calculation
The secondary structure of the predicted models and the PDB experimental structures were calculated using the DSSP algorithm, which is an algorithm to standardize secondary structure assignment [47]. Secondary structures assigned by DSSP are 8 conformational states, including α-helix, β-bridge, strand, 3-helix, 5-helix, turn, bend, and random coil.
The results of DSSP are the secondary structures represented by one letter for each position. In order to get a better view of the results, 'WebLogo 3 ' [35] was used to plot the secondary structure logo at each position. The overall height of the stack indicates the secondary structure conservation at that position, and the height of the symbols within the stack indicates the relative frequency of each secondary structure type at that position.

3D structure visualization
The 3D structure and the 3D structure superposition were visualized in the UCSF Chimera software, a free program for molecular graphics and analysis [48].

Statistics
To depict the data distribution of the parameters calculated here, the (mean value ± standard deviation) is listed for data with normal distribution, whereas for data that do not follow the normal distribution, the 25% quantile and 75% quantile values are listed.
The Student t test was applied for data with normal distribution and the Wilcoxon ranked test was performed on other data sets to assess significance. The significant level was set at 0.05. All the statistic work was done in the R environment which is a free software environment for statistical computing and graphics [49].