- Research article
- Open Access
Convergent evolution in structural elements of proteins investigated using cross profile analysis
© Tomii et al; licensee BioMed Central Ltd. 2012
- Received: 20 June 2011
- Accepted: 16 January 2012
- Published: 16 January 2012
Evolutionary relations of similar segments shared by different protein folds remain controversial, even though many examples of such segments have been found. To date, several methods such as those based on the results of structure comparisons, sequence-based classifications, and sequence-based profile-profile comparisons have been applied to identify such protein segments that possess local similarities in both sequence and structure across protein folds. However, to capture more precise sequence-structure relations, no method reported to date combines structure-based profiles, and sequence-based profiles based on evolutionary information. The former are generally regarded as representing the amino acid preferences at each position of a specific conformation of protein segment. They might reflect the nature of ancient short peptide ancestors, using the results of structural classifications of protein segments.
This report describes the development and use of "Cross Profile Analysis" to compare sequence-based profiles and structure-based profiles based on amino acid occurrences at each position within a protein segment cluster. Using systematic cross profile analysis, we found structural clusters of 9-residue and 15-residue segments showing remarkably strong correlation with particular sequence profiles. These correlations reflect structural similarities among constituent segments of both sequence-based and structure-based profiles. We also report previously undetectable sequence-structure patterns that transcend protein family and fold boundaries, and present results of the conformational analysis of the deduced peptide of a segment cluster. These results suggest the existence of ancient short-peptide ancestors.
Cross profile analysis reveals the polyphyletic and convergent evolution of β-hairpin-like structures, which were verified both experimentally and computationally. The results presented here give us new insights into the evolution of short protein segments.
- Ancestral Sequence
- Sequence Profile
- Protein Segment
- Amino Acid Preference
- Turn Region
Abundant examples of similar segments appearing in different protein folds, here continuous structural fragments in native protein folds, have been reported. Although some of those segments are believed to have originated from common ancestors, evolutionary scenarios for many of those segments are not clear. As opposed to the monophyletic scenario of presently existing protein domains, Lupas et al. argued the hypothesis of ancient short peptide ancestors . They found local sequence and structure similarities such as P-loops, zinc finger motifs, and Asp boxes, in different protein folds based on results of all-against-all structural comparisons of segments using their rigorous structure comparison method. The reason they employed their structure comparison method is that occurrences of such segments 'might not be expected to be meaningful from a sequence-only perspective '.
Originally, the profile method was developed by Gribskov et al. . Since that time, sequence profiles calculated from multiple alignments of protein families have been used for finding distantly related protein sequences. Here, a profile is a table that lists amino acid preferences in each position of a given multiple sequence alignment. Results show that the inclusion of evolutionary information for both the query protein and for proteins in the database being searched improved the detection of related proteins . These profile-profile comparison methods, which are sequence-based methods, are fundamentally superior to the profile method both in their ability to identify related proteins and to improve alignment accuracy [3–5]. Then, Friedberg and Godzik (2005) constructed a segment dataset, called Fragnostic, by combining the scores of their profile-profile comparison method, FFAS03 , and the Cα root mean square deviation (RMSD) of the structural alignment. They presented an alternative view of the protein structure universe in terms of the relations between interfold similarity and functional similarity of proteins via segments . They found functional commonalities of proteins with different folds that share the similar segments, such as dimetal binding loops. Therefore, the segments are shared by many different protein folds.
Profile-profile comparison methods have been developed and used for various purposes other than the original one. For instance, profile-profile comparison methods were applied in an attempt to establish evolutionary relations within protein superfolds . In this attempt, among three small β-barrel folds, intra-fold similarity scores calculated using profile-profile comparisons were used to identify functionally distinct sub-families. An amino acid sequence-order-independent profile-profile comparison method (SOIPPA) has been proposed and used for functional site comparison to find distant evolutionary relations by integrating local structural information . Some novel evolutionary relations across folds were detected automatically using SOIPPA. Recently, Remmert et al. proposed the possibility of divergent evolution of outer membrane β proteins from an ancestral ββ hairpin using their HMM-HMM comparison method . Using two atypical proteins as analogous reference structures, they argued that similarities of outer membrane β proteins are unlikely to be the result of sequence convergence.
However, no application of profile-profile comparison methods combines sequence-based profiles and structure-based profiles to capture more precise sequence-structure relations. Amino acid sequence patterns in proteins can be represented as profiles constructed using sequence and/or structural information. On one hand, comparison of sequence-based profiles based on evolutionary information is known to be highly effective for protein fold recognition , even when they are constructed without including explicit structural information, which indicates that they might harbor structural information. On the other hand, some amino acid substitution patterns, which reflect the physicochemical constraints of local conformations, are well known to correlate strongly with the protein structure at the local level. Profiles or position-specific amino acid propensities based on local structural classification have been used to study local sequence-structure relations for many years . Moreover, libraries of sequence patterns that correlate well with local structural elements have been constructed [13, 14]. Amino acid propensities were analyzed at each position of short protein segments within a structural cluster obtained by structural classification methods [15–18]. Position-specific amino acid propensities in protein segments with two consecutive secondary structure elements have also been investigated to support protein structure prediction . Pei and Grishin effectively combined evolutionary and structural information to improve local structure predictions .
Consequently, the aim of this study is to identify properties that are common to both profile types, and to find novel sequence-structure relations. To this end, we developed a method we call "Cross Profile Analysis" to compare structure-based profiles originating from the results of local structural classifications, with sequence-based profiles produced by PSI-BLAST using FORTE, our profile-profile comparison method [21, 22]. Using structure-based profiles derived from clusters of segment structures with 9-residue and 15-residue lengths as a starting point, we identified several structure-based profiles that correlate well with sequence-based profiles. These correlations indicate structural similarity between conformations of a segment cluster and the local structures corresponding to the segments of a protein family whose sequence-based profile exhibited strong correlation with a structure-based profile. This report describes previously undetectable sequence-structure patterns that transcend protein superfamily and fold boundaries, especially for segments that contain β-hairpin-like structures, shared by proteins with two distinct folds. Furthermore, through experimental measurements, we demonstrate that a deduced peptide corresponding to the segments, which has been shown to exhibit such sequence-structure correlation, is structurally stable in aqueous solution, suggesting the existence of ancient short peptide ancestors. We discuss the possibility of the convergent evolution of the protein short segments with patterns detected using our cross profile analysis.
Cross Profile Analysis
Results of the cross profile analysis for 9-residue-long segments
Cluster ID (# of segments in the cluster)
Amino acid preferences
# of hits in the FORTE library
SCOP ID of hits
Results of the cross profile analysis for 15-residue-long segments
Cluster ID (# of segments in the cluster)
Amino acid preferences
# of hits in the FORTE library
SCOP ID of hits
On average, Cα RMSDs between the medoid segments of structural clusters and the segments of hits (Z ≥ 8) in the FORTE library were, respectively, 0.84+/-0.89 Å for 9-residue-long segments, and 1.94+/-1.61Å for 15-residue-long segments. Although some exceptions with large RMSDs that might be false positives exist, these results are separate from the results of random match of 9-residue and 15-residue-long segments reported by Du et al. . They calculated RMSDs between randomly chosen fragments and reported their distribution. They found that the centers of distributions for 9-residue and 15-residue-long segments were located, respectively, at 3.5 Å and 5.0 Å. Their definitions of segments with respect to the amount of secondary structures are matched with conformations of these segments (see Additional file 1, Figures S1 and S2). These results clearly indicate the structural similarity between conformations of a segment cluster and the local structure of a protein family. Generally, significant correlation between profiles of two different types indicates not only the similarities of amino acid substitution patterns but also those of the structural similarities of constituent segments of both sequence-based and structure-based profiles.
The 12 profiles derived from the structural clusters for 9-residue-long segments showed correlation with sequence profiles in seven different protein folds according to the SCOP classification. Half of them showed correlation with 18 sequence profiles of segments in proteins that possess an α-α superhelix fold (SCOP ID: a.118). In Table 1 the profile of cluster #181 was apparently similar to the profiles of clusters #184, #246, and #247. These were the 'adjacent-segment' effects described above. Similarly, the profile of cluster #140 was similar to that of cluster #313 in Table 1 (and also to that of #147 in Table 2). The profile derived from cluster #366 showed strong correlation with 14 sequence profiles of segments corresponding to Ca2+-coordinating loops in proteins of the EF-hand superfamily (SCOP ID: a.39.1). The 12 clusters of 15-residue-long segments show correlation with a more diverse set of proteins (Table 2) than was the case for the clusters of 9-residue-long segments, i.e., correlation observed in 11 different protein folds. However, most of the correlations above the threshold were observed between the sequence profiles of segments of the EF-hand superfamily and the profiles derived from cluster #222, which clearly reflects the functional constraints on protein sequence evolution. Apparently, the profile of cluster #366 in Table 1 corresponds to part of the profile of clusters #222 in Table 2.
In principle, methods used for the structural classification of the protein segments are expected to affect structure-based profiles. However, a small change of parameters such as a threshold variable for structural similarity Dth used for clustering has been demonstrated not to have much effect on the results in our previous study . We observed robustness of the shapes of the distribution of segment clusters. For instance, we showed the dependence of a threshold parameter on the clustering results is minimum around Dth = 30°, which we used for this study, to 40° (see  for more details).
Preserved sequence-structure patterns
In the cross profile analysis of the 15-residue-long segments, we identified preserved sequence-structure patterns that transcend protein superfamily or fold boundaries that were previously undetectable (cf. Table 2).
(i) 1p1lA:2-16, 1kr4A:7-21, and 1mwqA:58-72
(ii) 1jnrA:614-629 and 1kthA:16-31
In both 1jnrA and 1kthA, the sequence profiles of two consecutive 15-residue length segments show significant correlation (Z ≥ 8) with structure-based profiles of two clusters (Table 2). The N-terminal regions of 1jnrA:614-628 and 1kthA:16-30 showed correlation with cluster #235, whereas the C-terminal regions, 1jnrA:615-629 and 1kthA:17-31 showed correlation with cluster #159. The structure-based profiles reflect the results from the structural classifications of the protein segments. Therefore, we investigated the composition of the two clusters #235 and #159 to check whether segments similar to those of 1jnrA and 1kthA are included in them. Most of the segments in the two clusters mutually overlap. As expected, 61 out of the 84 segments in cluster #235 and 119 segments in cluster #159 are derived from adjacent positions in the same proteins. The clusters contain segments that mainly originate from all-β (ca. 40%) and α+β proteins (ca. 27%). However, it is unlikely that this suggests bias in the usage of the folds because the segments are derived from 58 folds (cluster #235) and 76 folds (cluster #159). Although the two proteins, 1g6x and 2knt, from the BPTI-like fold class (SCOP ID: g.8) are included in the clusters, no protein of the spectrin repeat-like fold class (SCOP ID: a.7) is incorporated. Consequently, at least for 1jnrA, no readily apparent evolutionary relation exists to explain the remarkable correlation between sequence-based and structure-based profiles. The segments of the two structural clusters are included in Additional file 2, Table S1.
Our classification results obtained using the SCOP 1.73 release (November 2007) show that there are 15 superfamilies with the spectrin repeat-like fold among the clusters. Of those, domain 1 of 1jnrA:503-643 contains the 1jnrA:614-629 segment belonging to the succinate dehydrogenase/fumarate reductase flavoprotein C-terminal domain superfamily. Of the 15 superfamilies, only three, succinate dehydrogenase/fumarate reductase flavoprotein C-terminal domain, ribosomal protein S20, and PhoU-like superfamilies, have an 'additional' β-sheet at the C-terminus portions. Compared to the β-sheet of 1jnr, the region corresponding to both the β-sheet at the C-terminus portion of ribosomal protein S20 and the PhoU-like superfamily is small. Moreover, according to SCOP, the region is assigned to other domains that belong to other folds, instead of to the spectrin repeat-like fold, as is true when other classification databases such as CATH and VAST  are used. According to the classification of both the CATH and SCOP database, the BPTI-like fold (or the factor Xa Inhibitor topology) consists of a single superfamily.
Sequence evolution of the segments in each family
We measured the 'direction' of the amino acid sequence evolution of the segments, including the FLVC-segment and BPTI-segment, as described above, in terms of the compatibility with the structure-based profiles. This compatibility might reflect the physicochemical constraints or preferences of segment conformations in clusters #235 and #159. We calculated the score S for a sequence in the structure-based profiles of clusters #235 and #159 (see eq. (2) in Methods), and postulated that high scores indicate high compatibility of the sequence with the profile. We compared the scores between existing and deduced ancestral sequences, and considered that differences in the scores ΔS (see eq. (3) in Methods) reflect the direction of sequence evolution. Here, the results suggest that negative ΔS means that existing sequences are less compatible with the structure-based profile than their ancestral sequences in terms of β-hairpin-like structure that we identified.
The results might be explainable using either of two evolutionary scenarios: divergent or convergent evolution. However, for the following reasons, we speculate that those segments originated from distinct ancestors in this case. First, we found similarities between the structure-based profiles and the sequence profiles of two distinct protein families rather than direct similarities between segments of two distinct families. Consequently, it is difficult to hypothesize that those segments originated from a common ancestor through an evolutionary mechanism that necessarily occurred before the divergence into two distinct families. Although sequences of the Pfam protein family ID: PF02910 are distributed mainly in bacteria, most sequences in the Pfam protein family ID: PF00014 are distributed in eukaryotes. In addition, the functions and localization of two protein families are completely different. Protein sequences of PF02910 are parts of reductases, dehydrogenases, and oxidases in a cell. In contrast, proteins of PF00014 are secreted proteins which function as protease inhibitors or toxins. Furthermore, for example, in humans, 1kthA (= CO6A3_HUMAN/3111-3163) is encoded in an exon, i.e. no exon boundaries exist in its portion. There are no introns in the gene that encodes 1jnrA (= O28603_ARCFU/519-641), which is a portion of a large archaeal protein. Finally, it is difficult to imagine that present proteins of PF00014 were derived originally from both the turn region of β-hairpin-like structures and the rest because these proteins are too small to be stable and functional without this region. Taken together, the similarity between segments presented here does not necessarily indicate common evolutionary ancestry. It is apparently a reflection of physicochemical constraints of local conformations, i.e., it seems probable that convergent evolution might have occurred for this case. The evolutionary directions analyzed in Figure 8 also support the scenario of convergent evolution.
Implications for short autonomous elements
We have identified several structural clusters with structure-based profiles that show remarkably strong correlation with sequence-based profiles. We have observed that most segments are structurally similar, and are similar also to other segments in the cluster(s). For example, 15-residue-long segments of 1jnrA:615-629 in the FLVC-segment and 1kthA:17-31 in the BPTI-segment are similar to one another. The two segments are also similar to segments in cluster #159, whose profile indicates significant correlation with their sequence-based profiles. Do segments fold into particular structures irrespective of their context? To ascertain this, we synthesized 15-residue peptides with the deduced sequence of cluster #159 (TIIMWYYDPETGEWW), which has the highest score, i.e. the most compatible sequence with the structure-based profile of cluster #159, and conducted several experiments to elucidate its 3D-structure in aqueous solution.
Such speculation can be inferred not only from our results but also from other experimental studies. The peptide described above is not a first short autonomous element, derived from native proteins, that exhibits high foldability and stability. Several short fragments such as C-peptide of ribonuclease A , a C-terminal helix of cytochrome c , G-peptide of protein G [39, 40] and an N-terminal fragment of ubiquitin  forms their native-like conformations by themselves, although most isolated fragments cannot retain the original conformation without interactions with the remaining proteins. In addition, several pioneering works have succeeded in creating artificial assemblies that consist of a combination of short fragments as structural building blocks [42–48].
In 9-residue-long and 15-residue-long segments, we identified several segment clusters with structure-based profiles that show significant correlations (Z ≥ 8) with sequence-based profiles. We found significant correlation between a sequence-based profile and a structure-based profile, indicating structural similarity between the local structure of a protein family and representatives of a segment cluster. We found exceptionally strong correlation between amino acid preferences and local structures in all except one of the 42 9-residue-long segments (L = 9) and in 47 of the 50 15-residue-long segments (L = 15). These results suggest strong correlation between sequence substitution patterns and structures for some elements in proteins, in agreement with earlier results [13, 49]. Results also suggest that our method does not require calculation of the structural similarity between two segments to identify similar segments in both sequence and structure, in contrast to previous studies [1, 7].
Although many examples of significant correlations between sequence profiles and structural profiles of protein segments are apparently related to divergent evolution, several sequence-structure patterns that transcend protein family, superfamily, and even fold boundaries were identified. In those cases, the patterns found in the ferredoxin-like fold correspond to structurally equivalent segments within the fold. This example suggests the duplication of ancestral segments.
Through cross profile analysis, this report elucidates the preserved sequence-structure patterns, which designate β-hairpin-like structures shared by different protein folds. Based on the evolutionary analysis of two distinct proteins, these segments might be examples of convergent evolution using the sequence and structural information of consecutive segments. These results present a clear contrast to those of an earlier study  which found exclusively distant evolutionary relations using an order-independent profile-profile method. Most examples reported in the present study are apparently not under functional constraints, except for the EF-hand motif. In general, sequence-function correlations such as the catalytic triads and the EF-hand motif are often prominent and are easier to detect than sequence-structure correlations. Our cross profile analysis method is able to detect subtle sequence-structure correlation.
Irrespective of residue environments in proteins, these segments whose sequence-based profiles show correlation with structure-based profiles of specific clusters (#159 and #235) have well-preserved structures. Therefore, we examined the conformational properties, in aqueous solution, of a consensus peptide sequence from a cluster with these properties. CD spectral analysis of the peptide solution strongly suggests that the peptide has the property of a short autonomous element that exhibits high foldability and stability. This observation suggests that segments of the clusters that show good correlations with sequence-based profiles are autonomous elements, which are also local sequence/structure motifs, such as those in the I-sites library . Other reports have described the potential use of local sequence information to improve protein structure prediction. This report describes a new water-soluble β-hairpin-like peptide, which might support the hypothesis of polyphyletic origins of presently existing protein domains. Lupas et al.  discussed the possibility of the evolution of proteins from peptides and argued that one candidate ancient peptides or fundamental elements of proteins is a β-hairpin-like peptide . The results presented here provide new insights into the evolution of protein short segments. Moreover, they are expected to be useful in improving our understanding of protein folding and evolutionary mechanisms.
Construction of profile libraries
Preparation of structure-based profiles
The local structures of 9-residue-long and 15-residue-long protein segments were classified to obtain structure-based profiles. A non-redundant dataset of protein structures was used for classification. Representative proteins were obtained from the PDB select dataset (Sep. 25, 2001, version) , which contains 1,614 chains (resolution < 3.0 Å; R-factor < 0.3; sequence identity < 25%). Representative proteins were divided into short segments using a sliding L-residue window. Segments can be mutually overlapping.
Local structures of segments consisting of consecutive L (= 9, 15) amino acids were classified using a single-pass clustering method  as follows: i) Choose a segment and declare it to be in a cluster of size one. ii) Choose the next segment and compute distances from this segment to the centroids of all clusters. iii) Add the segment to the "nearest" cluster. If no cluster is sufficiently close (within a certain threshold), then declare the segment to be in a new cluster. In step iv) Go back to (ii) and repeat the process until all segments are classified. All parameters characterizing the distribution of the local structures were determined directly by assigning an arbitrary value to a threshold variable for structural similarity, Dth, that is defined based on the backbone dihedral angles. In this study, clustering results were obtained by assigning 30° to Dth. Detailed explanations of the clustering method can be found in a related paper .
In that equation, p i (j) represents the probability of observing amino acid j at position i in the segments of a cluster, and p(j) signifies the composition of amino acid j. Although several methods exist to convert a multiple alignment into a score, we employed a simple amino acid propensity that was calculated with neither weights nor pseudo-counts for this study. This propensity corresponds to the ratio of the frequency count of a certain residue type appearing at a particular position to the global frequency count of the amino acid residues. The segments and information of amino acid preferences in each structural class were classified using ProSeg: a database of local structures of protein segments http://riodb.ibase.aist.go.jp/proseg/index.html.
Preparation of sequence profiles
The FORTE system (see below) holds the sequence profile library of representative proteins whose structures are known. The amino acid sequences of those proteins are derived mainly from the ASTRAL  40% identity list according to the SCOP classification . Representative sequences that are not in SCOP were selected from the PDB entries . The FORTE library includes 7,419 sequence-based profiles.
To generate the sequence PSSMs of the library, PSI-BLAST iterations with the nonredundant (NR) amino acid sequence database from NCBI  were performed up to 20 times. The NR NCBI protein database was clustered using a 95% sequence identity threshold and the CD-HIT program  to reduce computational time. The 95% representative sequences of the NR NCBI protein database were then masked using the pfilt program in the PSIPRED package . When we performed PSI-BLAST iterations, we set 5 × 10-4 as the e-value cutoff value for inclusion in the next pass . We applied the makemat program of the IMPALA package  to prepare the PSSMs from the PSI-BLAST outputs.
We have developed our own profile-profile comparison method, the Fold Recognition Technique (FORTE), which uses large amounts of sequence information, optimized gap penalties, and correlation coefficients as the scoring scheme to measure the similarity between two profile columns. Using FORTE, profile-profile comparisons were performed. To build an optimal alignment between two compared profiles, we used the global-local algorithm, which is based on the global alignment algorithm with no penalty for the terminal gaps. The significance of each alignment score is estimated by calculating Z-scores using a simple log-length correction. The FORTE server is available at http://www.cbrc.jp/forte/. Successful examples of its application can be found in the literature [11, 22, 60]. For the present study, we used position-specific matrices derived from local structural classifications as query PSSMs to find significant correlation with sequence profiles (Figure 1).
Score calculation of ancestral and existing sequences for a profile
Construction of ancestral sequences
To obtain the ancestral sequences of the two Pfam protein families, PF02910 and PF00014, we used the set of 40% representative sequences clustered by the CD-HIT program with 'full' members of the Pfam families (3,109 PF02910 sequences and 2,143 PF00014 sequences), and by adding 1jnrA (= O28603_ARCFU/519-641) to the 40% representative PF02910 sequences and 1kthA (= CO6A3_HUMAN/3111-3163) to the 40% representative PF00014 sequences. The root sequences were generated by ANCESCON  with the "Alignment-Based rate factor" method based on the Pfam alignments of selected sequences (209 sequences from PF02910 and 236 sequences from PF00014) described above. For the PF02910 family, we regarded the next root sequence (see Figure 7) as an ancestral sequence because the deduced root sequence lacked two amino acids in the segment that corresponds to the FLVC-segment. One branch comprising 22 sequences that lack most amino acids in the region of interest was excluded from the following calculation.
Calculation of scores for a structure-based profile
In this calculation, of the 187 representative PF02910 sequences, we excluded 43 sequences that have no amino acids in the segment that corresponds to the FLVC-segment.
The synthetic peptide (TIIMWYYDPETGEWW) was purchased from Biosynthesis Inc. (Lewisville, Texas, USA). The identity and purity of the peptide were confirmed using mass spectrometry with a MALDI-TOF MS instrument (Voyager; Applied Biosystems) and using reversed-phase chromatography with an AKTA purifier (GE Healthcare) and a C18 column. Both the N-terminal and C-terminal of the peptide were in free-form (not protected).
Peptide conformation analysis
Circular dichroism (CD) spectra were recorded on a J-805 spectropolarimeter. The synthetic peptide was dissolved at 0.26 mM in 70 mM sodium phosphate buffer (pH 8.0). Spectra were measured at several temperatures and represented in units of molecular ellipticity per mole of residue (MRE). Thermal denaturation of the peptide was almost reversible (ca. 100%), as judged by recovery of the spectra upon cooling.
We thank Ayako Ooishi for her assistance in CD measurements, and Raymond Wan for proofreading the manuscript.
- Lupas AN, Ponting CP, Russell RB: On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? J Struct Biol 2001, 134(2–3):191–203. 10.1006/jsbi.2001.4393View ArticlePubMedGoogle Scholar
- Gribskov M, McLachlan AD, Eisenberg D: Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA 1987, 84(13):4355–4358. 10.1073/pnas.84.13.4355PubMed CentralView ArticlePubMedGoogle Scholar
- Ohlson T, Wallner B, Elofsson A: Profile-profile methods provide improved fold-recognition: a study of different profile-profile alignment methods. Proteins 2004, 57(1):188–197. 10.1002/prot.20184View ArticlePubMedGoogle Scholar
- Rychlewski L, Jaroszewski L, Li W, Godzik A: Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci 2000, 9(2):232–241.PubMed CentralView ArticlePubMedGoogle Scholar
- Panchenko AR: Finding weak similarities between proteins by sequence profile comparison. Nucleic Acids Res 2003, 31(2):683–689. 10.1093/nar/gkg154PubMed CentralView ArticlePubMedGoogle Scholar
- Jaroszewski L, Rychlewski L, Li Z, Li W, Godzik A: FFAS03: a server for profile-profile sequence alignments. Nucleic Acids Res 2005, (33 Web Server):W284–288.Google Scholar
- Friedberg I, Godzik A: Connecting the protein structure universe by using sparse recurring fragments. Structure 2005, 13(8):1213–1224. 10.1016/j.str.2005.05.009View ArticlePubMedGoogle Scholar
- Theobald DL, Wuttke DS: Divergent evolution within protein superfolds inferred from profile-based phylogenetics. J Mol Biol 2005, 354(3):722–737. 10.1016/j.jmb.2005.08.071PubMed CentralView ArticlePubMedGoogle Scholar
- Xie L, Bourne PE: Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments. Proc Natl Acad Sci USA 2008, 105(14):5441–5446. 10.1073/pnas.0704422105PubMed CentralView ArticlePubMedGoogle Scholar
- Remmert M, Biegert A, Linke D, Lupas AN, Soding J: Evolution of outer membrane beta-barrels from an ancestral beta beta hairpin. Mol Biol Evol 2010, 27(6):1348–1358. 10.1093/molbev/msq017View ArticlePubMedGoogle Scholar
- Dunbrack RL Jr: Sequence comparison and protein structure prediction. Curr Opin Struct Biol 2006, 16(3):374–384. 10.1016/j.sbi.2006.05.006View ArticlePubMedGoogle Scholar
- Taylor WR: Pattern matching methods in protein sequence comparison and structure prediction. Protein Eng 1988, 2(2):77–86. 10.1093/protein/2.2.77View ArticlePubMedGoogle Scholar
- Bystroff C, Baker D: Prediction of local structure in proteins using a library of sequence-structure motifs. J Mol Biol 1998, 281(3):565–577. 10.1006/jmbi.1998.1943View ArticlePubMedGoogle Scholar
- de Brevern AG, Benros C, Gautier R, Valadie H, Hazout S, Etchebest C: Local backbone structure prediction of proteins. In Silico Biol 2004, 4(3):381–386.PubMed CentralPubMedGoogle Scholar
- Ikeda K, Tomii K, Yokomizo T, Mitomo D, Maruyama K, Suzuki S, Higo J: Visualization of conformational distribution of short to medium size segments in globular proteins and identification of local structural motifs. Protein Sci 2005, 14(5):1253–1265. 10.1110/ps.04956305PubMed CentralView ArticlePubMedGoogle Scholar
- Sawada Y, Honda S: Structural diversity of protein segments follows a power-law distribution. Biophys J 2006, 91(4):1213–1223. 10.1529/biophysj.105.076661PubMed CentralView ArticlePubMedGoogle Scholar
- Fetrow JS, Palumbo MJ, Berg G: Patterns, structures, and amino acid frequencies in structural building blocks, a protein secondary structure classification scheme. Proteins 1997, 27(2):249–271. 10.1002/(SICI)1097-0134(199702)27:2<249::AID-PROT11>3.0.CO;2-MView ArticlePubMedGoogle Scholar
- Micheletti C, Seno F, Maritan A: Recurrent oligomers in proteins: an optimal scheme reconciling accurate and concise backbone representations in automated folding and design studies. Proteins 2000, 40(4):662–674. 10.1002/1097-0134(20000901)40:4<662::AID-PROT90>3.0.CO;2-FView ArticlePubMedGoogle Scholar
- Yang AS, Wang LY: Local structure-based sequence profile database for local and global protein structure predictions. Bioinformatics 2002, 18(12):1650–1657. 10.1093/bioinformatics/18.12.1650View ArticlePubMedGoogle Scholar
- Pei J, Grishin NV: Combining evolutionary and structural information for local protein structure prediction. Proteins 2004, 56(4):782–794. 10.1002/prot.20158View ArticlePubMedGoogle Scholar
- Tomii K, Akiyama Y: FORTE: a profile-profile comparison tool for protein fold recognition. Bioinformatics 2004, 20(4):594–595. 10.1093/bioinformatics/btg474View ArticlePubMedGoogle Scholar
- Tomii K, Hirokawa T, Motono C: Protein structure prediction using a variety of profile libraries and 3D verification. Proteins 2005, 61(Suppl 7):114–121.View ArticlePubMedGoogle Scholar
- Du P, Andrec M, Levy RM: Have we seen all structures corresponding to short protein fragments in the Protein Data Bank? An update. Protein Eng 2003, 16(6):407–414. 10.1093/protein/gzg052View ArticlePubMedGoogle Scholar
- Soding J, Lupas AN: More than the sum of their parts: on the evolution of proteins from peptides. Bioessays 2003, 25(9):837–846. 10.1002/bies.10321View ArticlePubMedGoogle Scholar
- Fritz G, Roth A, Schiffer A, Buchert T, Bourenkov G, Bartunik HD, Huber H, Stetter KO, Kroneck PM, Ermler U: Structure of adenylylsulfate reductase from the hyperthermophilic Archaeoglobus fulgidus at 1.6-A resolution. Proc Natl Acad Sci USA 2002, 99(4):1836–1841. 10.1073/pnas.042664399PubMed CentralView ArticlePubMedGoogle Scholar
- Arnoux B, Ducruix A, Prange T: Anisotropic behaviour of the C-terminal Kunitz-type domain of the alpha3 chain of human type VI collagen at atomic resolution (0.9 A). Acta Crystallogr D Biol Crystallogr 2002, 58(Pt 7):1252–1254.View ArticlePubMedGoogle Scholar
- Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247(4):536–540.PubMedGoogle Scholar
- Hartwig JH: Actin-binding proteins. 1: Spectrin super family. Protein Profile 1995, 2(7):703–800.PubMedGoogle Scholar
- Djinovic-Carugo K, Gautel M, Ylanne J, Young P: The spectrin repeat: a structural platform for cytoskeletal protein assemblies. FEBS Lett 2002, 513(1):119–123. 10.1016/S0014-5793(01)03304-XView ArticlePubMedGoogle Scholar
- Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM: CATH--a hierarchic classification of protein domain structures. Structure 1997, 5(8):1093–1108. 10.1016/S0969-2126(97)00260-8View ArticlePubMedGoogle Scholar
- Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, et al.: The Pfam protein families database. Nucleic Acids Res 2008, (38 Database):D211–222.Google Scholar
- Honda S, Yamasaki K, Sawada Y, Morii H: 10 residue folded peptide designed by segment statistics. Structure 2004, 12(8):1507–1518. 10.1016/j.str.2004.05.022View ArticlePubMedGoogle Scholar
- Gibrat JF, Madej T, Bryant SH: Surprising similarities in structure comparison. Curr Opin Struct Biol 1996, 6(3):377–385. 10.1016/S0959-440X(96)80058-3View ArticlePubMedGoogle Scholar
- Grishina IB, Woody RW: Contributions of tryptophan side chains to the circular dichroism of globular proteins: exciton couplets and coupled oscillators. Faraday Discuss 1994, (99):245–262.View ArticlePubMedGoogle Scholar
- Guvench O, Brooks CL: Tryptophan side chain electrostatic interactions determine edge-to-face vs. parallel-displaced tryptophan side chain geometries in the designed beta-hairpin "trpzip2". J Am Chem Soc 2005, 127(13):4668–4674. 10.1021/ja043492eView ArticlePubMedGoogle Scholar
- Honda S, Akiba T, Kato YS, Sawada Y, Sekijima M, Ishimura M, Ooishi A, Watanabe H, Odahara T, Harata K: Crystal structure of a ten-amino acid protein. J Am Chem Soc 2008, 130(46):15327–15331. 10.1021/ja8030533View ArticlePubMedGoogle Scholar
- Brown JE, Klee WA: Helix-coil transition of the isolated amino terminus of ribonuclease. Biochemistry 1971, 10(3):470–476. 10.1021/bi00779a019View ArticlePubMedGoogle Scholar
- Kuroda Y: Residual helical structure in the C-terminal fragment of cytochrome c. Biochemistry 1993, 32(5):1219–1224. 10.1021/bi00056a004View ArticlePubMedGoogle Scholar
- Blanco FJ, Rivas G, Serrano L: A short linear peptide that folds into a native stable beta-hairpin in aqueous solution. Nat Struct Biol 1994, 1(9):584–590. 10.1038/nsb0994-584View ArticlePubMedGoogle Scholar
- Honda S, Kobayashi N, Munekata E: Thermodynamics of a beta-hairpin structure: evidence for cooperative formation of folding nucleus. J Mol Biol 2000, 295(2):269–278. 10.1006/jmbi.1999.3346View ArticlePubMedGoogle Scholar
- Zerella R, Chen PY, Evans PA, Raine A, Williams DH: Structural characterization of a mutant peptide derived from ubiquitin: implications for protein folding. Protein Sci 2000, 9(11):2142–2150. 10.1110/ps.9.11.2142PubMed CentralView ArticlePubMedGoogle Scholar
- Crameri A, Raillard SA, Bermudez E, Stemmer WP: DNA shuffling of a family of genes from diverse species accelerates directed evolution. Nature 1998, 391(6664):288–291. 10.1038/34663View ArticlePubMedGoogle Scholar
- Crameri A, Whitehorn EA, Tate E, Stemmer WP: Improved green fluorescent protein by molecular evolution using DNA shuffling. Nat Biotechnol 1996, 14(3):315–319. 10.1038/nbt0396-315View ArticlePubMedGoogle Scholar
- Riechmann L, Winter G: Novel folded protein domains generated by combinatorial shuffling of polypeptide segments. Proc Natl Acad Sci USA 2000, 97(18):10068–10073.PubMed CentralView ArticlePubMedGoogle Scholar
- Shiba K, Schimmel P: Functional assembly of a randomly cleaved protein. Proc Natl Acad Sci USA 1992, 89(5):1880–1884. 10.1073/pnas.89.5.1880PubMed CentralView ArticlePubMedGoogle Scholar
- Shiba K, Takahashi Y, Noda T: Creation of libraries with long ORFs by polymerization of a microgene. Proc Natl Acad Sci USA 1997, 94(8):3805–3810. 10.1073/pnas.94.8.3805PubMed CentralView ArticlePubMedGoogle Scholar
- Takahashi K, Noguti T, Hojo H, Yamauchi K, Kinoshita M, Aimoto S, Ohkubo T, Go M: A mini-protein designed by removing a module from barnase: molecular modeling and NMR measurements of the conformation. Protein Eng 1999, 12(8):673–680. 10.1093/protein/12.8.673View ArticlePubMedGoogle Scholar
- Yanagawa H, Yoshida K, Torigoe C, Park JS, Sato K, Shirai T, Go M: Protein anatomy: functional roles of barnase module. J Biol Chem 1993, 268(8):5861–5865.PubMedGoogle Scholar
- Han KF, Baker D: Global properties of the mapping between local amino acid sequence and local structure in proteins. Proc Natl Acad Sci USA 1996, 93(12):5814–5818. 10.1073/pnas.93.12.5814PubMed CentralView ArticlePubMedGoogle Scholar
- Hobohm U, Scharf M, Schneider R, Sander C: Selection of representative protein data sets. Protein Sci 1992, 1(3):409–417.PubMed CentralView ArticlePubMedGoogle Scholar
- Richards JA, Jia X: Remote sensing digital image analysis. New York: Springer; 1999.View ArticleGoogle Scholar
- Sawada Y, Honda S: ProSeg: a database of local structures of protein segments. J Comput Aided Mol Des 2009, 23(3):163–169. 10.1007/s10822-008-9248-xView ArticlePubMedGoogle Scholar
- Chandonia JM, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE: ASTRAL compendium enhancements. Nucleic Acids Res 2002, 30(1):260–263. 10.1093/nar/30.1.260PubMed CentralView ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28(1):235–242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S, et al.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2008, (36 Database):D13–21.Google Scholar
- Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22(13):1658–1659. 10.1093/bioinformatics/btl158View ArticlePubMedGoogle Scholar
- Jones DT: Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 1999, 292(2):195–202. 10.1006/jmbi.1999.3091View ArticlePubMedGoogle Scholar
- Pearl FM, Lee D, Bray JE, Buchan DW, Shepherd AJ, Orengo CA: The CATH extended protein-family database: providing structural annotations for genome sequences. Protein Sci 2002, 11(2):233–244.PubMed CentralView ArticlePubMedGoogle Scholar
- Schaffer AA, Wolf YI, Ponting CP, Koonin EV, Aravind L, Altschul SF: IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 1999, 15(12):1000–1011. 10.1093/bioinformatics/15.12.1000View ArticlePubMedGoogle Scholar
- Shiozawa K, Maita N, Tomii K, Seto A, Goda N, Akiyama Y, Shimizu T, Shirakawa M, Hiroaki H: Structure of the N-terminal domain of PEX1 AAA-ATPase. Characterization of a putative adaptor-binding domain. J Biol Chem 2004, 279(48):50060–50068. 10.1074/jbc.M407837200View ArticlePubMedGoogle Scholar
- Cai W, Pei J, Grishin NV: Reconstruction of ancestral protein sequences and its applications. BMC Evol Biol 2004, 4: 33. 10.1186/1471-2148-4-33PubMed CentralView ArticlePubMedGoogle Scholar
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res 2004, 14(6):1188–1190. 10.1101/gr.849004PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.