- Research article
- Open Access
A tale of two symmetrical tails: Structural and functional characteristics of palindromes in proteins
BMC Bioinformatics volume 9, Article number: 274 (2008)
It has been previously shown that palindromic sequences are frequently observed in proteins. However, our knowledge about their evolutionary origin and their possible importance is incomplete.
In this work, we tried to revisit this relatively neglected phenomenon. Several questions are addressed in this work. (1) It is known that there is a large chance of finding a palindrome in low complexity sequences (i.e. sequences with extreme amino acid usage bias). What is the role of sequence complexity in the evolution of palindromic sequences in proteins? (2) Do palindromes coincide with conserved protein sequences? If yes, what are the functions of these conserved segments? (3) In case of conserved palindromes, is it always the case that the whole conserved pattern is also symmetrical? (4) Do palindromic protein sequences form regular secondary structures? (5) Does sequence similarity of the two "sides" of a palindrome imply structural similarity? For the first question, we showed that the complexity of palindromic peptides is significantly lower than randomly generated palindromes. Therefore, one can say that palindromes occur frequently in low complexity protein segments, without necessarily having a defined function or forming a special structure. Nevertheless, this does not rule out the possibility of finding palindromes which play some roles in protein structure and function. In fact, we found several palindromes that overlap with conserved protein Blocks of different functions. However, in many cases we failed to find any symmetry in the conserved regions of corresponding Blocks. Furthermore, to answer the last two questions, the structural characteristics of palindromes were studied. It is shown that palindromes may have a great propensity to form α-helical structures. Finally, we demonstrated that the two sides of a palindrome generally do not show significant structural similarities.
We suggest that the puzzling abundance of palindromic sequences in proteins is mainly due to their frequent concurrence with low-complexity protein regions, rather than a global role in the protein function. In addition, palindromic sequences show a relatively high tendency to form helices, which might play an important role in the evolution of proteins that contain palindromes. Moreover, reverse similarity in peptides does not necessarily imply significant structural similarity. This observation rules out the importance of palindromes for forming symmetrical structures. Although palindromes frequently overlap with conserved Blocks, we suggest that palindromes overlap with Blocks only by coincidence, rather than being involved with a certain structural fold or protein domain.
Symmetry of shapes is something that is found everywhere in nature. Human beings have always been attracted to symmetrical properties of natural phenomena, and their interest is reflected in art.
Creation of symmetrical sentences (i.e. "palindromes") goes back to at least 20 centuries ago. The Sator Square  contains probably the oldest known palindrome, which shows the Latin sentence "SATOR AREPO TENET OPERA ROTAS". Note that the word "tenet" is a palindrome itself. Since then, many other famous palindromic sentences have been constructed in different languages.
With the progress of molecular biology in the 20th century, a new level of symmetry was discovered in nature. From the study of restriction endonucleases in the 60's and the early 70's [2–5], it became clear that a certain type of palindrome, i.e. reverse palindrome, occur in DNA sequences. Restriction enzymes recognition sites which exist in double-stranded and not single stranded DNA, are usually "palindromic". For example:
G A A T T C
C T T A A G
is an example of a palindromic sequence recognized by restriction enzyme Eco RI. They are usually referred to as "reverse palindromes", because the sequence and polarity of both strands of the DNA molecule define their palindromic nature. Several studies suggest that reverse palindromes (or for simplicity, "palindromes") are statistically under-represented in some genomes [6–10], presumably because of the existence of restriction endonucleases in the host cells. Palindromic sequences are known to have roles in DNA replication  and RNA transcription .
In proteins, palindromes appear in a polypeptide chain. For example, the hypothetical protein sequence:
PQR S RQP
is a palindromic sequence. PQR and RQP will be referred to as the "sides" of this palindrome, while S will be called the "linker" (see Methods).
Since the original suggestion of the existence of important palindromic sequences in proteins [13–15], or simply "palindromic peptides", relatively little effort has been made to find the significance of such sequences. Inverse sequence similarity of proteins is not an exception by any means [16, 17], and some studies suggest that palindromes may appear in protein sequences more frequently than is expected by chance [18, 19]. Many have tried to find a relationship between palindromic sequences and protein structure. It has been suggested, directly or indirectly, that palindromic sequences are important for the structure and/or function of several classes of proteins, including DNA binding proteins [15, 18, 20], the Rhodopsin family and ion channels , prions [21, 22], metal binding proteins  and receptors . Some synthetic proteins which have structural characteristics as native proteins, as in the case of collagen protein model , can be added to this list.
In this work, we try to address the question of why palindromes are so frequent in proteins and to see if they have specific functions. In addition, we try to find a possible relationship between the symmetry of the sequence and the structure.
Results and Discussion
Linguistic complexity of palindromes
Ohno,  in one of his short papers on the importance of palindromic sequences in proteins, suggested that H1 histone is rich with palindromes only because of the high frequency of alanine and lysine (48% of all residues) in its sequence. In an effort to pursue this proposal, we try to test whether palindromes are a result of a biased amino acid usage, or in other words, a result of "low complexity". We use linguistic complexity (LC) as a measure of complexity of sequence. LC takes values between 0 and 1. The lower the LC value, the lower the complexity of the sequence. We compare the distribution of LC values in real palindromes and randomly generated palindromes, as explained in Materials and Methods.
Table 1 summarizes the results of this comparison. The Mann-Whitney test was used to assess the significance of differences observed between medians of distributions. The results clearly suggest that the linguistic complexity of real protein palindromes is significantly lower than what is observed in randomly constructed palindromes.
Palindromic peptides and their probable functional roles
Palindromic sequences are known to be present in a variety of proteins [14, 18], and different functions have been proposed to be associated with them. We tried to find roles of palindromic sequences in a systematic way, by comparing palindromes with conserved sequences recorded in the Blocks database [26, 27].
From our protein dataset of 1094 proteins, only 373 contained at least one reported Block. From these proteins, 54 Blocks overlapped with palindromic sequences in the corresponding proteins. These Blocks are listed in a file submitted with this article (see Additional file 1). It was interesting to find that a variety of functions were possibly associated with some palindromic sequences.
Figure 1 shows examples of Blocks that contain palindromic protein segments. Conserved palindromes in Figures 1A and 1B are presumably the result of low sequence complexity. As mentioned before, such sequences are prone to produce palindromes. The Block in Figure 1C clearly includes a palindromic consensus. This Block might have evolved from a palindromic ancestor with serine protease activity. Finally, there are palindromes in conserved Blocks like that in Figure 1D, which seem to be completely accidental.
Is the symmetry of palindromic sequences reflected in their structures?
Can palindromic protein sequences help in formation of structurally symmetrical folds? For answering this question one should test whether symmetry of palindromic peptides is reflected in their structure.
While the 3D structure of reverse palindromes in double-stranded DNA is symmetrical, there is much debate about the structural similarity of protein sequences which have reverse similarity. One might expect that reversing the sequence would result in folds that are mirror-images of the original fold . However, there exists theoretical and experimental evidences that sequence reversing results in the same rather than the mirror folding, presumably due to the fact that both native and reverse proteins have the same amino acid compositions and/or similar hydrophobic-hydrophilic patterns [23, 29, 30]. Evidence suggesting reverse peptide sequences result in different structures has also been presented in the literature. From analysis of Retro-inverso peptides (reversed peptides consisting of D instead of L amino acids), it became clear that, with few exceptions [31, 32], these peptides behave very similarly and are even recognized by the same antibodies. Contrary to these, reversed peptides generally behave differently [33–38]. Moreover, many research groups have reported that reversing a sequence can change its fold [39, 40], or even extinguish its folding ability . In general, the sequence similarity of reverse sequences may not imply a significant structural similarity [17, 42].
We studied the structural characteristics of the two sides of palindromic sequences. We considered very short peptides with perfect reverse-similarity in sequence. This condition assured that differences in structure cannot be due to differences in sequences. Since we did not allow long linker sequences between the two sides of the palindromes, peptide segments coded by the two sides are close to each other in space and therefore, are expected to form their fold in a similar environment.
If both sides of a palindrome appear in the same secondary structure, they will be considered structurally similar. Therefore, we grouped our palindromes into three classes: palindromes whose both sides have α-helical structure: "all-alpha", palindromes whose both sides have β-sheet structure: "all-beta", and other palindromes: "others".
Among the 980 palindromic sequences in our dataset, 120 (12.2%) were "all-alpha", 7 (0.7%) were "all-beta", and the remaining 853 (87.1%) were classified as "others". Among the 489720 randomly sequences occurring in the proteins, 16449 (3.4%) were "all-alpha", 2973 (0.6%) were "all-beta", and the remaining 470248 (96%) fall into "others" class. The results suggest that palindromes have significantly greater tendency to appear in α-helices compared to random sequences. There seems to be only a weak preference for palindromic sequences to appear in strands.
It is generally accepted that sequences of natural proteins are far from being random. It has been shown that, unless amino acid composition is restrained  or a binary patterning of polar and non-polar amino acids is defined [44, 45], proteins with random sequences rarely form secondary structures . Evidently, decreased amino acid composition complexity or binary patterning of polar and non-polar amino acids increases the chance for palindrome formation. Furthermore, it has been shown that proteins formed from repeats of non-natural peptides and palindromic peptides can form secondary structures in solution . Our results suggest that, even compared to (randomly-selected) natural protein fragments, palindromes have a greater tendency to form at least α-helices, which might be related to their frequent appearance in proteins. This finding might be of great importance for designing de novo secondary structures. Certainly, all palindromes do not form helices. It might be interesting to investigate other factors that alter the tendency of palindromic peptides to form regular secondary structures like helices.
We also compared the structural similarity (RMSD) between the two palindrome sides to the corresponding value in a set of randomly selected protein fragments. Since a few atoms in the PDB structures of randomly selected fragments were often missing, we additionally compared the Cα trace of those fragments. For the palindromic sequences, we compared the structures of backbones of the two sides of a palindrome, and also the structure of the backbone of the left side with the structure of the backbone of mirror image of the right side. In each case, an RMSD value was calculated to assess structural similarity. In order to see whether this alignment is significantly "good", we compare the RMSD values to the same values computed for randomly selected protein fragments of the same length. A good structural similarity of the two sides must result in smaller RMSD values for the palindrome sides as compared to randomly selected fragments.
Table 2 summarizes the results of the comparisons. Only a small number of palindromes showed significantly smaller RMSD values and the average RMSD of palindromes is almost in the middle of the RMSD values for random fragments. This implies that the two sides of a palindrome do not in general show an "exceptional" structural similarity. In other words, reverse similarity at the sequence level in proteins is not necessarily reflected at the level of structure.
We also focused on the short conserved palindrome shown in Figure 1C. The values of P for none of the four comparisons for this palindrome were less than 0.66. Furthermore, the two sides of this conserved palindrome have different secondary structures. This example confirms that the reverse similarity of the protein sequence is probably not reflected in the structure. Altogether, one may conclude that palindromic peptides can hardly help in forming symmetrical structures in proteins and this certainly is not a reason for their prevalence in proteins.
We suggest that the puzzling abundance of palindromic sequences in proteins is mainly due to their frequent concurrence with low-complexity protein regions, rather than a global role in the protein function. In addition, palindromic sequences show a relatively high tendency to form helices, which might play an important role in the evolution of proteins that contain palindromes. Moreover, reverse similarity in peptides does not necessarily imply significant structural similarity. This observation rules out the importance of palindromes for forming symmetrical structures.
It is not unusual to find palindromes within functionally conserved protein Blocks. However, the conserved Blocks have very different functions, which suggest that palindromes overlap with Blocks only by coincidence, rather than being involved with a certain structural fold or domain. In addition, many of the conserved patterns are not "symmetrical", which confirms that Blocks and protein palindromes overlap accidentally.
We obtained sequences for all the proteins in the Protein Data Bank, PDB  in FASTA format (30 December 2006). This dataset contained 38224 proteins. Using the PISCES culling server [49, 50], the sequence dataset was filtered to obtain sequences with mutual similarity of less than 30%, with structure resolution <2 Å. The final dataset contained 1094 proteins. The structures of these proteins were obtained from PDB.
We define palindrome to be any sequence as XYXR, in which X, Y and XRare strings of the 20 standard amino acids, and XRis the reverse of string X. In this palindrome, X and XRwill be referred to as the palindrome "sides", while Y is the "linker". Length of a sequence S will be shown by |S|.
In this study we have considered those palindromes for which |X| ≥ |Y| ≥ 0 and |X| = |XR| ≥ 3. This means that there might be no linker sequence between the sides of a palindrome. In addition, we have assumed that in the standard single-letter representation of amino acids, D = E and K = R. This assumption has helped us to obtain more palindromes so as to be able to perform statistical tests.
In our palindrome dataset, many palindromes are substrings of other palindromes. For example, ABCDDCBA is a palindrome with the side length of four and linker length of zero. One may also consider it as a palindrome of side length three and linker length of two. However, in this study only the palindrome with the maximum side length was included in our palindrome dataset. We also removed 6-His tag sequences from our dataset. These sequences are added artificially to the N-terminal of recombinant proteins to facilitate their purification and are not present in native proteins.
Analysis of the complexity of palindromic sequences
For each palindromic sequence, we calculated its corresponding "linguistic complexity", LC , which is defined as the ratio of the number of distinct substrings present in the string of interest to the maximum possible number of substrings for a string of the same length over the same alphabets.
If palindromes principally appear as low-complexity protein segments, the complexity scores of these sequences will be significantly smaller than that of random palindromic sequences. Therefore, for a palindrome family such as XYXR, with certain values for |X| and |Y|, we constructed 10000 random palindromes. For each random palindrome, two random sequences with lengths |X| and |Y| were constructed, based on the average frequencies of the twenty amino acids in our dataset. The sequences were joined to create X+Y+XR. Finally, the LC distribution of real palindromes was compared to the LC distribution of randomly generated palindromes.
Finding overlaps between palindromes and functional sites
To investigate possible overlaps between conserved Blocks [26, 27] and palindromic sequences, we first tried to identify known Blocks of proteins in our dataset. If Blocks were found, we then determined whether the palindromes in the protein fall within the Blocks (or alternatively, the Blocks fall within any of the palindromes).
In order to visualize Blocks and get an overview of their conservation, WebLogo  was used to construct graphical representations of the patterns.
Analysis of three dimensional structures of palindromic peptides
It is rational to assume that protein segments with the same secondary structures may show greater structural similarity, regardless of their sequence similarity. Therefore, for the analysis of 3D structure of palindromic sequences, we first determined the secondary structure of palindromic segments using DSSP software . Then, based on secondary structure data, we categorized palindromes into "all-alpha" (where the whole structure was in α-helix), "all-beta" (where the whole structure was in β-sheet), and "others". Finally, we compared the structural coordinates of each palindromic peptide with a set of randomly selected peptides in the same category.
Some reports suggest that reversing the sequence will result in the same protein fold [23, 29, 30]. To test this, we compared the conformation of backbone atoms in the two palindrome sides. It has also been suggested that reversing the sequence can result in a structure which is the mirror image of the original structure . Therefore, we additionally compared the backbone of one palindrome side with the mirror image of the other side. Both of these comparisons were performed for all backbone atoms and for their Cα traces.
For each comparison an RMSD value was calculated. In order to see whether there was a significantly small value, from each protein in the dataset we chose 10 random fragments of the same length (i.e. with the length of 2|X|+|Y|) and considered the first |X| amino acids as the left side of the "pseudo-palindrome" and the last |X| amino acids as the right side, and then the structural comparison was performed. If the RMSD for structural alignment of the two palindrome sides is small (i.e. the two sides were structurally similar), then only a small fraction, say P, of randomly selected fragments will show a smaller value.
We used Mann-Whitney test for testing the significance of differences between medians of distributions. This test is useful when the two distributions are skewed and cannot be approximated by a normal distribution .
Root mean square deviation
Protein Data Bank.
Griffiths JG: 'Arepo' in the Magic 'Sator' Square. The Classical Rev 1971, 21: 6–8.
Danna K, Nathans D: Specific cleavage of simian virus 40 DNA by restriction endonuclease of Hemophilus influenzae. Proc Natl Acad Sci USA 1971, 68: 2913–2917. 10.1073/pnas.68.12.2913
Ford E, Boyer HW: Degradation of enteric bacterial deoxyribonucleic acid by the Escherichia coli B restriction endonuclease. J Bacteriol 1970, 104: 594–595.
Roulland-Dussoix D, Boyer HW: The Escherichia coli B restriction endonuclease. Biochim Biophys Acta 1969, 195: 219–229.
Yuan R, Meselson M: A specific complex between a restriction endonuclease and its DNA substrate. Proc Natl Acad Sci USA 1970, 65: 357–362. 10.1073/pnas.65.2.357
Burge C, Campbell AM, Karlin S: Over- and under-representation of short oligonucleotides in DNA sequences. Proc Natl Acad Sci USA 1992 , 89: 1358–1362. 10.1073/pnas.89.4.1358
Fuglsang A: Distribution of potential type II restriction sites (palindromes) in prokaryotes. Biochem Biophys Res Commun 2003, 310: 280–285. 10.1016/j.bbrc.2003.09.014
Fuglsang A: The relationship between palindrome avoidance and intragenic codon usage variations: a Monte Carlo study. Biochem Biophys Res Commun 2004, 316: 755–762. 10.1016/j.bbrc.2004.02.117
Gelfand MS, Koonin EV: Avoidance of palindromic words in bacterial and archaeal genomes: a close connection with restriction enzymes. Nucleic Acids Res 1997, 25: 2430–2439. 10.1093/nar/25.12.2430
Lisnic B, Svetec IK, Saric H, Nikolic I, Zgaga Z: Palindrome content of the yeast Saccharomyces cerevisiae genome. Curr Genetics 2005, 47: 289–297. 10.1007/s00294-005-0573-5
Willwand K, Mumtsidu E, Kuntz-Simon G, Rommelaere J: Initiation of DNA replication at palindromic telomeres is mediated by a duplex-to-hairpin transition induced by the minute virus of mice nonstructural protein NS1. J Biol Chem 1998, 273: 1165–1174. 10.1074/jbc.273.2.1165
Chu WM, Ballard RE, Schmid CW: Palindromic sequences preceding the terminator increase polymerase III template activity. Nucleic Acids Res 1997, 25: 2077–2082. 10.1093/nar/25.11.2077
Ohno S: Intrinsic evolution of proteins. The role of peptidic palindromes. Rivista di Biologia - Biology Forum 1990, 83: 287–291.
Ohno S: Of palindromes and peptides. Human Genetics 1992, 90: 342–345. 10.1007/BF00220455
Ohno S: A song in praise of peptide palindromes. Leukemia 1993, 7: S157-S159.
Nielsen ML, Savitski MM, Zubarev RA: Improving protein identification using complementary fragmentation techniques in Fourier transform mass spectrometry. Mol Cell Proteomics 2005, 4: 835–845. 10.1074/mcp.T400022-MCP200
Preißner R, Goede A, Michalski E, Frömmel C: Inverse sequence similarity in proteins and its relation to the three-dimensional fold. FEBS Lett 1997, 414: 425–429. 10.1016/S0014-5793(97)00907-1
Giel-Pietraszuk M, Hoffmann M, Dolecka S, Rychlewski J, Barciszewski J: Palindromes in proteins. J Protein Chem 2003, 22: 109–113. 10.1023/A:1023454111924
Hoffmann M, Rychlewski J: Searching for palindromic sequences in primary structure of proteins. Comput Methods Sci Tech 1999, 5: 21–24.
Suzuki M: DNA-bridging by a palindromic α-Helix. Proc Natl Acad Sci USA 1992, 89: 8726–8730. 10.1073/pnas.89.18.8726
Kazim AL: Identification of putative internalization signals in prion proteins. FEBS Lett 1993, 331: 1–3. 10.1016/0014-5793(93)80285-3
Alderfer JL, Kazim AL, Sulkowski E, Tabaczynski W, Tomasi TB: Aromatic palindrome peptide of prion proteins - A conformational study. FASEB J 1993, 7: A1283.
Pan PK, Zheng ZF, Lyu PC, Huang PC: Why reversing the sequence of the a domain of human metallothionein-2 does not change its metal-binding and folding characteristics. Eur J Biochem 1999, 266: 33–39. 10.1046/j.1432-1327.1999.00811.x
Jaseja M, Mergen L, Gillette K, Forbes K, Sehgal I, Copié V: Structure-function studies of the functional and binding epitope of the human 37 kDa laminin receptor precursor protein. J Peptide Res 2005, 66: 9–18. 10.1111/j.1399-3011.2005.00267.x
Stetefeld J, Frank S, Jenny M, Schulthess T, Kammerer RA, Boudko S, Landwehr R, Okuyama K, Engel J: Collagen stabilization at atomic level: Crystal structure of designed (GlyProPro)10 foldon. Structure 2003, 11: 339–346. 10.1016/S0969-2126(03)00025-X
Henikoff JG, Greene EA, Pietrokovski S, Henikoff S: Increased coverage of protein families with the blocks database servers. Nucleic Acids Res 2000, 28: 228–230. 10.1093/nar/28.1.228
Henikoff S, Henikoff JG, Pietrokovski S: Blocks+: A non-redundant database of protein alignment blocks dervied from multiple compilations. Bioinformatics 1999, 15: 471–479. 10.1093/bioinformatics/15.6.471
Guptasarma P: Reversal of peptide backbone direction may result in the mirroring of protein structure. FEBS Lett 1992, 310: 205–210. 10.1016/0014-5793(92)81333-H
Olszewski KA, Kolinski A, Skolnick J: Does a backwardly read protein sequence have a unique native state? Protein Eng 1996, 9: 5–14. 10.1093/protein/9.1.5
Cheley S, Braha O, Lu X, Conlan S, Bayley H: A functional protein pore with a 'retro' transmembrane domain. Protein Sci 1999, 8: 1257–1267.
Rai J: Retroinverso mimetics of S peptide. Chem Biol Drug Des 2007, 70: 552–556. 10.1111/j.1747-0285.2007.00595.x
Pal-Bhowmick I, Pati Pandey R, Jarori GK, Kar S, Sahal D: Structural and functional studies on Ribonuclease S, retro S and retro-inverso S peptides. Biochem Biophys Res Commun 2007, 364: 608–613. 10.1016/j.bbrc.2007.10.056
Benkirane N, Guichard G, Van Regenmortel MHV, Briand JP, Muller S: Cross-reactivity of antibodies to retro-inverso peptidomimetics with the parent protein histone H3 and chromatin core particle: Specificity and kinetic rate-constant measurements. J Biol Chem 1995, 270: 11921–11926. 10.1074/jbc.270.20.11921
Guichard G, Benkirane N, Zeder-Lutz G, Van Regenmortel MHV, Briand JP, Muller S: Antigenic mimicry of natural L-peptides with retro-inverso- peptidomimetics. Proc Natl Acad Sci USA 1994, 91: 9765–9769. 10.1073/pnas.91.21.9765
Nieddu E, Melchiori A, Pescarolo MP, Bagnasco L, Biasotti B, Licheri B, Malacarne D, Parodi S: Sequence specific peptidomimetic molecules inhibitors of a protein-protein interaction at the helix 1 level of c-Myc. FASEB J 2005, 19: 632–634.
Phan-Chan-Du A, Petit MC, Guichard G, Briand JP, Muller S, Cung MT: Structure of antibody-bound peptides and retro-inverso analogues. A transferred nuclear overhauser effect spectroscopy and molecular dynamics approach. Biochemistry 2001, 40 : 5720–5727. 10.1021/bi001151h
Sakurai K, Hak SC, Kahne D: Use of a retroinverso p53 peptide as an inhibitor of MDM2. J Am Chem Soc 2004, 126: 16288–16289. 10.1021/ja044883w
Verdoliva A, Ruvo M, Cassani G, Fassina G: Topological mimicry of cross-reacting enantiomeric peptide antigens. J Biol Chem 1995, 270: 30422–30427. 10.1074/jbc.270.51.30422
Shukla A, Raje M, Guptasarma P: A backbone-reversed form of an all-β α-crystallin domain from a small heat-shock protein (retro-HSP12.6) Folds and assembles into structured multimers. J Biol Chem 2003, 278: 26505–26510. 10.1074/jbc.M303123200
Mittl PRE, Deillon C, Sargent D, Liu N, Klauser S, Thomas RM, Gutte B, Grütter MG: The retro-GCN4 leucine zipper sequence forms a stable three-dimensional structure. Proc Natl Acad Sci USA 2000, 97: 2562–2566. 10.1073/pnas.97.6.2562
Lacroix E, Viguera AR, Serrano L: Reading protein sequences backward. Fold Des 1998, 3: 79–85. 10.1016/S1359-0278(98)00013-3
Lorenzen S, Gille C, Preissner R, Frömmel C: Inverse sequence similarity of proteins does not imply structural similarity. FEBS Lett 2003, 545: 105–109. 10.1016/S0014-5793(03)00450-2
Davidson AR, Lumb KJ, Sauer RT: Cooperatively folded proteins in random sequence libraries. Nature Struct Biol 1995, 2: 856–864. 10.1038/nsb1095-856
Kamtekar S, Schiffer JM, Xiong H, Babik JM, Hecht MH: Protein design by binary patterning of polar and nonpolar amino acids. Science 1993, 262: 1680–1685. 10.1126/science.8259512
Roy S, Ratnaswamy G, Boice JA, Fairman R, McLendon G, Hecht MH: A protein designed by binary patterning of polar and nonpolar amino acids displays native-like properties. J Am Chem Soc 1997, 119: 5302 -55306. 10.1021/ja9700717
Yamauchi A, Yomo T, Tanaka F, Prijambada ID, Ohhashi S, Yamamoto K, Shima Y, Ogasahara K, Yutani K, Kataoka M, Urabe I: Characterization of soluble artificial proteins with random sequences. FEBS Lett 1998, 421: 147–151. 10.1016/S0014-5793(97)01552-4
Shiba K, Takahashi Y, Noda T: On the role of periodism in the origin of proteins. J Mol Biol 2002, 320: 833–840. 10.1016/S0022-2836(02)00567-3
Deshpande N, Addess KJ, Bluhm WF, Merino-Ott JC, Townsend-Merino W, Zhang Q, Knezevich C, Xie L, Chen L, Feng Z, Green RK, Flippen-Anderson JL, Westbrook J, Berman HM, Bourne PE: The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Res 2005, 33: D233-D237. 10.1093/nar/gki057
Wang GL, Dunbrack Jr. RL: PISCES. A protein sequence culling server. Bioinformatics 2003, 19: 1589–1591. 10.1093/bioinformatics/btg224
Wang GL, Dunbrack Jr. RL: PISCES: recent improvements to a PDB sequence culling server . Nucleic Acids Res 2005, 33: W94-W98. 10.1093/nar/gki402
Troyanskaya OG, Arbell O, Koren Y, Landau GM, Bolshoy A: Sequence complexity profiles of prokaryotic genomic sequences: a fast algorithm for calculating linguistic complexity. Bioinformatics 2002, 18: 679–688. 10.1093/bioinformatics/18.5.679
Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: A sequence logo generator. Genome Res 2004, 14: 1188–1190. 10.1101/gr.849004
Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22: 2577–2637. 10.1002/bip.360221211
Conover WJ: Practical Nonparametric Statistics. 2nd edition. N.Y., John Wiley & Sons; 2006.
This work was in part supported by a grant from IPM (No. CS1385-1-02).
S–AM presented the original idea. S–AM, MS, HP, CE and AK participated in the design of the methods and led the project. AS, MK, AK and SA implemented the methods. The manuscript was drafted by S–AM. All authors contributed to the discussions for the improvement of the original draft and approved the final manuscript.