ProtRepeatsDB: a database of amino acid repeats in genomes
- Mridul K Kalita†1,
- Gowthaman Ramasamy†1,
- Sekhar Duraisamy†2,
- Virander S Chauhan3 and
- Dinesh Gupta1Email author
© Kalita et al; licensee BioMed Central Ltd. 2006
Received: 22 February 2006
Accepted: 07 July 2006
Published: 07 July 2006
Genome wide and cross species comparisons of amino acid repeats is an intriguing problem in biology mainly due to the highly polymorphic nature and diverse functions of amino acid repeats. Innate protein repeats constitute vital functional and structural regions in proteins. Repeats are of great consequence in evolution of proteins, as evident from analysis of repeats in different organisms. In the post genomic era, availability of protein sequences encoded in different genomes provides a unique opportunity to perform large scale comparative studies of amino acid repeats. ProtRepeatsDB http://bioinfo.icgeb.res.in/repeats/ is a relational database of perfect and mismatch repeats, access to which is designed as a resource and collection of tools for detection and cross species comparisons of different types of amino acid repeats.
ProtRepeatsDB (v1.2) consists of perfect as well as mismatch amino acid repeats in the protein sequences of 141 organisms, the genomes of which are now available. The web interface of ProtRepeatsDB consists of different tools to perform repeat s; based on protein IDs, organism name, repeat sequences, and keywords as in FASTA headers, size, frequency, gene ontology (GO) annotation IDs and regular expressions (REGEXP) describing repeats. These tools also allow formulation of a variety of simple, complex and logical queries to facilitate mining and large-scale cross-species comparisons of amino acid repeats. In addition to this, the database also contains sequence analysis tools to determine repeats in user input sequences.
ProtRepeatsDB is a multi-organism database of different types of amino acid repeats present in proteins. It integrates useful tools to perform genome wide queries for rapid screening and identification of amino acid repeats and facilitates comparative and evolutionary studies of the repeats. The database is useful for identification of species or organism specific repeat markers, interspecies variations and polymorphism.
The function, type and size of amino acid repeat regions in proteins are vastly diverse. The amino acid repeats (hence forth referred as repeats and repeat containing proteins as repeat proteins) can be functionally neutral or active, tandem or scattered, and perfect or mismatch. Repeats may also range from a very minuscule to a highly significant fraction of proteins. Though functions of a few types of repeats are known, in general, function and evolution of a large variety of repeats are still poorly understood. Repeats have been broadly classified as perfect and mismatch repeats. The former class is further subdivided into homopeptide repeats (reiteration of any single amino acid, henceforth referred to as homo repeats) and heteropeptide repeats (repeats with different amino acids, henceforth referred to as hetero repeats). The mismatch repeats, consisting of repeats with substituted conserved amino acids, form the most diverse class of repeats.
Homo repeats containing proteins have been recognized as cause of several neurodegenerative and congenital malformation diseases. Proteins containing polyglutamine stretches (polyQ) have been observed in at least eight neural diseases . Similarly, proteins with polyalanine tracts (polyA) are associated with various congenital malformations, skeletal dysplasia and nervous system anomalies . Most of such homo repeats arise from trinucleotide slippage during replication [3, 4], leading to proteins with homopolymeric tracts. Such proteins are either susceptible to misfolding, or aggregation and subsequent degradation, rendering them impaired.
Hetero repeats in proteins are quite diverse. For example, the glycine-proline amino acid run in eukaryotic genomes, polar zippers and prion like glutamine/asparagine-rich stretches [5–7], whereas mucins contain arrays of tandem repeats which are rich in serine and threonine residues corresponding to the most O-glycosylated part of the mature protein .
Functional studies have shown that the insertions of mismatch repetitive regions unique to a species can be indispensable to the functions of proteins, for example as shown in Plasmodium berghei glucose-6-phosphate dehydrogenase-6-phosphogluconolactonase, a bi-functional enzyme . The role of repeats in antigenic variation and immune evasion by pathogens (e.g. Plasmodium falciparum, Trypanosoma brucei and group B Streptococci) has been elucidated [10–14]. Certain repeats provide structural scaffolds for structural packing of functional groups of amino acids to facilitate molecular interactions and stabilization. For example, Yeast Sup35p protein contains five oligopeptide repeats, which stabilizes its aggregation .
Identification of perfect repeats is simpler than that of mismatch repeats. All strategies to identify mismatch repeats have some limitation or the other, on account of polymorphism, low similarity and vast diversity of repeats . Most strategies utilize algorithms based on Smith-Waterman local alignment, using substitution matrices [17, 18], but some are also aimed to locate low complexity regions of proteins . It is however clear that no single algorithm or strategy is sufficient to find all types of repeats in protein sequences.
Large-scale comparative studies of repeats across kingdoms will be required for better understanding of the role, diversity and evolution of repetitive sequences in proteins. Such comparisons may also provide deeper insight into the role of repeats in folding of proteins, immunogenicity and relevance to disease etiology . However, most studies on repeats, so far, have focused either on a few classes of proteins in a limited number of organisms or only a particular type of repeat in different proteins or organisms [21, 22]. Tools which can perform large scale cross species comparisons of different types of repeats are not yet developed. Here we present our efforts to build a robust protein repeats database and tools for large-scale analysis of repeats in different organisms. ProtRepeatsDB is a simple relational database to facilitate complex and large-scale comparative studies of repeats amongst organisms.
Construction and contents
Utility and discussion
The ProtRepeatsDB and related tools facilitate retrieval and analysis of repeats in the database through different query pages and different sections. On submitting a query, a summary page of repeats satisfying the search conditions is generated. The summary page consists of a brief description of the sequences containing the repeat, such as the organism name, protein ID (hyperlinked to corresponding NCBI RefSeq entry), amino acids constituting the repeat, repeat size, repeat frequency, positions of the repeat in protein sequence (referred to as coordinates) hyperlinked to a detailed view, and E-value etc. (Figure 1b). Upon clicking coordinates of a repeat in the summary page, detailed information of the corresponding repeat is displayed as a new page consisting of two frames. The top frame consists of a dynamically generated color graph illustrating relative distribution of the repeat in the protein, followed by the sequence of the protein with repeats highlighted in red. In addition to this, links are provided to retrieve the sequence in FASTA format, perform repeat analysis using PROSPERO , DOTMATCHER , PfScan and BLAST  against ProtRepeatsDB. This section also provides links to display PROSITE matches, mismatch repeats and graphical representation of all the identified repeats in the protein sequence. The lower frame lists all the ProtRepeatsDB sequence entries sharing the same repeat.
The comparative search section provides tools to compare repeats in multiple organisms using Boolean (AND or NOT) operators. For instance, by using 'AND' operator, one can find repeats common in different sets of organisms. Similarly, the "NOT" operator can be used to obtain a list of repeats which are mutually exclusive in user defined sets of organisms. Using the tools to search PROSITE repeats section, we investigated bacterial proteins with tetratricopeptide repeat (TPR), a structural repeat motif present in a wide range of proteins [34, 35] and believed to be involved in protein-protein interactions and assembly of multi protein complexes [36, 37]. We found that the TPR containing orthologs of Vibrio vulnificus CMCP6 [RefSeq:27367908], Vibrio parahaemolyticus [RefSeq:28901310] and V. vulnificus YJ016 [RefSeq:37676035] contain tandem glutamine repeats downstream from the TPR motifs. However, multiple alignments of the sequences revealed that the glutamine run is smaller in Vibrio cholerae [RefSeq:15600941] or interspersed, as in Vibrio augustum [RefSeq:90577283] and Shewanella oneidensis MR-1 [RefSeq:24374614]. Significance of the glutamine homo repeats in Vibrio proteins is not clear; however such glutamine repeats are known to induce formation of quasi-aggregates in the early stage of amyloid protein fibrillization  and inhibition of protein degradation  in various proteins associated with human neurodegenerative disease.
Relative distribution of homo repeats (tandem or scattered) of size <10 (occurring one or more times) and ≥10 (occurring more than once). The figures within round brackets represent the number of homo repeat proteins of corresponding amino acid and organism. The figures within square brackets represent percentage of homo repeat proteins of corresponding amino acid and organism, to that of all the homo repeat proteins in the organism.
Proteins with homo repeats of size <10
Proteins with homo repeats of size ≥10
Glutamine homo repeat proteins are most abundant in D. melanogaster, accounted by 1582 proteins, which is 34 percent of all the homo repeat proteins in the genome. The polar glutamine repeats are primarily present in proteins that are involved in transcription-translational activities especially proteins interacting with DNA and other proteins. Proline homo repeats of size ten and greater are present in thirteen proteins of H. sapiens, ten A. thaliana proteins, and four M. musculus proteins. One class of such proteins includes formin-like proteins, common to the above three organisms, which are involved in processes such as morphogenesis, embryonic differentiation, cell polarity, and cytokinesis . The proline-rich regions of formin-like proteins are believed to be involved in protein-protein interactions as exemplified by the crystal structure of the binding domain in formin binding protein (FBP11) with specific binding to PPLPp motifs in the formins .
The use of ProtRepeatsDB in comparative analysis of homo repeat proteins in related species is perhaps best illustrated by comparison of such repeats in H. sapiens, R. norvegicus and M. musculus. The most abundant homo repeat proteins in human, mouse and rat have repeats of glutamic acid, proline, leucine, alanine, glutamine, glycine, serine or lysine residues. Homo repeat proteins containing repeats of isoleucine, asparagine and valine are miniscule in H. sapiens and related species. Number of phenylalanine homo repeat proteins in M. musculus is relatively higher than those in rat and human proteomes. The percentage of cysteine, methionine and arginine homo repeat protein is almost same in all the three organisms. There is no tyrosine homo repeat protein in rat and mouse; however, there is one tyrosine homo repeat in a H. sapiens helicase protein. The mixed-lineage leukemia proteins of human, which are also trithorax homologs of Drosophila, have homo repeats of serine, glutamic acid, proline and glycine. A human and mouse ortholog of the protein has a serine run of forty two and thirty amino acid residues respectively, however the run is absent in the corresponding rat ortholog. The non trithorax homologs in human mixed-lineage leukemia proteins also lack serine runs.
The Table 1 also reveals that there is no histidine, cysteine, tyrosine, isoleucine and tryptophan homo repeat falling under second group, encoded in any of the genomes. The rare methionine homo repeats have been observed in proteins belonging to A. thaliana, A. gambiae and P. falciparum. In fact, a CHP-rich zinc finger protein (CHP-rich: cysteine, histidine and proline rich) of A. thaliana contains methionine repeat of size 11, which also happens to be the longest homo repeat of methionine in the database. Phenylalanine and tyrosine repeats are present in several P. falciparum hypothetical proteins and putative proteins like succinyl-CoA synthetase alpha subunit, syntaxin, Ser/Thr protein kinase, sequestrin and acid phosphatase. Glycine homo repeat proteins are ubiquitously present in almost all super kingdoms. The mycobacterial parasite M. bovis has the highest number of alanine homo repeat proteins, representing 64.5 percent of its homo repeat proteins and nearly 4.8 percent of annotated proteins in the genome.
Figure 3b–d, is a graphical representation of perfect repeats in representative proteomes. Figure 3b gives the distribution of repeat proteins in the eukaryotes, which indicates that P. falciparum has an unusual distribution of repeats in comparison with other eukaryotes; it is the only organism with more than 50% of its proteome constituted by proteins containing different kinds of repeats. Since one of the important role of repeat proteins is believed to be in immune evasion by the parasites, we investigated the nature of repeats in immuno dominant proteins in P. falciparum, and we found that the parasite antigens are not only rich in homo repeats, several of these proteins are also marked by abundance of hetero repeats. We also found that twenty one antigens of the parasite also have hetero repeats of size ten or greater. Noticeably, most of these hetero repeat regions are rich in charged or polar amino acids, which are known to play important role in antigen-antibody or antigen-carrier protein interactions. There are several such examples, including that of the S-antigen in P. falciparum, which is characterized by the presence of two similar repeats of sixty nine residues each, forming the coiled and surface exposed regions of the proteins. Similarly, other plasmodium antigens such as the LSA (liver stage antigen), antigen 332, MESA (mature-parasite- infected erythrocyte surface antigen) or PfEMP2 (P. falciparum erythrocyte membrane protein 2) and FIRA (interspersed repeat antigen) also have large inserts of hetero repeats. MESA and ring-infected erythrocyte antigen are surface exposed antigens rich in glutamic acid and lysine residues. Some of these antigens are also being developed as vaccines .
Amongst other eukaryotes, we found abundant repeat proteins in O. sativa, H. sapiens, D. melanogastor and Neurospora crassa. Interestingly, N. crassa and O. Sativa are the only organisms in the database, in which the percentage of homo repeat proteins exceeds that of hetero perfect repeat proteins. In Guillardia theta and Encephalitozoon cuniculi, the percentage of all types of perfect repeat proteins is almost the same as that of perfect hetero repeat proteins alone, which implies that the number of homo repeat proteins is quite less and in fact lowest amongst the eukaryotes studied here. However, it is warned that several RefSeq genomes are still under review and contain several proteins sequences yet to be verified, and G. theta and E. cuniculi genomes are examples of such genomes.
Rat and mouse proteins have lower number of perfect repeat proteins as compared to human proteome (Figure 3b). However, percentage of homo repeat proteins in rat and mouse are almost same and only marginally lower than that of humans, implying that the human proteome is richer in proteins with hetero repeats. It might have some implications regarding the observation that the generation of repeats promote protein evolution  and formation of novel functional variants [45, 46]. We observed differential distribution of repeats in other closely related species as well, for example Saccharomyces cerevisiae and S. pombe have almost equal fraction of repeat proteins, however, the percentage of homo repeat proteins is considerably higher in S. cerevisiae (11%) as compared to S. pombe (~7%)(Figure 3b).
The percentage of repeat proteins in prokaryotic genomes is lower as compared to that in eukaryotes. The highest percentage of repeat proteins in prokaryotes is observed in Thermus thermophilus (~29%) followed by Mycobacterium tuberculosis Rv (~28%) whereas Salmonella typhi contains the lowest number of repeat proteins not only amongst prokaryotes, but in all the proteomes studied here (Figure 3c). It is interesting to note that M. tuberculosis Rv has higher percentage of repeat proteins as compared to that in other mycobacterium species, namely M. bovis and Mycobacterium avium. All the mycobacterial species except Mycobacterium leprae, predominantly have alanine, glycine and proline homo repeat proteins. Alanine repeats are abundant in PPE (proteins with proline-proline-glutamine motifs) family and a few PE-PGRS proteins (proteins with proline-glutamine motifs and polymorphic GC-rich repetitive sequences)  of M. tuberculosis Rv and M. bovis. However, all M. avium alanine homo repeat proteins are hypothetical proteins except two: one PE family protein and a PstA (phosphate-specific transport subunit A) protein. M. leprae has the lowest percentage of repeat proteins amongst mycobacterium genomes in the database mainly due to lower numbers of PPE family homologs. The glycine repeats are exclusively present in PE-PGRS proteins of M. tuberculosis Rv and M. bovis but absent in M. avium and M. leprae. The PE-PGRS is a newly identified family of fibronectin-binding proteins involved in antigenic variation. The number of glycine repeat proteins in M. avium and M. leprae is much less as compared to that in the other two species. All the four species of mycobacterium have proline repeats, mainly in the proline-rich antigens. Repetitive regions have been used as genetic markers for the strain differentiation and epidemiology of mycobacterium species [48, 49]. It will be interesting to perform systematic comparative investigation of different types of repeats in the species and investigate possible roles of repeats in specific interactions with their hosts.
Amongst the archeal genomes, the percentage of total repeat proteins is highest in Halobacterium (~21%) followed by Aeropyrum pernix and the lowest in Archaeoglobus fulgidus (~7%) (Figure 3d). Notably, we found that all archeal proteomes have lower number of homo repeat proteins. The highest percentage of homo repeat proteins is observed in Halobacterium sps. (~4%) and lowest in A. fulgidus (~1%). During evolution, archeal genomes mainly accumulated hetero repeat proteins but the resistance towards evolution of homo repeat proteins or mechanism of specific selection of hetero duplication in genomes is not yet known. A detailed and comprehensive analysis of different types of repeats in the three super kingdoms as well as intra-kingdom might provide leads to answers of many open questions, which are, however, beyond the scope of the present study.
The ProtRepeatsDB contains tools for large-scale comparison of protein repeats across genomes to aid studies related to evolution of repeat genesis and functional roles of such repeats in different organisms. The differential propensity of repeats among the super kingdoms/lineages and organisms within the same kingdom/genus emphasizes that different types of repeats have undergone different selection pressure and propagation mechanisms during evolution. However, conclusions derived from analysis of sequences emerging from the sequencing projects with little or no manual curation or additional experimental validation of gene structure have to be dealt with additional care, as some of such sequences may have translations from incorrect gene predictions.
ProtRepeatsDB is a multi-organism database of protein repeats, which is the first database of its kind that incorporates different kinds of repeats viz. perfect repeats-homopeptides and heteropeptides, mismatch repeats and profile patterns representing different families of repeats. The current version (v 1.2) consists of 120686 perfect repeats, 834621 mismatch repeats and 3673 profile repeats from 894890 protein sequences belonging to 141 genomes. The web interface of ProtRepeatsDB consists of unique tools which allow formulation of queries for retrieval and cross species comparison of repeats.
Availability and requirements
ProtRepeatsDB is freely accessible on the Internet at http://bioinfo.icgeb.res.in/repeats/ (Figure 1b). The database is divided into integrated sections of different types of repeats for easy browsing and data retrieval. The web interface of ProtRepeatsDB is supported with PERL and PHP scripts which enable formulation of queries against the database. Results are displayed either in tabulated or graphical formats.
ProtRepeatsDB will be regularly updated with protein repeat sequences in emerging annotated sequences from various genome sequencing projects. ProtRepeatsDB will be developed further to include cross links with other databases, repeats detected by other repeat finding algorithms, 3-dimensional structures of repeat proteins, web based repeat finding servers, tools for phylogenetic analysis and ortholog based search for comparative analysis of repeats.
We thank WHO (World Health Organization) for the support to the bioinformatics facility at ICGEB. MKK acknowledges CSIR (Council for Scientific and Industrial Research, India) and GR acknowledge UGC (University Grants Commission, India) for financial support. We also thank Arnab Pain (The Wellcome Trust Sanger Institute) for reviewing the manuscript.
- Karlin S, Brocchieri L, Bergman A, Mrazek J, Gentles AJ: Amino acid runs in eukaryotic proteomes, disease associations. Proc Natl Acad Sci USA 2002, 99: 333–338. 10.1073/pnas.012608599PubMed CentralView ArticlePubMedGoogle Scholar
- Hofferbert S, Schanen NC, Chehab F, Francke U: Trinucleotide repeats in the human genome:size distributions for all possible triplets, detection of expanded disease alleles in a group of Huntington disease individuals by the repeat expansion detection method. Hum Mol Genet 1997, 6: 77–83. 10.1093/hmg/6.1.77View ArticlePubMedGoogle Scholar
- Kazemi-Esfarjani P, Trifiro MA, Pinsky L: Evidence for a repressive function of the long polyglutamine tract in the human androgen receptor:possible pathogenetic relevance for the CAGn-expanded neuronopathies. Hum Mol Genet 1995, 4: 523–527.View ArticlePubMedGoogle Scholar
- Brown LY, Brown SA: Alanine tracts:the expanding story of human illness, trinucleotide repeats. Trends Genet 2004, 20: 51–58. 10.1016/j.tig.2003.11.002View ArticlePubMedGoogle Scholar
- Perutz MF, Johnson T, Suzuki M, Finch JT: Glutamine repeats as polar zippers:their possible role in inherited neurodegenerative diseases. Proc Natl Acad Sci USA 1994, 91: 5355–5358. 10.1073/pnas.91.12.5355PubMed CentralView ArticlePubMedGoogle Scholar
- Michelitsch MD, Weissman JS: A census of glutamine/asparagine-rich regions:implications for their conserved function, the prediction of novel prions. Proc Natl Acad Sci USA 2000, 97: 11910–11915. 10.1073/pnas.97.22.11910PubMed CentralView ArticlePubMedGoogle Scholar
- Stahl N, Prusiner SB: Prions, prion proteins. FASEB J 5: 2799–2807.
- Leikauf GD, Borchers MT, Prows DR, Simpson LG: Mucin apoprotein expression in COPD. Chest 1991, 121: 166S-182S. 10.1378/chest.121.5_suppl.166SView ArticleGoogle Scholar
- Clarke JL, Sodeinde O, Mason PJ: A unique insertion in Plasmodium berghei glucose-6-phosphate dehydrogenase-6-phosphogluconolactonase: evolutionary and functional studies. Mol Biochem Parasitol 2003, 127: 1–8. 10.1016/S0166-6851(02)00298-0View ArticlePubMedGoogle Scholar
- Reeder JC, Brown GV: Antigenic variation, immune evasion in Plasmodium falciparum malaria. Immunol Cell Biol 1996, 74: 546–554.View ArticlePubMedGoogle Scholar
- Anders RF, Smythe JA: Polymorphic antigens in Plasmodium falciparum. Blood 1989, 74: 1865–1875.PubMedGoogle Scholar
- Verra F, Hughes AL: Biased amino acid composition in repeat regions of Plasmodium antigens. Mol Biol Evol 1999, 16: 627–33.View ArticlePubMedGoogle Scholar
- Lee MG, Russell DG, D'Alesandro PA, Van der Ploeg LH: Identification of membrane-associated proteins in Trypanosoma brucei encoding an internal, EARLRAEE amino acid repeat. J Biol Chem 1994, 269: 8408–8415.PubMedGoogle Scholar
- Gravekamp C, Rosner B, Madoff LC: Deletion of repeats in the alpha C protein enhances the pathogenicity of group B streptococci in immune mice. Infect Immun 1998, 66: 4347–4354.PubMed CentralPubMedGoogle Scholar
- Parham SN, Resende CG, Tuite MF: Oligopeptide repeats in the yeast protein Sup35p stabilize intermolecular prion interactions. EMBO J 2001, 20: 2111–2119. 10.1093/emboj/20.9.2111PubMed CentralView ArticlePubMedGoogle Scholar
- Andrade MA, Perez-Iratxeta C, Ponting CP: Protein repeats:structures, functions, evolution. J Struct Biol 2001, 134: 117–131. 10.1006/jsbi.2001.4392View ArticlePubMedGoogle Scholar
- Heger A, Holm L: Rapid automatic detection, alignment of repeats in protein sequences. Proteins 2000, 41: 224–237. 10.1002/1097-0134(20001101)41:2<224::AID-PROT70>3.0.CO;2-ZView ArticlePubMedGoogle Scholar
- Andrade MA, Ponting CP, Gibson TJ, Bork P: Homology-based method for identification of protein repeats using statistical significance estimates. J Mol Biol 2000, 298: 521–537. 10.1006/jmbi.2000.3684View ArticlePubMedGoogle Scholar
- Wan H, Li L, Federhen S, Wootton JC: Discovering simple regions in biological sequences associated with scoring schemes. J Comput Biol 2003, 10: 171–185. 10.1089/106652703321825955View ArticlePubMedGoogle Scholar
- Alba MM, Santibanez-Koref MF, Hancock JM: The comparative genomics of polyglutamine repeats:extreme differences in the codon organization of repeat-encoding regions between mammals, Drosophila. J Mol Evol 2001, 52: 249–259.PubMedGoogle Scholar
- Alba MM, Guigo R: Comparative analysis of amino acid repeats in rodents, humans. Genome Res 2004, 14: 549–554. 10.1101/gr.1925704PubMed CentralView ArticlePubMedGoogle Scholar
- Depledge DP, Dalby AR: COPASAAR – A database for proteomic analysis of single amino acid repeats. BMC Bioinformatics 2005, 6: 196. 10.1186/1471-2105-6-196PubMed CentralView ArticlePubMedGoogle Scholar
- NCBI Reference sequences (RefSeq)[http://www.ncbi.nlm.nih.gov/RefSeq/]
- Karlin S: Statistical significance of sequence patterns in proteins. Curr Opin Struct Biol 1995, 5: 360–371. 10.1016/0959-440X(95)80098-0View ArticlePubMedGoogle Scholar
- Mott R: Accurate formula for P-values of gapped local sequence, profile alignments. J Mol Biol 2000, 300: 649–659. 10.1006/jmbi.2000.3875View ArticlePubMedGoogle Scholar
- Hulo N, Sigrist CJ, Le Saux V, Langendijk-Genevaux PS, Bordoli L, Gattiker A, De Castro E, Bucher P, Bairoch A: Recent improvements to the PROSITE database. Nucleic Acids Res 2004, (32 Database):D134-D137. 10.1093/nar/gkh044Google Scholar
- Rice P, Longden I, Bleasby A: EMBOSS:the European Molecular Biology Open Software Suite. Trends Genet 2000, 16: 276–277. 10.1016/S0168-9525(00)02024-2View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Brunori M, Silvestrini MC, Pocchiari M: The scrapie agent, the prion hypothesis. Trends Biochem Sci 1988, 13: 309–313. 10.1016/0968-0004(88)90126-0View ArticlePubMedGoogle Scholar
- Kuznetsov IB, Morozov PS, Matushkin YG: Prion proteins:evolution, preservation of secondary structure. FEBS Lett 1997, 412: 429–432. 10.1016/S0014-5793(97)00810-7View ArticlePubMedGoogle Scholar
- Croes EA, Theuns J, Houwing-Duistermaat JJ, Dermaut B, Sleegers K, Roks G, Van den Broeck M, van Harten B, van Swieten JC, Cruts M, Van Broeckhoven C, van Duijn CM: Octapeptide repeat insertions in the prion protein gene, early onset dementia. J Neurol Neurosurg Psychiatry 2004, 75: 1166–1170. 10.1136/jnnp.2003.020198PubMed CentralView ArticlePubMedGoogle Scholar
- Ferlin A, Garolla A, Bettella A, Bartoloni L, Vinanzi C, Roverato A, Foresta C: Androgen receptor gene CAG, GGC repeat lengths in cryptorchidism. Eur J Endocrinol 2005, 152: 419–425. 10.1530/eje.1.01860View ArticlePubMedGoogle Scholar
- Ding D, Xu L, Menon M, Reddy GP, Barrack ER: Effect of GGC glycine repeat length polymorphism in the human androgen receptor on androgen action. Prostate 2005, 62: 133–139. 10.1002/pros.20128View ArticlePubMedGoogle Scholar
- Das AK, Cohen PW, Barford D: The structure of the tetratricopeptide repeats of protein phosphatase 5:implications for TPR-mediated protein-protein interactions. EMBO J 1998, 17: 1192–1199. 10.1093/emboj/17.5.1192PubMed CentralView ArticlePubMedGoogle Scholar
- Goebl M, Yanagida M: The TPR snap helix:a novel protein repeat motif from mitosis to transcription. Trends Biochem Sci 1991, 16: 173–177. 10.1016/0968-0004(91)90070-CView ArticlePubMedGoogle Scholar
- D'Andrea LD, Regan L: TPR proteins:the versatile helix. Trends Biochem Sci 2003, 28: 655–662. 10.1016/j.tibs.2003.10.007View ArticlePubMedGoogle Scholar
- Main ER, Stott K, Jackson SE, Regan L: Local and long-range stability in tandemly arrayed tetratricopeptide repeats. Proc Natl Acad Sci USA 2005, 102: 5721–5726. 10.1073/pnas.0404530102PubMed CentralView ArticlePubMedGoogle Scholar
- Tanaka M, Machida Y, Nishikawa Y, Akagi T, Hashikawa T, Fujisawa T, Nukina N: Expansion of polyglutamine induces the formation of quasi-aggregate in the early stage of protein fibrillization. J Biol Chem 2003, 278: 34717–34724. 10.1074/jbc.M209852200View ArticlePubMedGoogle Scholar
- de Pril R, Fischer DF, Maat-Schieman ML, Hobo B, de Vos RA, Brunt ER, Hol EM, Roos RA, van Leeuwen FW: Accumulation of aberrant ubiquitin induces aggregate formation and cell death in polyglutamine diseases. Hum Mol Genet 2004, 13: 1803–1813. 10.1093/hmg/ddh188View ArticlePubMedGoogle Scholar
- Singh GP, Chandra BR, Bhattacharya A, Akhouri RR, Singh SK, Sharma A: Hyper-expansion of asparagines correlates with an abundance of proteins with prion-like domains in Plasmodium falciparum. Mol Biochem Parasitol 2004, 137: 307–19. 10.1016/j.molbiopara.2004.05.016View ArticlePubMedGoogle Scholar
- Zeller R, Haramis AG, Zuniga A, McGuigan C, Dono R, Davidson G, Chabanis S, Gibson T: Formin defines a large family of morphoregulatory genes and functions in establishment of the polarising region. Cell Tissue Res 1999, 296: 85–93. 10.1007/s004410051269View ArticlePubMedGoogle Scholar
- Pires JR, Parthier C, Aido-Machado R, Wiedemann U, Otte L, Bohm G, Rudolph R, Oschkinat H: Structural basis for APPTPPPLPP peptide recognition by the FBP11WW1 domain. J Mol Biol 2005, 348: 399–408. 10.1016/j.jmb.2005.02.056View ArticlePubMedGoogle Scholar
- Chiang PK, Bujnicki JM, Su X, Lanar DE: Malaria: therapy, genes and vaccines. Curr Mol Med 2006, 6: 309–26. 10.2174/156652406776894545View ArticlePubMedGoogle Scholar
- Tompa P: Intrinsically unstructured proteins evolve by repeat expansion. Bioessays 2003, 25: 847–55. 10.1002/bies.10324View ArticlePubMedGoogle Scholar
- Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D: A census of protein repeats. J Mol Biol 1999, 293: 151–60. 10.1006/jmbi.1999.3136View ArticlePubMedGoogle Scholar
- Williamson MP: The structure and function of proline-rich regions in proteins. Biochem J 1994, 297: 249–60.PubMed CentralView ArticlePubMedGoogle Scholar
- Espitia C, Laclette JP, Mondragon-Palomino M, Amador A, Campuzano J, Martens A, Singh M, Cicero R, Zhang Y, Moreno C: The PE-PGRS glycine-rich proteins of Mycobacterium tuberculosis: a new family of fibronectin-binding proteins? Microbiology 1999, 145: 3487–3495.View ArticlePubMedGoogle Scholar
- Sun YJ, Lee AS, Ng ST, Ravindran S, Kremer K, Bellamy R, Wong SY, van Soolingen D, Supply P, Paton NI: Characterization of ancestral Mycobacterium tuberculosis by multiple genetic markers and proposal of genotyping strategy. J Clin Microbiol 2004, 42: 5058–5064. 10.1128/JCM.42.11.5058-5064.2004PubMed CentralView ArticlePubMedGoogle Scholar
- van Soolingen D, de Haas PE, Hermans PW, Groenen PM, van Embden JD: Comparison of various repetitive DNA elements as genetic markers for strain differentiation and epidemiology of Mycobacterium tuberculosis. J Clin Microbiol 1993, 31: 1987–1995.PubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.