AMYPdb: A database dedicated to amyloid precursor proteins
© Pawlicki et al; licensee BioMed Central Ltd. 2008
Received: 25 June 2007
Accepted: 10 June 2008
Published: 10 June 2008
Misfolding and aggregation of proteins into ordered fibrillar structures is associated with a number of severe pathologies, including Alzheimer's disease, prion diseases, and type II diabetes. The rapid accumulation of knowledge about the sequences and structures of these proteins allows using of in silico methods to investigate the molecular mechanisms of their abnormal conformational changes and assembly. However, such an approach requires the collection of accurate data, which are inconveniently dispersed among several generalist databases.
We therefore created a free online knowledge database (AMYPdb) dedicated to amyloid precursor proteins and we have performed large scale sequence analysis of the included data. Currently, AMYPdb integrates data on 31 families, including 1,705 proteins from nearly 600 organisms. It displays links to more than 2,300 bibliographic references and 1,200 3D-structures. A Wiki system is available to insert data into the database, providing a sharing and collaboration environment. We generated and analyzed 3,621 amino acid sequence patterns, reporting highly specific patterns for each amyloid family, along with patterns likely to be involved in protein misfolding and aggregation.
AMYPdb is a comprehensive online database aiming at the centralization of bioinformatic data regarding all amyloid proteins and their precursors. Our sequence pattern discovery and analysis approach unveiled protein regions of significant interest. AMYPdb is freely accessible .
Amyloid deposits are abnormal in vivo extracellular aggregates of insoluble proteinaceous fibers exhibiting a cross-beta structure. The proteins or fragments found in these aggregates derive from diverse full-length precursors belonging to families without any obvious functional or structural resemblance. In addition to these quite typical extracellular deposits, other proteins can also form intracellular inclusions. Under the effect of diverse modifications, including interaction with chaperones, mutations, supraphysiological concentrations, post-translational modifications, and so on, amyloid proteins fail to fold properly, thus accumulating irreversibly over long periods, with toxic effect [2–4].
Protein misfolding is associated with a wide range of human diseases called amyloidoses. These may affect multiple tissues, in the case of systemic amyloidoses, or can be limited to a particular organ. Those pathologies may have major health and social impacts, as in the case of Alzheimer's disease , or might be somewhat benign, such as the amyloidosis that can occur among diabetics at the site of their insulin injections .
Prions are a special case among amyloid proteins because of their unusual properties. They originate from the conversion of a normal host protein into a fibrillar structure that then acts as an infectious particle . To date only one prion, PrP, has been discovered in vertebrates. It is involved in major neurodegenerative diseases including Creutzfeldt-Jakob disease, Gerstmann-Straüssler-Scheinker syndrome, and Kuru in humans, scrapie in sheep, and spongiform encephalopathy in cattle. Prion proteins are also described in eukaryotic microorganisms (yeasts and fungi). However, in these latter organisms, the prion isoform is not always toxic and can control normal cellular processes [8, 9]. The prion concept has been recently extended to include mammalian prion-like proteins, such as Tia-1. This is an RNA-binding protein implicated in the assembly of the cytoplasmic aggregates known as stress granules .
Schematically, the conversion of a normal soluble protein into insoluble amyloid fibers begins with a conformational change, resulting in an intermediate form, an amyloidogenic isoform. This new conformation favors self-association in small oligomers that act as nucleation units. The growth of the nucleation units leads to the formation of long protofilaments, which are wrapped to form mature fibers . Biophysical techniques have shown that protofilaments may have various morphologies, but that they share common properties at the molecular level. The amyloid proteins/peptides form either parallel or anti-parallel arrangements of beta-strands. Since these beta-strands are perpendicular to the fiber axis, this has been described as a cross-beta structure . Despite the difficulties of using experimental approaches to determine the precise 3D-structure of amyloid proteins in their fibrillar state, several models have recently been proposed . These discoveries profit from computer simulations being used more and more often in biology.
Some authors have demonstrated that amyloid-like structures can be obtained in vitro with almost any protein, suggesting that the ability to form fibers is a common property of polypeptide chains . However, the number of proteins aggregating in vivo is low compared to the over 3 million sequences stored in the Universal Protein Ressource (UniProtKB), and only include a few specific members of 31 families. The propensity of a protein to aggregate into amyloid fibrils varies greatly with the amino-acid sequence and with cellular environment. To take just two examples: the globular protein lysozyme is only associated with amyloid deposition in the kidney when it presents site-specific mutations (I56T, F57I, W64R, D67H) , while phosphorylation of Huntingtin may modulate its cleavage and toxicity .
During the past few years, bioinformatic approaches have been dedicated to the discovery of sequence segments that are sensitive to self-aggregation or that promote protein destabilization [14, 17–27]. All methods presented in these papers are based on similar ideas. Each one tries to calculate various aggregation indexes and profiles by exploiting the information found in kinetic data, peptide/protein sequences, conformation space, and/or 3D-structures. However, a common problem encountered in these in silico experiments is the difficulty of finding and extracting accurate data from the existing literature and various molecular databases.
For studies focusing on sequence features, three databases are usually particularly useful: the UniProtKB, which provides sequences with functional annotations, comments, and cross-references ; the PROSITE database, which consists of a large collection of sequence signatures ; and the bibliographical database MEDLINE . However, extraction of information from such general databases can be complex and time-consuming due to the large amount of data available and because of the diversity of the gene or protein families. To compensate for this, Siepen  developed a specialized database, fibril_one, dedicated to the analysis of mutations associated with fibrillogenesis. Unfortunately, the usefulness of this resource is limited, as fibril_one contains few data and has never been updated.
To facilitate in silico comparison of proteins involved in the formation of beta-sheet-rich fibrils in vivo, we have created a new multi-user database, the AMYloid Protein database (AMYPdb). The main goal in developing this relational database was to provide a regularly updated access to protein sequences and patterns describing each family. The 3,621 amino acid sequence patterns stored in the database can be screened to facilitate the assigning of new sequences to a particular family and the formulation of hypotheses about their function(s). Patterns conserved in several families may also help in extracting rules about the mechanisms of fibril formation.
Results and Discussion
Working with AMYPdb
AMYPdb is freely accessible . Users can browse web pages to obtain descriptions of each family, visualize protein sequences enriched with links to both UniProtKB and the Protein DataBank (PDB), study multiple sequence alignments using the Jalview editor , or access bibliographic references. An identity card for each protein is available from the "protein menu". Links to Wikipedia provide further information on some families. Sequences can be selected and exported in FASTA format for further analysis.
The amino acid sequence patterns are accessible by browsing the pages from the "pattern" menu, or by using the search interface. The search page contains several menus, allowing the user to focus on particular data. For instance, they can interrogate AMYPdb with UniProtKB or PROSITE identifiers or patterns to determine whether a particular protein or pattern is stored in AMYPdb. They can also submit a personal signature to find any matching amyloid proteins, or inversely, they can submit a sequence to find matching AMYPdb patterns. It is also possible to select patterns using thresholds on quality scores. This method is useful for discovering patterns shared between families.
Beyond functioning as a pure repository of knowledge, AMYPdb also provides private workspaces to anyone interested in further analysis. This allows users to manage their own working sets of proteins and patterns, which can be easily manipulated and organized accordingly to their research interests.
Below, we have illustrated several ways that AMYPdb can be useful in pattern research on amyloid proteins.
Protein family signatures
Amyloid families in AMYPdb
Involved in the coagulation cascade
Autosomal dominant hereditary hepatic or renal amyloidosis
Synucleopathies, such as Alzheimer's and Parkinson's diseases
Amyloid Beta Precursor
Alzheimer's disease and aged Down's syndrome
Autosomal dominant systemic amyloidosis
Atrial Natriuretic Factor
Blood pressure and sodium balance
Isolated Atrial Amyloid
Class 1 human leukocyte antigen
Aggregation in the musculoskeletal system
Familial British Dementia (FBD)
Major component of lung surfactant
Pulmonary alveolar proteinosis
Amyloid deposits in case of medullary thyroid cancer
Cystein protease inhibitor
Alzheimer's disease and cerebral amyloid angiopathy
Modulation of actin filament length
Gelsolin familial amyloidosis
Fast axonal trafficking
Metabolism of carbohydrates and fat
Localized amyloidosis at injection sites of type 1 diabetic patients
Islet Amyloid Polypeptide
Aggregates in pancreatic islets of type 2 diabetes and insulinomas
Aortic medial amyloidoses
Amyloid deposits in the cornea, seminal vesicles and brain
Non-neuropathic systemic amyloidosis
Regulation of the protein's activity
Hormone secreted by the pituitary gland
Amyloid deposits in pituitary glands of aging individuals
Serine protease inhibitors
Serum amyloid A
Cell adhesion, migration, and proliferation
Inflammation-associated reactive amyloidosis
Microtubule assembly and stability
Alzheimer's disease and dementias
Familial amyloid polyneuropathies
Prionization involved in the protein's normal function
No stable prion has been shown in vivo
Prion Protein (PrP)
Signal transmission, copper regulation?
Transmissible spongiform encephalopathies and dementias
Translation termination factor
Prionization might be advantageous in stress conditions
Loss of Ure2 function
Best patterns describing amyloid protein families
Amyloid Beta Precursor
Atrial Natriuretic Factor
Islet Amyloid Polypeptide
Serum amyloid A
Prion Protein (PrP)
There are many advantages in describing a protein family using several patterns rather than only one or two, as is done in PROSITE. First, the occurrence of more than one pattern increases confidence that a protein belongs to a specific family. Pattern distribution along sequences can also be used to assess conserved and variable regions in proteins. Indeed, highly specific patterns only describe conserved regions in proteins. Examples of this are the Tau and prolactin protein families. Human Tau protein is characterized by 13 patterns with CF ≥ 0.9, all found in the C-terminal region of the protein and covering barely 14% of the sequence. This suggests that the C-terminal part of Tau is the protein's main domain (indeed it is the microtubule-interacting region). On the other hand, 54 patterns with CF ≥ 0.9 are characteristic of human prolactin, and they are distributed all along the sequence, covering 32% of it. This suggests the presence of numerous important regions, which are likely correlated to the many known biological effects of prolactin.
Amino acid sequence pattern exploration
Signatures of biological interest
Although patterns in AMYPdb were created from precursor proteins, users can easily access signatures matching aggregation features and other biological annotations. Indeed, for each pattern, the AMYPdb interface displays its position in sequences, along with the corresponding UniProtKB features. There are 836 highly specific patterns (CF ≥ 0.9) covering annotated regions in proteins. Among these, 251 patterns match variants associated with keywords such as aggregation, amyloidosis, Alzheimer, Parkinson, and so on (FT variant lines in UniProtKB). We have successfully used AMYPdb for knowledge-rich data mining concerning three amyloid families: transthyretin; tau; and prion.
In AMYPdb, 97 patterns (CF ≥ 0.9), distributed over the entire sequence of human transthyretin (hTTR), map 31 of the 37 single-site amyloidogenic variants described in UniProtKB. In pattern G-E-[IL V]-H-[EGN]-L-x(0,1)-T-x(3,4)-F-x(2)-G-[I LV]-[IY]-[K R]-[ILV]-E, the 9 underlined amino acids correspond to pathogenic missense mutations in human TTR (positions 73–92). In particular, the variant I88L is associated with an amyloid cardiomyopathy. Interestingly, the multiple sequence alignment available in AMYPdb reveals that leucine exists in the wild-type sequence of seven organisms, including bovine and sheep. Comparative study of these proteins with hTTR could help to understand the effect of the mutation isoleucine/leucine in human disease.
The human Tau protein sequence deduced from the gene is composed of 757 amino acids. It exists however in the human brain as 6 alternatively-spliced isoforms of 352 to 441 amino acids (Tau-A to F), with each isoform containing 3 or 4 repeat domains (R repeats). Using the AMYPdb search interface, we researched each isoform, and found 8 patterns (CF ≥ 0.9) matching 1 to 6 isoforms. One of the patterns, P-G-G-G-[KNS]-V-Q-I-[FIV]-[DHNY] is observed in all of the isoforms, and matches "hot spot" regions for nucleation, β-sheet aggregation and fibril formation both in vitro and in silico[19, 33, 34]. The pattern is located at the junction between R repeats. It matches 3 regions in human tau (PGGGKVQIVY, PGGGKVQIIN and PGGGSVQIVY), that correspond to 2 kinds of junctions: R1–R3 in Tau-A, Tau-B and Tau-C (3 R repeats); and R1–R2 and R2–R3 in Tau-D, Tau-E and Tau-F (4 R repeats). Moreover, the pattern includes variants described as being involved in tau pathogenicity: N596K, delV597, P618L, P618S and S622N (numbering according to UniProtKB P10306).
In a recent study, Hamodrakas et al.  predicted amyloidogenic determinants in several proteins by combining three methods. In the case of human prion (P04156), the authors pointed out the segments 175–183 (FVHDCVNIT), 209–215 (VVEQMCI) and 242–251 (LLISFLIFLI). Using AMYPdb we searched for amino acid sequence patterns and UniProtKB features matching these segments. Some results are summarized in Table 3. The high density of mutation/modification sites overlapping the first two amyloidogenic segments is intriguing. Indeed, these segments contain both the cysteines involved in the unique disulfide bond between helix 2 and 3, and a glycozylation site involved in prion strain propagation . Although other regions have been shown to be important for prion propagation, susceptibility, and other activities, our observations reinforce the idea that in silico investigations are more efficient when they combine several methods, such as sequence pattern discovery, aggregation prediction, bibliographical knowledge, and so on.
PrP regions including amyloidogenic determinants according to 
Segments of PrP sequence
SNQNNFVHD CV NIT IKQHTF
D178N, V180I, T183A
N-Glycozylation 181–184 Kinase C 183–185 Disulfide 179–214
KMMER VVEQ MCITQ YER
R208H, V210I, E211Q, Q212P, Q217R
Kinase II 216–219 Disulfide 179–214
Sequence patterns conserved in several families
The patterns are distributed all along the sequences. In human huntingtin (3,144 residues) the patterns N°1 to N°4 match respectively at position 212, 1501, 2092, 2789. Positions in human prolactin (227 residues) are 91, 108, 200, and 214 for patterns N° 3, 2, 1 and 4 respectively. Two of those patterns match known structural/functional features in human huntingtin and human prolactin. Pattern N°1 is located in the first "HEAT repeat" of huntingtin, belonging to the N-terminal part of the fragment found in amyloid aggregates . The patterns N°1 and N°4 contain cysteines known to be involved in disulfide bridges in prolactin. They are located in the 4th α-helix of prolactin, already established to be part of the site of interaction with one of the prolactin receptors . The segment of human huntingtin and prolactin corresponding to pattern N°2 is located in the middle of each protein and is predicted by the TANGO algorithm to be a β-aggregating segment . To our knowledge, this is the first time that patterns have been described as shared between huntingtin and prolactin.
Although it seems unlikely that these results were due to chance, we searched for another pattern to confirm our observations. We used PRATT with a new set of 99 sequences, corresponding to all full-length huntingtin and prolactin proteins. We discovered a new highly specific pattern (N°5), R-[DV]-S-x-K-x(2)-[ANSTV]-x(3)-[FILV]-[AGL]-x-[ACS], conserved in 100% of the data set (Sen = 1). In a recent version of UniProtKB (release 10.5, containing more than 4.7 million sequences), this pattern retrieves new prolactin and huntingtin sequences, and only about 50 false positive sequences ones. Pattern N°5 is located at position 710 and 205 in human huntingtin and prolactin respectively, and includes a potential serine phosphorylation site.
This process can be applied to other families. However, it is clear that the quality of a prediction depends on the quality and number of patterns found in common. Experimental work should be undertaken to confirm our observations and to further understand the functional/structural significance of the conserved motifs shared between the huntingtin and prolactin families. These could reveal interaction sites with common cofactors such as chaperones, or common motifs involved in aggregation processes.
Amino acid repeats play an important structural role in proteins and are often associated with diseases. This is the case with huntingtin, which shows a polyglutamine tract in its N-terminal part. However, repeats are not limited to single amino acids, but can include domains repetitions . For example, repeats are thought to be involved in PrP prionization in mammals, since birds, reptiles, fish and amphibians do not show the same domain architecture [39, 40]. In AMYPdb, 41 distinct patterns cover the N-terminal domain of PrPs. We observed that the amino acid sequences of various patterns and their number of occurrences is closely linked to the phylogenetic differences described above, such as for the pattern: G-[GHKQRY]-[GNPSTY]-x-G-[GHQY]-G-x(0,3)-G-[QSWY]-[GHNPQ]-[GHPQRS]-[GNPQSTY]-x-[AGHNST]. In species from fishes to birds (which PrP is not demonstrated to be pathogenic), only 1 occurrence of this pattern was found in the N-terminal region while it is repeated 2 to 5 times in mammalian PrPs. The repetition might therefore act to facilitate the structural conversion of PrPc into PrPsc.
In this paper, we present a knowledge database dedicated to amyloid precursor proteins and their amino acid sequence signatures . Our work sheds light on the signatures that best describe each amyloid family. Moreover, we have extracted several patterns of interest to demonstrate how users can easily take advantage of the database for their own research. Note that because there are only sparse data on sequences which can form fibrils in vivo, especially non-human organisms, we cannot yet automatically predict aggregation regions. In the future we will continue to enrich the database with new families and new functionalities, ensuring that AMYPdb will remain a reference tool for researchers interested in bioinformatic approaches to protein misfolding and aggregation. A wiki system available in the "identity card" of the proteins allows experts to add high quality data.
Implementation and structure of the database
Amino acid sequence pattern discovery
We employed the commonly-used software PRATT, developed to extract conserved patterns in a set of unaligned protein sequences . The "advanced PRATT" version, accessible at OUEST-Genopole® , allows users to specify amino acids clusters, thus orienting the discovery of interesting patterns. We selected most of the default parameters of the program, and limited the maximum pattern length to 20 amino acids. This choice is in agreement with data from the recent literature, which show that short protein stretches may be involved in the self-assembly process of amyloid proteins .
Various analyses were carried out with "advanced PRATT" using the default clusters, based on the physico-chemical properties of amino acids. Moreover, we defined two other sets of clusters, corresponding to criteria related to amyloid aggregation. The first set was designed based on the ability of amino acids to either form beta sheets (CIFTWYV), alpha helices (AREQLK) or other secondary structures (NDGHMPS) . The second set was based on whether amino acids are found at protein-protein interfaces (CHILMFWYV) or not (ARNDEQGKPST) [47, 48]. A refinement parameter was systematically tested (on and off). When the parameter was switched on, ambiguous pattern positions were generalized using the groups of similar amino acids. Among the 31 known amyloid families, 9 could not be submitted to pattern matching because of their small number of known sequences. On the other hand, a few families had enough known sequences to design several data sets. Finally, we applied the pattern discovery method to 42 sets of sequences using 6 parameters, and selected patterns matching 100% of the sequences of each set. Thus, as described in Figure 2, and including the 38 PROSITE patterns, AMYPdb contains 3,621 patterns related to amyloid protein families.
In addition to pattern comparison, we also scanned the UniProtKB database (release 6.1) for sequences matching any of the 3,621 patterns. To do this, we used WAPAM , specifically developed to parse a list of amino acid patterns and to search for those patterns in sequence databases. Compared to other pattern-matching tools, WAPAM has several advantages. It has no limit in the pattern's length, flexibility, or indetermination. It also uses Rdisk technology, a specialized architecture that can highly accelerate a search. Using WAPAM with Rdisk, the scan of UniProtKB for the 3,621 patterns took less than 15 hours, instead of the estimated 2,000 hours it would have taken without it. UniProtKB returned 267,490 sequences matching AMYPdb patterns, although the number here is underestimated due to WAPAM's retrieval limitation. The UniProtKB files of these sequences were stored as a non-amyloid group in AMYPdb and were used for the classification procedure described below.
Database updating procedure
Since the content of UniProtKB was evolving during the various development phases of our project (631,592 entries added from release 3.2 to 6.1), we updated AMYPdb by semi-automatically sorting the 267,490 sequences extracted with WAPAM. From this group, we picked out 421 protein sequences matching highly specific AMYPdb patterns, and we assigned these sequences to the corresponding amyloid families. This updating procedure increased the AMYPdb sequence group to 1,705 members (1,063 full-length sequences and 642 fragments, Figure 2), and leaving 267,069 sequences in the non-amyloid protein group.
To measure pattern performance, we used three fitness scores commonly used in classification problems: sensitivity (Sen); specificity (Spe); and correlation (CF) [49, 50]. The scores of the 3,621 patterns were calculated for each family, using only full-length sequences. For each pattern, true positives (TP) and false negatives (FN) are sequences of a family respectively either matching or not matching the pattern. False positives (FP) are either amyloid sequences not belonging to the considered family but which match the pattern, or non-amyloid sequences matching the pattern. True negatives (TN) are non-amyloid sequences not matching the pattern. Therefore, when a pattern is specific for one amyloid family, it has high Sen, Spe and CF scores for that family and low scores for other amyloid families. When a pattern is conserved in several amyloid families, the Spe of the pattern remains high for each family, but Sen and CF scores can decrease dramatically. Due to calculation limitations, non-amyloid sequences were obtained from 1 of 3 data sets: the 2,032,835 proteins of UniProtKB was used for patterns matching less than 5000 non-amyloid proteins; the 267,069 non-amyloid sequences resulting from the WAPAM search was used for patterns for matching between 5,000 and 10,000 non-amyloid proteins; and a random pool of 50,000 proteins was used for patterns matching more than 10,000 non-amyloid proteins.
The accuracy of hypotheses deduced from bioinformatic methods strongly depends on data sets quality. In the present study, the sequence sets used in the pattern discovery method were those of protein families. In order to facilitate sequence extraction, especially for the discovery of patterns link to misfolding and aggregation, all the proteins were sorted into one of the five following quality categories:
Amyloid in vivo: the precursor protein, or a specific sub-segment, forms fibrils in human, or animals, or is a yeast prion. Proteins of this class are unambiguously described in literature and are identified by specific keywords in UniProtKB ("Amyloid" or "Prion").
Amyloid in vitro: the polypeptide forms fibrils under experimental conditions.
Amyloid in silico: the polypeptide forms fibrils using computational techniques, including protein threading and molecular dynamics simulations.
Putative amyloid protein: the protein is a member of an amyloid family, but the amyloid properties of that specific member were not assessed.
Unclassified protein: the protein family does not fulfill the definition of amyloid , but sparse data show that at least one member of the family shares some amyloid properties.
At the present time, the classes 2 and 3 are empty. Experts are welcome to contribute to relevance of biological information by changing the status of the proteins using the Wiki system.
We are grateful for support from the Région Bretagne to S. Pawlicki. We thank the CRITT Santé Bretagne for financial support. We also thank G. Georges, M. Giraud, L. Guillot, G. Ranchy and A-S. Valin, from the Ouest-Genopole®, for providing free bioinformatics services. We thank Juliana Berland for her careful reading of the manuscript, and Erwan Rio for aid in programming. We also acknowledge the anonymous referees for their useful suggestions.
- Selkoe DJ: Folding proteins in fatal ways. Nature 2003, 426(6968):900–904. 10.1038/nature02264View ArticlePubMedGoogle Scholar
- Sipe JD, Cohen AS: Review: history of the amyloid fibril. J Struct Biol 2000, 130(2–3):88–98. 10.1006/jsbi.2000.4221View ArticlePubMedGoogle Scholar
- Westermark P, Benson MD, Buxbaum JN, Cohen AS, Frangione B, Ikeda S, Masters CL, Merlini G, Saraiva MJ, Sipe JD: Amyloid: toward terminology clarification. Report from the Nomenclature Committee of the International Society of Amyloidosis. Amyloid 2005, 12(1):1–4.View ArticlePubMedGoogle Scholar
- Morishima-Kawashima M, Ihara Y: Alzheimer's disease: beta-Amyloid protein and tau. J Neurosci Res 2002, 70(3):392–401. 10.1002/jnr.10355View ArticlePubMedGoogle Scholar
- Dische FE, Wernstedt C, Westermark GT, Westermark P, Pepys MB, Rennie JA, Gilbey SG, Watkins PJ: Insulin as an amyloid-fibril protein at sites of repeated insulin injections in a diabetic patient. Diabetologia 1988, 31(3):158–161. 10.1007/BF00276849View ArticlePubMedGoogle Scholar
- Prusiner SB: Prions. Proc Natl Acad Sci U S A 1998, 95(23):13363–13383. 10.1073/pnas.95.23.13363PubMed CentralView ArticlePubMedGoogle Scholar
- Bieler S, Estrada L, Lagos R, Baeza M, Castilla J, Soto C: Amyloid formation modulates the biological activity of a bacterial protein. J Biol Chem 2005, 280(29):26880–26885. 10.1074/jbc.M502031200View ArticlePubMedGoogle Scholar
- Dalstra HJ, van der Zee R, Swart K, Hoekstra RF, Saupe SJ, Debets AJ: Non-mendelian inheritance of the HET-s prion or HET-s prion domains determines the het-S spore killing system in Podospora anserina. Fungal Genet Biol 2005, 42(10):836–847. 10.1016/j.fgb.2005.05.004View ArticlePubMedGoogle Scholar
- Gilks N, Kedersha N, Ayodele M, Shen L, Stoecklin G, Dember LM, Anderson P: Stress granule assembly is mediated by prion-like aggregation of TIA-1. Mol Biol Cell 2004, 15(12):5383–5398. 10.1091/mbc.E04-08-0715PubMed CentralView ArticlePubMedGoogle Scholar
- Dumoulin M, Dobson CM: Probing the origins, diagnosis and treatment of amyloid diseases using antibodies. Biochimie 2004, 86(9–10):589–600. 10.1016/j.biochi.2004.09.012View ArticlePubMedGoogle Scholar
- Sunde M, Serpell LC, Bartlam M, Fraser PE, Pepys MB, Blake CC: Common core structure of amyloid fibrils by synchrotron X-ray diffraction. J Mol Biol 1997, 273(3):729–739. 10.1006/jmbi.1997.1348View ArticlePubMedGoogle Scholar
- Nelson R, Eisenberg D: Recent atomic models of amyloid fibril structure. Curr Opin Struct Biol 2006, 16(2):260–265. 10.1016/j.sbi.2006.03.007View ArticlePubMedGoogle Scholar
- Pawar AP, Dubay KF, Zurdo J, Chiti F, Vendruscolo M, Dobson CM: Prediction of "aggregation-prone" and "aggregation-susceptible" regions in proteins associated with neurodegenerative diseases. J Mol Biol 2005, 350(2):379–392. 10.1016/j.jmb.2005.04.016View ArticlePubMedGoogle Scholar
- Merlini G, Bellotti V: Lysozyme: a paradigmatic molecule for the investigation of protein structure, function and misfolding. Clin Chim Acta 2005, 357(2):168–172. 10.1016/j.cccn.2005.03.022View ArticlePubMedGoogle Scholar
- Schilling B, Gafni J, Torcassi C, Cong X, Row RH, LaFevre-Bernt MA, Cusack MP, Ratovitski T, Hirschhorn R, Ross CA, Gibson BW, Ellerby LM: Huntingtin phosphorylation sites mapped by mass spectrometry. Modulation of cleavage and toxicity. J Biol Chem 2006, 281(33):23686–23697. 10.1074/jbc.M513507200View ArticlePubMedGoogle Scholar
- Dima RI, Thirumalai D: Proteins associated with diseases show enhanced sequence correlation between charged residues. Bioinformatics 2004, 20(15):2345–2354. 10.1093/bioinformatics/bth245View ArticlePubMedGoogle Scholar
- DuBay KF, Pawar AP, Chiti F, Zurdo J, Dobson CM, Vendruscolo M: Prediction of the absolute aggregation rates of amyloidogenic polypeptide chains. J Mol Biol 2004, 341(5):1317–1326. 10.1016/j.jmb.2004.06.043View ArticlePubMedGoogle Scholar
- Fernandez-Escamilla AM, Rousseau F, Schymkowitz J, Serrano L: Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat Biotechnol 2004, 22(10):1302–1306. 10.1038/nbt1012View ArticlePubMedGoogle Scholar
- Galzitskaya OV, Garbuzynskiy SO, Lobanov MY: Prediction of amyloidogenic and disordered regions in protein chains. PLoS Comput Biol 2006, 2(12):e177. 10.1371/journal.pcbi.0020177PubMed CentralView ArticlePubMedGoogle Scholar
- Hamodrakas SJ, Liappa C, Iconomidou VA: Consensus prediction of amyloidogenic determinants in amyloid fibril-forming proteins. Int J Biol Macromol 2007, 41(3):295–300. 10.1016/j.ijbiomac.2007.03.008View ArticlePubMedGoogle Scholar
- Lopez de la Paz M, Serrano L: Sequence determinants of amyloid fibril formation. Proc Natl Acad Sci U S A 2004, 101(1):87–92. 10.1073/pnas.2634884100PubMed CentralView ArticlePubMedGoogle Scholar
- Rousseau F, Schymkowitz J, Serrano L: Protein aggregation and amyloidosis: confusion of the kinds? Curr Opin Struct Biol 2006, 16(1):118–126. 10.1016/j.sbi.2006.01.011View ArticlePubMedGoogle Scholar
- Sanchez de Groot N, Pallares I, Aviles FX, Vendrell J, Ventura S: Prediction of "hot spots" of aggregation in disease-linked polypeptides. BMC Struct Biol 2005, 5: 18. 10.1186/1472-6807-5-18View ArticlePubMedGoogle Scholar
- Tartaglia GG, Cavalli A, Pellarin R, Caflisch A: Prediction of aggregation rate and aggregation-prone segments in polypeptide sequences. Protein Sci 2005, 14(10):2723–2734. 10.1110/ps.051471205PubMed CentralView ArticlePubMedGoogle Scholar
- Thompson MJ, Sievers SA, Karanicolas J, Ivanova MI, Baker D, Eisenberg D: The 3D profile method for identifying fibril-forming segments of proteins. Proc Natl Acad Sci U S A 2006, 103(11):4074–4078. 10.1073/pnas.0511295103PubMed CentralView ArticlePubMedGoogle Scholar
- Yoon S, Welsh WJ: Detecting hidden sequence propensity for amyloid fibril formation. Protein Sci 2004, 13(8):2149–2160. 10.1110/ps.04790604PubMed CentralView ArticlePubMedGoogle Scholar
- Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Mazumder R, O'Donovan C, Redaschi N, Suzek B: The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res 2006, 34(Database issue):D187–91. 10.1093/nar/gkj161PubMed CentralView ArticlePubMedGoogle Scholar
- Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJ: The PROSITE database. Nucleic Acids Res 2006, 34(Database issue):D227–30. 10.1093/nar/gkj063PubMed CentralView ArticlePubMedGoogle Scholar
- Siepen JA, Westhead DR: The fibril_one on-line database: mutations, experimental conditions, and trends associated with amyloid fibril formation. Protein Sci 2002, 11(7):1862–1866. 10.1110/ps.0204302PubMed CentralView ArticlePubMedGoogle Scholar
- Clamp M, Cuff J, Searle SM, Barton GJ: The Jalview Java alignment editor. Bioinformatics 2004, 20(3):426–427. 10.1093/bioinformatics/btg430View ArticlePubMedGoogle Scholar
- Li W, Lee VM: Characterization of two VQIXXK motifs for tau fibrillization in vitro. Biochemistry 2006, 45(51):15692–15701. 10.1021/bi061422+View ArticlePubMedGoogle Scholar
- Rojas Quijano FA, Morrow D, Wise BM, Brancia FL, Goux WJ: Prediction of nucleating sequences from amyloidogenic propensities of tau-related peptides. Biochemistry 2006, 45(14):4638–4652. 10.1021/bi052226qView ArticlePubMedGoogle Scholar
- Aguzzi A, Sigurdson C, Heikenwaelder M: Molecular mechanisms of prion pathogenesis. Annu Rev Pathol 2008, 3: 11–40. 10.1146/annurev.pathmechdis.3.121806.154326View ArticlePubMedGoogle Scholar
- Li H, Li SH, Johnston H, Shelbourne PF, Li XJ: Amino-terminal fragments of mutant huntingtin show selective accumulation in striatal neurons and synaptic toxicity. Nat Genet 2000, 25(4):385–389. 10.1038/78054View ArticlePubMedGoogle Scholar
- Teilum K, Hoch JC, Goffin V, Kinet S, Martial JA, Kragelund BB: Solution structure of human prolactin. J Mol Biol 2005, 351(4):810–823. 10.1016/j.jmb.2005.06.042View ArticlePubMedGoogle Scholar
- Andrade MA, Perez-Iratxeta C, Ponting CP: Protein repeats: structures, functions, and evolution. J Struct Biol 2001, 134(2–3):117–131. 10.1006/jsbi.2001.4392View ArticlePubMedGoogle Scholar
- Burns CS, Aronoff-Spencer E, Dunham CM, Lario P, Avdievich NI, Antholine WE, Olmstead MM, Vrielink A, Gerfen GJ, Peisach J, Scott WG, Millhauser GL: Molecular features of the copper binding sites in the octarepeat domain of the prion protein. Biochemistry 2002, 41(12):3991–4001. 10.1021/bi011922xPubMed CentralView ArticlePubMedGoogle Scholar
- Flechsig E, Shmerling D, Hegyi I, Raeber AJ, Fischer M, Cozzio A, von Mering C, Aguzzi A, Weissmann C: Prion protein devoid of the octapeptide repeat region restores susceptibility to scrapie in PrP knockout mice. Neuron 2000, 27(2):399–408. 10.1016/S0896-6273(00)00046-5View ArticlePubMedGoogle Scholar
- Zdobnov EM, Lopez R, Apweiler R, Etzold T: The EBI SRS server--recent developments. Bioinformatics 2002, 18(2):368–373. 10.1093/bioinformatics/18.2.368View ArticlePubMedGoogle Scholar
- ExPASy Proteomics tools[http://www.expasy.org/tools/]
- Jonassen I, Collins JF, Higgins DG: Finding flexible patterns in unaligned protein sequences. Protein Sci 1995, 4(8):1587–1595.PubMed CentralView ArticlePubMedGoogle Scholar
- Plate-forme bio-informatique GENOUEST[http://www.genouest.org]
- Esteras-Chopo A, Serrano L, Lopez de la Paz M: The amyloid stretch hypothesis: recruiting proteins toward the dark side. Proc Natl Acad Sci U S A 2005, 102(46):16672–16677. 10.1073/pnas.0505905102PubMed CentralView ArticlePubMedGoogle Scholar
- Kallberg Y, Gustafsson M, Persson B, Thyberg J, Johansson J: Prediction of amyloid fibril-forming proteins. J Biol Chem 2001, 276(16):12945–12950. 10.1074/jbc.M010402200View ArticlePubMedGoogle Scholar
- Bahadur RP, Chakrabarti P, Rodier F, Janin J: Dissecting subunit interfaces in homodimeric proteins. Proteins 2003, 53(3):708–719. 10.1002/prot.10461View ArticlePubMedGoogle Scholar
- Chakrabarti P, Janin J: Dissecting protein-protein recognition sites. Proteins 2002, 47(3):334–343. 10.1002/prot.10085View ArticlePubMedGoogle Scholar
- Brazma A, Jonassen I, Eidhammer I, Gilbert D: Approaches to the automatic discovery of patterns in biosequences. J Comput Biol 1998, 5(2):279–305.View ArticlePubMedGoogle Scholar
- Via A, Helmer-Citterich M: A structural study for the optimisation of functional motifs encoded in protein sequences. BMC Bioinformatics 2004, 5: 50. 10.1186/1471-2105-5-50PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.