ConSole: using modularity of Contact maps to locate Solenoid domains in protein structures
© Hrabe and Godzik; licensee BioMed Central Ltd. 2014
Received: 3 February 2014
Accepted: 17 April 2014
Published: 27 April 2014
Periodic proteins, characterized by the presence of multiple repeats of short motifs, form an interesting and seldom-studied group. Due to often extreme divergence in sequence, detection and analysis of such motifs is performed more reliably on the structural level. Yet, few algorithms have been developed for the detection and analysis of structures of periodic proteins.
ConSole recognizes modularity in protein contact maps, allowing for precise identification of repeats in solenoid protein structures, an important subgroup of periodic proteins. Tests on benchmarks show that ConSole has higher recognition accuracy as compared to Raphael, the only other publicly available solenoid structure detection tool. As a next step of ConSole analysis, we show how detection of solenoid repeats in structures can be used to improve sequence recognition of these motifs and to detect subtle irregularities of repeat lengths in three solenoid protein families.
The ConSole algorithm provides a fast and accurate tool to recognize solenoid protein structures as a whole and to identify individual solenoid repeat units from a structure. ConSole is available as a web-based, interactive server and is available for download at http://console.sanfordburnham.org.
Current estimates suggest that approximately 30% of human proteins contain multiple repeats of short motifs and could be classified as “periodic proteins” . In many cases, proteins with such motifs fold into three-dimensional structures resembling solenoids (Greek solen (pipe) eidos (form)) and thus are called solenoid or solenoid-like proteins. A well-known example of solenoid proteins are Leucine Rich Repeats (LRRs) present in the innate immunity or receptors (NLR or TLR, respectively) and in thousands of other proteins with various other functions and extremely variable consensus sequences . Other examples include Ankyrin repeats involved in various protein–protein interactions and Armadillo repeats that, together with other homologous classes, such as HEAT repeats, form helical solenoids and are found in proteins involved in cell adhesion [3, 4].
Solenoid proteins evolved by a series of duplications of an ancestral motif, but the precise order of duplications is often unknown and may differ between and sometimes even within families. Accumulated mutations, deletions, and insertions lead to increasing divergence between individual repeats. For many proteins, this divergence can be quite extreme with almost no sequence similarity between individual copies of the ancestral motif . As a result, solenoid repeats are often difficult to recognize in sequence, for instance Pfam Hidden Markov Models recognize less than half of the repeats in NLR and TLR proteins. Hence, automated detection of subtle motif variations from sequence is often impossible.
Because protein structures tend to be more conserved than sequences, similarity is retained on the structural level and recognition of the repeats is thus easier . Still, repeats have significant variations of length and shape, making the precise recognition of individual solenoid units highly nontrivial. For instance, in LRR proteins the length of the individual repeats varies between 18 and 34, and not a single position, including the leucines forming the telltale pattern, is universally conserved in all repeats. The local divergence of the motifs has consequences on the global-structure level. In LRR proteins, the curvature of the entire domain varies from an ideal curvature in Ribonuclease Inhibitors (RIs) or NLRs  to an irregular curvature of TLRs , with consequences for the binding properties within the inner cavity of the protein.
Detection of repeats in proteins, both on the sequence and structure level, has gained importance as structures of more proteins with solenoid repeats have become known. Almost simultaneously, the sensitivity of sequence-based recognition has improved. Both these trends resulted in better appreciation of the relative number of proteins with repeats and the importance of the detection problem. Various detection algorithms of repeated motifs in protein sequences have been developed, with Gibbs sampling  and RADAR  as some of the first, and many others have followed [11, 12]. Some of them are focused specifically on solenoid repeats in which Fourier-based analysis seems to produce the best results [13, 14].
To the best of our knowledge, only four detectors of repetitive units in protein structures have been described in literature: (i) DAVROS  was probably the first method for this task, with detection based on a self-alignment matrix, (ii) ProStrip  performs repeat detection based on the Cα backbone angles, (iii) Raphael  is specifically devoted to the detection of solenoid repeats and is based on repeated Fourier analysis of Cα coordinates with appended machine-learning classification, and (iv) a hierarchical approach based on successive bisection of the structure into tiles for self-alignment . Raphael is the best solenoid classifier to date as it significantly exceeds solenoid detection performance of sequence-based methods, while the hierarchical structural analysis from  is the most versatile approach to detect all possible types of structural repeats.
Here we present ConSole, a new method to determine the presence and specific positions of individual solenoid repeat units within protein structures. Template matching, a popular image-processing procedure, applied to contact maps determines whether individual residues are part of a solenoid domain or part of a non-solenoid segment. This approach is further generalized to classify whether a whole protein structure under scrutiny is solenoid or non-solenoid. ConSole is assessed on a benchmark dataset and directly compared to Raphael, the only publicly available solenoid detection algorithm. We furthermore demonstrate how accurate detection and subsequent structural alignment of solenoid units can be used to automatically retrieve the solenoid sequence motif from structure. Finally, as an example of a large-scale analysis enabled by the development of the ConSole algorithm, we analyze the length distribution of solenoid units in a large number of solenoid-like protein structures to automatically detect subtle variations of solenoid units in three solenoid protein families.
Pattern of solenoid units in contact maps
Contact maps (CMs) provide a simple but powerful means for protein structure comparison and alignment [18, 19], prediction [20, 21] and visualization of protein structural features . Here we show how CMs can be used to identify solenoid proteins and to calculate lengths of individual units, even for very divergent repeats.
Argmax returns the argument with the maximum value of a function. Sampling the complete CM to obtain λ is not required since λ is expected to be in the interval [6; 60] of potentially contacting residues. These boundaries are based on the fact that contacts shorter than 6 residues are within the α-helical contact range and that solenoid unit lengths beyond 60 residues are virtually nonexistent . Lengths of solenoid repeats are typically in the [12; 45] interval. Repeat lengths λ i of individual solenoids unit spanning over a short segment [i; i + λ i ] can also be calculated when the detection in equation 2 is confined to [i; i + λ].
Rule-based classification of solenoids vs. non-solenoids
Then, we reassess each solenoid unit by determining the individual unit length λ i with equation 2. We annotate a solenoid unit starting at i and ending at i + λ i if solenoid(i) is true. The algorithm then continues at i = i + λ i + 1. If solenoid(i) is false, however, we continue either at i = i + 1 or at the end of a gap.
Template matching and SVM-based classification of solenoids vs. non-solenoids
The core algorithm implemented in ConSole is based on image-processing methods to detect solenoids and non-solenoid regions in protein structures. For this, we apply a template-matching algorithm to the contact map and classify the resulting scores with a trained Support Vector Machine (SVM).
Template matching in a contact map
where is the mean and σ S the standard deviation of is the mean and the standard deviation of a region around x, y in I with the same size as S. NCC returns a matrix containing correlation coefficients in the range of [−1; 1]. A result of 1 indicates a perfect match, 0 indicates no similarity, and −1 indicates inverse similarity .
Both patterns are generated dynamically at runtime. The pattern size in the x and y dimensions are set to 2λ in order to accommodate d 2 fully in the solenoid pattern. This way, both patterns used in the analysis are adapted to the specific solenoid length of the given structure.
Support vector classification of correlation features
The Support Vector Machine is a machine-learning method used for supervised classification in many computational disciplines . It is especially renowned for being able to classify multidimensional data while maintaining a low error rate based on its maximum margin hyper-plane determined during training.
In ConSole, we make use of the SVM to assign residues to solenoid or non-solenoid classes according to their previously determined correlation coefficients. We therefore collect correlation coefficients around the main diagonal from the NCC results as shown in Figure 2. Feature vectors are generated by concatenating 20 correlation coefficients from M 1 with 20 correlation coefficients from M 2 (Figure 2). All coefficients were extracted from their respective matrices at the positions [(i,i – 10);(i,i + 10)]. Visual inspection of the feature vectors clearly indicated significant differences between both correlation features for the same CM regions. We observed smooth correlation peaks for solenoid segments while correlation features of globular proteins had rather noisy shapes. Conclusively, the shape of the correlation peaks provides a characteristic feature for automated classification. Class labels were available from the corresponding benchmark annotation and assigned to each feature vector prior to SVM training.
Final classification of structures
Setting the t value to 0.5 provided the best agreement with benchmark results. A detailed assessment of different t values is presented in the Additional file 1.
Detection of solenoid sequence-motifs by solenoid unit alignment
We extended the solenoid detection algorithm with an automated feature to recognize individual solenoid motifs. It is based on the local λ i value where units include all residues with the indexes in [i; i + λ i -1]. We extend the usage of equation 6 to measure the quality of each solenoid unit and accept units as being solenoids only if their solenoid abundance solenoid([i; i + λ i -1]) is larger than 0.75. If solenoid([i; i + λ i -1]) < 0.75 we continue with the next residue i + 1. This condition prevents beginnings or ends of non-solenoid regions from contributing to the motif detection.
In order to improve identification of consensus motifs, we perform structural alignment of all units using rigid alignment in the FATCAT  and POSA  pipeline. We extract the common core determined by POSA to build a sequence alignment from the respective common core overlaps . Finally, we use Weblogo to visualize the consensus motif  for the repeat.
Solenoid benchmark data
We used a previously published test dataset to assess the accuracy of ConSole. This dataset was originally established for testing sequence repeat detectors  and has since been used as a benchmark for both sequence  and structure based repeat detectors .
The benchmark comprises 105 solenoid structures for which λ, solenoid and non-solenoid residues, have been manually annotated. A total of 247 non-solenoid structures were also included in this dataset to provide a large variety of non-solenoid samples. The dataset contains 80,347 residues in total, out of which 19,197 were annotated as being part of solenoid repeats.
All the algorithms described here were implemented in Python, utilizing additional packages such as Biopython  for accessing PDB files, PyTom  for correlation functions and parallel processing on multiple CPUs, and Scikit  to interface with the machine-learning algorithms. The algorithm used on the server is also available for download from the server page http://console.sanfordburnham.org. Residue classification results are available in XML format containing solenoid unit boundaries for further analysis.
Results and discussion
Figures of merit
Solenoid unit length estimates
The fidelity of our solenoid detector was tested on each structure from the benchmark dataset. Each automatically detected λ was compared to the manually annotated value. The accuracy was determined to a mean standard deviation of 2.6 residues. The original annotators postulate that an error tolerance of up to 5 residues is acceptable for structural solenoid detectors , so the accuracy of our method is well within the tolerance level.
Assessing automated solenoid classification
First, classification of residues to the solenoid or non-solenoid class was assessed for a random classifier. The underlying random distribution was adjusted to the distribution in the benchmark annotation of all residues, resulting in a distribution such that ~23% of all residues were annotated as solenoids while the remaining ~77% were non-solenoid residues.
A total of 80,347 random draws from this distribution were used to calculate the baseline performance for both of our classifiers. In the random assignments, an average of ~63% of all residues were assigned correctly while ~37% were false assignments. More precisely, 4,515 solenoids and 46,695 non-solenoids were predicted correctly. The MCC of the random residue classification was ~0.008. Extending this random assignment with equation 6 and t = 0.33 or t = 0.5 to the level of whole structures failed to detect any solenoid structure correctly.
Benchmark results of various solenoid classifiers
Raphael S > 0
Raphael S > 1
In order to compare ConSole classification to other methods, we generalized classification to entire protein structures based on equation 6. Results of this generalized classification are also presented in Table 1, and the Matthews correlation coefficient was determined here to be 0.91. Based on the results published for Raphael, the MCC was determined to be 0.87 for SVM value > 0 and 0.89 for SVM value > 1.
Additional file 2: Figure S1 in the presents the ROC curve of whole-structure classification and provides an additional means for direct comparison to Raphael.
Solenoid consensus motif from unit alignments
Detecting solenoid motifs in sequence is difficult because (i) the length of a solenoid repeat is typically short, increasing the signal-to-noise ratio as compared to the typical domains and full-length proteins, and (ii) sequence similarity may be too weak for detection of very divergent repeats. Pfam profiles for solenoid families such as LRR, Armadillo, or Ankyrin try to address these problems by defining HMMs consisting of several repeat units for divergent family members. For instance, the LRR profile for LRR1 (PF00560) has a length of 22 residues that is in accordance with the primary repeat interval [12–45]—class 3 in . However, the HMM for LRR5 (PF13306) has a length of 129 residues, encompassing approximately five repeats of the actual motif. This approach is used for other solenoid families: Ankyrin HMMs: PF00023—33 residues and PF12796—89 residues (3× motif repeat), Armadillo/HEAT: PF02985—31 residues and PF13646—88 residues (3× motif repeat), and others. While improving the recognition sensitivity, this approach is inconsistent and leads to confusing results, where simultaneous high-significance matches to overlapping HMMs of different lengths are possible.
Unknown solenoid structures in the PDB
Many protein coordinate sets in the PDB are not described in a peer-reviewed manuscript and also often lack any significant annotations. To identify such proteins, we parsed all PDB headers for the keywords “JRNL REF TO BE PUBLISHED,” which resulted in a large set (16,114) of structures. In the next step, we applied ConSole to identify novel, perhaps unrecognized solenoid protein structures within this set. Indeed, 132 structures from this set were classified as solenoids.
Next, the sequence similarity of each protein against the complete collection of PDB proteins was determined. Here we ruled out homologs of proteins that have already been annotated as solenoids. The search for already known solenoid homologs was furthermore extended to the Pfam database, eliminating proteins mapping to known solenoid proteins such as Ankyrin (Pfam: 00023), Armadillo (Pfam: 00514), or Leucine Rich Repeat (Pfam: 00022).
Nineteen solenoid structures remained after these steps. Many of them were TIM barrels, identified here as solenoids because the torus-like structure also produces the second diagonal feature in contact maps. Hence, they are sometimes referred to as “closed” solenoids [1, 36].
Another unrecognized solenoid protein structure was a hypothetical protein from B. thetaiotaomicron (Uniprot: A7LZL0, PDB: 3N6Z). Interestingly, a domain homologous to this protein is found in one of the classes of immunoglobulin A1 proteases, where it overlaps with an N-terminal immunoglobulin A1 protease domain. This domain was not known to consist of repeats, but detailed analysis of the automatically identified repeats performed as described in the previous paragraph suggests that repeats in this domain are distantly related to GLUG repeats. GLUG is found in other classes of immunoglobulin A1 proteases, suggesting that the different classes could actually be distantly homologous.
Analysis of solenoid unit length distributions in solenoid families
Solenoid-like protein structures, by their very nature, generally show a high degree of structural regularity. However, subtle variations at the level of individual solenoid units are possible, with accumulated mutations, deletions or insertions altering the length and shape of individual units. Such small local irregularities can add up to very significant structural differences between entire proteins and are important for functional adaptations of individual proteins.
Reliable and reproducible detection of such subtle irregularities in unit lengths for whole protein families is impossible by manual analysis. Hence, we used ConSole to automatically analyze the Leucine rich repeat, Ankyrin repeat and Armadillo repeat families for length irregularities of solenoid units. The structures were obtained from a representative set of PDB structures clustered at 90% sequence identity, a total of 140 structures (396 chains) for the LRR family, 107 structures (281 chains) for the Ankyrin and 37 structures (100 chains) for the Armadillo repeats.
We show that Ankyrin repeats are the most regular among the three families we analyzed here, with no deviation from λ for approximately 75% of all solenoid units (Figure 7). On the other hand, the Armadillo repeats turns out to be the most irregular with 23% of all solenoid units being at least two residues off from the average length λ.
In this work we present ConSole, an algorithm based on a novel combination of contact map analysis and image-processing algorithms that focuses on recognition of solenoid repeats in structures of periodic proteins.
Contact maps are naturally suited for solenoid recognition because of the presence of a characteristic line parallel to the main diagonal in the contact map. Albeit being the most intuitive approach for solenoid unit detection, direct analysis of contacts did not provide the desired accuracy of repeat recognition.
To improve the recognition, we used a standard technique of template matching in image analysis based on successive cross correlations of dynamically generated solenoid and non-solenoid patterns. The further classification of the computed correlation coefficients with a support vector machine allowed high-accuracy solenoid classification as measured by the MCC on the standard solenoid recognition benchmark.
ConSole is both more accurate and much faster than any available solenoid classifier. However, there are still a few examples of solenoid protein structures that pose challenges for the current implementation. Most notable are protein structures with non-solenoid segments running close to the solenoid domain. Such non-solenoid segments alter the contact patterns in a way that leads ConSole to classify neighboring solenoid residues as non-solenoids. An example of such a structure is the structure of gamma carbonic anhydrase (1QRL). Another factor for false classification results were false-negative classifications of complete solenoid units encapsulating long insertions (4ECO). While the insertion segment was classified correctly as a non-solenoid, residues in solenoid units in contact with the insertion were wrongly classified as non-solenoids.
One interesting application of ConSole is to analyze individual solenoid units and retrieve their consensus motifs from structural alignments. As we demonstrated, this application is robust enough to be integrated in a completely automated pipeline. We proved that separation of individual solenoid units and subsequent multiple structure alignment reliably detects solenoid specific motifs. Consensus motifs stemming from distinctive solenoid families were retrieved successfully for individual structures and indicate that current Pfam HMMs for solenoids were trained using sequences that were too long.
Finally, we extended ConSole analysis from individual structures to large groups of proteins in order to analyze the extent of structural irregularities within each family. Such local irregularities are correlated with function differences between homologs from the same family, such as a difference between Ribonuclease Inhibitor-like, regular and TLR receptors, the irregular members of the LRR family. We were also able to compare the irregularity patterns and show that Ankyrin structures generally are more regular than LRRs and Armadillo repeats.
Thus, we believe that ConSole would be useful for further sequence- or structure-based analysis of solenoid proteins as it allows the user to reliably identify consensus motifs and to detect structural irregularities, leading either to developing more accurate motif definitions or to structure analysis of individual units and detecting their variations.
Availability of supporting data
The data sets supporting the results of this article are available in online on http://console.sanfordburnham.org in XML format and 3D visualization.
The authors would like to thank Dr. Lukasz Jaroszewski for valuable discussions and Cindy Cook for help with editing. The project was supported by NIH R01 GM101457.
- Kajava AV: Tandem repeats in proteins: from sequence to structure. J Struct Biol. 2012, 179: 279-288. 10.1016/j.jsb.2011.08.009.View ArticlePubMedGoogle Scholar
- Kobe B, Kajava AV: The leucine-rich repeat as a protein recognition motif. Curr Opin Struct Biol. 2001, 11: 725-732. 10.1016/S0959-440X(01)00266-4.View ArticlePubMedGoogle Scholar
- Sedgwick SG, Smerdon SJ: The ankyrin repeat: a diversity of interactions on a common structural framework. Trends Biochem Sci. 1999, 24: 311-316. 10.1016/S0968-0004(99)01426-7.View ArticlePubMedGoogle Scholar
- Tewari R, Bailes E, Bunting K a, Coates JC: Armadillo-repeat protein functions: questions for little creatures. Trends Cell Biol. 2010, 20: 470-481. 10.1016/j.tcb.2010.05.003.View ArticlePubMedGoogle Scholar
- Kobe B, Kajava AV: When protein folding is simplified to protein coiling: the continuum of solenoid protein structures. Trends Biochem Sci. 2000, 25: 509-515. 10.1016/S0968-0004(00)01667-4.View ArticlePubMedGoogle Scholar
- Walsh I, Sirocco FG, Minervini G, Di Domenico T, Ferrari C, Tosatto SCE: RAPHAEL: recognition, periodicity and insertion assignment of solenoid protein structures. Bioinformatics. 2012, 28: 3257-3264. 10.1093/bioinformatics/bts550.View ArticlePubMedGoogle Scholar
- Proell M, Riedl SJ, Fritz JH, Rojas AM, Schwarzenbacher R: The Nod-like receptor (NLR) family: a tale of similarities and differences. PLoS One. 2008, 3: e2119-10.1371/journal.pone.0002119.View ArticlePubMed CentralPubMedGoogle Scholar
- Kawai T, Akira S: Toll-like receptors and their crosstalk with other innate receptors in infection and immunity. Immunity. 2011, 34: 637-650. 10.1016/j.immuni.2011.05.006.View ArticlePubMedGoogle Scholar
- Neuwald AF, Liu JS, Lawrence CE: Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci. 1995, 4: 1618-1632. 10.1002/pro.5560040820.View ArticlePubMed CentralPubMedGoogle Scholar
- Heger A, Holm L: Rapid automatic detection and alignment of repeats in protein sequences. Proteins. 2000, 41: 224-237. 10.1002/1097-0134(20001101)41:2<224::AID-PROT70>3.0.CO;2-Z.View ArticlePubMedGoogle Scholar
- Biegert A, Söding J: De novo identification of highly diverged protein repeats by probabilistic consistency. Bioinformatics. 2008, 24: 807-814. 10.1093/bioinformatics/btn039.View ArticlePubMedGoogle Scholar
- Newman AM, Cooper JB: XSTREAM: a practical algorithm for identification and architecture modeling of tandem repeats in protein sequences. BMC Bioinforma. 2007, 8: 382-10.1186/1471-2105-8-382.View ArticleGoogle Scholar
- Marsella L, Sirocco F, Trovato A, Seno F, Tosatto SCE: REPETITA: detection and discrimination of the periodicity of protein solenoid repeats by discrete Fourier transform. Bioinformatics. 2009, 25: i289-i295. 10.1093/bioinformatics/btp232.View ArticlePubMed CentralPubMedGoogle Scholar
- Vo A, Nguyen N, Huang H: Solenoid and non-solenoid protein recognition using stationary wavelet packet transform. Bioinformatics. 2010, 26: i467-i473. 10.1093/bioinformatics/btq371.View ArticlePubMed CentralPubMedGoogle Scholar
- Murray KB, Taylor WR, Thornton JM: Toward the detection and validation of repeats in protein structure. Proteins. 2004, 57: 365-380. 10.1002/prot.20202.View ArticlePubMedGoogle Scholar
- Sabarinathan R, Basu R, Sekar K: ProSTRIP: a method to find similar structural repeats in three-dimensional protein structures. Comput Biol Chem. 2010, 34: 126-130. 10.1016/j.compbiolchem.2010.03.006.View ArticlePubMedGoogle Scholar
- Parra R, Espada R, Sánchez I: Detecting repetitions and periodicities in proteins by tiling the structural space. J Phys Chem B. 2013, 117: 12887-12897. 10.1021/jp402105j.View ArticlePubMed CentralPubMedGoogle Scholar
- Holm L, Sander C: Mapping the protein universe. Science. 1996, 273: 595-603. 10.1126/science.273.5275.595.View ArticlePubMedGoogle Scholar
- Holm L, Sander C: Protein structure comparison by alignment of distance matrices. J Mol Biol. 1993, 233: 123-138. 10.1006/jmbi.1993.1489.View ArticlePubMedGoogle Scholar
- Fariselli P, Olmea O: Progress in predicting inter-residue contacts of proteins with neural networks and correlated mutations. Proteins. 2001, 162: 157-162.View ArticleGoogle Scholar
- Bartoli L, Capriotti E, Fariselli P, Martelli PL, Casadio R: The pros and cons of predicting protein contact maps. Methods Mol Biol. 2008, 413: 199-217.PubMedGoogle Scholar
- Vehlow C, Stehr H, Winkelmann M, Duarte JM, Petzold L, Dinse J, Lappe M: CMView: interactive contact map visualization and analysis. Bioinformatics. 2011, 27: 1573-1574. 10.1093/bioinformatics/btr163.View ArticlePubMedGoogle Scholar
- Godzik A, Skolnick J, Kolinski A: Regularities in interaction patterns of globular proteins. Protein Eng. 1993, 6: 801-810. 10.1093/protein/6.8.801.View ArticlePubMedGoogle Scholar
- Kumar BVKV, Mahalanobis A, Juday RD: Correlation Pattern Recognition. 2006, Cambridge: Cambridge University Press, http://www.cambridge.org/us/academic/subjects/engineering/image-processing-and-machine-vision/correlation-pattern-recognition?format=HB,Google Scholar
- Boser B, Guyon I, Vapnik V: A Training Algorithm for Optimal Margin Classifiers. Proc. of the 5th Ann. ACM Workshop on Comp. Learning Theory. 1992, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.3818,Google Scholar
- Ye Y, Godzik A: Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics. 2003, 19 (Suppl 2): 246-255.View ArticleGoogle Scholar
- Ye Y, Godzik A: Multiple flexible structure alignment using partial order graphs. Bioinformatics. 2005, 21: 2362-2369. 10.1093/bioinformatics/bti353.View ArticlePubMedGoogle Scholar
- Altman RB, Gerstein M: Finding an average core structure: application to the globins. Proc Int Conf Intell Syst Mol Biol. 1994, 2: 19-27.PubMedGoogle Scholar
- Crooks G, Hon G: WebLogo: a sequence logo generator. Genome Re. 2004, 14: 1188-1190. 10.1101/gr.849004.View ArticleGoogle Scholar
- Cock PJ a, Antao T, Chang JT, Chapman B a, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJL: Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009, 25: 1422-1423. 10.1093/bioinformatics/btp163.View ArticlePubMed CentralPubMedGoogle Scholar
- Hrabe T, Chen Y, Pfeffer S, Cuellar LK, Mangold A-V, Förster F: PyTom: a python-based toolbox for localization of macromolecules in cryo-electron tomograms and subtomogram analysis. J Struct Biol. 2012, 178: 177-188. 10.1016/j.jsb.2011.12.003.View ArticlePubMedGoogle Scholar
- Pedregosa F, Varoquaux G: Scikit-learn: machine learning in python. J Mach Learn Res. 2011, 12: 2825-2830.Google Scholar
- Baldi P, Brunak S, Chauvin Y: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics. 2000, 16: 412-424. 10.1093/bioinformatics/16.5.412.View ArticlePubMedGoogle Scholar
- Kajava AV: Review: proteins with repeated sequence structural prediction and modeling. J Struct Biol. 2001, 134: 132-144. 10.1006/jsbi.2000.4328.View ArticlePubMedGoogle Scholar
- Bella J, Hindle KL, McEwan PA, Lovell SC: The leucine-rich repeat structure. Cell Mol Life Sci. 2008, 65: 2307-2333. 10.1007/s00018-008-8019-0.View ArticlePubMedGoogle Scholar
- Alvarez M: Triose-phosphate Isomerase (TIM) of the Psychrophilic Bacterium Vibrio marinus. Kinetic and structural properties. J Biol Chem. 1998, 273: 2199-2206. 10.1074/jbc.273.4.2199.View ArticlePubMedGoogle Scholar
- Medzhitov R: Toll-like receptors and innate immunity. Nat Rev Immunol. 2001, 1: 135-145. 10.1038/35100529.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.