- Methodology article
- Open Access
Identifying tandem Ankyrin repeats in protein structures
© Chakrabarty and Parekh; licensee BioMed Central. 2014
Received: 9 July 2014
Accepted: 18 December 2014
Published: 30 December 2014
Tandem repetition of structural motifs in proteins is frequently observed across all forms of life. Topology of repeating unit and its frequency of occurrence are associated to a wide range of structural and functional roles in diverse proteins, and defects in repeat proteins have been associated with a number of diseases. It is thus desirable to accurately identify specific repeat type and its copy number. Weak evolutionary constraints on repeat units and insertions/deletions between them make their identification difficult at the sequence level and structure based approaches are desired. The proposed graph spectral approach is based on protein structure represented as a graph for detecting one of the most frequently observed structural repeats, Ankyrin repeat.
It has been shown in a large number of studies that 3-dimensional topology of a protein structure is well captured by a graph, making it possible to analyze a complex protein structure as a mathematical entity. In this study we show that eigen spectra profile of a protein structure graph exhibits a unique repetitive profile for contiguous repeating units enabling the detection of the repeat region and the repeat type. The proposed approach uses a non-redundant set of 58 Ankyrin proteins to define rules for the detection of Ankyrin repeat motifs. It is evaluated on a set of 370 proteins comprising 125 known Ankyrin proteins and remaining non-solenoid proteins and the prediction compared with UniProt annotation, sequence-based approach, RADAR, and structure-based approach, ConSole. To show the efficacy of the approach, we analyzed the complete PDB structural database and identified 641 previously unrecognized Ankyrin repeat proteins. We observe a unique eigen spectra profile for different repeat types and show that the method can be easily extended to detect other repeat types. It is implemented as a web server, AnkPred. It is freely available at ‘bioinf.iiit.ac.in/AnkPred’.
AnkPred provides an elegant and computationally efficient graph-based approach for detecting Ankyrin structural repeats in proteins. By analyzing the eigen spectra of the protein structure graph and secondary structure information, characteristic features of a known repeat family are identified. This method is especially useful in correctly identifying new members of a repeat family.
In this study we address an important pattern recognition problem in protein structures, viz. the identification of structural tandem repeats. Repeats are ubiquitous in protein sequences and vary considerably from single amino acid repetitions, e.g., runs of glutamine in the protein huntingtin, to large globular domains of size 100 or more residues that fold independently . The structural classification of repetitive proteins based on the repeat lengths and the possible 3-dimensional structure of these proteins has been given by Kajava (2001) . Repeats of intermediate length of 20–50 amino acids are most commonly observed in proteins to form integrated assemblies providing multiple binding sites. This class of non-globular repeat proteins form various 3-dimensional folds, viz., spirals, solenoids, closed structures, etc. . Examples of such repeats include leucine-rich repeat (LRR), Ankyrin repeat (ANK), armadillo (ARM) /HEAT repeat, tetratricopeptide repeat (TPRs), Kelch repeat, etc.
Earlier approaches for the identification of repeats in protein sequences range from methods based on Fourier analysis of amino acid sequences -, to short-string searches ,, sequence-alignment based approaches -, and HMM-profile based methods ,. The Fourier transform methods fail in the presence of insertions between the repeating units, while the performance of the alignment-based methods fail when there is weak, non-detectable similarity between the repeating units. A comprehensive review of the sequence-based approaches for the detection of protein repeats is given by Kajava (2012) . As each copy of a repeat accumulates independent uncoordinated mutations over the evolution, weak similarities between repeated copies make their identification difficult and in certain cases non-detectable by the sequence-based approaches. The low similarities imply that the functional constraints on individual repeats are relatively weak when compared to the constraints imposed on the repeat assembly as a whole. Since proteins are more conserved at the structure level, their identification at the structural level is desirable. Over the last decade various structure based methods have been proposed to predict internal repeats in protein structures. The earlier approaches, DAVROS  and OPASS , based on structure-structure alignment of protein to itself, are computationally very intensive. Both these algorithms are no longer maintained, nor is the Propeat Database  constructed using OPASS algorithm. Swelfe  and ProSTRIP  methods are based on the self-alignment of the sequence of α characters (the backbone dihedral angles) using dynamic programming for detecting internal structural repeats. These methods, based on the periodicity of dihedral angles (as a result of repetition of secondary structure elements), fail in case of large insertions/deletions. The Internal Repeat Identification System (IRIS) performs both sequence-based and structure-based approach for identifying internal repeats. When structural information for a protein is available, it verifies the sequence-based prediction by structure-comparison within itself, or with their benchmark database of internal repeat units . Another recent method based on the distribution of the structural alignment of continuous fragments is given by Parra et al. . RAPHAEL, based on Fourier analysis of Cα coordinates in combination with support vector machine (SVM), tries to mimic visual interpretation of a manual expert and classifies a protein into solenoid/non-solenoid class . A large number of novel solenoid repeat proteins have been identified by this approach. With a large number of protein structures now available, reliable automatic methods of analyses are required. Algorithms from computer science have been widely used for identifying biological patterns and concepts from graph theory have been promising. ConSole  is one such recent method which transforms the protein structure into a network and implements a rule based machine learning technique to identify solenoid repeats in proteins. Its performance on benchmark datasets is shown by the authors to be better than RAPHAEL. Here we propose a computationally efficient algorithm for the identification of an important class of protein repeat family, viz., Ankyrin (ANK) repeats, based on graph theory and secondary structure architecture of the repeating unit. The proposed approach is observed to perform better compared to ConSole especially in identifying the terminal repeats on the dataset of known ANK repeat proteins. The complete Protein Data Bank (PDB) structures are also analyzed to identify previously unrecognized ANK repeat proteins.
The particular architecture of proteins containing repeats make them attractive targets for protein engineering, being involved in innumerable biological processes as binding molecules, viz., cell-cycle regulation, transcriptional regulation, cell differentiation, apoptosis, plant defence or bacterial invasion . The design and engineering of repeat proteins may help to elucidate their structural and biophysical properties, such as the dependence of stability and folding on the number of repeats, as well as the importance of key intra- and inter-repeat interactions and hence considerable effort is being made in this direction. This can have important biotechnological or medical applications. So far, consensus based on multiple sequence alignment of homologous proteins has been used for the design of repeat proteins. However, because of low sequence similarity among repeating motifs, and their occurrence in non-homologous proteins, identifying repeats at the structural level and building the consensus on structure-based multiple alignment of repeating units will be more reliable. And identification of repeats at the structural level forms the first step in this direction.
The design and engineering of a number of repeat proteins has been carried out, e.g., ANK (called DARPins), TPR and LRR. In this paper we present in detail the training and performance of the proposed approach on ANK repeat proteins and then show that it can be extended for the detection of other repeats proteins, TPR, LRR, ARM/HEAT and Kelch. Below we briefly discuss the feature and function of ANK and other repeats.
The analysis of the proposed graph based approach is also discussed for other repeat types, such as Tetratricopeptide repeat (TPR), Armadillo (ARM), Leucine-rich repeat (LRR) and Kelch. The example repeats have been chosen to represent different structural classes α, β and α/β, with some, such as TPR very similar to the ANK motif, while LRR and Kelch that have very different repeat lengths and architecture compared to ANK motif. A TPR motif is 34 residues long α-domain motif with two anti-parallel α helices forming the helix-turn-helix core of the repeat and the contiguous repeating units stack to form a super-helical structure . ARM is another example of α-domain repeat, typically 42 residues with each repeat unit comprising three helices and the contiguous repeats form super-helical structure . Kelch repeat forms a β propeller structure with 44–56 residues constituting 4–6 anti-parallel β strands in each repeat unit . A typical LRR repeat unit with an anti-parallel helix-strand motif 24 residues long is an example of α/β domain motif and contiguous copies form a horse-shoe like structure . Typically a repeat protein contains 4–8 copies of repeat, but higher copies are also observed depending on the function of the protein. The arrangement of repeating units in each of the repeat types provides structural stability and forms binding pockets for a wide range of protein-protein interactions.
where, X is the eigenvector of adjacency matrix, A, with eigenvalue λ. Thus x i , which depends on both the number and quality of connections, is proportional to the average of the centrality of its adjacent neighbours and is called the eigenvector centrality of the graph. It assigns relative scores to all the nodes in the network based on the principle that connections to high scoring nodes contribute more. In our earlier work, we carried out a comparative analysis of various graph centrality measures to identify tandemly repeated structural motifs. We observed that the principal eigenvector of the adjacency matrix was able to reliably capture the repetitive pattern of the structural units . Below we present our proposed approach based on this eigenvector centrality.
In this study, a total of 125 Ankyrin repeat proteins were manually collected from Pfam , PROSITE (release 20.103)  and UniProt (release 2014_05) databases , and 5 designed Ankyrin proteins from the SCOP database (release 1.75) . This is a redundant set with one or more structures corresponding to the same UniProt sequence. From this set, the training dataset of 58 proteins was constructed by considering the structure with highest resolution and maximum sequence coverage from each cluster of proteins corresponding to unique UniProt entries. For testing the performance of the algorithm, a total set of 370 proteins was taken comprising 125 known ANK repeat proteins (positive test set) and 245 non-solenoid proteins (negative test set), the benchmark dataset used by ConSole (http://console.sanfordburnham.org/GT/index.html). The complete set of 98,341 protein structures from Protein Data Bank (as of June 17, 2014)  was downloaded for detecting new Ankyrin repeat proteins.
Characteristic features of Ankyrin repeat proteins
Dist. between peak positions
The two helices are anti-parallel. Since the lengths of the two helices in ANK motif are comparable, a simple distance measure is used to confirm that the two helices are anti-parallel; S 1: start of H 1; E 1, E 2 : end of H 1, H 2.
In the A levc profile, , where , : average magnitude of A levc in H 1, H 2. This condition confirms that the first helix, H 1, of the ANK motif is buried inside while the second helix, H 2, lies on the outer surface.
Distance between the peak positions of the two helices in the A levc profile ranges between 5 – 15 residues, to accommodate insertions/deletions in the individual secondary structure elements (Table 1).
Length of helix-turn-helix core is at least 13, as each helix is at least 4–6 residues and a coil in between of at least 2 residues. (Table 1).
Once the presence of an ANK motif is predicted by the above criteria, we define its start-end boundaries as follows. In earlier studies on constructing the consensus for designed ANK repeat proteins, the first helix is considered to start at the 5th residue of the Ankyrin repeat . We observed this to be true in 78% of ANK motifs annotated in the UniProt database (of the remaining 22%, ~53% are terminal repeats, which are generally incomplete). Hence, we define “- 4” position from the start of the 1st helix (based on STRIDE annotation) as the start of the Ankyrin repeat. To define the end of the ANK motif, we look for a beta-turn from the end of the 2nd helix to the start of the next repeat or 15 residues (whichever is lower). The position in the turn having the lowest A levc value is considered as the end of the repeat region. In some cases the terminal repeat is at the end of the protein chain and has no β turn, in such cases the end of the protein is taken as the end of the terminal repeat. Finally, if at least two contiguous ANK repeats are identified within a threshold distance (≤17, half of a typical ANK motif), tandem ANK repeat region is reported.
We have developed a web server for the identification of tandem repeats in protein structures by implementing the above algorithm. Python scripts are developed to automate the entire process involving construction of protein contact network, computing the eigen spectra , and obtaining secondary structure assignment using STRIDE. An implementation of the algorithm with a simple graphical output is deployed as a Web server for detecting ANK repeats, named as Ankyrin Repeat Identification by Graph Spectral Analysis (AnkPred) and is freely accessible at: http://bioinf.iiit.ac.in/AnkPred/. A user can enter a PDB Id or upload a structure in PDB format as input and identify ANK repeats by choosing the chain (default A). The output of the tool gives the number of predicted repeat copies, the coordinates of the repeat units and the 3-dimensional structure of the protein with each repeat unit highlighted in a different color, the non-repeat region being in grey. The overall complexity of the algorithm is O(2|n| + |e| + |h|), where n is the number of nodes (C α atoms) in the network, e the number of edges in the network and h the total number of helices in the protein structure. The algorithm is computationally very efficient and the time taken to analyze the dataset of 370 proteins on a Intel(R) Core™ i5 processor with 4GB RAM is ~ 6 and ½ minutes.
Results and discussion
Prediction of repeat regions for a representative set of 15 proteins compared with UniProt annotation, RADAR and ConSole output
3-35, 36–68, 69-92
3-33, 34–67, 68-93
1-33, 34–66, 67–99, 100-125
1-33, 34–66, 67–100, 101-126
Protein phosphatase 1 regulatory subunit 12A
39-68, 72–101, 105–134, 138–164, 198–227, 231-260
47-73, 77–106, 110–139, 203–232, 236-265
44-74, 75–105, 106–136, 137–167, 175–205, 206–236, 237-267
36-71, 72–104, 105–138, 198–231, 232-267
Mouse GABP α/βdomain
5-34, 37–66, 70–99, 103–132, 136-166
13-34, 40–67, 73–100, 106-133
16-47, 48–79, 80-111
5-36, 37–69, 70–103, 104–136, 137-157
TRPV6 Ankyrin repeat domain
44-74, 78–107, 116–145, 162–191, 195–236, 238-267
46-91, 93–129, 164-208
44-77, 78–114, 116–142, 162–194, 195–237, 238-265
Yeast Nas6p complex with proteasome subunit, rpt3
1-30, 35–64, 71–100, 106–135, 139–168, 173-203
5-37, 38–70, 74–106, 109–141, 142-175
13-44, 45–76, 77–108, 112–143, 144–175, 176-207
3-34, 35–70, 71–105, 106–138, 139–172, 173–207, 208-228
D34 of human Ankyrin-R
1 N11 (A)
403-432, 436–465, 469–498, 502–531, 535–564, 568–597, 601–630, 634–663, 667–696, 700–729, 733–762, 766-795
406-431, 438–458, 471–487, 504–530, 537–563, 570–596, 603–629, 636–662, 669–695, 702–728, 735-761
415-446, 447–478, 479–510, 511–542, 543–574, 575–606, 607–638, 639–670, 671–702, 703–734, 735–766, 767-802
405-432, 436–458, 469–487, 503–535, 536–567, 568–600, 601–633, 634–666, 667–699, 700–733, 734–766, 767-796
3-36, 37–69, 70–102, 103–135, 136–168, 169–201, 202-226
49-78, 82–111, 115–144, 148–177, 181-210
23-53, 54–84, 85–115, 116–146, 147–177, 178-208
5-38, 39–71, 72–105, 106–137, 138–170, 171–204, 205-226
926-957, 958–990, 991–1024, 1025-1067
Tumor Suppressor P15(INK4B)
5-34, 38–66, 71–100, 104-130
24-44, 56–86, 88-119
1927-1956, 1960–1990, 1994–2023, 2027–2056, 2060-2089
1884-1930, 1931–1963, 1964–1997, 1998–2030, 2031–2063, 2064-2096
1919-1950, 1951–1982, 1983–2014, 2015–2046, 2047–2078, 2079-2110
1928-1960, 1961–1994, 1995–2027, 2028–2060, 2061–2094, 2095-2122
S. Cerevisiae Swi6 Ankyrin repeat fragmen
318-346, 347–383, 384–469, 470–498, 499-514
316-323, 324–331, 470-477
Human Osteoclast Stimulating Factor
72-101, 105–135, 139-168
83-114, 115–147, 148-180
72-104, 105–138, 139-177
Ankyrin repeat domain of Huntingtin interacting protein 14
89-118, 123–155, 156–188, 189–219, 224-253
64-90, 94–123, 128–157, 161–190, 194–225, 229-252
70-101, 102–133, 134–165, 166–197, 198–229, 230-261
57-88, 89–122, 123–156, 157–189, 190–223, 224–257, 258-281
148-180, 181–213, 214–246, 247–279, 280-313
29-60, 61–92, 93–124, 125-156
149-180, 181–213, 214–246, 247–280, 281-310
Performance of the proposed algorithm
Performance of the proposed approach
a) Number of ANK proteins predicted on a dataset of 370 proteins (annotated in UniProt is 125)
b) Number of ANK motifs predicted in 125 known ANK repeat proteins (annotated in UniProt is 584)
where TP corresponds to number of correctly predicted known Ankyrin repeat proteins, FN – the number of known Ankyrin repeat proteins missed by our approach, FP – the number of proteins predicted by our approach as containing tandem ANK repeats but not annotated as Ankyrin protein, and TN – the number of proteins correctly predicted by our approach as non-Ankyrin proteins. As there were only three false negatives (FN), 1SW6, 2ETB and 3ZRH, and no false positives (FP), the sensitivity and specificity of the algorithm is very high (≃1).
where, TP is the number of ANK motifs correctly predicted by the method in known dataset of 125 proteins, FP is the number of ANK motifs predicted by the method but not annotated in the UniProt database, and FN is the number of annotated ANK motifs missed by the method. It may be observed that both the sensitivity and precision of the proposed approach, AnkPred, is ~ 0.88, reasonably good compared to that of ConSole (0.72 and 0.79) and RADAR (0.68 and 0.86) respectively. The terminal copies are known to have low sequence conservation, resulting in lower sensitivity of the RADAR method. We recognise that the sensitivity of our algorithm, with its dependence on the secondary structure assignment, might be further improved.
This is in very good agreement with the consensus ANK motif proposed by Kohl et al.  and Mosavi et al. . The conserved tetrapeptide motif TPLH at positions 4–7, Glycine at positions 2 and 13, and Leucine at positions 21–22 confirms the prediction accuracy of the repeat boundaries by the proposed approach.
Analysis on protein data bank
Functional analysis of previously unrecognized ankyrin proteins
Example proteins with binding sites in the predicted Ankyrin repeat region
Predicted Ankyrin region
Binding partner in predicted region obtained from PDBsum
285-319, 324–381, 397–444, 458-504
Kifunensine and Sulphate
Kifunensine, N-Acetyl-D-Glucosamine and Mannose
894-923, 929–959, 965-995
FAD and MHF
FAD, IMD and MES
We predicted Ankyrin repeats in two mannosidase protein structures, 1FO3 (human) and 1KRF (P. citrinum). Kifunensine (KIF) is the inhibitor of mannosidases and regulates the activity of these proteins. In PDBsum, the KIF binding sites for the proteins 1FO3 and 1KRF are annotated in the region predicted as Ankyrin repeat by our approach. This suggests novel interactions of these Ankyrin repeat proteins. Thus one could carry out a systematic analysis of other previously unrecognized Anyrin proteins to identify their interacting partners, leading to an understanding of their functional role.
Analysis of modelled ankyrin proteins
Analysis of other structural repeats
Most proteins with tandemly repeated structural motifs of 20–50 residues length interact with other proteins. The identification of these repeats can be informative in understanding the structure and function of these proteins. Here we show that the structural repeat motifs exhibit a specific pattern in the eigen spectra profile of the adjacency matrix of a protein structure represented as a graph. Thus graph-spectral analysis provides an efficient tool in the detection of repeats of different shapes and forms. Analysis of the results suggest that the proposed method discovers more Ankyrin repeat proteins and repeats per protein than existing sequence and structure based methods. The annotation of the complete repeat region has been improved in 53 proteins and 641 previously unrecognized Ankyrin repeat proteins have been identified by the proposed approach. However, the performance of the proposed approach is affected by the assignment of secondary structures by STRIDE/DSSP; only if a secondary structure element is completely missed; the prediction accuracy is not affected by small insertions/deletions within secondary structure elements.
In our preliminary analysis of some of the newly predicted Ankyrin proteins we observe that the reported binding sites lie in the predicted repeat regions providing support to our prediction. A systematic analysis of these proteins can lead to the understanding of their interacting partners and help towards the functional annotation of these proteins. We also show that the proposed approach can be successfully used on modelled proteins for identification of repeats and can help in improving the annotation. Since a large number of proteins do not have any structural information, and sequence-based repeat detection methods are limited by detectable similarity in the repeating units, this analysis greatly enhances the capability of the algorithm. As shown it is possible to easily extend and automate the proposed approach for the identification of other protein repeat families by considering information available on known repeat families.
The authors acknowledge the support of CSIR for funding this work (37(1468)/11/EMR-II).
- Andrade MA, Bork P: HEAT repeats in the Huntington’s disease protein. Nat Genet. 1995, 11: 115-116. 10.1038/ng1095-115.View ArticlePubMedGoogle Scholar
- Kajava AV: Review: proteins with repeated sequence - structural prediction and modeling. J Struct Biol. 2001, 134: 132-144. 10.1006/jsbi.2000.4328.View ArticlePubMedGoogle Scholar
- Kajava AV: Tandem repeats in proteins: from sequence to structure. J Struct Biol. 2012, 179: 279-288. 10.1016/j.jsb.2011.08.009.View ArticlePubMedGoogle Scholar
- McLachlan AD, Stewart M: The 14-fold periodicity in alpha-tropomyosin and the interaction with actin. J Mol Biol. 1976, 103: 271-298. 10.1016/0022-2836(76)90313-2.View ArticlePubMedGoogle Scholar
- Coward E, Drabløs F: Detecting periodic patterns in biological sequences. Bioinformatics. 1998, 14: 498-507. 10.1093/bioinformatics/14.6.498.View ArticlePubMedGoogle Scholar
- Gruber M, Söding J, Lupas AN: REPPER-repeats and their periodicities in fibrous proteins. Nucleic Acids Res 2005, 33(Web Server issue):W239–243.Google Scholar
- Marsella L, Sirocco F, Trovato A, Seno F, Tosatto SCE: REPETITA: detection and discrimination of the periodicity of protein solenoid repeats by discrete Fourier transform. Bioinformatics. 2009, 25: i289-295. 10.1093/bioinformatics/btp232.View ArticlePubMed CentralPubMedGoogle Scholar
- Newman AM, Cooper JB: XSTREAM: a practical algorithm for identification and architecture modeling of tandem repeats in protein sequences. BMC Bioinformatics 2007, 8:382.,Google Scholar
- Jorda J, Kajava AV: T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm. Bioinformatics. 2009, 25: 2632-2638. 10.1093/bioinformatics/btp482.View ArticlePubMedGoogle Scholar
- Pellegrini M, Marcotte EM, Yeates TO: A fast algorithm for genome-wide analysis of proteins with repeated sequences. Proteins. 1999, 35: 440-446. 10.1002/(SICI)1097-0134(19990601)35:4<440::AID-PROT7>3.0.CO;2-Y.View ArticlePubMedGoogle Scholar
- Heger A, Holm L: Rapid automatic detection and alignment of repeats in protein sequences. Proteins. 2000, 41: 224-237. 10.1002/1097-0134(20001101)41:2<224::AID-PROT70>3.0.CO;2-Z.View ArticlePubMedGoogle Scholar
- Szklarczyk R, Heringa J: Tracking repeats using significance and transitivity. Bioinformatics. 2004, 20 (Suppl 1): i311-317. 10.1093/bioinformatics/bth911.View ArticlePubMedGoogle Scholar
- Biegert A, Söding J: De novo identification of highly diverged protein repeats by probabilistic consistency. Bioinformatics. 2008, 24: 807-814. 10.1093/bioinformatics/btn039.View ArticlePubMedGoogle Scholar
- Gribskov M, McLachlan AD, Eisenberg D: Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA. 1987, 84: 4355-4358. 10.1073/pnas.84.13.4355.View ArticlePubMed CentralPubMedGoogle Scholar
- Bucher P, Karplus K, Moeri N, Hofmann K: A flexible motif search technique based on generalized profiles. Comput Chem. 1996, 20: 3-23. 10.1016/S0097-8485(96)80003-9.View ArticlePubMedGoogle Scholar
- Murray KB, Taylor WR, Thornton JM: Toward the detection and validation of repeats in protein structure. Proteins. 2004, 57: 365-380. 10.1002/prot.20202.View ArticlePubMedGoogle Scholar
- Shih ESC, Gan RR, Hwang M-J: OPAAS: a web server for optimal, permuted, and other alternative alignments of protein structures. Nucleic Acids Res 2006, 34(Web Server issue):W95–98.,Google Scholar
- Shih ESC, Hwang M-J: Alternative alignments from comparison of protein structures. Proteins. 2004, 56: 519-527. 10.1002/prot.20124.View ArticlePubMedGoogle Scholar
- Abraham A-L, Rocha EPC, Pothier J: Swelfe: a detector of internal repeats in sequences and structures. Bioinformatics. 2008, 24: 1536-1537. 10.1093/bioinformatics/btn234.View ArticlePubMed CentralPubMedGoogle Scholar
- Sabarinathan R, Basu R, Sekar K: ProSTRIP: A method to find similar structural repeats in three-dimensional protein structures. Comput Biol Chem. 2010, 34: 126-130. 10.1016/j.compbiolchem.2010.03.006.View ArticlePubMedGoogle Scholar
- Kao H-Y, Shih T-H, Pai T-W, Lu M-D, Hsu H-H: A Comprehensive System for Identifying Internal Repeat Substructures of Proteins. In 2010 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS), IEEE; 2010:689–693. doi: 10.1109/CISIS.2010.92.Google Scholar
- Parra RG, Espada R, Sánchez IE, Sippl MJ, Ferreiro DU: Detecting repetitions and periodicities in proteins by tiling the structural space. J Phys Chem B. 2013, 117: 12887-12897. 10.1021/jp402105j.View ArticlePubMed CentralPubMedGoogle Scholar
- Walsh I, Sirocco FG, Minervini G, Di Domenico T, Ferrari C, Tosatto SCE: RAPHAEL: recognition, periodicity and insertion assignment of solenoid protein structures. Bioinformatics. 2012, 28: 3257-3264. 10.1093/bioinformatics/bts550.View ArticlePubMedGoogle Scholar
- Hrabe T, Godzik A: ConSole: using modularity of Contact maps to locate Solenoid domains in protein structures. BMC Bioinformatics 2014, 15:119.,Google Scholar
- Forrer P, Binz HK, Stumpp MT, Plückthun A: Consensus design of repeat proteins. Chembiochem. 2004, 5: 183-189. 10.1002/cbic.200300762.View ArticlePubMedGoogle Scholar
- Mosavi LK, Cammett TJ, Desrosiers DC, Peng Z: The ankyrin repeat as molecular architecture for protein recognition. Protein Sci. 2004, 13: 1435-1448. 10.1110/ps.03554604.View ArticlePubMed CentralPubMedGoogle Scholar
- Li J, Mahajan A, Tsai M-D: Ankyrin repeat: a unique motif mediating protein-protein interactions. Biochemistry. 2006, 45: 15168-15178. 10.1021/bi062188q.View ArticlePubMedGoogle Scholar
- Leite RC, Basseres DS, Ferreira JS, Alberto FL, Costa FF, Saad ST: Low frequency of ankyrin mutations in hereditary spherocytosis: identification of three novel mutations. Hum Mutat 2000, 16:529.,Google Scholar
- Blatch GL, Lässle M: The tetratricopeptide repeat: a structural motif mediating protein-protein interactions. Bioessays. 1999, 21: 932-939. 10.1002/(SICI)1521-1878(199911)21:11<932::AID-BIES5>3.0.CO;2-N.View ArticlePubMedGoogle Scholar
- Tewari R, Bailes E, Bunting KA, Coates JC: Armadillo-repeat protein functions: questions for little creatures. Trends Cell Biol. 2010, 20: 470-481. 10.1016/j.tcb.2010.05.003.View ArticlePubMedGoogle Scholar
- Adams J, Kelso R, Cooley L: The kelch repeat superfamily of proteins: propellers of cell function. Trends Cell Biol. 2000, 10: 17-24. 10.1016/S0962-8924(99)01673-6.View ArticlePubMedGoogle Scholar
- Kajava AV: Structural diversity of leucine-rich repeat proteins. J Mol Biol. 1998, 277: 519-527. 10.1006/jmbi.1998.1643.View ArticlePubMedGoogle Scholar
- Vishveshwara S, Brinda KV, Kannan N: Protein structure: insights from graph theory. J Theor Comput Chem. 2002, 01: 187-211. 10.1142/S0219633602000117.View ArticleGoogle Scholar
- Kannan N, Vishveshwara S: Identification of side-chain clusters in protein structures by a graph spectral method. J Mol Biol. 1999, 292: 441-464. 10.1006/jmbi.1999.3058.View ArticlePubMedGoogle Scholar
- Patra SM, Vishveshwara S: Backbone cluster identification in proteins by a graph theoretical method. Biophys Chem. 2000, 84: 13-25. 10.1016/S0301-4622(99)00134-9.View ArticlePubMedGoogle Scholar
- Chakrabarty B, Parekh N: Analysis of graph centrality measures for identifying Ankyrin repeats. In 2012 World Congress on Information and Communication Technologies (WICT), IEEE; 2012:156–161. doi: 10.1109/WICT.2012.6409067Google Scholar
- Sonnhammer EL, Eddy SR, Durbin R: Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins. 1997, 28: 405-420. 10.1002/(SICI)1097-0134(199707)28:3<405::AID-PROT10>3.0.CO;2-L.View ArticlePubMedGoogle Scholar
- Sigrist CJA, Cerutti L, de Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A, Hulo N: PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res. 2010, 38 (Database issue): D161-166. 10.1093/nar/gkp885.View ArticlePubMed CentralPubMedGoogle Scholar
- Consortium UP: The universal protein resource (UniProt) in 2010. Nucleic Acids Res. 2010, 38 (Database issue): D142-148. 10.1093/nar/gkp846.View ArticleGoogle Scholar
- Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995, 247: 536-540.PubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The protein data bank. Nucleic Acids Res. 2000, 28: 235-242. 10.1093/nar/28.1.235.View ArticlePubMed CentralPubMedGoogle Scholar
- Frishman D, Argos P: Knowledge-based protein secondary structure assignment. Proteins. 1995, 23: 566-579. 10.1002/prot.340230412.View ArticlePubMedGoogle Scholar
- Oliphant TE: Python for scientific computing. Computing in Science and Engg. 2007, 9: 10-20. 10.1109/MCSE.2007.58.View ArticleGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.View ArticlePubMed CentralPubMedGoogle Scholar
- Shindyalov IN, Bourne PE: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998, 11: 739-747. 10.1093/protein/11.9.739.View ArticlePubMedGoogle Scholar
- Laskowski RA, Hutchinson EG, Michie AD, Wallace AC, Jones ML, Thornton JM: PDBsum: a Web-based database of summaries and analyses of all PDB structures. Trends Biochem Sci. 1997, 22: 488-490. 10.1016/S0968-0004(97)01140-7.View ArticlePubMedGoogle Scholar
- Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983, 22: 2577-2637. 10.1002/bip.360221211.View ArticlePubMedGoogle Scholar
- Gouy M, Guindon S, Gascuel O: SeaView version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol. 2010, 27: 221-224. 10.1093/molbev/msp259.View ArticlePubMedGoogle Scholar
- Kohl A, Binz HK, Forrer P, Stumpp MT, Plückthun A, Grütter MG: Designed to be stable: Crystal structure of a consensus ankyrin repeat protein. Proc Natl Acad Sci U S A. 2003, 100: 1700-1705. 10.1073/pnas.0337680100.View ArticlePubMed CentralPubMedGoogle Scholar
- Mosavi LK, Minor DL, Peng Z: Consensus-derived structural determinants of the ankyrin repeat motif. Proc Natl Acad Sci U S A. 2002, 99: 16029-16034. 10.1073/pnas.252537899.View ArticlePubMed CentralPubMedGoogle Scholar
- Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, Kiefer F, Cassarino TG, Bertoni M, Bordoli L, Schwede T: SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res 2014, 42(Web Server issue):W252–258.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.