An evolutionary method for learning HMM structure: prediction of protein secondary structure
© Won et al; licensee BioMed Central Ltd. 2007
Received: 28 April 2007
Accepted: 21 September 2007
Published: 21 September 2007
The prediction of the secondary structure of proteins is one of the most studied problems in bioinformatics. Despite their success in many problems of biological sequence analysis, Hidden Markov Models (HMMs) have not been used much for this problem, as the complexity of the task makes manual design of HMMs difficult. Therefore, we have developed a method for evolving the structure of HMMs automatically, using Genetic Algorithms (GAs).
In the GA procedure, populations of HMMs are assembled from biologically meaningful building blocks. Mutation and crossover operators were designed to explore the space of such Block-HMMs. After each step of the GA, the standard HMM estimation algorithm (the Baum-Welch algorithm) was used to update model parameters. The final HMM captures several features of protein sequence and structure, with its own HMM grammar. In contrast to neural network based predictors, the evolved HMM also calculates the probabilities associated with the predictions. We carefully examined the performance of the HMM based predictor, both under the multiple- and single-sequence condition.
We have shown that the proposed evolutionary method can automatically design the topology of HMMs. The method reads the grammar of protein sequences and converts it into the grammar of an HMM. It improved previously suggested evolutionary methods and increased the prediction quality. Especially, it shows good performance under the single-sequence condition and provides probabilistic information on the prediction result. The protein secondary structure predictor using HMMs (P.S.HMM) is on-line available http://www.binf.ku.dk/~won/pshmm.htm. It runs under the single-sequence condition.
Prediction of protein secondary structure is an important step towards understanding protein structure and function from protein sequences. This task has attracted considerable attention and consequently represents one of the most studied problems in bioinformatics. Early prediction methods were developed based on stereochemical principles  and statistics [2, 3]. Since then the prediction rate has steadily risen due to both algorithmic development and the proliferation of the available data. The first machine learning predictions of secondary structure were done using neural networks [4, 5]. Later methods using neural networks include PHD , PSIPRED , SSpro , SSpro8  and YASPIN . Support vector machines have also been used and show promising results . Recently, the prediction accuracy has been improved by cascading a second layer of support vector machines [12, 13]. The currently used machine learning methods typically improve their performance by combining several predictors and using evolutionary information obtained from PSI-BLAST . Combining results from different predictors has been shown to improve the performance of secondary structure prediction [15, 16].
Even though Hidden Markov Models (HMMs) have been successfully applied to many problems in biological sequence modelling, they have not been used much for protein secondary structure prediction. Asai et al. suggested the first HMM for the prediction of protein secondary structure . Later, an HMM with a hierarchical structure was suggested . However, both predictors had limited accuracy.
HMMSTR  is a successful HMM predictor for this problem. It was constructed by identifying recurring protein backbone motifs (called invariant/initiation sites or I-sites) and representing them as a Markov chain. Consequently, the topology of HMMSTR can be interpreted as a description of the protein backbone in terms of consecutive I-sites. YASPIN , which is one of the most recent methods, builds on a combination of hidden Markov models and neural networks .
In this paper, we report a new method for optimizing the structure of HMMs for secondary structure prediction. Over the last couple of years we have developed a method for optimizing the structure of HMMs automatically using Genetic Algorithms (GAs) [21, 22]. In previous work, we applied this method to promoter finding in DNA. Here, we use the evolutionary method to optimize the structure of an HMM for secondary structure prediction. During the evolutionary optimization, the HMM's structure is assembled from biologically meaningful building blocks . Hence, we call our evolutionary method Block-HMM. The evolved HMM using the Block-HMM remodels the training protein sequences and shows the prediction probability of the secondary conformations calculated for each amino acid.
In the literature, we have found a few HMM structure learning methods. Stolcke developed a state merging method, which starts from an HMM with a large number of states . On the other hand, a state splitting method was suggested in [24, 25]. A structure evolving method using GAs was first suggested to change the structure of a TATA box HMM . Later, they upgraded the HMM structure learning method considering statistical significance . A structure evolving method using a genetic programming was also suggested, in which the HMM structures is represented by probabilistic trees [28, 29]. The evolving method was also applied to protein secondary structure prediction. Thomsen suggested a GA very similar to Yada et al. and achieved 49% prediction accuracy .
Our structure learning method is different from previous methods in that we use block models inspired by HMM applications used in biological sequence analysis. Instead of crossing over arbitrary number of states, we cross a number of blocks over. This enables different number of states to be exchanged through the crossover operation. Mutation occurs in a limited area that adding or deleting transitions do not break the property of blocks. As a result, our approach makes use of characteristics of HMM modularity more strategically than previously suggested genetic methods. Genetic programming methods [28, 29] encode HMM networks with probabilistic trees. Linguistic representations were derived from each particular HMM topology. Similar to genetic programming method, our approach encodes several types of blocks into linguistic forms . The basic shapes of linguistic blocks are different each other in both of the methods. The encoding differences effect the searching space of a topology evolution. It also suggests that various types of topological encoding may be useful for other problems.
We analyze one of the evolved HMM structure under the single-sequence condition. We also test it under the multiple-sequence condition after designing a whole predictor using an ensemble of three independently trained predictors as well as simple neural networks.
Block-HMM for labelled sequences
Linear blocks consist of N states (labelled from 1 to N) where state n is only connected to state n + 1 (with 1 ≤ n <N). Self-loop blocks are linear blocks in which each state has an additional loop to itself. A forward-jump block is a linear block where the first state is also connected to the last M states (with 1 <= M <N). Zero blocks are empty blocks with no states: they can replace other block types during the GA procedure and thus allow the exploration of simpler topologies.
The self-loop and forward-jump blocks can be either tied (in the figures, tied blocks are shaded) or untied. When a block is tied all the emission and transition probabilities of states inside the block are equal. In the case of linear blocks we did not consider tying because tying a linear blocks is equivalent to a single-state self-loop block.
Genetic operators for Block-HMM
Genetic algorithms evolve a population of solutions with genetic operators. Inside the genetic cycle, genetic operators select members of the population (called parents) and evolve them to produce new members (called children). New children after the genetic operators along with the remaining old members in a population are evaluated to calculate fitness. According to the fitness selection procedure select a number of members in a population for the next genetic cycle.
We ran the GA that hybridize the parameter learning method with these genetic operators that train the structure of HMMs. The detailed description of the whole procedure is on Methods.
Analysis of the evolved HMM
The evolved model
Information of all the trained states
At state11 and state32 we found a strong probability of Pro. Among 13637 visits on state11 we found Pro 3765 times (= 27.6%) in the generated sequences, which closely matches with the emission probability of 27.7%. State11 usually modelled 'xxx' (2783 times, 73.9%), 'xxH' (685 times, 18.2%), 'xxE' (286 times, 7.6%), and at the end of the sequence ('xx', 11 times). This indicates that state11 is used to link a coil with other compositions. In the case of state 32, Pro was usually used to model 'xHH' (1828 cases out of 2084, 87.7%) or 'HHH' (205 cases out of 2084, 9.8%).
Gly was found strong on state37 and state45. We found Gly on state37 is only between two coil conformations (3570 times). Gly on state45 worked in the apposite way to Pro on state11, producing 'Hxx' (710 times out of 2018, 35.2%), 'xxx' (1300 times, 64.4%), 'Exx' (2 times 0.1%), and at the end of the sequences.
The HMM's grammar
The block transition
percentage used on each state
number of times used in the generated sequence
state0 (H) → state17 (H)
state0 (H) → state39 (H)
state3 (H) → state1 (H)
state6 (E) → state33 (E)
state9 (E) → state27 (E)
state9 (E) → state43 (E)
state11 (x) → state15 (x)
state11 (x) → state31 (x)
state11 (x) → state50 (x)
state14 (H) → state51 (x)
state16 (x) → state4 (E)
state16 (x) → state31 (x)
state16 (x) → state50 (x)
state18 (H) → state50 (x)
state19 (E) → state7 (E)
state21 (E) → state7 (E)
state23 (H) → state12 (H)
state23 (H) → state48 (H)
state23 (H) → state51 (x)
state26 (H) → state51 (x)
state27 (E) → state15 (x)
state27 (E) → state28 (x)
state29 (x) → state28 (x)
state29 (x) → state36 (x)
state30 (H) → state48 (H)
state31 (x) → state0 (H)
state31 (x) → state10 (x)
state31 (x) → state31 (x)
state31 (x) → state32 (H)
state31 (x) → state39 (H)
state31 (x) → state50 (x)
state32 (H) → state17 (H)
state38 (x) → state4 (E)
state40 (H) → state41 (H)
state42 (H) → state48 (H)
state44 (E) → state27 (E)
state49 (H) → state22 (H)
state49 (H) → state24 (H)
state49 (H) → state30 (H)
state50 (x) → state10 (x)
state50 (x) → state31 (x)
state50 (x) → state50 (x)
state51 (x) → state15 (x)
state51 (x) → state31 (x)
state51 (x) → state50 (x)
state51 (x) → state51 (x)
We checked if the model has grammar for short secondary elements. We found 57 'H-x-H' linked helices. For this sequence grammar we found an HMM grammar of 'state18-state50-state32' 44 times (77.2%). We also checked how each coil-state contribute to this grammar. For the coil region State50, state51 and state31 are used 44 time, 6 time and 7 times, respectively. In the case of 666 'H-x-E' in the generated sequences, the dominant grammars is 'state18-state50-state43' (27.6%). For the coil region State51 and State50 were used 90 times (13.5%) and 576 times (86.5%), respectively. Interestingly, state31 was not used for this grammar. For the grammar 'E-x-H', however, state51 was never used on the other hand. About 97.2% of the HMM grammar uses state31 (923 times out of 950) and 2.8% (27 times) was used by state50. This indicates that state50, state51 and state31 are used in a different way when they compose a sequence grammar. For the grammar 'H-xx-H', the dominant HMM grammar used for coil region was 'state51-state31' (1175 times out of 2146 (54.8%)), 'state50-state31' (19.5%) and 'state10-state11' (22.2%). We checked how the HMM is organized to model hairpin structures. For a grammar 'E-x-E', state50 was found dominant (81.1% = 310 out of 382). State51 and state31 covered 3.7% and 15.2%, respectively. For the structure 'E-xx-E', 'state28-state29' are mostly used (58.2% = 1830 out of 3142), followed by 'state15-state16' (16.9%) and 'state36-state38' (11.0%). The single state blocks (state50, state51 and state31) are instead rarely used. In the case of the structure 'E-xxx-E', 'state36-state37-state38' covered 68.2% (1937 out of 2842) and state15-state15-state16 occupied 14.0%. Each of other compositions is less than 5%. We generated no sequence for the grammar 'x-H-x', which disobeys the grammar of protein secondary structure.
Prediction results with posterior decoding
where x is an amino acid sequence and y is a accompanying sequence labels of protein secondary structure conformation. Θ is the evolved HMM, and is the set of all the states in the HMM.
We assign each state to one of the classes in the secondary structure. That is, we take the probability of a label given a state to be 1 if the state is assigned to that class and 0 otherwise. Thus the sum in equation (1) only gets contributions from states that have been assigned to class l.
Prediction under single-sequence condition
Prediction under the single-sequence condition
We compared performance of the best HMM topology trained on all the 1662 training sequences with other predictors under the single-sequence condition. As a test set we used the data set published on October 2002 on the EVA server . From this we prepared two sets. Firstly, from 1828 sequences we deleted the common sequences with our training set and finally retrieved 1584 sequences (non-common set). Secondly, we only used the sequences which are common in our training set and PSIPRED training set and found 153 sequences (common set). Table 3 shows the comparison with PSIPRED for the two tests. These tests at least show that Block-HMM has good performance as a secondary structure predictor.
Prediction under multi-sequence condition
We designed a whole secondary structure predictor using multiple sequences information. Structure-to-Structure layer is added to get more prediction. See Methods for more detail.
Cross-validation & comparison
Prediction under the multiple sequences condition
In an attempt to benchmark our method with existing predictors we compare our prediction results with those of YASPIN  and PSIPRED. We asked Dr. Kuang Lin to train YASPIN with the same data set we used. The same dataset is used in running PSIPRED which has been already trained and publicly available. We used 2 data sets used in the test under the single-sequence condition. Table 4 shows the benchmarking result. When we used non-common dataset the QÌ rate of our method is about 1% better than that of YASPIN, though the SOV of YASPIN is about 0.5% higher. The Q E of YASPIN is impressive, showing better performance than PSIPRED. Obviously, PSIPRED showed best performance. Next, we used the common set. Again, PSIPRED showed best performance, and the performance of Block-HMM is about 1% better than YASPIN. This result is interesting, considering that the performance of Block-HMM is better using same dataset under the single-sequence condition.
The predictor using HMMs inherited all the advantages of HMMs. Artificial protein sequence with secondary structure can be generated. The generated sequences show matched characteristics with the training dataset in the contents and the length distributions of the secondary conformation. Also, it is easy to see probabilistic reasoning of the prediction result. The analysis on the evolved model and the generated sequences shows that the evolving method successfully interprets the grammar of the protein sequences and converts it into the grammar of HMMs. It is more noteworthy considering that the grammar and biological information is constructed automatically without human intervention.
Recently, an HMM based protein secondary predictor was hand designed and showed good performance in predicting beta-strands under single-sequence condition . Also, structure learning method using Bayesian information criterion has been introduced . It increases the number of states while checking the optimal balance between fitting to the data and the HMM size. Our method has more operations to change HMM structure and we penalised the number of HMM structure by evaluating the trained model with the separated set. As shown in the test under the single-sequence condition, the overall prediction performance of an evolved HMM is quite excellent. We do not claim that our HMM is better under the single-sequence condition. The test set we used may be biased to HMMs. However, the result at least shows that the evolving method is a good way to design an HMM for this problem and further applications. In the case of testing under the multiple sequences condition, the performance of PSIPRED is obviously better than Block-HMM. The way of incorporating multiple sequences information as well as the structure-to-structure layer of PSIPRED works far better than our approach. Incorporating multiple sequences information remains further area of study. However, our result still comparable to YASPIN's result.
Our method does not require a sliding window as most other secondary structure prediction methods do. The size of the window is chosen in order to obtain good performance (for example, PSIPRED has a window size of 15 ). The evolving HMM method uses the whole sequence as input, which avoids the use of a fixed sequence window that might affect performance in specific cases.
At present the Block-HMM method is relatively slow because it has to train and calculate fitness for all the HMM members in the population. Fortunately, the method is suitable for parallel computation. To evolve an HMM using GAs with 30 members in a population, we used 31 2.4 GHz P4 processors each with 512 Mb RAM run in parallel. Each processor trains one HMM. Ideally, the CPU time consumed in each processor is the time to train and evaluate an HMM multiplied by the number of iterations. It took about 7 hours to produce an HMM with 40 states. Prediction using three trained HMMs without evolutionary information takes about 30 seconds.
Optimizing HMM structures using an evolutionary algorithm has several benefits. First of all, the structure of an HMM is automatically evolved without prior knowledge. The success is remarkable given that other methods for secondary structure prediction require considerable calibration. Compared to the hand-designed HMMSTR , the evolutionary method produced good results with a smaller number of states. In the case of neural networks, the selection of the number of units needs careful attention. Here again, the evolving HMM method is an attractive alternative.
Compared to other HMM structure evolving methods, our approach shows excellences. Thomsen's results for the secondary structure prediction (49%) indirectly tells that our method is very effective for the secondary structure prediction problem.
The P.S.HMM (Protein Secondary structure predictor using HMMs)server is online, providing secondary structure prediction and probability of each secondary structure conformation. Protein dataset used in the test is found at http://binf.ku.dk/~won/proseq.tar.gz.
The SABMark Twilight Zone data set (version 1.63)  provides a set of representative structures. This data set consists of 2230 high quality structures partitioned into 236 folds. Although many proteins in the data set share a common fold, no pair of protein sequences can be aligned with a BLAST E-value below 1 or a sequence identity above 25%. For the proteins with a common fold in the data set, it is not possible to identify a traceable evolutionary common origin.
Structures that caused problems with the DSSP program (see below) or that had chain breaks were removed, which resulted in a final data set of 1662 structures belonging to 234 fold groups. Two fold groups are removed by this process because no structures remained in these groups. With these 234 groups we performed a five fold cross-validation test. In order to create a stringent test set we made sure that proteins with a common fold do not appear in both the training and test sets.
The secondary structure was calculated using the program DSSP . DSSP assigns secondary structure to eight different classes: α-helix (H), isolated β-bridge (B), β-strand (E), 310-helix (G), Π-helix (I), turn (T), bend (S) and other. In this study, we used three classes: helix (consisting of DSSP classes H and G), strand (classes B and E) and coil (all other classes). The DSSP results were retrieved using the DSSP front end in the Biopython toolkit .
Training with Block-HMM
We have used a hybrid GA with traditional GA operators to explore the space of HMM topologies in combination with Baum-Welch optimization of the transition and emission probabilities.
Block-HMM parameters used in the experiment
Number of blocks in an HMM
The initial length of a block
Number of crossovers per iteration
Number of mutations per iteration
Number of type-mutations per iteration
where σ is the standard deviation of the fitness in the population and s is a constant that controls the strength of the selection. In the work reported here, we used a value of s equal to 0.3.
The best member of a population is always selected, and a subset of other members are selected by using stochastic universal sampling . Some of the members are mutated or subjected to crossover. Then, all the members of the generation undergo Baum-Welch optimization using the training data set.
We saved the best HMM at each of the 400 generations, i.e. during the whole run of the GA. At the end of the run, the best HMM is selected and trained again with the Baum-Welch algorithm, this time using all the sequences used for training and evaluation. This is done because the last HMM is not always the best HMM generated during the whole GA run. Finally, the HMM is trained further using the discriminative training method . The Baum-Welch algorithm maximizes the likelihood of the training sequences (in our case containing amino acid and secondary structure labels). However, we are more interested in maximizing the probability of obtaining correct secondary structure labels for the amino acid sequences (rather than maximizing the probability of the full sequences themselves). Discriminative training is used to increase the probability of obtaining correct labels given the sequences and a specific HMM structure.
Incorporating evolutionary information
Secondary structure prediction rates can be boosted by using evolutionary information. In most systems, the position specific scoring matrix (PSSM) is used as an input of the predictor. Instead of using PSSM, we ran our predictor on a set of homologous sequences and then combined the results. To obtain the homologous sequences we ran PSI-BLAST  against the UniProt 90 protein sequence database  downloaded on Feb. 17th 2005. We used 3 iterations of PSI-BLAST and an E-value threshold of 0.001. The posterior label probabilities (PLPs) were calculated by decoding each of the homologous sequences against the trained HMM. After aligning the decoding results, we calculated the weight of each sequence according to the position-based sequence weight .
The second (structure-to-structure) layer
We would like to thank L.G.T. Joergensen for providing his nice HMM structure drawing tool and Drs. V.A. Simossis for providing YASPIN code. Specially, we would like to thank Dr. Kuang Lin for kindly training YASPIN with the data set we provided. KJW was supported by a grant from the Novo Nordisk Foundation. TH is supported by a Marie Curie Intra-European Fellowship within the 6th European Community Framework Programme.
- Lim VI: Algorithms for prediction of alpha helices and structural regions in globular proteins. J Mol Biol 1974, 88: 873–894. 10.1016/0022-2836(74)90405-7View ArticlePubMedGoogle Scholar
- Chow PY, Fasman GD: Prediction of the secondary structure of proteins from their amino acid sequence. Adv Enzymol 1978, 47: 45–148.Google Scholar
- Garnier J, Osguthorpe DJ, Robson B: Analysis and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 1978, 120: 97–120. 10.1016/0022-2836(78)90297-8View ArticlePubMedGoogle Scholar
- Qian N, Sejnowski TJ: Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 1988, 202: 865–884. 10.1016/0022-2836(88)90564-5View ArticlePubMedGoogle Scholar
- Bohr H, Bohr J, Brunak S, Cotterill R, Lautrup B, Nørskov L, Olsen O, Petersen S: Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 1988, 202: 865–884. 10.1016/0022-2836(88)90564-5View ArticleGoogle Scholar
- Rost B, Sander C: Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 1993, 232: 584–599. 10.1006/jmbi.1993.1413View ArticlePubMedGoogle Scholar
- Jones DT: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices. J Mol Biol 1999, 292: 195–202. 10.1006/jmbi.1999.3091View ArticlePubMedGoogle Scholar
- Baldi P, Brunak S, Frasconi P, Soda G, Pollastri G: Exploiting the past and the future in protein secondary structure prediction. Bioinformatics 1999, 15(11):937–946. 10.1093/bioinformatics/15.11.937View ArticlePubMedGoogle Scholar
- Pollastri G, Przybylski D, Rost B, Baldi P: Improving the Prediction of Protein Secondary Structure in Three and Eight Classes Using Recurrent Neural Networks and Profiles. Proteins 2002, 47: 228–235. 10.1002/prot.10082View ArticlePubMedGoogle Scholar
- Lin K, Simossis VA, Taylor WR, Heringa J: A simple and fast secondary structure prodiction method using hidden neural networks. Bioinformatics 2005, 21(2):152–159. 10.1093/bioinformatics/bth487View ArticlePubMedGoogle Scholar
- Hua S, Sun Z: A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach. J Mol Biol 2001, 308: 397–407. 10.1006/jmbi.2001.4580View ArticlePubMedGoogle Scholar
- Ward JJ, McGuffin LJ, Buxton BF, Jones DT: Secondary structure prediction with support vector machines. Bioinformatics 2003, 19(13):1650–1655. 10.1093/bioinformatics/btg223View ArticlePubMedGoogle Scholar
- Guo J, Chen H, Sun Z, Lin Y: A Novel Method for Protein Secondary Structure Prediction Using Dual-Layer SVM and Profiles. Proteins 2004, 54: 738–743. 10.1002/prot.10634View ArticlePubMedGoogle Scholar
- Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman D: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl Acids Res 1997, 24: 3389–3402. 10.1093/nar/25.17.3389View ArticleGoogle Scholar
- Cuff J, Barton G: Application of Multiple Sequence Alignment Profiles to Improve Protein Secondary Structure Prediction. Proteins 2000, 40: 502–511. 10.1002/1097-0134(20000815)40:3<502::AID-PROT170>3.0.CO;2-QView ArticlePubMedGoogle Scholar
- Albrecht M, Tosatto S, Lengauer T, Valle G: Simple consensus procedures are effective and sufficient in secondary structure prediction. Protein Eng 2003, 16(7):459–462. 10.1093/protein/gzg063View ArticlePubMedGoogle Scholar
- Asai K, Hayamnizu S, Handa K: Prediction of protein secondary structure by the hidden Markov model. Comput Appl Biosci 1993, 9: 141–146.PubMedGoogle Scholar
- Yoshikawa H, Ikeguchi M, Nakamura S, Shimizu K, Doi J: Prediction of Protein Structure Classes and Secondary Structure by Means of Hidden Markov Models. Systems and Computers in Japan 1999, 30(13):13–22. Publisher Full Text 10.1002/(SICI)1520-684X(19991130)30:13<13::AID-SCJ2>3.0.CO;2-7View ArticleGoogle Scholar
- Bystroff C, Thorsson V, Baker D: HMMSTR: a Hidden Markov Model for Local Sequence-Structure Correlations in Proteins. J Mol Biol 2000, 301: 173–190. 10.1006/jmbi.2000.3837View ArticlePubMedGoogle Scholar
- Krogh A, Riis SK: Hidden Neural Networks. Neural Computation 1999, 11: 541–563. 10.1162/089976699300016764View ArticlePubMedGoogle Scholar
- Won KJ, Prügel-Bennett A, Krogh A: Training HMM Structure with Genetic Algorithms for Biological Sequence Analysis. Bioinformatics 2004, 20(18):3613–3627. 10.1093/bioinformatics/bth454View ArticlePubMedGoogle Scholar
- Won KJ, Prügel-Bennett A, Krogh A: Evolving the Structure of Hidden Markov Models. IEEE Transactions on Evolutionary Computation 2006, 10: 39–49. 10.1109/TEVC.2005.851271View ArticleGoogle Scholar
- Stolcke A: Bayesian Learning of Probabilistic Language Models. PhD thesis. University of California at Berkeley; 1994.Google Scholar
- Fujiwara Y, Asogawa M, Konagaya A: Stochastic motif extraction using hidden Markov model. ISMB1994 1994, 121–129.Google Scholar
- Fujiwara Y, Asogawa M, Konagaya A: Motif Extraction using an Improved Iterative Duplication Method for HMM Topology Learning. Pacific Symposium on Biocumputing '96 1995, 713–714.Google Scholar
- Yada T, Ishikawa M, Tanaka H, Asai K: DNA Sequence Analysis using Hidden Markov Model and Genetic Algorithm. Genome Informatics 1994, 5: 178–179.Google Scholar
- Yada T, Totoki Y, Ishikawa M, Asai K, Nakai K: Automatic extraction of motifs represented in the hidden Markov model from a number of DNA sequences. Bioinformatics 1998, 14: 317–325. 10.1093/bioinformatics/14.4.317View ArticlePubMedGoogle Scholar
- Yada T: Generation of hidden Markov model describing complex motif in DNA sequences. IPSJ Trans 1995, 40: 750–767.Google Scholar
- Yada T: Stochastic models representing DNA sequence data construction algorithms and their applications to prediction of gene structure and function. PhD thesis. University of Tokyo; 1998.Google Scholar
- Thomsen R: Evolving the Topology of Hidden Markov Models Using Evolutionary Algorithms. LNCS 2002, 2439: 861–870.Google Scholar
- Rost B, Eyrich V: EVA: large-scale analysis of secondary structure prediction. Proteins 2001, 5: 192–199. 10.1002/prot.10051View ArticlePubMedGoogle Scholar
- Aydin Z, Altunbasak Y, Borodovsky M: Protein secondary structure prediction for a single-sequence using hidden semi-Markov models. BMC Bioinformatics 2006, 7: 178. 10.1186/1471-2105-7-178PubMed CentralView ArticlePubMedGoogle Scholar
- Martin J, Gibrat JF, Rodolphe F: Analysis of an optimal hidden Markov model for secondary structure prediction. BMC Structural Biology 2006, 6: 25. 10.1186/1472-6807-6-25PubMed CentralView ArticlePubMedGoogle Scholar
- Van Walle I, Lasters I, Wyns L: SABmark – a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 2005, 21(7):1267–1268. 10.1093/bioinformatics/bth493View ArticlePubMedGoogle Scholar
- Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22: 2577–2637. 10.1002/bip.360221211View ArticlePubMedGoogle Scholar
- Hamelryck T, Manderick B: PDB file parser and structure class implemented in Python. Bioinformatics 2003, 19: 2308–2310. 10.1093/bioinformatics/btg299View ArticlePubMedGoogle Scholar
- Baker JE: Reducing Bias and Inefficiency in the Selection Algorithm. In Proceedings of the Second International Conference on Genetic Algorithms. Lawrence Erlbaum Associates (Hillsdale); 1987:14–21.Google Scholar
- Krogh A: Two methods for improving performance of an HMM and their application for gene finding. Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology 1997, 179–186.Google Scholar
- Leinonen R, Diez F, Fleischmann W, Lopez R, Apweiler R: Uniprot archive. Bioinformatics 2004, 20(17):3236–3237. 10.1093/bioinformatics/bth191View ArticlePubMedGoogle Scholar
- Henikoff S, Henikoff J: Position-based Sequence Weights. J Mol Biol 1994, 243: 574–578. 10.1016/0022-2836(94)90032-9View ArticlePubMedGoogle Scholar
- Rumelhart D, Hinton G, Williams R: Learning representations by back-propagating error. Nature 1986, 323: 533–536. 10.1038/323533a0View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.