Protein structure prediction with local adjust tabu search algorithm
- Xiaoli Lin†^{1, 2},
- Xiaolong Zhang^{1} and
- Fengli zhou^{2}
https://doi.org/10.1186/1471-2105-15-S15-S1
© Lin et al.; licensee BioMed Central Ltd. 2014
Published: 3 December 2014
Abstract
Background
Protein folding structure prediction is one of the most challenging problems in the bioinformatics domain. Because of the complexity of the realistic protein structure, the simplified structure model and the computational method should be adopted in the research. The AB off-lattice model is one of the simplification models, which only considers two classes of amino acids, hydrophobic (A) residues and hydrophilic (B) residues.
Results
The main work of this paper is to discuss how to optimize the lowest energy configurations in 2D off-lattice model and 3D off-lattice model by using Fibonacci sequences and real protein sequences. In order to avoid falling into local minimum and faster convergence to the global minimum, we introduce a novel method (SATS) to the protein structure problem, which combines simulated annealing algorithm and tabu search algorithm. Various strategies, such as the new encoding strategy, the adaptive neighborhood generation strategy and the local adjustment strategy, are adopted successfully for high-speed searching the optimal conformation corresponds to the lowest energy of the protein sequences. Experimental results show that some of the results obtained by the improved SATS are better than those reported in previous literatures, and we can sure that the lowest energy folding state for short Fibonacci sequences have been found.
Conclusions
Although the off-lattice models is not very realistic, they can reflect some important characteristics of the realistic protein. It can be found that 3D off-lattice model is more like native folding structure of the realistic protein than 2D off-lattice model. In addition, compared with some previous researches, the proposed hybrid algorithm can more effectively and more quickly search the spatial folding structure of a protein chain.
Keywords
Background
The understanding of molecular conformations is one of the crucial issues in computational biology. The incorrect protein folding is associated with illnesses such as Alzheimer's disease, bovine spongiform encephalopathy and Creutzfeldt-Jakob disease. The biological functions of protein are determined by their dimensional folding structures, and their spatial structures are absolutely determined by their primary structures [1]. Traditional experimental methods of determining protein folding structure are expensive, such as X-ray crystallography and NMR spectroscopy. Because of the complexity of realistic protein, it is extremely difficult to make an analysis of the protein folding process.
Due to the polypeptide chain forms such large number of different spatial structure, it is still difficult to search for the global minimum energy conformations of proteins from its sequence of amino acids [2]. Therefore, the most important problem is how to establish a highly simplified but effective model which can reflect the relation between the free energy and tertiary structure of the protein. One of the simplified protein models is the hydrophobic-polar (HP) model which has been widely used to study protein structure and understand protein folding process. The HP-lattice model represents the amino acid chains of a protein using two types of residue, non-polar or hydrophobic (H) residue and polar (P) or hydrophilic residue are on the vertices of a simple cubic lattices[3]. The HP lattice-model abstracts the hydrophobic interaction process in protein folding by reducing a protein to a heteropolymer that represents a predetermined pattern of hydrophobicity in the protein[4]. The non-ploar amino acids are classified as hydrophobic and polar amino acids, which is used to force the formation of a compact hydrophobic core as observed in the real protein [5]. However, the HP lattice-model doesn't reveal all secrets of the protein, despite its simplicity. The main reason lies in that local interactions are neglected in the simplified models, while local interactions might be important for the local structure of the chains [6].
To reflect more realisticly the native attributes of proteins, Stillinger studied a similar AB off-lattice protein model in two dimensions[7]. In AB off-lattice model the 20 amino acids are also reduced to two classes, hydrophobic (A) and hydrophilic (B). For AA, BB and AB pairs respectively, there is an intramolecular mix of strong attraction, weak attraction, and weak repulsion, roughly analogous to the situation of the real proteins[8].The interactions considered in AB off-lattice model include both sequence independent local interactions and the sequence dependent Lennard-Jones term that favours the formation of a hydrophobic core[9]. Irback et al. extended a two dimension (2D) to a three dimension (3D) in the AB off-lattice model, which takes account of the torsional energy implicitly [6].
In recent years, many works were devoted to the optimal conformations with lowest energies in the AB off-lattice model[10–12]. Because searching the whole conformational space of a protein has been proved to be NP-complete problem, it is necessary to introduce the heuristic optimization algorithm, such as the energy landscape paving minimizer (ELP)[9], the genetic tabu search algorithm (GATS)[10], the conformational space annealing (CSA)[13], the pruned-enriched-Rosenbluth method (PERM) [14] and the local adjust genetic algorithm (LAGAA) [15] etc. This paper describes a protein structure prediction method that is based on the AB off-lattice model in two dimension and three dimension, which combines the tabu search algorithm and local adjust simulation annealing. The new improved hybrid algorithm (SATS) is applied to find the spacial conformations with Fibonacci sequences and real proteins.
Methods
AB off-lattice model in 2D
AB off-lattice model in 3D
If σ_{ i } = 1, the ith reside is A; if σ_{ i } = − 1, the ith reside is B, and the formation of hydrophobic core depends on the C(σ_{ i }, σ_{ j }). In addition, the strength of species-independent local interactions is reflected by the parameters K_{1} and K_{2}. The parameter (K_{1}, K_{2}) were tested again and again by using different values in [6], and finally Irback found that the spacial structure is more stability when the parameter (K_{1}, K_{2}) was set to (− 1, 0.5).
Improved strategies
Tabu search, created by Glover [16], is a meta-heuristic search method. It can be used for complex mathematical optimization and combinatorial optimization problems. Tabu search uses a local or neighborhood search procedure to iteratively move from one potential solution to an improved solution in the neighborhood until some stopping criterion has been satisfied. Local search procedures often become stuck in poor-scoring areas or areas where scores plateau. In order to avoid these pitfalls and explore regions of the search space that would be left unexplored by other local search procedures, tabu search carefully explores the neighborhood of each solution as the search progresses. The solutions admitted to the new neighborhood, are determined through the use of the memory structures. Adaptive memory helps the search process to avoid local optima and explores the solution space economically and effectively without getting trapped into cycles [17]. These memory structures form what is known as the tabu list, a set of rules and banned solutions used to filter which solutions will be admitted to the neighborhood to be explored by the search. To enhance the efficiency, the following strategies are used in the algorithm for predicting the protein folding structure.
Encoding
It is very important for the algorithm how to encode the individual, because the different encoding will affect the effectiveness and performance of searching for the whole spatial structure. The solutions of individual encoding then are often binary coded. This encoding, however, is not well suited for protein folding structure prediction problem. Instead, for an N-residue long chain, the individual can be expressed as ${h}_{i}=\left\{{\theta}_{2}^{i},{\theta}_{3}^{i},\dots ,{\theta}_{n-1}^{i}\right\}$ and ${h}_{i}=\left\{{\theta}_{2}^{i},{\theta}_{3}^{i},\dots ,{\theta}_{n-1}^{i},{\alpha}_{3}^{i},{\alpha}_{4}^{i},\dots ,{\alpha}_{n-1}^{i}\right\}$ in the 2D AB off-lattice model and the 3D AB off-lattice model respectively. Encoding in this way is enabled by the fact that the optimization is performed for the protein amino acids chain.
Annealing mechanism
Simulated Annealing (SA) [18]is a probabilistic method for the global optimization problem of searching an approximation to the global optimum of a cost function. Just as the cooling process of solid shows, the solid stays in a disorder state at the beginning with a high temperature, and coming to more and more order when the temperature drops lower and lower till to the frozen state [19]. The core mechanism of SA is the Metropolis Criterion which is used to decide whether the new state should be accepted. The acceptance probability function depends on the energy E and temperature T. If the change in energy is negative the new state is accepted. If the change in energy is positive it is accepted by the certain probability given by the Boltzmann factor. That is to say, the good and bad solution both can be accepted with a probability to avoid becoming trapped in a local optimum. Annealing algorithm simulates the process described above, the algorithm starts with a give parameter called start temperature, and terminates when the temperature drops to zero or the global optimized solution is founded. In this paper, the cooling schedule is a simple linear equation which is the same as [20] (T_{ i+1 } = σT_{ i }, 0 ≤ σ ≤ 1, When σ inclines to 1, the temperature declines only slowly).
Adaptive neighborhood generation
Where scale represents the original neighborhood of the tabu search, CurState is current annealing state, and InitState represents the initial annealing state. So the neighborhood gradually narrows in the process of annealing.
Local adjustment strategy
The vector φ^{ lmin }(θ) − φ(θ) constrains the off-spring of φ(θ) in a cube area, if φ^{ gmin }(θ) is lying in this area, the possibility of finding it will be increasing.
The algorithm
The SATS algorithm is based on the AB off-lattice Model. Just as the followed process illustrates, the algorithm generates a hypotheses list by using the same initial conformation mechanism in [21] (The idea is as follows: Pick out all A-monomers and place them in certain spots in the space, and all B-monomers wrap the hydrophobic core.), and calculates every individual's energy of the list by the AB off-lattice model and stores the individual with the best energy as a temp best solution. Then, start to descend the temperature, during this period, a new list with individuals in a small scale around the individuals of the hypotheses list is produced as neighborhood list. After calculate out the individual energy, the neighborhood list is rear-ranged by the energy of the individual. Select several top individuals of the neighbor-hood list as candidates and use deprecated principle to judge whether to add it to the tabu list or not. As the tabu list refreshed, the local adjust principle is adopted to optimize the elements of the list aimed to find the possible best solutions. If the current temperature is lower than the given stop condition, terminate the algorithm and output the founded best solution. The steps of SATS algorithm are as following.
SATS Begin
Parameter:
neighbourhood_size
candidate_size
initial_temperature
end_temperature
descend_rate
Process:
Step 1: Create hypotheses using the initial conformation mechanism;
Step 2: Generate neighborhood solutions;
Step 3: Calculate the energy value by the off-lattice Model;
Step 4: Select candidate solutions from neighborhood;
Step 5: Use deprecated principle to accept the solution and add it to the Tabu List;
Step 6: Apply the local adjustment strategy;
Step 7: Descend the temperature, if the temperature is greater than the end temperture then go to Step 2;
Step 8: Output the Result.
SATS End.
Results and discussion
The SATS algorithm has been implemented with Python in Windows 7. There are two parts of experiment for searching the optimum energy conformation in the AB off-lattice model.
Results for Fibonacci sequences
The minimum energies obtained by SATS for the short Fibonacci sequences.
SEQUENCE | ENERGY | SEQUENCE | ENERGY |
---|---|---|---|
AAA | -0.65821 | AAAAA | -2.84828 |
AAB | 0.03223 | AAAAB | -1.58944 |
ABA | -0.65821 | AAABA | -2.44493 |
ABB | 0.03223 | AAABB | -0.54688 |
BAB | -0.03027 | AABAA | -2.53170 |
BBB | -0.03027 | AABAB | -1.34774 |
AABBA | -0.92662 | ||
AAAA | -1.67633 | AABBB | 0.04017 |
AAAB | -0.58527 | ABAAB | -1.37647 |
AABA | -1.45098 | ABABA | -2.22020 |
AABB | 0.06720 | ABABB | -0.61680 |
ABAB | 0.64938 | ABBAB | -0.00565 |
ABBA | -0.03617 | ABBBA | -0.39804 |
ABBB | 0.00470 | ABBBB | -0.06596 |
BAAB | 0.06172 | BAAAB | -0.52108 |
BABB | -0.00078 | BAABB | 0.09621 |
BBBB | 0.13974 | BABAB | -0.64803 |
BABBB | -0.18266 | ||
BBABB | -0.24020 | ||
BBBBB | -0.45266 |
The minimum energies obtained by different algorithm for Fibonacci sequences with 13 ≤ N ≤ 55 in 2D.
N | SEQUENCE | E _{ min } | E _{ perm } | E _{ EPSO } | E _{ SAT S } |
---|---|---|---|---|---|
13 | ABBABBABABBAB | -3.224 | -3.217 | -3.294 | -3.294 |
21 | BABABBABABBAB | -5.288 | -5.750 | -6.198 | -6.198 |
BABABBAB | |||||
34 | ABBABBABABBAB | -8.975 | -9.220 | -9.834 | -10.707 |
BABABBABABBABBABABBAB | |||||
55 | BABABBABABBABBABABBABAB | -14.409 | -14.905 | -16.447 | -18.467 |
BABBABABBABBABABBABABBABBABABBAB |
The minimum energies obtained by different algorithm for Fibonacci sequences with 13 ≤ N ≤ 55 in 3D.
N | SEQUENCE | E _{ ELP } | E _{ ACMC } | E _{ CSA } | E _{ LAGA } | E _{ SATS } |
---|---|---|---|---|---|---|
13 | ABBABBABABBAB | -26.498 | -26.507 | -26.471 | -26.498 | -26.507 |
21 | BABABBABABBAB | -52.917 | -51.757 | -52.787 | -52.917 | -52.917 |
BABABBAB | ||||||
34 | ABBABBABABBABBABABBAB | -92.746 | -94.043 | -97.732 | -98.765 | -99.876 |
ABBABBABABBAB | ||||||
55 | BABABBABABBABBABABBAB | -172.696 | -154.505 | -173.980 | -176.542 | -178.986 |
ABBABBABABBABBABABBAB | ||||||
ABBABBABABBAB |
Results for real protein sequences
The four short sequences of real protein.
PDB ID | SEQUENCE |
---|---|
1BXP | MRYYESSLKSYPD |
1BXL | GQVGRQLAIIGDDINR |
1EDP | CSCSSLMDKECVYFCHL |
1EDN | CSCSSLMDKECVYFCHLDIIW |
The lowest optimum energies of the short real protein sequences.
PDB ID | E _{ GAA } | E _{ LAGAA } | E _{ EPSO } | E _{ SATS } |
---|---|---|---|---|
1BXP | -2.24484 | -2.24484 | -4.392713 | -4.42913 |
1BXL | -8.74685 | -8.81260 | -8.847081 | -8.907082 |
1EDP | -5.60713 | -6.64530 | -10.06692 | -11.06572 |
1EDN | -7.09609 | -7.81925 | -11.13420 | -13.15426 |
The lowest optimum energies of the two long real protein sequences.
PDB ID | E _{ PSO } | E _{ SA } | E _{ GAA } | E _{ SATS } |
---|---|---|---|---|
1AGT | -19.61686 | -17.36282 | -19.07243 | -19.50661 |
1AHO | -15.19110 | -14.96127 | -17.93291 | -18.37535 |
Conclusions
This paper has shown that protein folding conformation based on only anfinsen's thermodynamic hypothesis can be feasible by SATS method which combines simulated annealing algorithm and tabu search algorithm. In order to verify the efficiency of the algorithm, 2D off-lattice model and 3D off-lattice model are both adopted by using Fibonacci sequences and real protein sequences respectively. In addition, local adjust strategy is used to improved the accuracy and speed of searching the protein native state. It is obvious that some of our results for lowest energy are better than those of other methods. Therefore, SATS is more effective in solving the protein folding structure problem. In the future, the one of most important work is how to make the algorithm more effective and accuracy for real protein sequence prediction in 3D space. Besides, the AB off-lattice model only considers two kinds of residues and two kinds of interaction energy, so it cannot reflect more important properties of the real protein. Therefore, we should study other models to explore the more interaction energy of protein amino acids.
Notes
Declarations
Acknowledgements
The authors thank the members of Machine Learning and Artificial Intelligence Laboratory, School of Computer Science and Technology, Wuhan University of Science and Technology, for their helpful discussion within seminars. This work was supported in part by Program for Outstanding Young Science and Technology Innovation Teams in Higher Education Institutions of Hubei Province, China (No.T201202), National Natural Science Foundation of China (61273225, 61273303, 61373109).
Declarations
Publication charges for this article have been funded by the Project (No. 61273225) from National Natural Science Foundation of China.
This article has been published as part of BMC Bioinformatics Volume 15 Supplement 15, 2014: Proceedings of the 2013 International Conference on Intelligent Computing (ICIC 2013). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/15/S15.
Authors’ Affiliations
References
- Anfinsen CB: Principles that govern the folding of protein chains. Science. 1973, 181: 223-227. 10.1126/science.181.4096.223.View ArticlePubMedGoogle Scholar
- Lopes HS: Evolutionary algorithms for the protein folding problem: A review and current trends. Studies in Computational Intelligence Springer Berlin. 2008, 151: 297-315. 10.1007/978-3-540-70778-3_12.View ArticleGoogle Scholar
- Dill KA: Theory for the folding and stability of globular proteins. Biochemistry. 1985, 24: 1501-1509. 10.1021/bi00327a032.View ArticlePubMedGoogle Scholar
- Irback A, Sandelin E: Local interactions and protein folding: Model study on the square and triangular lattices. J Chem Phys. 1998, 108 (5): 2245-2250. 10.1063/1.475605.View ArticleGoogle Scholar
- Hart WE: Newman A: Protein structure prediction with lattice models. Handbook of Molecular Biology. 2006, 1-24.Google Scholar
- Irback A, Peterson C, Potthast F, Sommelius O: Local interactions and protein folding: A three-dimensional off-lattice approach. J Chem Phys. 1997, 107: 273-282. 10.1063/1.474357.View ArticleGoogle Scholar
- Stillinger FH, Head-Gordon T, Hirshfel CL: Toy model for protein folding. Physical review. 1993, E48: 1469-1477.Google Scholar
- Stillinger FH: Collective aspects of protein folding illustrated by a toy model. Physical review. 1995, E52: 2872-2877.Google Scholar
- Bachmann M, Arkin H, Janke W: Multicanonical study of coarse-grained off-lattice models for folding heteropolymers. Physical review. 2005, E71: 031906-Google Scholar
- Zhang XL, Wang T, Luo HP: 3d protein structure prediction with genetic tabu search algorithm. BMC systems biology. 2010, 4: 6-10.1186/1752-0509-4-6.View ArticleGoogle Scholar
- Liu J, Wang LH, He LL, Shi F: Analysis of toy model for protein folding based on particle swarm optimization algorithm. ICNC. 2005, 3: 636-645.Google Scholar
- Zhu HB, Pu CD, Lin XL: Protein Structure Prediction with EPSO in Toy Model. Second International Conference on Intelligent Networks and Intelligent Systems. 2009, 673-676.View ArticleGoogle Scholar
- Kim SY, Lee SB, Lee J: Structure optimization by conformational space annealing in an off-lattice protein model. Physical review. 2005, E72: 011916-Google Scholar
- Hsu HP, Mehra V, Grassberger P: Grassberger p. structure optimization in an off-lattice protein model. Physical review. 2003, E68: 037703-Google Scholar
- Zhang XL, Lin XL: Protein folding prediction using an improved genetic-annealing algorithm. The 19th Australian Joint Conference on Artificial Intelligence. 2006, 1196-1200.Google Scholar
- Cucu L, Idoumghar L, Schott R: Proceedings of the 10th IASTED International Conference on Artificial Intelligence and Applications: 15-17 Feb 2010; Innsbruck. 2010, Acta Press, InnsbruckGoogle Scholar
- Kasperski A, Makuchowski M, Zielinski P: A tabu search algorithm for the minmax regret minimum spanning tree problem with interval data. Journal of Heuristics. 2012, 18: 593-625. 10.1007/s10732-012-9200-z.View ArticleGoogle Scholar
- Andrea LV, John L, Jan M: Simulated annealing: Rigorous finite-time guarantees for optimization on continuous domains. Advances in Neural Information Processing Systems. Edited by: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. 2007, 20 (NIPS 2007):Google Scholar
- Gatti CJ, Hughes RE: Optimization of muscle wrapping objects using simulated annealing. Annals of Biomedical Engineering. 2009, 37: 1342-1347. 10.1007/s10439-009-9710-5.View ArticlePubMedGoogle Scholar
- Lin XL, Zhu HB: Structure Optimization by an Improved Tabu Search in the AB Off-lattice Protein Model. First International Conference on Intelligent Networks and Intelligent Systems. 2008, 123-126.View ArticleGoogle Scholar
- Zhang XL, Lin XL, Wan CP: Genetic-Annealing Algorithm for 3D Off-lattice Protein Folding Model. PAKDD Workshops. 2007, 4819: 186-193.Google Scholar
- Liang F: Annealing contour monte carlo algorithm for structure optimization in an off-lattice protein model. J Chem Phys. 2004, 120: 6756-10.1063/1.1665529.View ArticlePubMedGoogle Scholar
- Zhang XL, Lin XL: Effective 3d protein structure prediction with local adjustment genetic-annealing algorithm. Interdiscip Sci Comput Life Sci. 2010, 2: 1-7. 10.1007/s12539-010-0001-5.View ArticleGoogle Scholar
- David WM: Bioinformatics: Sequence and Genome Analysis.Google Scholar
- Wang L, Zhou H: Perspective roles of short-and long-range interactions in protein folding. Wuhan University Journal of Natural Sciences. 2004, 9: 182-187.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.