An ant colony optimisation algorithm for the 2D and 3D hydrophobic polar protein folding problem
- Alena Shmygelska^{1} and
- Holger H Hoos^{1}Email author
https://doi.org/10.1186/1471-2105-6-30
© Shmygelska and Hoos; licensee BioMed Central Ltd. 2005
Received: 16 July 2004
Accepted: 14 February 2005
Published: 14 February 2005
Abstract
Background
The protein folding problem is a fundamental problems in computational molecular biology and biochemical physics. Various optimisation methods have been applied to formulations of the ab-initio folding problem that are based on reduced models of protein structure, including Monte Carlo methods, Evolutionary Algorithms, Tabu Search and hybrid approaches. In our work, we have introduced an ant colony optimisation (ACO) algorithm to address the non-deterministic polynomial-time hard (NP-hard) combinatorial problem of predicting a protein's conformation from its amino acid sequence under a widely studied, conceptually simple model – the 2-dimensional (2D) and 3-dimensional (3D) hydrophobic-polar (HP) model.
Results
We present an improvement of our previous ACO algorithm for the 2D HP model and its extension to the 3D HP model. We show that this new algorithm, dubbed ACO-HPPFP-3, performs better than previous state-of-the-art algorithms on sequences whose native conformations do not contain structural nuclei (parts of the native fold that predominantly consist of local interactions) at the ends, but rather in the middle of the sequence, and that it generally finds a more diverse set of native conformations.
Conclusions
The application of ACO to this bioinformatics problem compares favourably with specialised, state-of-the-art methods for the 2D and 3D HP protein folding problem; our empirical results indicate that our rather simple ACO algorithm scales worse with sequence length but usually finds a more diverse ensemble of native states. Therefore the development of ACO algorithms for more complex and realistic models of protein structure holds significant promise.
Keywords
Background
Ant Colony Optimisation (ACO) is a population-based stochastic search method for solving a wide range of combinatorial optimisation problems. ACO is based on the concept of stigmergy – indirect communication between members of a population through interaction with the environment. An example of stigmergy is the communication of ants during the foraging process: ants indirectly communicate with each other by depositing pheromone trails on the ground and thereby influencing the decision processes of other ants. This simple form of communication between individual ants gives rise to complex behaviours and capabilities of the colony as a whole.
From the computational point of view, ACO is an iterative construction search method in which a population of simple agents ('ants') repeatedly constructs candidate solutions to a given problem; this construction process is probabilistically guided by heuristic information on the given problem instance as well as by a shared memory containing experience gathered by the ants in previous iterations ('pheromone trails'). Following the seminal work by Dorigo et al. [1, 2], ACO algorithms have been successfully applied to a broad range of hard combinatorial problems, including the traveling salesman problem, the graph colouring problem, the quadratic assignment problem and vehicle routing problems (see, e.g., [3–5]).
The research presented in this paper builds on an ACO algorithm first proposed in [6] (and later improved in [7]) for ab-initio protein folding under a widely studied abstract model – the hydrophobic polar (HP) model. In particular, we extend our previous ACO algorithm to the 3D HP model and improve its performance by modifying the subsidiary local search procedure.
The protein folding problem is one of the most challenging problems in computational biology, molecular biology, biochemistry and physics. Even under simplified lattice models, the protein folding problem is non-deterministic polynomial-time hard (NP-hard) [8]. The ab-initio protein folding problem can be broken down into three sub-problems: 1) design of a model (with a desired level of accuracy); 2) definition of an energy function that can effectively discriminate between native and non-native states; and 3) design of a search algorithm that can efficiently find minimal-energy conformations. A number of search (or sampling) methods have been proposed in the literature to solve the protein folding problem, including Monte Carlo algorithms, Evolutionary Algorithms, Tabu Search and hybrid approaches. ACO, which has been very successfully applied to other combinatorial problems, appears to be a very attractive computational method for solving the protein folding problem, since it combines aspects of chain-growth and permutation-based search with ideas closely related to reinforcement learning. These concepts and ideas apply rather naturally to protein folding: By folding from multiple initial folding points, guided by the energy function and experience from previous iterations of the algorithm, an ensemble of promising, low-energy complete conformations is obtained. These conformations are further improved by a subsidiary local search procedure and then evaluated to update the accumulated pheromone values that are used to bias the generation of conformations in future iterations of the algorithm.
In this paper, we ask and address the following questions: Is ACO a competitive method for solving the ab-initio protein folding problem under the 2D and 3D HP models? How does its performance scale with sequence length? What is the role of the parameters of the ACO algorithm for the efficiency of the optimisation process? Which classes of structures (if any) are solved more efficiently by ACO than by any other known algorithms? Finally, it should be noted that our ACO algorithm for this problem is based on very simple design choices, in particular with respect to the solution components reinforced in the pheromone matrix and of the subsidiary local search procedure. We discuss which of the many design choices underlying our algorithm should be reconsidered in order to achieve further performance improvements.
The hydrophobic polar model
Due to the complexity of the protein folding problem, simplified models such as Dill's hydrophobic-polar (HP) model have become one of the major tools for studying protein structure [9]. The HP model is based on the observation that the hydrophobic force is the main force determining the unique native conformation (and hence the functional state) of small globular proteins [9, 10].
The HP Protein Folding Problem can be formally defined as follows: Given an HP sequence s = s_{1} s_{2}...s_{ n }, find an energy-minimising conformation of s, i.e., find c* ∈ C(s) such that E(c*) = min{E(c) | c ∈ C}, where C(s) is the set of all valid conformations for s. It has been proved recently that this problem and several variations of it are NP-hard [8].
Existing 2D and 3D HP protein folding algorithms
A number of well-known heuristic optimisation methods have been applied to the 2D and 3D HP Protein Folding Problem, including Evolutionary Algorithms (EAs) [11–15] and Monte Carlo (MC) algorithms [16–22]. The latter have been found to be particularly robust and effective for finding high-quality solutions to the HP Protein Folding Problem [18].
Besides general optimisation methods, there are other heuristic methods that rely on specific heuristics that are based on intuitions or assumptions about the folding process, such as co-operativity of folding or the existence of a hydrophobic core. Co-operativity is believed to arise from local conformational choices that result in a globally optimal state without exhaustive search [23]. Among these methods are the hydrophobic zipper method (HZ) [23], the contact interactions method (CI) [24], the core-directed chain growth method (CG) [25], and the constraint-based hydrophobic core construction method (CHCC) [26].
The hydrophobic zipper (HZ) strategy developed by Dill et al. is based on the hypothesis that once a hydrophobic contact is formed it cannot be broken, and other contacts are formed in accordance with already folded parts of the chain (co-operativity of folding) [23]. The contact interactions (CI) algorithm by Toma and Toma [24] combines the idea of HZ with a Monte Carlo search procedure that assigns different conformational freedom to the different residues in the chain, and thus allows previously formed contacts to be modified according to their computed mobilities. The core-directed chain growth method (CG) by Beutler and Dill [25] biases construction towards finding a good hydrophobic core by using a specifically designed heuristic function and by approximating the hydrophobic core with a square (in 2D) or a cube (in 3D). The constraint-based hydrophobic core construction method (CHCC) by Yue and Dill [26] is complete, i.e., always guaranteed to find a global optimum; it attempts to find the hydrophobic core with the minimal possible surface area by systematically introducing geometric constraints and by pruning branches of a conformational search tree. A similar, but more efficient complete constraint satisfaction search method has been proposed by Backofen et al. [27] for the more complex face-centred cubic lattice.
An early application of Evolutionary Algorithms to protein structure prediction was presented by Unger and Moult [14, 15]. Their non-standard EA incorporates characteristics of Monte Carlo methods. Currently among the best known algorithms for the HP Protein Folding problem are various Monte Carlo algorithms, including the 'pruned-enriched Rosenbluth method' (PERM) of Grassberger et al. [16, 18]. PERM is a biased chain growth algorithm that evaluates partial conformations and employs pruning and enrichment strategies to explore promising partial solutions.
Other methods for solving protein folding problems include the dynamic Monte Carlo algorithm by Ramakrishnan et al. [21], which introduced long-range moves involving disconnection of the chain, and the evolutionary Monte Carlo (EMC) algorithm by Liang and Wong [19], which works with a population of individuals that each perform Monte Carlo optimisation; a variant of EMC also reinforces certain secondary structures (alpha-helices and beta-sheets).
Finally, Chikenji et al. introduced the multi-self-overlap ensemble (MSOE) Monte Carlo method [17], which considers overlapping chain configurations.
Other Monte Carlo methods that have been particularly useful in off-lattice protein folding include generalised ensemble methods, such as umbrella sampling [28] (with replica exchange sampling [29, 30] being the most common variant) and multi-canonical (entropic) sampling [30, 31]. Replica exchange Monte Carlo (parallel tempering) has also been applied to the off-lattice HP model [32].
Currently, when applied to the square and cubic lattice HP model, none of these algorithms appears to completely dominate the others in terms of solution quality and run-time.
Our ACO algorithm for the 2D and 3D HP protein folding problem
In previous work, we have applied ACO to the 2D HP Protein Folding Problem [6, 7]; in the following, we briefly summarise the main features of our ACO algorithm and the improvements introduced in this work. Details on our ACO framework and the new ACO-HPPFP-3 algorithm developed in the context of this work are given in the 'Methods' section.
Since conformations are rotationally invariant, the position of the first two amino acids can be fixed without loss of generality. Hence, we represent candidate conformations for a protein sequence of length n by a sequence of local structure motifs of length n - 2. For example, the conformation of Sequence S1-1 shown in Figure 1 corresponds to the motif sequence LSLLRRLRLLSLRRLLSL.
During the construction phase, ants fold a protein from an initial folding point by probabilistically adding one amino acid at a time based on the two sources of information: pheromone matrix values τ (which represent previous search experience and reinforce certain structural motifs) and heuristic function values η (which reflect current energy of the considered structural motif); details of this process are given in the 'Methods' section. The relative importance of τ and η is determined by parameters α and β, respectively, whose settings are detailed in the 'Discussion' section. Similar to other ACO algorithms known from the literature, our algorithm for the HP Protein Folding Problem incorporates a local search phase that takes the initially built protein conformation and attempts to optimise its energy further, using probabilistic long-range moves that are described in detail in the 'Methods' section.
Finally, the pheromone update procedure is based on two mechanisms: Uniform pheromone evaporation is modelled by decreasing all pheromone levels by a constant factor ρ (where 0 <ρ ≤ 1), and pheromone reinforcement is achieved by increasing the pheromone levels associated with the local folding motifs used in a fraction of the best conformations (in terms of energy values) obtained during the preceding construction and local search phase. Furthermore, to prevent search stagnation when all of the pheromone is accumulated on very few structural motifs, we introduce an additional renormalisation mechanism for the pheromone levels (controlled by a parameter θ where 0 ≤ θ < 1; details are given in the 'Methods' section).
Different from our previous ACO algorithms for the HP Protein Folding Problem, our new algorithm, ACO-HPPFP-3, supports the 3D HP cubic lattice model in addition to the 2D HP square lattice model. Furthermore, it uses a different iterative improvement strategy, a modified long-range move operator and a less restrictive termination criterion in its local search phase. ACO-HPPFP-3 was used in all ACO experiments described in the following.
Results
To compare ACO-HPPFP-3 with algorithms for the 2D and 3D HP Protein Folding Problem described in the literature, we tested it on a number of standard benchmark instances as well as on two newly created data sets, one of which was obtained by randomly generating amino acid sequences with hydrophobicity value characteristic of globular proteins, while the other consists of biological sequences that were translated into HP strings using a standard hydrophobicity scale. (These new data sets will be described in more detail later in this section.)
Results for standard benchmark instances
2D and 3D HP standard benchmark instances. Benchmark instances for the 2D and 3D HP Protein Folding Problem used in this study with optimal or best known energy values E*. Most instances for 2D and 3D HP can also be found in [44]; Sequence S1-9 (2D) is taken from [45], and the last two instances (2D) are from [21]. H_{ i }and P_{ i }indicate a string of i consecutive H's and P's, respectively; likewise, (s)_{ i }indicates an i-fold repetition of string s.
ID | Length | E* | Protein Sequence |
---|---|---|---|
2D HP | |||
S1-1 | 20 | -9 | (HP) _{2} PH _{2} PHP _{2} HPH _{2} P _{2} HPH |
S1-2 | 24 | -9 | H_{2}(P_{2}H)_{7}H |
S1-3 | 25 | -8 | P_{2}HP_{2}(H_{2}P_{4})_{3}H_{2} |
S1-4 | 36 | -14 | P _{3} H _{2} P _{2} H _{2} P _{5} H _{7} P _{2} H _{2} P _{4} H _{2} P _{2} HP _{2} |
S1-5 | 48 | -23 | P_{2}H(P_{2}H_{2})_{2}P_{5}H_{10}P_{6}(H_{2}P_{2})_{2}HP_{2}H_{5} |
S1-6 | 50 | -21 | H_{2}(PH)_{3}PH_{4}PH(P_{3}H)_{2}P_{4}H(P_{3}H)_{2}PHPH_{4}(HP)_{3}H_{2} |
S1-7 | 60 | -36 | P _{2} H _{3} PH _{8} P _{3} H _{10} PHP _{3} H _{12} P _{4} H _{6} PH _{2} PHP |
S1-8 | 64 | -42 | H_{12}(PH)_{2}(P_{2}H_{2})_{2}P_{2}HP_{2}H_{2}PPH_{2}P_{2}HP_{2}(H_{2}P_{2})_{2}(HP)_{2}H_{12} |
S1-9 | 85 | -53 | H_{4}P_{4}H_{12}P_{6}(H_{12}P_{3})_{3}HP_{2}(H_{2}P_{2})_{2}HPH |
S1-10 | 100 | -50 | P_{3}H_{2}P_{2}H_{4}P_{2}H_{3}(PH_{2})_{2}PH_{4}P_{8}H_{6}P_{2}H_{6}P_{9}HPH_{2}PH_{11}P_{2}H_{3}PH_{2}PHP_{2}HPH_{3}P_{6}H_{3} |
S1-11 | 100 | -48 | P _{6} HPH _{2} P _{5} H _{3} PH _{5} PH _{2} P _{4} H _{2} P _{2} H _{2} PH _{5} PH _{10} PH _{2} PH _{7} p _{11} H _{7} P _{2} HPH _{3} P _{6} HPH _{2} |
3D HP | |||
S2-1 | 48 | -32 | HPH _{2} P _{2} H _{4} PH _{3} P _{2} H _{2} P _{2} HPH _{2} PHPH _{2} P _{2} H _{2} P _{3} HP _{8} H _{2} |
S2-2 | 48 | -34 | H _{4} PH _{2} PH _{5} P _{2} HP _{2} H _{2} P _{2} HP _{6} HP _{2} HP _{3} HP _{2} H _{2} P _{2} H _{3} PH |
S2-3 | 48 | -34 | PHPH_{2}PH_{6}P_{2}HPHP_{2}HPH_{2}(PH)_{2}P_{3}H(P_{2}H_{2})_{2}P_{2}H_{ P }HP_{2}HP |
S2-4 | 48 | -33 | PHPH_{2}P_{2}HPH_{3}P_{2}H_{2}PH_{2}P_{3}H_{5}P_{2}HPH_{2}(PH)_{2}P_{4}HP_{2}(HP)_{2} |
S2-5 | 48 | -32 | P_{2}HP_{3}HPH_{4}P_{2}H_{4}PH_{2}PH_{3}P_{2}(HP)_{2}HP_{2}HP_{6}H_{2}PH_{2}PH |
S2-6 | 48 | -32 | H_{3}P_{3}H_{2}PH(PH_{2})_{3}PHP_{7}HPHP_{2}HP_{3}HP_{2}H_{6}PH |
S2-7 | 48 | -32 | PHP_{4}HPH_{3}PHPH_{4}PH_{2}PH_{2}P_{3}HPHP_{3}H_{3}(P_{2}H_{2})_{2}P_{3}H |
S2-8 | 48 | -31 | PH_{2}PH_{3}PH_{4}P_{2}H_{3}P_{6}HPH_{2}P_{2}H_{2}PHP_{3}H_{2}(PH)_{2}PH_{2}P_{3} |
S2-9 | 48 | -34 | (PH)_{2}P_{4}(HP)_{2}HP_{2}HPH_{6}P_{2}H_{3}PHP_{2}HPH_{2}P_{2}HPH_{3}P_{4}H |
S2-10 | 48 | -33 | PH_{2}P_{6}H_{2}P_{3}H_{3}PHP_{2}HPH_{2}(P_{2}H)_{2}P_{2}H_{2}P_{2}H_{7}P_{2}H_{2} |
Most studies of EA and MC methods in the literature, including [12, 14, 15, 19], report the number of valid conformations scanned during the search. This makes a performance comparison difficult, since run-time spent for backtracking and the checking of partial or infeasible conformations, which may vary substantially between different algorithms, is not accounted for. We therefore compared ACO to the best-performing algorithm from the literature for which performance data in terms of CPU time is available – PERM [18] (we used the most recent implementation, which was kindly provided by P. Grassberger). We note that the most efficient PERM variant for the HP Protein Folding Problem uses an additional penalty of 0.2 for H-P contacts [34]. Since this corresponds to an energy function different from that of the standard HP model underlying our ACO algorithm as well as other algorithms developed in literature, we used the best performing variant of PERM [18] based on the standard energy function in our experiments. It may be noted that the chain growth process in PERM can start from the N- or C-terminus of the given HP sequence, and in many cases, this results in substantial differences in the performance of the algorithm. To capture this effect, we always ran PERM in both directions, and in addition to the respective average run-times, t_{1} and t_{2}, we report the expected time for solving a given problem instance when performing both runs concurrently, t_{ exp }= 2·(1/t_{1} + 1/t_{2})^{-1}. For all runs of PERM, the following parameter settings were used: inverse temperature γ: = 26 and q: = 0.2.
Performance comparison of various algorithms for the 2D HP Protein Folding Problem. Comparison of the solution quality obtained in 2D by the evolutionary algorithm of Unger and Moult (EA) [14], the evolutionary Monte Carlo algorithm of Liang and Wong (EMC) [19], the multi-self-overlap ensemble algorithm of Chickenji et al. (MSOE) [17], the pruned-enriched Rosenbluth method (PERM) and ACO-HPPFP-3. For EA and EMC, the reported energy values are the lowest among five independent runs, and the values in parentheses are the numbers of valid conformations scanned before the lowest energy values were found. Missing entries indicate cases where the respective method has not been tested on a given instance. The CPU times reported in parentheses for MSOE have been determined on a 500 MHz CPU, and those for PERM and ACO-HPPFP-3 are based on 100 – 200 runs per instance on our reference 2.4 GHz Pentium IV machine. The energy values shown in bold face correspond to currently best known solution qualities.
ID | E | GA | EMC | MSOE | PERM t_{1} | PERM t_{2} | PERM t_{ exp } | ACO |
---|---|---|---|---|---|---|---|---|
S1-1 | -9 | -9 (30 492) | -9 (9 374) | -9 (< 1 sec) | -9 (< 1 sec) | -9 (< 1 sec) | -9 (< 1 sec) | |
S1-2 | -9 | -9 (30 491) | -9 (6 929) | -9 (< 1 sec) | -9 (< 1 sec) | -9 (< 1 sec) | -9 (< 1 sec) | |
S1-3 | -8 | -8 (20 400) | -8 (7 202) | -8 (6 sec) | -8 (< 1 sec) | -8 (2 sec) | -8 (< 1 sec) | |
S1-4 | -14 | -14 (301 339) | -14 (12 447) | -14 (< 1 sec) | -14 (< 1 sec) | -14 (< 1 sec) | -14 (4 sec) | |
S1-5 | -23 | -23 (126 547) | -23 (165 791) | -23 (3 min) | -23 (< 1 sec) | -23 (2 sec) | -23 (1 min) | |
S1-6 | -21 | -21 (592 887) | -21 (74 613) | -21 (3 sec) | -21 (3 sec) | -21 (3 sec) | -21 (15 sec) | |
S1-7 | -36 | -34 (208 781) | -35 (203 729) | -36 (7 sec) | -36 (3 sec) | -36 (4 sec) | -36 (20 min) | |
S1-8 | -42 | -37 (187 393) | -39 (564 809) | -39 | -42 (78 hrs) | -42 (78 hrs) | -42 (78 hrs) | -42 (1.5 hrs) |
S1-9 | -53 | -52 (44 029) | -53 (64 sec) | -53 (60 sec) | -53 (1 min) | -53 (20% of runs 1 days) | ||
S1-10 | -50 | -50 (50 hrs) | -50 (50% of runs 1 hrs) | -50 (20 min) | -50 | -49 (12 hrs) | ||
S1-11 | -48 | -47 | -48 (9 min) | -48 (7 min) | -48 (8 min) | -47 (10 hrs) |
Performance comparison of various algorithms for the 3D HP Protein Folding Problem. Comparison of the solution quality obtained in 3D by the hydrophobic zipper (HZ) algorithm [23], the constraint-based hydrophobic core construction method (CHCC) [26], the core-directed chain growth algorithm (CG) [25], the contact interactions (CI) algorithm [24], the pruned-enriched Rosenbluth method (PERM) and ACO-HPPFP-3. For CI, only the best energies obtained are shown. For HZ, CHCC and CG, the reported CPU times are taken from [25]; these are the expected times for finding optimal solutions on a Sparc 1 workstation. In the case of HZ, the reported CPU times are based on an extrapolation from the measured times required for finding suboptimal conformations with the energy values listed here. The CPU times for PERM and ACO-HPPFP-3 were determined on our reference 2.4 GHz Pentium IV machine based on 50 – 100 runs per instance. The energy values shown in bold face correspond to currently best known solution qualities.
ID | E | HZ | CHCC | CG | CI | PERM t_{1} | PERM t_{2} | PERM t_{ exp } | ACO |
---|---|---|---|---|---|---|---|---|---|
S2-1 | -32 | -31(4 hrs) | -32 (30 min) | -32 (9.4 min) | -32 | -32 (0.1 min) | -32 (0.5 min) | -32 (0.2 min) | -32 (30 min) |
S2-2 | -34 | -32 (18 hrs) | -34 (2.3 min) | -34 (35 min) | -33 | -34 (0.3 min) | -34 (48 min) | -34 (0.6 min) | -34 (420 min) |
S2-3 | -34 | -31 (23 hrs) | -34 (30 min) | -34 (62 min) | -32 | -34 (0.1 min) | -34 (4 days) | -34 (0.2 min) | -34 (120 min) |
S2-4 | -33 | -30 (19 days) | -33 (71 min) | -33 (29 min) | -32 | -33 (2 min) | -33 (4 min) | -33 (3 min) | -33 (300 min) |
S2-5 | -32 | -30 (1.3 days) | -32 (32 min) | -32 (12 min) | -32 | -32 (0.5 min) | -32 (19 min) | -32 (1 min) | -32 (15 min) |
S2-6 | -32 | -29 (2.1 days) | -32 (80 min) | -32 (460 min) | -30 | -32 (0.5 min) | -32 (0.1 min) | -32 (0.2 min) | -32 (720 min) |
S2-7 | -32 | -29 (2.5 days) | -32 (110 min) | -32 (64 min) | -30 | -32 (0.5 min) | -32 (2 days) | -32 (1 min) | -32 (720 min) |
S2-8 | -31 | -29 (4 hrs) | -31 (530 min) | -31 (38 min) | -30 | -31 (0.3 min) | -31 (8 min) | -31 (0.6 min) | -31 (120 min) |
S2-9 | -34 | -31(4.5 hrs) | -34 (8.3 min) | -33 | -32 | -34 (5 min) | -34 (10 min) | -34 (7 min) | -34 (450 min) |
S2-10 | -33 | -33 (1.1 hr) | -33 (4.8 min) | -33(1.1 min) | -32 | -33 (0.01 min) | -33 (0.01 min) | -33 (0.01 min) | -33 (60 min) |
Result for new biological and random data sets
To thoroughly test the performance of ACO-HPPFP-3, we created two new data sets of random and biological sequences of length ≈ 30 and ≈ 50 amino acids (ten sequences for each length; for details, see additional data file 1). Random sequences were generated based on the observation that most globular proteins have a fairly uniform amino acid profile, and the percent of hydrophobic residues of majority of globular proteins falls in the range of 40–50% [35]. Thus, the probability of generating character H at each position of a sequence was chosen to be 0.45, and in the remaining cases (i.e., with probability 0.55), we generated a P.
For the biological test-sets, ten sequences were taken from the PDBSELECT data set with homology < 25% from the Protein Data Bank (PDB) in order to obtain a non-redundant representative set of proteins. These protein sequences were translated into HP strings using the hydrophobicity scale classification of RASMOL [36], according to which the following amino acids were considered hydrophobic: Ala, Leu, Val, Ile, Pro, Phe, Met, Trp, Gly and Tyr. Non-standard amino acid symbols, such as X and Z, were skipped in this translation.
As can be seen from these results, in 2D, ACO-HPPFP-3 performs roughly comparably to PERM (PERM'S t_{ exp }was calculated as described in the previous subsection): ACO-HPPFP-3 reaches the same energies as PERM, but on some instances, particularly of length 50, requires more run-time. In 3D, ACO-HPPFP-3 generally requires a comparable amount of run-time on sequences of length 30 and outperforms PERM on one random sequences of length 30, but performs noticeably worse on sequences of length 50 and in some cases does not reach the same energy. We also generated longer sequences of length 75; for these, ACO-HPPFP-3 failed to reach the minimal energy values obtained by PERM in a number of cases. The run-times for both algorithms are reported in detail in Additional file 1; we note that on some sequences, the performance of PERM depends significantly on the direction of folding. Interestingly, there is no significant difference in performance between the biological and random test-sets for either PERM or ACO-HPPFP-3.
In summary, the performance of ACO-HPPFP-3 is comparable with that of PERM (the best known algorithm for the 2D and 3D HP Protein Folding Problem) on biological and random sequences of length 30–50, but worse on longer sequences. This scaling effect is significantly more pronounced in 3D than in 2D. We note that neither ACO-HPPFP-3 nor PERM were optimised for short sequences (n ≤ 30), but by using parameter settings different from the ones specified earlier, the performance of both algorithms can be significantly improved in this case.
Characteristic performance differences between ACO and PERM
The left side of Figure 7 shows an example of a relatively short biological sequence (B50-7, 45 amino acids) with a unique native hydrophobic core in the 2D HP model. (This is rare for HP sequences, which usually have a high ground state and hydrophobic core degeneracy: According to our observations, of the 11 standard benchmark instances in 2D, only Sequences S1-1, S1-3, S1-4 have a unique hydrophobic core; in 3D, none of the sequences studied here have a unique hydrophobic core.) This sequence has no structural nuclei at its ends; instead, the two ends interact with each other. ACO-HPPFP-3 outperforms PERM by a factor of 2 on this sequence in terms of CPU time: using a cut-off time of 1 CPU hour per run, PERM found the optimum with energy -17 in an average run-time of 284.06 CPU seconds (t_{1} = 271 sec, t_{2} = 299 sec), while using the same cut-off time and machine, ACO-HPPFP-3 found the optimum in an average run-time of 130 CPU seconds.
We also designed two additional sequences, D-1 and D-2, of length 50 and 60, respectively, that have a unique native state in which both ends of the sequence interact with each other (see Figure 8). Sequence D-1 also has a structural nucleus near its C-terminus. When testing the performance of PERM and ACO-HPPFP-3 on these sequences, we found that on D-1, ACO-HPPFP-3 requires a mean run-time of 236 CPU seconds, compared to t_{1} = 3 795, t_{2} = 1, t_{ exp }= 2 CPU seconds for PERM (values are based on 100 successful runs). When this sequence was reversed, PERM started folding the sequence from the structural nucleus, and its mean run-time dropped to 1 CPU second. A result similar to that for sequence B50-7 was obtained for Sequence D-2, which has no structural nuclei at the ends, but a native state in which the ends interact with each other. Here, ACO-HPPFP-3 was found to require a mean run-time of 951 CPU seconds (again, mean run-times were obtained from 100 successful runs), compared to t_{1} = 9 257, t_{2} = 19 356, t_{ exp }= 12 525 CPU seconds for PERM; as expected, in this case, reversing the folding order of the sequence did not cause a decrease in PERM'S run-time.
We also analysed native conformations of sequences on which PERM outperforms ACO and observed that the end from which PERM starts folding is relatively compact and forms a structural nucleus in the resulting conformation.
An example of a conformation with the structural nucleus at the beginning of the sequence (near the N-terminus, i.e., residue 1) is shown in the right panel of Figure 7. For this biological sequence (B50-5, 53 amino acids), PERM finds an optimal conformation with an energy of -22 in t_{1} = 5, t_{2} = 118, t_{ exp }= 9 CPU seconds, while the average run-time for ACO-HPPFP-3 is 820 CPU seconds. Our ACO algorithm generally performs worse than PERM on sequences that have structural nuclei at the ends, because it tends to spend substantial amounts of time compacting local regions in the interior of the sequence, while PERM folds more systematically from one end. These observations also hold in 3D, as seen from two random sequences folded in 3D (see Figure 9).
To further investigate our hypothesis, we studied differences between the distributions of native conformations found by ACO-HPPFP-3 and PERM, respectively. For this purpose, we introduced the notion of relative H-H contact order, which captures arrangement of H residues in the core of the folded protein, and thus determines the topology of the conformation (the closely related concept of contact order was first defined in [38]). Relative H-H contact order is defined as follows:
where l is the number of H-H contacts, n is the number of H residues in the sequence, and i and j are interacting H residues that are not neighbours in the chain. Intuitively, CO_{ H-H }specifies the average sequence separation between H-H residues in contact per H in the sequence.
Figure 10 shows cumulative frequency distributions of relative H-H contact order values for sets of native conformations of a 2D (the left panel) and 3D (the right panel) standard benchmark instance, respectively, found by ACO-HPPFP-3 and PERM over 500 independent runs, each of which was terminated as soon as a native conformation had been found. These results show that the ACO algorithm finds a set of native conformations with a wider range of H-H contact order values than PERM; in particular, ACO-HPPFP-3 finds conformations with high relative H-H contact oder as compared to PERM (more distant parts of the chain interact; for example, relative CO_{ H-H }= 0.324 for Sequence S1-7 in 2D and relative CO_{ H-H }= 0.75 for Sequence S2-5 in 3D are not found by PERM; similar results were obtained for other sequences), which further supports our hypothesis that both, in 2D and 3D, PERM is biased toward a more restricted set of native conformations. We performed analogous experiments for the case where PERM is allowed to keep certain statistics from one run to another as in [18] (runs are no longer independent) and found no significant differences in the set of conformations obtained (data not shown).
As seen in Figure 11, ACO-HPPFP-3 is less greedy than PERM, both in 2D (left side) and in 3D (right side), and it tends to leave more lattice sites around H residues accessible for future contacts with other H residues that appear later in the chain. This is also reflected in the mean number of H-H contacts formed when folding prefixes of increasing length; ACO-HPPFP-3 tends to form fewer H-H contacts than PERM for short and medium size prefixes (see Figure 12). By examining the dependence of absolute H-H contact order (defined as , the average sequence separation per H-contact) on prefix length, we furthermore observed that different from PERM, ACO-HPPFP-3 realises the bulk of its local H-H interactions in the middle of the given sequence (see Figure 13). This further confirms that ACO is capable of finding native conformations with structural folding nuclei that are not located at or near the end of a given protein sequence. The results illustrated in Figures 11, 12 and 13 are typical for all 2D and 3D HP instances we studied.
Discussion
Although conceptually rather simple, our ACO algorithm is based on a number of distinct components and mechanisms. A natural question to ask is whether and to which extent each of these contributes to the performance reported in the previous section. A closely related questions concerns the impact of parameter settings on the performance of ACO-HPPFP-3; further details concerning parameters can be found in the 'Methods' section. To address these questions, we conducted several series of experiments. In this context, we primarily used three standard test sequences: Sequence S1-7 of length 60 and Sequence S1-8 of length 64 (long sequences) in 2D, as well as Sequence S2-5 of length 48 in 3D (all standard benchmark sequences for 3D are 48 amino acids in length); these sequences were chosen because the CPU time required to find the best known solutions was sufficiently small to perform a large number of runs (100–200 per instance).
Following the methodology of Hoos and Stützle [39], we measured run-time distributions (RTDs) of our ACO algorithm, which represent the (empirical) probability distribution over the run-time required to reach (or exceed) a given solution quality; the solution qualities used here are the known optima or best known energies for the respective sequences.
Pheromone values and heuristic information
Two important components of any ACO algorithm are the heuristic function, which indicates the desirability of using particular solution components during the construction phase, and the pheromone values, which represent information learned over multiple iterations of the algorithm. Three parameters control the influence of the pheromone information versus heuristic information on the construction of candidate solutions: the relative weight of the pheromone information, α; the relative weight of the heuristic information, β; and the pheromone persistence, ρ (see also 'Methods' section).
Ant colony size and length of local search phase
Our results also indicate that the performance of ACO-HPPFP-3 is more sensitive to the number of non-improving steps than to ant colony size. The optimal value for the maximum number of non-improving steps tolerated (per ant) before the local search phase terminates was found to be around 1 000 for short 2D sequences (n ≤ 50) and around 10 000 for long 2D sequences (n > 50); the latter value also appeared to be optimal for all 3D sequences considered here. This observation follows the intuition that more degrees of freedom, as present for longer sequences and in higher dimensions, require more time for local optimisation, since for any conformation, improving neighbours tend to be rarer and hence harder to find.
Selectivity and persistence of local search
As described in the 'Methods' section, our ACO algorithm uses selective local search, i.e., local search is only performed on a certain fraction of the lowest energy conformations. We observed that ACO-HPPFP-3 is fairly robust with respect to the fraction of conformations to which local search is applied; good performance was obtained for local search selectivity values between 5% and 50%, but performance was found to deteriorate when local search is performed by all ants. Intuitively, similar to colony size, local search selectivity has an impact on search diversification. If too few ants perform local search, insufficient diversification is achieved, which typically leads to premature stagnation of the search process. On the other hand, if local search is performed by too many ants, the resulting substantial overhead in run-time can no longer be amortised by increased search efficiency.
Similarly to selective local search, pheromone update was performed only by a certain fraction of so-called 'elitist ants' whose solution quality after the local search phase is highest within the population. As in the case of local search selectivity, ACO-HPPFP-3 shows robustly high performance for elitist fractions between 1% and 50% (results are not shown here), but performance deteriorates markedly when all ants in the colony are allowed to update the pheromone matrix.
Conclusions
In this work, we have shown that ant colony optimisation (ACO) can be applied in a rather straight-forward way to the 2D and 3D HP Protein Folding Problems. Even though our ACO-HPPFP-3 algorithm is based on very simple structure components (single relative directions) and a simple subsidiary local search procedure (iterative first improvement), it performs fairly well compared to other algorithms and specialised heuristics on the benchmark instances considered here, particularly in 2D. The only non-specialised algorithm that typically performs better than our ACO algorithm, both in 2D and 3D, is PERM. We observed that, particularly in 3D, the run-time required by ACO-HPPFP-3 for finding minimum (or best known) energy conformations scales worse with sequence length than PERM. However, our results show that our ACO algorithm finds a different ensemble of native conformations compared to PERM, and has less difficulty folding sequences whose native states contain structural nuclei located in the middle rather than at the ends of a given sequence, as well as sequences with structures in which the ends interact. We found that two major components of ACO-HPPFP-3 – the pheromone values, which capture experience accumulated over multiple iterations of the search process and from multiple conformations, as well as the heuristic information that provides myopic guidance to the folding process – play a significant role for longer 2D sequences and, to a lesser extent, for 3D sequences, which suggests that in 3D, it may be preferable to associate pheromone values with more complex solution components.
We also found that the subsidiary local search procedure is crucial for the performance of the algorithm; in particular, to ensure that high-quality conformations are obtained, it is very important to allow the local search procedure to run sufficiently long. In an earlier version of our algorithm [7], we used substantially more stringent termination criteria, which forced us to additionally use non-greedy local search (probabilistic iterative improvement, which accepts worsening steps) in addition to the greedy local search procedure used here. The results presented in this study indicate that by using a new and simpler local search procedure, ACO-HPPFP-3 achieves better performance; this is probably due to the fact that the new local search procedure is based on a type of long-range move that leads to a larger effective search neighbourhood.
We have shown that all components of our ACO algorithms contribute to its performance. In particular, its performance is affected by the following components and parameters (listed in the order of decreasing impact): pheromone values, termination criterion for local search, persistence of long-range moves, ant colony size, pheromone persistence, heuristic function, selectivity of local search, and selectivity of pheromone update (i.e., fraction of elitist ants).
Because of its ability to find more balanced ensembles of minimum (or close to minimum) energy conformations, our new ACO algorithm can greatly facilitate investigations of the topology and location of structural nuclei, which we plan to undertake in future work. Finally, while HP protein folding problems are of considerable interest because of their conceptual simplicity, ultimately, most applications of protein folding algorithms require the use of more realistic models of protein structure. Since it does not rely on heuristics and properties that are specific to the HP model and yet performs very well on this restrictive, but not entirely unrealistic abstract model, we believe that relatively straight-forward extensions of our ACO algorithm to more complex and realistic models of protein structure hold significant promise.
Methods
Our new ACO algorithm, ACO-HPPFP-3, iterates construction, local search, and pheromone update phases until a termination condition is satisfied; in the context of this work, we mostly terminated the algorithm upon reaching a given energy threshold. In the following, we describe the three search phases in detail.
Construction phase, pheromone and heuristic values
During the construction phase of ACO-HPPFP-3, each ant first determines a starting point within the given protein sequence; this is done by uniform random choice. From this starting point, the sequence is folded in both directions, adding one residue at a time. Each ant performs probabilistic chain-growth construction of the protein conformation, where in every step, the structure is extended either to the left or to the right, such that the ratio of unfolded residues at each end of the protein remains (roughly) unchanged.
Here, we assume that folding is performed in 3D (the 2D case is handled analogously by considering three relative directions {S, L, R} instead of five {S, L, R, U, D}, see also [6]). The relative directions in which the conformation is extended in each construction step are determined probabilistically based on a heuristic function η_{i,d}and pheromone values τ_{i,d}, according to the formula:
The pheromone values τ_{i,d}indicate the desirability of using the local structure motif with relative direction d ∈ {S, L, R, U, D} at sequence position i. Initially, all τ_{i,d}are equal, such that local structure motifs are chosen in an unbiased way; but throughout the search process, the pheromone values are updated to bias folding towards the use of local motifs that occur in low-energy structures (the updating mechanism will be described in more detail later). The heuristic values η_{i,d}are based on the energy function E. They are defined according to the Boltzman distribution as η_{i,d}: = , where γ is a parameter called the inverse temperature (as in [18]), and h_{i,d}is the number of new H-H contacts achieved by placing amino acid i at the position specified by direction d.
During construction, it may happen that the chain cannot be extended without running into itself. This situation is called attrition, and our algorithm overcomes it as follows: First, starting at the end at which attrition occurred, half of the sequence that has been folded up to this point is unfolded. Then, this segment of the chain is refolded; the first residue (i.e., the last one that was unfolded) is placed such that its relative direction differs from what it had been when attrition occurred, while all of the subsequent residues are folded in a feasible direction that is chosen uniformly at random. This backtracking mechanism is particularly important for longer protein sequences in 2D, where infeasible conformations are frequently encountered during the construction phase.
Local search
The local search phase is based on a long-range mutation move that has been designed to avoid infeasible conformations. It also has a number of important advantages over the more commonly used point mutation moves or Monte Carlo moves (i.e., the end, crankshaft and corner moves [40]): It is easy to implement; it decreases the number of infeasible conformations encountered, even when the protein is very compact (at high densities); it considers a larger neighbourhood that subsumes the single point mutation neighbourhood; and it has some validity in terms of the physical processes taking place during the protein folding process. Similar attempts have been previously undertaken, but these involved disconnection of the chain [21].
From studies of protein folding dynamics, it is known that proteins display a broad range of motions that range from localised motions to slow large-scale movements [37]. Inspired by this complex process, we designed a long-range mutation move that starts by selecting a residue whose relative direction is randomly mutated and then adapts the rest of the chain by probabilistically changing relative directions starting from this initial position [7]. During this adaptation, for each residue, with a probability (0 ≤ ≤ 1) its previous relative direction, if it is still feasible, is left unchanged, and otherwise (i.e., with probability 1 - , or if the previous direction has become infeasible), a different relative direction is chosen, where the probability for each direction d is proportional to the corresponding heuristic value η_{i,d}. Formally, this can be written as follows:
where P[d_{ i }: = ] is the probability of choosing direction as the relative direction d_{ i }at sequence position i. Unlike in our previous implementation [7], the local search phase of our new ACO algorithm is a simple iterative first improvement procedure that is based on this long-range mutation move. The outline of this local search procedure is shown in Figure 18. Iterative first improvement accepts a new conformation generated via long-range mutation only if the solution quality of a new conformation c' improves over the current solution quality (energy) of c. This search process is greedy in the sense that it does not allow worsening steps, and it is terminated when no improving steps have been found after a specific number of scans through the chain (this number is a parameter of the algorithm). Since this local search procedure has a relatively high time-complexity, in each iteration of ACO-HPPFP-3 it is only applied to a certain fraction of the highest-quality conformations constructed by the ants in the preceding construction phase.
Update of the pheromone values
After each construction and local search phase pheromones are updated according to
τ_{i,d}:= ρ·τ_{i,d}, (4)
where 0 <ρ ≤ 1 is the pheromone persistence, a parameter that determines how much of the information gathered in previous iterations is retained. Subsequently, selected ants with low-energy conformations update the pheromone values according to
τ_{i,d}:= τ_{i,d}+ Δ_{i,d,c}, (5)
where Δ_{i,d,c}is the relative solution quality of the given ant's candidate conformation c if that conformation contains local structure motif d at sequence position i, and zero otherwise.
As a further mechanism for preventing search stagnation, we use an additional renormalisation of the pheromone values that is conceptually similar to the method used in MAX - MIN Ant System [41]: After the standard pheromone updates according to Equations 3 and 4, all τ values are normalised such that ∑_{d∈{S,L,R,U,D}}τ_{i,d}= 1 for every residue i; additionally, whenever for a given sequence position i the minimal normalised pheromone value (min_{d∈{S,L,R,U,D}}τ_{i,d})/(∑_{d∈{S,L,R,U,Dr}}τ_{i,d}) falls below a threshold θ (which is a parameter of the algorithm), the minimal τ_{i,d}value is set to θ, while the maximal τ_{i,d}value is decreased by θ - min_{d∈{S,L,R,U,D}}τ_{i,d}. (If there is more than one minimal τ_{i,d}value, all of these are increased to θ, and if there is more than one maximal τ_{i,d}value, one of them is chosen uniformly at random.) This guarantees that the probability of selecting an arbitrary local structure motif for the corresponding sequence position does not become arbitrarily small, and hence ensures the probabilistic approximate completeness of our algorithm (see [42]).
Implementation details and availability
ACO-HPPFP-3 has been implemented in C++ and compiled using gcc (version 3.3.3) for the Linux operating system; a Linux executable is available from http://www.cs.ubc.ca/labs/beta/Projects/ACO-HPPFP.
Declarations
Acknowledgements
This work has been supported by an NSERC Postgraduate Scholarship (PGS-A) held by AS and by HH's NSERC Individual Research Grant #238788. We thank Peter Grassberger for kindly providing us with his implementation of PERM and for very useful feedback on earlier versions of this paper. We also thank the anonymous reviewers for helpful suggestions.
Authors’ Affiliations
References
- Dorigo M, Maniezzo V, Colorni A: Positive feedback as a search strategy. Tech rep., 91–016, Dip Elettronica, Politecnico di Milano, Italy 1991.Google Scholar
- Dorigo M, Maniezzo V, Colorni A: The Ant System: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics-Part B 1996, 26: 29–41. 10.1109/3477.484436View ArticleGoogle Scholar
- Dorigo M, Di Caro G: New Ideas in Optimization. In New Ideas in Optimization. Edited by: Corne D, Dorigo M, Glover F. McGraw-Hill; 1999.Google Scholar
- Dorigo M, Di Caro G, Gambardella LM: Ant Algorithms for Discrete Optimization. Artificial Life 1999, 5(2):137–172. 10.1162/106454699568728View ArticlePubMedGoogle Scholar
- Dorigo M, Stützle T: Ant Colony Optimization. The MIT Press; 2004.View ArticleGoogle Scholar
- Shmygelska A, Hernandez R, Hoos HH: An Ant Colony Optimization Algorithm for the 2D HP Protein Folding Problem. In Proc of the 3rd Intl Workshop on Ant Algorithms, ANTS LNCS 2463. Springer Verlag; 2002:40–52.Google Scholar
- Shmygelska A, Hoos HH: An Improved Ant Colony Optimisation Algorithm for the 2D HP Protein Folding Problem. In Proc of the 16th Canadian Conference on Artificial Intelligence, LNCS 2671. Springer Verlag; 2003:400–17.Google Scholar
- Unger R, Moult J: Finding the lowest Free-Energy Conformation of a protein is an NP-hard problem – Proof and Implications. Bull Math Biol 1993, 55(6):1183–1198.View ArticlePubMedGoogle Scholar
- Lau KF, Dill KA: lattice statistical mechanics model of the conformation and sequence space of proteins. Macromolecules 1989, 22: 3986–3997. 10.1021/ma00200a030View ArticleGoogle Scholar
- Richards FM: Areas, volumes, packing, and protein structures. Annu Rev Biophys Bioeng 1977, 6: 151–176. 10.1146/annurev.bb.06.060177.001055View ArticlePubMedGoogle Scholar
- Krasnogor N, Pelta D, Lopez PM, Mocciola P, de la Canal E: Genetic algorithms for the protein folding problem: a critical view. In Proc of Engineering of Intelligent Systems. Edited by: Alpaydin C. ICSC Academic Press; 1998:353–360.Google Scholar
- Krasnogor N, Hart WE, Smith J, Pelta DA: Protein structure prediction with evolutionary algorithms. Proc of the Genetic and Evolutionary Computation conference 1999, 1596–1601.Google Scholar
- Patton AWP, Goldman E: A standard GA approach to native protein conformation prediction. In Proc of the 6th Intl Conf Genetic Algorithms. Morgan Kaufmann Publishers; 1995:574–581.Google Scholar
- Unger R, Moult J: Genetic algorithms for protein folding simulations. J of Mol Biol 1993, 231: 75–81. 10.1006/jmbi.1993.1258View ArticleGoogle Scholar
- Unger R, Moult J: A genetic algorithm for three dimensional protein folding simulations. In Proc of the 5th Intl Conf on Genetic Algorithms. Morgan Kaufmann Publishers; 1993:581–588.Google Scholar
- Bastolla U, Fravenkron H, Gestner E, Grassberger P, Nadler W: Testing a New Monte Carlo algorithm for the protein folding problem. Proteins 1998, 32: 52–66.View ArticlePubMedGoogle Scholar
- Chikenji G, Kikuchi M, Iba Y: Multi-Self-Overlap Ensemble for protein folding: ground state search and thermodynamics. Condensed Materials Archive 1999, 27.Google Scholar
- Hsu HP, Mehra V, Nadler W, Grassberger P: Growth Algorithm for Lattice Heteropolymers at Low Temperatures. J Chem Phys 2003, 118: 444–51. 10.1063/1.1522710View ArticleGoogle Scholar
- Liang F, Wong WH: Evolutionary Monte Carlo for protein folding simulations. J Chem Phys 2001, 115(7):3374–3380. 10.1063/1.1387478View ArticleGoogle Scholar
- O'Toole EM, Panagiotopoulos AZ: Monte Carlo simulation of folding transitions of simple model proteins using a chain growth algorithm. J Chem Phys 1992, 97(11):8644–8652. 10.1063/1.463383View ArticleGoogle Scholar
- Ramakrishnan R, Ramachandran B, Pekny JF: A dynamic Monte Carlo algorithm for exploration of dense conformational spaces in heteropolymers. J Chem Phys 1997, 106(6):2418–2424. 10.1063/1.473791View ArticleGoogle Scholar
- Sali A, Shakhnovich E, Karplus M: How does a protein fold? Nature 1994, 369: 248–251. 10.1038/369248a0View ArticlePubMedGoogle Scholar
- Dill KA, Fiebig KM, Chan HS: Cooperativity in Protein-Folding Kinetics. Proc Natl Acad Sci USA 1993, 90: 1942–1946.PubMed CentralView ArticlePubMedGoogle Scholar
- Toma L, Toma S: Contact interactions method: A new algorithm for protein folding simulations. Protein Sci 1996, 5: 147–153.PubMed CentralView ArticlePubMedGoogle Scholar
- Beutler T, Dill K: A fast conformational search strategy for finding low energy structures of model proteins. Protein Sci 1996, 5: 2037–2043.PubMed CentralView ArticlePubMedGoogle Scholar
- Yue K, Dill KA: Forces of Tertiary Structural Organization in Globular Proteins. Proc Natl Acad Sci USA 1995, 92: 146–150.PubMed CentralView ArticlePubMedGoogle Scholar
- Backofen R, Will S: A Constraint-Based Approach to Structure Prediction for Simplified Protein Models that Outperforms Other Existing Methods. Proc XIX Intl Conf on Logic Programming 2003, 49–71.Google Scholar
- Torrie GM, Valleau JP: Nonphysical sampling distributions in MC free energy estimation: Umbrella sampling. J Comput Phys 1977, 23: 187–199. 10.1016/0021-9991(77)90121-8View ArticleGoogle Scholar
- Gront D, Kolinski A, Skolnick J: Comparison of three Monte Carlo conformational search strategies for a proteinlike homopolymer model: Folding thermodynamics and identification of low-energy structures. J Chem Phys 2000, 113(12):5065–5071. 10.1063/1.1289533View ArticleGoogle Scholar
- Mitsutake A, Sugita Y, Okamoto Y: Replica-exchange multicanonical and multicanonical replica-exchange Monte Carlo simulations of peptides. I. Formulation and benchmark test. J Chem Phys 2003, 118(14):6664–6675. 10.1063/1.1555847View ArticleGoogle Scholar
- Berg BA, Neuhaus T: Multicanonical ensemble: A new approach to simulate first-order phase transitions. Phys Rev Lett 1992, 68: 9–12. 10.1103/PhysRevLett.68.9View ArticlePubMedGoogle Scholar
- Irbäck A: Dynamic-parameter algorithms for protein folding. In Monte Carlo Approach to Biopolymers and Protein Folding. Edited by: Grassberger P, Barkema GT, Nadler W,. World Scientific, Singapore; 1998:98–109.Google Scholar
- Backofen R, Will S, Clote P: Algorithmic approach to quantifying the hydrophobic force contribution in protein folding. Proc of the 5th Pacific Symposium on Biocomputing 2000, 92–103.Google Scholar
- Hsu HP, Mehra V, Nadler W, Grassberger P: Growth-based Optimisation Algorithm for Lattice Heteropolymers. Phys Rev E 2003, 68: 021113–1-021113–4.Google Scholar
- Nandi T, B-Rao C, Ramachandran S: Comparative Genomics using Data Mining tools. J Bioscience 2002, 27: 15–25.View ArticleGoogle Scholar
- Sayle R, Milner-White EJ: RASMOL – Biomolecular Graphics for All. Trends Biochem Sci 1995, 20(9):374–376. 10.1016/S0968-0004(00)89080-5View ArticlePubMedGoogle Scholar
- Creighton TE: Protein Folding. W H Freeman and Company; 1992.Google Scholar
- Plaxco KW, Simons KT, Baker D: Contact order, transition state placement and the refolding rates of single domainproteins. J Mol Biol 1998, 277: 985–994. 10.1006/jmbi.1998.1645View ArticlePubMedGoogle Scholar
- Hoos HH, Stützle T: On the empirical evaluation of Las Vegas algorithms. In Proc of the 14th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers; 1998:238–245.Google Scholar
- Sali A, Shakhnovich E, Karplus M: Kinetics of protein folding – A lattice model study of the requirements for folding tothe native state. J Mol Biol 1994, 235: 1614–1636. 10.1006/jmbi.1994.1110View ArticlePubMedGoogle Scholar
- Stützle T, Hoos HH: MAX-MIN Ant System. Future Generation Computer Systems 2000, 16(8):889–914. 10.1016/S0167-739X(00)00043-1View ArticleGoogle Scholar
- Hoos HH, Stützle T: Stochastic Local Search: Foundations and Applications. Morgan Kaufmann Publishers /Elsevier; 2004.Google Scholar
- Parkes A, Walser JP: Tuning Local Search for Satisfiability Testing. In Proc of the Applications of Artificial Intelligence Conf. MIT Press; 1996:356–362.Google Scholar
- HP Benchmarks[http://www.cs.sandia.gov/tech_reports/compbio/tortilla-hp-benchmarks.html]
- Konig R, Dandekar T: Improving Genetic Algorithms for Protein Folding simulations by systematic crossover. Biosystems 1999, 50: 17–25. 10.1016/S0303-2647(98)00090-2View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.