A replica exchange Monte Carlo algorithm for protein folding in the HP model
- Chris Thachuk^{1},
- Alena Shmygelska^{2} and
- Holger H Hoos^{3}Email author
https://doi.org/10.1186/1471-2105-8-342
© Thachuk et al; licensee BioMed Central Ltd. 2007
Received: 28 February 2007
Accepted: 17 September 2007
Published: 17 September 2007
Abstract
Background
The ab initio protein folding problem consists of predicting protein tertiary structure from a given amino acid sequence by minimizing an energy function; it is one of the most important and challenging problems in biochemistry, molecular biology and biophysics. The ab initio protein folding problem is computationally challenging and has been shown to be $\mathcal{N}\mathcal{P}$-hard even when conformations are restricted to a lattice. In this work, we implement and evaluate the replica exchange Monte Carlo (REMC) method, which has already been applied very successfully to more complex protein models and other optimization problems with complex energy landscapes, in combination with the highly effective pull move neighbourhood in two widely studied Hydrophobic Polar (HP) lattice models.
Results
We demonstrate that REMC is highly effective for solving instances of the square (2D) and cubic (3D) HP protein folding problem. When using the pull move neighbourhood, REMC outperforms current state-of-the-art algorithms for most benchmark instances. Additionally, we show that this new algorithm provides a larger ensemble of ground-state structures than the existing state-of-the-art methods. Furthermore, it scales well with sequence length, and it finds significantly better conformations on long biological sequences and sequences with a provably unique ground-state structure, which is believed to be a characteristic of real proteins. We also present evidence that our REMC algorithm can fold sequences which exhibit significant interaction between termini in the hydrophobic core relatively easily.
Conclusion
We demonstrate that REMC utilizing the pull move neighbourhood significantly outperforms current state-of-the-art methods for protein structure prediction in the HP model on 2D and 3D lattices. This is particularly noteworthy, since so far, the state-of-the-art methods for 2D and 3D HP protein folding – in particular, the pruned-enriched Rosenbluth method (PERM) and, to some extent, Ant Colony Optimisation (ACO) – were based on chain growth mechanisms. To the best of our knowledge, this is the first application of REMC to HP protein folding on the cubic lattice, and the first extension of the pull move neighbourhood to a 3D lattice.
Background
The ab initio protein folding problem concerns the prediction of the three dimensional functional state, i.e., the native fold, of a protein given only its sequence information. A successful method for solving this problem would have far reaching implications in many fields including structural biology, genetics and medicine. Current laboratory techniques for protein structure determination are both costly and time consuming. In the current era of high throughput sequencing, it is infeasible to rely exclusively on time and labor-intensive experimental structure determination techniques, such as X-ray crystallography and nuclear magnetic resonance, for characterizing the protein products of newly discovered genes; there is a clear need for effective and efficient computational protein structure prediction programs. However, even for simplified protein models that use lattices to discretize the conformational space, the ab initio protein structure prediction problem has been shown to be $\mathcal{N}\mathcal{P}$-hard [1–3], and a polynomial-time algorithm is therefore unlikely to exist.
One of the most prevalently studied abstractions of the ab initio protein structure prediction problem is Dill's hydrophobic polar (HP) model. Many algorithms have been formulated to address the protein folding problem using two dimensional (2D) and three dimensional (3D) HP models on a variety of lattices (see, e.g., [4–12]). In this study, we restrict our attention to those HP models that embed all protein folds into the 2D square lattice or the 3D cubic lattice. Many of these algorithms can be classified primarily as construction based (or chain growth) algorithms, which determine folds by sequentially placing residues onto the lattice. Among these, the pruned enriched Rosenbluth method (PERM) [13] has been particularly successful in finding optimal conformations for standard benchmark sequences in both 2D and 3D. PERM is a Monte Carlo based chain growth algorithm that iteratively constructs partial conformations; it is heavily based on mechanisms for pruning unfavourable folds and for enriching promising partial conformations, to facilitate their further exploration.
Despite being one of the most successful algorithms for ab initio protein structure prediction in the 2D and 3D HP models, PERM – like all other currently known algorithms for this problem – is not dominant in every instance. In the work of Shmygelska and Hoos [9] it was shown that PERM has great difficulty folding proteins which have a hydrophobic core located in the middle and not at one of the ends of the sequence, as is the case when the core is formed from interacting termini. We note that an earlier version of PERM [14], capable of initiating search at non-terminus positions, was previously proposed and may be more effective in folding these types of sequences. However, to the best of our knowledge, no comparison has been made with the most recent version of PERM or other protein folding algorithms.
Shmygelska and Hoos proposed an ant colony optimization algorithm, ACO-HPPFP-3, which employs both construction and local search phases on complete conformations [9]. Ant Colony Optimisation (ACO) is a population-based stochastic search method for solving a wide range of combinatorial optimisation problems. ACO is based on the concept of stigmergy – indirect communication between members of a population through interaction with the environment. From the computational point of view, ACO is an iterative construction search method in which a population of simple agents ('ants') repeatedly constructs candidate solutions to a given problem instance; this construction process is probabilistically guided by heuristic information on the problem instance as well as by a shared memory containing experience gathered by the ants in previous iterations of the search process ('pheromone trails') [15]. The ACO-HPPFP-3 algorithm combines a relatively straight-forward application of the general ACO method to the 2D and 3D HP protein structure prediction problem with specific local search procedures that are used to optimize the conformations constructed by the ants.
In the 2D case, ACO-HPPFP-3 was shown to be competitive with PERM on many benchmark instances and dominant on proteins whose hydrophobic core is located in the middle of the sequence. Other attempts at the problem use local search methods on complete conformations, including the GTabu algorithm [7]. This method utilizes the generic tabu search algorithm from the Human Guided Search (HuGS) framework [16]. GTabu was shown to find conformations with the lowest known energy for several benchmark instances in the 2D case. This was primarily made possible by using a newly introduced neighbourhood consisting of so-called pull moves, which is also utilized in our work.
In addition to PERM, many other Monte Carlo algorithms have been devised to address the problem of ab initio protein structure prediction using lattice models [17–20]. A class of Monte Carlo methods known as generalized ensemble algorithms have been shown to be particularly effective for more complex lattices and for the off-lattice case [5, 21–24]. Classical Monte Carlo search methods for protein structure prediction typically sample conformations according to the Boltzmann distribution in energy space. In generalized ensemble algorithms, random walks in other dimensions, such as temperature, can also be realized. This is the case for replica exchange Monte Carlo (REMC) algorithms, which maintain many independent replicas of potential solutions, i.e., protein conformations. Each replica is set at a different temperature and locally runs a Markov process sampling from the Boltzmann distribution in energy space. A random walk in temperature space is achieved by periodic exchanges of conformations at neighbouring temperatures. REMC appears to have been discovered independently by various researchers [25–28] and is also known as parallel tempering, multiple Markov chain Monte Carlo and exchange Monte Carlo search. REMC has been shown to be highly effective in high dimensional search problems with rugged landscapes containing many local minima. Initially this was demonstrated in an application to spin glass systems [26, 29]. REMC has also been applied to the off-lattice protein folding problem [21–24, 30–34]. Furthermore, it was previously used for folding proteins on the 2D square lattice in a study by Irbäck [23] and to the face-centred cubic lattice in the work of Gront et al [5]. However, to the best of our knowledge, no extensive study of the REMC algorithm in the HP model on the cubic lattice has been undertaken. The remainder of this paper is structured as follows. First, we formally introduce the hydrophobic polar model and describe in detail the two search neighbourhoods (move sets) utilized later in this work. Next, we present the general REMC method followed by the three instantiations we have developed for the 2D and 3D HP protein folding problem. Then, we report results from a comparative empirical performance analysis of our new algorithms vs PERM and ACO-HPPFP-3. The respective computational experiments are run on standard benchmark instances as well as on two new sequence sets, which we introduced to evaluate the performance of REMC when folding long sequences and sequences which have a provably unique optimal structure. We also report results from experiments involving proteins with termini interacting to form a hydrophobic core. Next, we compare the performance of our new REMC algorithms with that of GTabu. A discussion follows, in which we report empirical results regarding the effects of various parameters on the performance of our new algorithms. We close with a high-level summary of our major findings and a brief discussion of potential future work.
The hydrophobic polar model
In this model, we search for a conformation c* that minimizes the objective energy function E(c_{ i }). Such a conformation is considered a solution and is also called a ground-state conformation of the given protein sequence. However, many instances of the HP protein folding problem exhibit solution degeneracy, i.e., have more than one minimum-energy conformation. In this sense, our definition of ground-state conformation does not imply a unique solution, but simply one that satisfies the following equation:
E(c*) = min{E(c_{ i }) | c_{ i }∈ C_{ s }}
Although ground-state structures in this model typically do not closely resemble the known native conformations of the respective proteins, a close correspondence has been observed in some cases [37]; this is particularly true for higher resolution lattices such as the face-centred cubic lattice.
Generally, simplified models, such as the ones considered here, are widely considered to be useful in studying certain aspects of protein folding and structure prediction, including the formation of conformations exhibiting a hydrophobic core [38, 39].
Search neighbourhoods
Local search methods (including REMC and simple Monte Carlo search) are based on the idea of iteratively improving a given candidate solution by exploring its local neighbourhood. In the protein folding problem as it is presented here, the neighbourhood of a conformation can be thought to consist of slight perturbations of the respective structure. The neighbourhoods (move sets) used in solving this problem specify a perturbation as a feasible change from a current conformation c at time t to a valid conformation at time t + 1. Thus, the neighbourhood of a conformation c is a set of valid conformations c' that are obtained by applying a specific set of perturbations to c. In this study we consider two such neighbourhoods, the so-called VSHD moves and pull move neighbourhoods, for both, the 2D and 3D HP models.
VSHD moves
VSHD moves, as we will refer to them in this study, appeared early on in the simulation of polymer chains by Verdier and Stockmayer [40]. In this early work, only single residue moves were used, and the single residue end and corner moves were introduced. That work was later critiqued in a study by Hilhorst and Deutch [18], which also introduced the two residue crankshaft move. Gurler et al. combined all three types of moves into one search neighbourhood [41], which we call the VSHD neighbourhood.
End moves
Corner moves
A corner move can potentially be performed on any residue excluding the end residues. For a corner move to be possible, the two connected neighbours of some residue i must be mutually adjacent to another, unoccupied position on the lattice. Note that for both, the 2D square and the 3D cubic lattices, any two residues i - 1 and i + 1 can share at most one adjacent lattice position. When this situation occurs, a corner is formed by residues i - 1, i and i + 1. If the mutually adjacent position is empty, residue i can be moved to it. This is illustrated in Figure 2b for the 2D case. Overall, in 2D as well as in 3D, there are at most n - 2 possible corner moves for any conformation of a n-residue chain.
Crankshaft moves
A crankshaft move can occur if some residue i is part of a u-shaped bend in the chain, as shown in Figure 2c. Referring to this figure, the crankshaft move can be performed in 2D if positions i' and i + 1' are empty. Crankshaft moves in 2D always involve a 180° rotation of a u-shaped structure consisting of four connected neighbours on the chain. The 3D case is handled analogously, except that the motif is rotated by either 90° or -90°, provided the appropriate positions are empty. (If both rotations are feasible, one of them is chosen uniformly at random). Note that in the figure, the same crankshaft move can be initiated from residue i and i + 1.
Pull moves
Pull moves have been introduced relatively recently by Lesh et al. [7], who used them in the context of a generic tabu search algorithm for the 2D HP protein folding problem. In the following, we will briefly introduce the central idea behind this type of move. For a formal treatment of the pull move neighbourhood and the proof of its completeness (i.e., the fact that any two valid sequence conformations on the 2D square lattice can be transformed into each other by a sequence of pull moves), the reader is directed to the original paper by Lesh et al. [7].
The simplest case occurs when C is occupied by residue i - 1, in which case the entire move consists of moving residue i to location L. Note that this move, which is illustrated in Figure 3a, is equivalent to the previously introduced corner move. When C is not occupied by residue i - 1, i is moved to L and i - 1 is moved to C. If residue i - 2 is adjacent to position C, this second operation completes the pull move. This case is illustrated in Figure 3b.
If, however, residue i - 2 is adjacent to position C, the chain is still not in a valid conformation at this point, and in this case, the following procedure is used. Using the notation by Lesh et al. [7], starting with residue j = i - 2, let (x_{ j }(t + 1), y_{ j }(t + 1)) = (x_{j+2}(t), y_{j+2}(t)) until a valid conformation has been found or residue 1 has been moved. Informally speaking, residues are successively pulled into positions that have just been vacated (as a consequence of pulling another residue) until a valid conformation has been obtained or one end of the chain is reached. Figure 3c illustrates this situation where residues i to i - 3 were pulled successively, until the valid conformation shown on the right was obtained. Note that pull moves have been described as pulling from residue i down to residue 1, if needed. Pulling in the opposite direction is equivalent and also valid.
When they introduced pull moves, Lesh et al. claimed that the resulting neighbourhood could be generalized to the 3D case. However, to the best of our knowledge no algorithm implementing pull moves for the 3D case has been published. For the 2D case, valid choices of L and C are restricted to a single plane. The generalization to 3D can consider choices of L and C in any plane containing both i and i + 1; in the case of the 3D cubic lattice, there are two such planes. In our study presented here, we have implemented this generalization of pull moves in the context of a standard REMC algorithm, which will be described in the following.
Replica exchange Monte Carlo search
In the following, we provide a brief introduction to replica exchange Monte Carlo search. For an in-depth description of the algorithm including its historical aspects, the reader is referred to the review of extended ensemble Monte Carlo algorithms by Iba [42], which also provides details related to simulated tempering [43] and replica Monte Carlo search [44].
Replica exchange Monte Carlo (REMC) search maintains χ independent replicas of a potential solution. Each of the χ replicas has an associated temperature value (T_{1}, T_{2},..., T_{ χ }). Each temperature value is unique and the replicas are numbered such that T_{1} <T_{2} < ... <T_{ χ }. In our description of the algorithm, we will label the χ conformations maintained by the algorithm at any given time with the replica numbers (1 ,... χ,) and always associate temperature T_{ j }with replica j (for all j such that 1 ≤ j ≤ χ). Thus, the exchange of replicas is equivalent to (and is commonly implemented as) the swap of replica labels.
where ΔE : = E(c') - E(c) is the difference in energy between conformations c' and c, and T denotes the temperature of the replica.
The value Δ is the product of the energy difference and inverse temperature difference:
Δ : = (β_{ j }- β_{ i })(E(c_{ i }) - E(c_{ j })).
where ${\beta}_{i}=\frac{1}{{T}_{i}}$ is the inverse of the temperature of replica i.
Potential replica exchanges are only performed between neighbouring temperatures, since the acceptance probability of the exchange drops exponentially as the temperature difference between replicas increases.
Our REMC algorithms
Details of our implementation of REMC search are presented in the 'Methods' section. We have experimented with three variants of the REMC algorithm for both the 2D and 3D case, which differ only in the neighbourhoods used in the subsidiary Monte Carlo local search procedure. REMC_{ vshd }folds protein sequences using exclusively the VSHD neighbourhood. Likewise, REMC_{ pm }is based on the pull move neighbourhood. Our third variant, REMC_{ m }, makes use of a hybrid neighbourhood that allows both, pull moves and VSHD moves to be performed; more precisely, in each local search step, the pull move neighbourhood will be used with probability ρ (where ρ is a configurable parameter of the algorithm) and otherwise, the VSHD neighbourhood will be used.
Results
To evaluate the performance of our REMC algorithms we directly compared results against those for two state-of-the-art folding algorithms, ACO-HPPFP-3 and PERM. In the same manner in which the parameters for REMC remain fixed for all experiments, the PERM and ACO-HPPFP-3 parameters have been fixed to the values suggested by their authors. The parameter values for ACO-HPPFP-3 have been taken from Shmygelska and Hoos [9], and those for PERM were optimized by P. Grassberger and his group and pre-configued in the code kindly provided to us. For all runs of PERM, the parameter settings β : = 26 and q : = 0.2 were used [13].
In our experiments we conducted a number of runs with a given energy or CPU time cut-off on a standard set of benchmark instances for both the 2D and 3D HP protein folding problems. Furthermore, several new benchmark sets were created to evaluate the performance of REMC on long, biologically inspired sequences as well as on sequences with provably unique optimal conformations. A direct comparison between ACO-HPPFP-3 and PERM has been previously reported by Shmygelska and Hoos [9]. In this earlier work it has been shown through experiments on artificially designed as well as on known biological sequences that PERM has inherit difficulties with folding proteins where the termini interact in the formation of the hydrophobic core. Here, we performed analogous experiments to determine the performance differences between ACO-HPPFP-3, PERM and our REMC algorithms for these cases. We further tested our 2D algorithms using the pull move neighbourhood, REMC_{ m }and REMC_{ pm }, against the first algorithm based on this neighbourhood, GTabu, by means of a computational experiment analogous to that performed by Lesh et al. [7].
Results for standard benchmark sequences
Standard benchmark sequences
ID | Length | E* | Protein Sequence |
---|---|---|---|
2D HP | |||
S1-1 | 20 | -9 | (HP)_{2}PH_{2}PHP_{2}HPH_{2}P_{2}HPH |
S1-2 | 24 | -9 | H_{2}(P_{2}H)_{7}H |
S1-3 | 25 | -8 | P_{2}HP_{2}(H_{2}P_{4})_{3}H_{2} |
S1-4 | 36 | -14 | P_{3}H_{2}P_{2}H_{2}P_{5}H_{7}P_{2}H_{2}P_{4}H_{2}P_{2}HP_{2} |
S1-5 | 48 | -23 | P_{2}H(P_{2}H_{2})_{2}P_{5}H_{10}P_{6}(H_{2}P_{2})_{2}HP_{2}H_{5} |
S1-6 | 50 | -21 | H_{2}(PH)_{3}PH_{4}PH(P_{3}H)_{2}P_{4}H(P_{3}H)_{2}PH_{4}(PH)_{4}H |
S1-7 | 60 | -36 | P_{2}H_{3}PH_{8}P_{3}H_{10}PHP_{3}H_{12}P_{4}H_{6}PH_{2}PHP |
S1-8 | 64 | -42 | H_{12}(PH)_{2}(P_{2}H_{2})_{2}P_{2}HP_{2}H_{2}PPH_{2}P_{2}HP_{2}(H_{2}P_{2})_{2}(HP)_{2}H_{12} |
S1-9 | 85 | -53 | H_{4}P_{4}H_{12}P_{6}(H_{12}P_{3})_{3}HP_{2}(H_{2}P_{2})_{2}HPH |
S1-10 | 100 | -50 | P_{3}H_{2}P_{2}H_{4}P_{2}H_{3}(PH_{2})_{2}PH_{4}P_{8}H_{6}P_{2}H_{6}P_{9}HPH_{2}PH_{11}P_{2}H_{3}PH_{2}PHP_{2}HPH_{3}P_{6}H_{3} |
S1-11 | 100 | -48 | P_{6}HPH_{2}P_{5}H_{3}PH_{5}PH_{2}P_{4}H_{2}P_{2}H_{2}PH_{5}PH_{10}PH_{2}PH_{7}P_{11}H_{7}P_{2}HPH_{3}P_{6}HPH_{2} |
3D HP | |||
S2-1 | 48 | -32 | HPH_{2}P_{2}H_{4}PH_{3}P_{2}H_{2}P_{2}HPH_{3}PHPH_{2}P_{2}H_{2}P_{3}HP_{8}H_{2} |
S2-2 | 48 | -34 | H_{4}PH_{2}PH_{5}P_{2}HP_{2}H_{2}P_{2}HP_{6}HP_{2}HP_{3}HP_{2}H_{2}P_{2}H_{3}PH |
S2-3 | 48 | -34 | PHPH_{2}PH_{6}P_{2}HPHP_{2}HPH_{2}(PH)_{2}P_{3}H(P_{2}H_{2})_{2}P_{2}HPHP_{2}HP |
S2-4 | 48 | -33 | PHPH_{2}P_{2}HPH_{3}P_{2}H_{2}PH_{2}P_{3}H_{5}P_{2}HPH_{2}(PH)_{2}P_{4}HP_{2}(HP)_{2} |
S2-5 | 48 | -32 | P_{2}HP_{3}HPH_{4}P_{2}H_{4}PH_{2}PH_{3}P_{2}(HP)_{2}HP_{2}HP_{6}H_{2}PH_{2}PH |
S2-6 | 48 | -32 | H_{3}P_{3}H_{2}PH(PH_{2})_{3}PHP_{7}HPHP_{2}HP_{3}HP_{2}H_{6}PH |
S2-7 | 48 | -32 | PHP_{4}HPH_{3}PHPH_{4}PH_{2}PH_{2}P_{3}HPHP_{3}H_{3}(P_{2}H_{2})_{2}P_{3}H |
S2-8 | 48 | -31 | PH_{2}PH_{3}PH_{4}P_{2}H_{3}P_{6}HPH_{2}P_{2}H_{2}PHP_{3}H_{2}(PH)_{2}PH_{2}P_{3} |
S2-9 | 48 | -34 | (PH)_{2}P_{4}(HP)_{2}HP_{2}HPH_{6}P_{2}H_{3}PHP_{2}HPH_{2}P_{2}HPH_{3}P_{4}H |
S2-10 | 48 | -33 | PH_{2}P_{6}H_{2}P_{3}H_{3}PHP_{2}HPH_{2}(P_{2}H)_{2}P_{2}H_{2}P_{2}H_{7}P_{2}H_{2} |
Every run was performed independently with a unique random seed. In the 2D case, for sequences of length n ≤ 50, 500 independent runs were performed; for 50 <n ≤ 64, 100 runs; and for n > 64, 20 runs. In the case of 3D, 100 independent runs were performed for each sequence. Results for ACO-HPPFP-3 and PERM were taken from the study of Shmygelska and Hoos [9], which used the same experimental environment and protocol. Expected run-times for PERM are computed as ${t}_{exp}=2\cdot {\left(\frac{1}{{t}_{1}}+\frac{1}{{t}_{2}}\right)}^{-1}$, where t_{1} and t_{2} are the average run-times when folding from the N-terminus and C-terminus of the given protein sequence, respectively; as noted by Shmygelska and Hoos, the performance of PERM often varies substantially between folding directions [9].
Results on 2D benchmark sequences
ID | E* | ${\text{PERM}}_{{t}_{exp}}$ | ACO-HPPFP-3 | REMC_{ vshd } | REMC_{ pm } | REMC_{ m } |
---|---|---|---|---|---|---|
S1-1 | -9 | -9 (< 1 sec) | -9 (< 1 sec) | -9 (< 1 sec) | -9 (< 1 sec) | -9 (< 1 sec) |
S1-2 | -9 | -9 (< 1 sec) | -9 (< 1 sec) | -9 (< 1 sec) | -9 (< 1 sec) | -9 (< 1 sec) |
S1-3 | -8 | -8 (2 sec) | -8 (2 sec) | -8 (< 1 sec) | -8 (< 1 sec) | -8 (< 1 sec) |
S1-4 | -14 | -14 (< 1 sec) | -14 (4 sec) | -14 (15 sec) | -14 (< 1 sec) | -14 (< 1 sec) |
S1-5 | -23 | -23 (2 sec) | -23 (1 min) | -23 (91% of runs 18 min) | -23 (< 1 sec) | -23 (< 1 sec) |
S1-6 | -21 | -21 (3 sec) | -21 (15 sec) | -21 (98% of runs 19 min) | -21 (< 1 sec) | -21 (< 1 sec) |
S1-7 | -36 | -36 (4 sec) | -36 (20 min) | -34 (33% of runs 33 min) | -36 (10 sec) | -36 (13 sec) |
S1-8 | -42 | -42 (78 hrs) | -42 (1.5 hrs) | -35 (11% of runs 40 min) | -42 (5 sec) | -42 (6 sec) |
S1-9 | -53 | -53 (1 min) | -53 (20% of runs 1 day) | -50 (5% of runs 19 min) | -53 (2 min) | -53 (38 sec) |
S1-10 | -50 | -50 (20 min) | -49 (12 hrs) | -46 (5% of runs 41 min) | -50 (3.5 min) | -50 (8 min) |
S1-11 | -48 | -48 (8 min) | -47 (10 hrs) | -46 (5% of runs 97 min) | -48 (1 min) | -48 (1.2 min) |
Results on 3D benchmark sequences
ID | E* | ${\text{PERM}}_{{t}_{exp}}$ | ACO-HPPFP-3 | REMC_{ vshd } | REMC_{ pm } | REMC_{ m } |
---|---|---|---|---|---|---|
S2-1 | -32 | -32 (0.1 min) | -32 (30 min) | -32 (0.75 min) | -32 (0.1 min) | -32 (0.1 min) |
S2-2 | -34 | -34 (0.3 min) | -34 (420 min) | -34 (8.1 min) | -34 (0.2 min) | -34 (0.2 min) |
S2-3 | -34 | -34 (0.1 min) | -34 (120 min) | -34 (3.3 min) | -34 (0.1 min) | -34 (0.1 min) |
S2-4 | -33 | -33 (2 min) | -33 (300 min) | -33 (2.2 min) | -33 (0.2 min) | -33 (0.1 min) |
S2-5 | -32 | -32 (0.5 min) | -32 (15 min) | -32 (1.2 min) | -32 (0.1 min) | -32 (0.1 min) |
S2-6 | -32 | -32 (0.1 min) | -32 (720 min) | -32 (1.5 min) | -32 (0.1 min) | -32 (0.1 min) |
S2-7 | -32 | -32 (0.5 min) | -32 (720 min) | -32 (3.9 min) | -32 (0.4 min) | -32 (0.3 min) |
S2-8 | -31 | -31 (0.3 min) | -31 (120 min) | -31 (2.3 min) | -31 (0.2 min) | -31 (0.1 min) |
S2-9 | -34 | -34 (5 min) | -34 (450 min) | -34 (14 min) | -34 (0.7 min) | -34 (0.9 min) |
S2-10 | -33 | -33 (0.01 min) | -33 (60 min) | -33 (2 min) | -33 (0.1 min) | -33 (0.1 min) |
REMC_{ pm }and REMC_{ m }also outperform other algorithms found in the literature. Shmygelska and Hoos compared PERM and ACO-HPPFP-3 against other methods with previously published results on the standard benchmark sets [9]. For the 2D square lattice, this comparison included the genetic algorithm of Unger and Moult [11], the evolutionary Monte Carlo algorithm of Liang and Wong [8], and the multi-self-overlap ensemble algorithm of Chikenji et al. [47]. Furthermore, a previous application of replica exchange Monte Carlo search (parallel tempering) on the 2D square lattice [23] has failed to reach ground-state configurations for the two largest standard benchmark sequences (here referred to as S1-10 and S1-11) [47]. For the 3D cubic lattice, the hydrophobic zipper algorithm [49], the constraint-based hydrophobic core construction method [37], the core-directed chain growth algorithm [46] and the contact interactions algorithm [10] were considered. Considering these previously published results in combination with the results reported here, REMC_{ pm }and REMC_{ m }both perform better than any of the earlier methods mentioned above in terms of the energy of the conformations found or the CPU time required for reaching a given energy level (where differences in CPU speed are taken into account).
Due to their superior performance, only the REMC_{ pm }and REMC_{ m }algorithms were considered in the remainder of our study.
Characteristic performance of REMC
Results for biological and designed sequences
ID | E* | ${\text{PERM}}_{{t}_{1}}$ | ${\text{PERM}}_{{t}_{2}}$ | ${\text{PERM}}_{{t}_{exp}}$ | ACO-HPPFP-3 | REMC_{ pm } | REMC_{ m } |
---|---|---|---|---|---|---|---|
B50-5 | -22 | 5 sec | 118 sec | 9 sec | 820 sec | 6 sec | 5 sec |
B50-7 | -17 | 271 sec | 299 sec | 284 sec | 130 sec | 1 sec | 2 sec |
D-1 | -19 | 3 795 sec | 1 sec | 2 sec | 236 sec | 1 sec | 1 sec |
D-2 | -17 | 9 257 sec | 19 356 sec | 12 524 sec | 951 sec | 44 sec | 41 sec |
For B50-7, REMC beats all variants of PERM by more than two orders of magnitude. As B50-7 involves significant interaction between both termini, the folding direction of PERM appears to be inconsequential. This is not the case for D-1. When folding from the C-terminus, PERM has no difficulty folding the sequence within 1 CPU second, as a significant part of the C-terminus forms the hydrophobic core of this protein. This performance is matched by both REMC algorithms. However, when folding from the N-terminus, PERM requires a mean run-time of over one CPU hour. The D-2 sequence is highly symmetric in its core formation. PERM reports the worst run-times of all algorithms in this instance with a mean run-time of over 2.5 CPU hours in the best case. This is more than 200 times worse than either of the REMC algorithms. Overall, these results clearly indicate that, compared to PERM, REMC is much more effective in finding low-energy structures whose termini interact to form hydrophobic cores.
where l is the number H-H contacts, n is the number of hydrophobic residues, and i and j are hydrophobic residues in contact that are not neighbours in the chain. This measure can be employed to compare the quantity and diversity of structures returned by one or more algorithms. Since identical conformations have the same relative H-H contact order value, the number of unique structures in a set is bounded from below by the number of unique contact order values. Furthermore, a larger range of relative contact order values is indicative of a more diverse set of structures.
In the 3D case, the REMC algorithms also find a more diverse set of ground-state structures than ACO-HPPFP-3 and PERM. REMC_{ m }and REMC_{ pm }return 82 and 83 unique values, respectively, compared to 74 found by ACO-HPPFP-3 and 69 by PERM. Furthermore, REMC finds conformation with larger relative contact order values than ACO-HPPFP-3 and PERM; the largest values are 0.789 for REMC_{ m }, 0.776 for REMC_{ m }, 0.75 for ACO-HPPFP-3 and 0.737 for PERM.
Results for longer sequences
To evaluate how REMC's performance scales with sequence length, a new, biologically motivated test set was constructed. All sequences were taken from the Protein Data Bank and have length between 200 and 250 residues at a sequence similarity of less than 10%. Sequences were translated into HP strings based on the RASMOL hydrophobicity classification scale [50], except for non-standard amino acid symbols, such as X and Z, which were skipped (the same protocol has been previously used by Shmygelska and Hoos [9]). The resulting HP sequences are listed in Supplemental Table 2 [see Additional file 1]. As ACO-HPPFP-3 scaled poorly with sequence length on the benchmark sequences compared with PERM and REMC, it has been omitted from this evaluation. PERM was run in both directions for each instance.
In the 3D case (Figure 5, right side), the performance difference is more pronounced. REMC finds better conformations on average and in the best case for every sequence. Considering the best conformations found among the ten independent runs for each algorithm and for each initial direction in the case of PERM, REMC_{ m }reaches significantly lower energies; the same holds with respect to average energy values. REMC_{ pm }achieves similar performance in all cases (results not shown).
Results for sequences with unique ground-state conformations
Results on stable Z-structures
ID | E* | ${\text{PERM}}_{{t}_{exp}}$ | REMC_{ pm } | REMC_{ m } |
---|---|---|---|---|
Z-4 | -3 | -3 (< 1 sec) | -3 (< 1 sec) | -3 (< 1 sec) |
Z-8 | -7 | -7 (< 1 sec) | -7 (< 1 sec) | -7 (< 1 sec) |
Z-12 | -11 | -11 (< 1 sec) | -11 (< 1 sec) | -11 (< 1 sec) |
Z-16 | -15 | -15 (3 sec) | -15 (< 1 sec) | -15 (< 1 sec) |
Z-20 | -19 | -19 (51 min) | -19 (< 1 sec) | -19 (< 1 sec) |
Z-24 | -23 | -23 (49 hrs†) | -23 (< 1 sec) | -23 (< 1 sec) |
Z-28 | -27 | -26 | -27 (< 1 sec) | -27 (< 1 sec) |
Z-32 | -31 | -29 | -31 (< 1 sec) | -31 (< 1 sec) |
Z-36 | -35 | -31 | -35 (1 sec) | -35 (< 1 sec) |
Z-40 | -39 | -34 | -39 (2 sec) | -39 (1 sec) |
Results on stable L0-structures
ID | E* | ${\text{PERM}}_{{t}_{exp}}$ | REMC_{ pm } | REMC_{ m } |
---|---|---|---|---|
L0-1 | -4 | -4 (< 1 sec) | -4 (< 1 sec) | -4 (< 1 sec) |
L0-2 | -8 | -8 (< 1 sec) | -8 (< 1 sec) | -8 (< 1 sec) |
L0-3 | -12 | -12 (< 1 sec) | -12 (< 1 sec) | -12 (< 1 sec) |
L0-4 | -16 | -16 (32 sec) | -16 (7 sec) | -16 (5 sec) |
L0-5 | -20 | -20 (3 hrs†) | -20 (1.1 min) | -20 (55 sec) |
L0-6 | -24 | -23 | -24 (16 min) | -24 (13 min) |
L0-7 | -28 | -26 (33 sec) | -28 (3.2 hrs) | -28 (2.5 hrs) |
L0-8 | -32 | -30 (3 min) | -32 (50 hrs) | -32 (16 hrs) |
L0-9 | -36 | -34 (22 min) | -35 (99 hrs) | -35 (100 hrs) |
L0-10 | -40 | -38 (40 min) | -38 (9.6 hrs) | -39 (100 hrs) |
Results on stable L1-structures
ID | E* | ${\text{PERM}}_{{t}_{exp}}$ | REMC_{ pm } | REMC_{ m } |
---|---|---|---|---|
L1-1-3 | -16 | -16 (120 sec) | -16 (6 sec) | -16 (6 sec) |
L1-2-2 | -16 | -16 (57 sec) | -16 (3 sec) | -16 (2 sec) |
L1-3-1 | -16 | -16 (28 sec) | -16 (3 sec) | -16 (3 sec) |
L1-1-5 | -24 | -24 (100 hrs†) | -24 (17 min) | -24 (14 min) |
L1-2-4 | -24 | -24 (49 hrs†) | -24 (9 min) | -24 (7 min) |
L1-3-3 | -24 | -23 | -24 (7 min) | -24 (5 min) |
L1-4-2 | -24 | -24 (49 hrs†) | -24 (5 min) | -24 (4 min) |
L1-3-7 | -40 | -38 | -38 (20 hrs) | -38 (20 hrs) |
L1-5-5 | -40 | -38 | -38 (16 hrs) | -38 (14 hrs) |
L1-8-2 | -40 | -38 | -38 (16 hrs) | -38 (14 hrs) |
Comparison with GTabu
The variants of REMC utilizing pull moves significantly outperform REMC_{ vshd }for the 2D and 3D HP models. This clearly demonstrates the effectiveness of the pull move neighbourhood. To address the question to which extent the REMC search strategy contributes to the excellent performance of REMC_{ pm }and REMC_{ m }, we compared the performance of these algorithms with that of GTabu, the first algorithm for the HP model to use pull moves. In their paper describing GTabu and pull moves, Lesh et al. reported performance results for the standard benchmark sequences S1-8 to S1-11 [7]. To ensure the comparability of results, we used the same experimental protocol as Lesh et al. for evaluating our REMC algorithms on these sequences. Two hundred independent runs were performed for each sequence and the rate of successful completion (i.e., fraction of runs in which the best known energy was reached) after 30 minutes and 60 minutes was reported.
Lesh et al. pointed out that the performance of their implementation of GTabu could be improved by a factor of 2 to 5 through relatively straightforward optimizations. Furthermore, GTabu was evaluated on different hardware (based on the 1000 MHz Alpha processor). Therefore, optimistically assuming our hardware is three times faster and GTabu performance could be improved by a factor of five, we also report run-times for GTabu if it were faster by a factor of fifteen.
Discussion
In all experiments reported so far, the parameters of the REMC algorithms have remained fixed at the values listed in the 'Methods' section. These parameter sets were chosen separately for the 2D case and for the 3D case using the automatic algorithm configuration tool of Hutter et al. [53], which performs a local search in parameter space to optimize a given performance criterion (here: mean run-time). Attempts to manually configure the algorithm parameters failed to produce settings as robust as those found by the automated tool. Experiments using manually tuned parameter configurations yielded performance results that were biased in favour of either short or long sequences.
To better understand the impact of parameter settings on the performance of our REMC algorithms, we performed a series of additional experiments, in which we varied one parameter at a time, while leaving all others fixed at their default values (as specified in the 'Methods' section), i.e., (ϕ, τ_{ min }, τ_{ max }, χ, ρ) : = (500, 160, 220, 5, 0.4) in 2D and (ϕ, τ_{ min }, τ_{ max }, χ, ρ) : = (500, 160, 220, 2, 0.5) in 3D, where ϕ is the number of local steps in a Monte Carlo search, τ_{ min }, and τ_{ max }, are the minimum and maximum temperature values respectively, χ is the number of replicas to simulate and ρ is the probability of performing a pull move.
Two test sequences were selected from the standard benchmark set for this purpose. The first sequence, S1-7, was selected for the 2D case as it does not involve significant interaction of the sequence termini in hydrophobic core formation. We did not choose a sequence with a symmetric optimal fold, such as S1-8, since in that case, REMC always appeared to find an optimal conformation fast (compared to the time required for solving other sequences of similar length), irrespectively of the parameter settings used. For the 3D case, sequence S2-7 was chosen. Neither sequence demands extensive CPU time to solve, therefore 100 independent runs were conducted for each parameter value being evaluated. In the following, we always report the mean run-time required for reaching the target energy level. Results for REMC_{ vshd }have been omitted, because they are always significantly inferior to those achieved by REMC_{ m }and REMC_{ pm }.
Number of replicas
A particularly important parameter of any REMC algorithm is the number of replicas, i.e., the number of conformations on which Monte Carlo search is concurrently performed. In the literature, the use of χ : = $\sqrt{N}$ replicas has been suggested, where N is the number of degrees of freedom within the system [42].
Temperature distribution
Number of MC steps
The parameter φ specifies the number of Monte Carlo steps performed by each replica between any two (proposed) replica exchanges. To determine the effect of this parameter on the run-time of our REMC algorithms, we conducted experiments using a number of values ranging from 5 to 5000 MC steps between replica exchanges.
Probability of performing pull moves
Conclusion
In this work we have demonstrated the effectiveness of an extended ensemble algorithm, replica exchange Monte Carlo search, when applied to the protein structure prediction problem for the HP model on the two dimensional square lattice and the three dimensional cubic lattice. A direct comparison with two state-of-the-art algorithms, ACO-HPPFP-3 and PERM, on a standard set of benchmark sequences has shown that when using the pull move neighbourhood, REMC performs exceptionally well. In the 2D case, REMC ties or outperforms ACO-HPPFP-3 on every problem instance we studied. Furthermore, the performance of REMC_{ m }matches or exceeds that of PERM on ten out of the eleven benchmark instances.
In 3D, we have shown that REMC outperforms ACO-HPPFP-3 on all commonly studied benchmark instances. Moreover, REMC variants based on pull moves find ground-state conformations as fast or faster than PERM for nine of ten instances and with a mean run-time of 0.1 CPU seconds on the remaining instance (which PERM solves in 0.01 CPU seconds on average).
We have demonstrated that in the context of REMC search, using pull moves – as opposed to VSHD moves only – results in substantial performance improvements. We have also shown that by combining pull moves and VSHD moves into a hybrid search neighbourhood, better (and more robust) performance can be obtained in some cases. At the same time, the use of REMC search also contributes to the overall effectiveness of our new algorithms, as can be seen from the fact that our REMC algorithms using pull moves performs better than the GTabu algorithms, which is also based on the pull move neighbourhood. While GTabu introduced pull moves on the square lattice, (to the best of our knowledge) our study is the first to use pull moves on a 3-dimensional lattice model.
REMC proved to be very effective in folding proteins whose hydrophobic cores are formed by interacting termini, such as S1-8 from the standard benchmark set – a class of sequences that are very difficult for PERM. Similarly, we have shown that REMC finds ground-state conformations for sequences with provably unique optimal structures more effectively than PERM. We also presented evidence indicating that when applied to sequences with degenerate ground-states, REMC finds a larger and more diverse set of ground-state conformations in both 2D and 3D. Finally, we have demonstrated that REMC performs better than PERM on long biological sequences (in 2D and 3D), which suggests that REMC's performance scales quite well with sequence length. We expect, however, that for very long sequences it may be beneficial to use a chain-growth method to generate a compact conformation from which REMC search is started. Overall, we have demonstrated that REMC algorithms using the pull move neighbourhood show excellent performance on the HP model. Considering the generality of REMC and the possibility of adapting the concept of pull moves to more complex lattice structures, we see much promise in developing similar algorithms for models that can represent protein structure more accurately.
Methods
In this section, specific details of our algorithm, experimental protocol and experimental environment are listed.
Naming of problem instances
We have followed the naming conventions of problem instances established by Shmygelska and Hoos [9]. The instances with unique ground-state conformations were named analogously. For the long biologically-inspired sequences we retained the respective Protein Data Bank identification codes.
Details of our REMC algorithm
In order to demonstrate the effectiveness of the REMC algorithms without prior knowledge of problem instances, we have fixed parameters across all experiments including long sequences. For the 2D case, we use the parameter configuration (ϕ, τ_{ min }, τ_{ max }, χ, ρ) : = (500, 160, 220, 5, 0.4); in 3D, (ϕ, τ_{ min }, τ_{ max }, χ, ρ) : = (500, 160, 220, 2, 0.5) where ϕ is the number of local steps in a Monte Carlo search, τ_{ min }, and τ_{ max }, are the minimum and maximum temperature values respectively, χ is the number of replicas to simulate and ρ is the probability of performing a pull move. In the case of REMC_{ vshd }, where pull moves are not considered, we use ρ : = 0.0; likewise, ρ : = 1.0 is used in the case of REMC_{ pm }. A detailed description of these parameters and their effects on run-time can be found in the 'Discussion' section. The REMC algorithms are always run on one processor and have not been parallelized.
Experimental protocol
In all experiments, runs were conducted independently and with unique random seeds. All runs were terminated immediately upon discovering the best known energy of some sequence, or when a pre-specified maximum run-time was exceeded in the case of fixed time runs. When less than 100% of runs were successful, i.e.. reached the target energy level, the expected run-time was calculated as detailed by Parkes and Walser [54] as ${t}_{exp}:={t}_{s}+\left(\frac{1}{sr}-1\right)\cdot {t}_{f}$, where t_{ s }and t_{ f }are the mean run-time of successful and unsuccessful runs, respectively, and sr is the ratio of successful to unsuccessful runs. All timing of runs was performed measuring CPU time and started with the first search step.
Experimental environment
All experiments were performed on PCs with 2.4Ghz Pentium IV processors with 256KB cache and 1GB of RAM, running SUSE Linux version 9.2, unless explicitly stated otherwise.
Implementation
All versions of our REMC protein folding algorithms were coded in C++ and compiled using g++ (GCC version 3.3.3). The source code is freely available under the GNU General Public License (GPL) and can be downloaded from our project website [55].
Declarations
Acknowledgements
We would like to thank Neil Lesh, Michael Mitzenmacher and Sue Whitesides for providing us with their GTabu code, and Peter Grassberger for providing us with an implementation of PERM. CT was funded by the CIHR/MSFHR Bioinformatics Training Program for Health Research. This research was also supported by funding from the Mathematics of Information Technology and Complex Systems (MITACS) Network of Centres of Excellence held by HH. We would like to thank the anonymous reviewers for their helpful comments.
Authors’ Affiliations
References
- Berger B, Leighton T: Protein folding in the hydrophobic-hydrophilic (HP) is NP-complete. Proceedings of the second annual international conference on Computational molecular biology. 1998, 5 (1): 27-40.Google Scholar
- Crescenzi P, Goldman D, Papadimitriou C, Piccolboni A, Yannakakis M: On the complexity of protein folding. Proceedings of the second annual international conference on Computational molecular biology. 1998, 61-62.Google Scholar
- Hart W, Istrail S: Robust proofs of NP-hardness for protein folding: general lattices and energy potentials. Journal of Computational Biology. 1997, 4: 1-22.View ArticlePubMedGoogle Scholar
- Grassberger P: Pruned-enriched Rosenbluth method: Simulations of θ polymers of chain length up to 1 000 000. Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics. 1997, 56 (3): 3682-3693.Google Scholar
- Gront D, Kolinski A, Skolnick J: A new combination of replica exchange Monte Carlo and histogram analysis for protein folding and thermodynamics. The Journal of Chemical Physics. 2001, 115 (3): 1569-1574.View ArticleGoogle Scholar
- Konig R, Dandekar T: Improving genetic algorithms for protein folding simulations by systematic crossover. Biosystems. 1999, 50: 17-25.View ArticlePubMedGoogle Scholar
- Lesh N, Mitzenmacher M, Whitesides S: A complete and effective move set for simplified protein folding. RECOMB '03: Proceedings of the seventh annual international conference on Research in computational molecular biology. 2003, New York, NY, USA: ACM Press, 188-195.View ArticleGoogle Scholar
- Liang F, Wong WH: Evolutionary Monte Carlo for protein folding simulations. The Journal of Chemical Physics. 2001, 115: 3374-3380.View ArticleGoogle Scholar
- Shmygelska A, Hoos H: An ant colony optimisation algorithm for the 2D and 3D hydrophobic polar protein folding problem. BMC Bioinformatics. 2005, 6: 30-PubMed CentralView ArticlePubMedGoogle Scholar
- Toma L, Toma S: Contact interactions method: A new algorithm for protein folding simulations. Protein Science. 1996, 5: 147-153.PubMed CentralView ArticlePubMedGoogle Scholar
- Unger R, Moult J: Genetic Algorithms for Protein Folding Simulations. Journal of Molecular Biology. 1993, 231: 75-81.View ArticlePubMedGoogle Scholar
- Unger R, Moult J: Genetic Algorithm for 3D Protein Folding Simulations. Proceedings of the 5th International Conference on Genetic Algorithms. 1993, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, 581-588.Google Scholar
- Hsu HP, Mehra V, Nadler W, Grassberger P: Growth-based optimization algorithm for lattice heteropolymers. Physical review. E, Statistical, nonlinear, and soft matter physics. 2003, 68 (2): 021113-View ArticlePubMedGoogle Scholar
- Bastolla U, Frauenkron H, Grassberger P: Phase Diagram of Random Heteropolymers: Replica Approach and Application of a New Monte Carlo Algorithm. Journal of Molecular Liquids. 2000, 84: 111-129.View ArticleGoogle Scholar
- Dorigo M, Gambardella LM: Ant Colony System: A Cooperative Learning Approach to the Traveling Salesman Problem. IEEE Transactions on Evolutionary Computation. 1997, 1: 53-66.View ArticleGoogle Scholar
- Klau GW, Lesh N, Marks J, Mitzenmacher M: Human-guided tabu search. Eighteenth national conference on Artificial intelligence. 2002, Menlo Park, CA, USA: American Association for Artificial Intelligence, 41-47.Google Scholar
- Gront D, Kolinski A, Skolnick J: Comparison of three Monte Carlo conformational search strategies for a proteinlike homopolymer model: Folding thermodynamics and identification of low-energy structures. The Journal of Chemical Physics. 2000, 113 (12): 5065-5071.View ArticleGoogle Scholar
- Hilhorst HJ, Deutch JM: Analysis of Monte Carlo results on the kinetics of lattice polymer chains with excluded volume. The Journal of Chemical Physics. 1975, 63 (12): 5153-5161.View ArticleGoogle Scholar
- Kremer K, Binder K: Monte Carlo simulation of lattice models for macromolecules. Computer Physics Reports. 1988, 7 (6): 259-310.View ArticleGoogle Scholar
- Ramakrishnan R, Ramachandran B, Pekny JF: A dynamic Monte Carlo algorithm for exploration of dense conformational spaces in heteropolymers. The Journal of Chemical Physics. 1997, 106 (6): 2418-2425.View ArticleGoogle Scholar
- Hansmann UHE: Parallel tempering algorithm for conformational studies of biological molecules. Chemical Physics Letters. 1997, 281: 140-150.View ArticleGoogle Scholar
- Irbäck A, Sandelin E: Monte Carlo study of the phase structure of compact polymer chains. The Journal of Chemical Physics. 1999, 110 (24): 12256-12262.View ArticleGoogle Scholar
- Irbäck A: Dynamic Parameter Algorithms for Protein Folding. Monte Carlo Approach to Biopolymers and Protein Folding. Edited by: Grassberger P, Barkema G, Nadler W. 1998, World Scientific, Singapore, 98-109.Google Scholar
- Mitsutake A, Sugita Y, Okamoto Y: Generalized-ensemble algorithms for molecular simulations of biopolymers. Peptide Science. 2001, 60 (2): 96-123.View ArticlePubMedGoogle Scholar
- Geyer C: Markov chain Monte Carlo maximum likelihood. Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface. Edited by: Keramidas E. 1991Google Scholar
- Hukushima K, Nemoto K: Exchange Monte Carlo Method and Application to Spin Glass Simulations. Journal of the Physical Society of Japan. 1996, 65: 1604-1608.View ArticleGoogle Scholar
- Iba Y: Review of Extended Ensemble Algorithms. Proceedings of the Institute of Statistical Mathematics. 1993, 41: 65-Google Scholar
- Kimura K, Taki K: Time-homogeneous parallel annealing algorithm. Proceedings of the 13th IMACS World Congress on Computation and Applied Mathematics (IMACS'91). Edited by: Vichnevetsky R, Miller JJH. 1991, 2: 827-828.Google Scholar
- Hukushima K, Takayama H, Yoshino H: Exchange Monte Carlo Dynamics in the SK Model. Journal of the Physical Society of Japan. 1998, 67: 12-15.View ArticleGoogle Scholar
- Lin CY, Hu CK, Hansmann UH: Parallel tempering simulations of HP-36. Proteins. 2003, 52 (3): 436-445.View ArticlePubMedGoogle Scholar
- Sugita Y, Kitao A, Okamoto Y: Multidimensional replica-exchange method for free-energy calculations. The Journal of Chemical Physics. 2000, 113 (15): 6042-6051.View ArticleGoogle Scholar
- Sugita Y, Okamoto Y: Replica-exchange molecular dynamics method for protein folding. Chemical Physics Letters. 1999, 314 (1–2): 141-151.View ArticleGoogle Scholar
- Sugita Y, Okamoto Y: Free-Energy Calculations in Protein Folding by Generalized-Ensemble Algorithms. Lecture Notes in Computational Science and Engineering. 2001Google Scholar
- Sugita Y, Okamoto Y: Replica-exchange multicanonical algorithm and multicanonical replica-exchange method for simulating systems with rough energy landscape. Chemical Physical Letters. 2000, 329 (3–4): 261-270.View ArticleGoogle Scholar
- Dill KA: Theory for the folding and stability of globular proteins. Biochemistry. 1985, 24 (6): 1501-1509.View ArticlePubMedGoogle Scholar
- Lau KF, Dill KAD: A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules. 1989, 22 (10): 3986-3997.View ArticleGoogle Scholar
- Yue K, Dill K: Forces of Tertiary Structural Organization in Globular Proteins. Proceedings of the National Academy of Sciences of the United States of America. 1995, 92: 146-150.PubMed CentralView ArticlePubMedGoogle Scholar
- Kolinski A, Skolnick J: Reduced models of proteins and their applications. Polymer. 2004, 45 (2): 511-524.View ArticleGoogle Scholar
- Dill KA, Bromberg S: Molecular Driving Forces. 2003, New York and London: Garland ScienceGoogle Scholar
- Verdier PH, Stockmayer WH: Monte Carlo Calculations on the Dynamics of Polymers in Dilute Solution. The Journal of Chemical Physics. 1962, 36: 227-235.View ArticleGoogle Scholar
- Gurler MT, Crabb CC, Dahlin DM, Kovac J: Effect of bead movement rules on the relaxation of cubic lattice models of polymer chains. Macromolecules. 1983, 16 (3): 398-403.View ArticleGoogle Scholar
- Iba Y: Extended Ensemble Monte Carlo. International Journal of Modern Physics C. 2001, 12: 623-View ArticleGoogle Scholar
- Marinari E, Parisi G: Simulated tempering: a new Monte Carlo scheme. Europhysics Letters. 1992, 19: 451-458.View ArticleGoogle Scholar
- Swendsen R, Wang J: Replica Monte Carlo simulation of spin glasses. Physical Review Letters. 1986, 57: 2607-2609.View ArticlePubMedGoogle Scholar
- Bastolla U, Frauenkron H, Gerstner E, Grassberger P, Nadler W: Testing a new Monte Carlo algorithm for protein folding. Proteins. 1998, 32 (1): 52-66.View ArticlePubMedGoogle Scholar
- Beutler TC, Dill KA: A fast conformational search strategy for finding low energy structures of model proteins. Protein Science. 1996, 5 (10): 2037-2043.PubMed CentralView ArticlePubMedGoogle Scholar
- Chikenji G, Kikuchi M, Iba Y: Multi-Self-Overlap Ensemble for Protein Folding: Ground State Search and Thermodynamics. Physical Review Letters. 1999, 83 (9): 1886-1889.View ArticleGoogle Scholar
- Krasnogor N, Hart WE, Smith J, Pelta DA: Protein Structure Prediction With Evolutionary Algorithms. Proceedings of the Genetic and Evolutionary Computation Conference. Edited by: Banzhaf W, Daida J, Eiben AE, Garzon MH, Honavar V, Jakiela M, Smith RE. 1999, Morgan Kaufmann, 2: 1596-1601.Google Scholar
- Dill K, Fiebig K, Chan H: Cooperativity in Protein-Folding Kinetics. Proceedings of the National Academy of Sciences of the United States of America. 1993, 90 (5): 1942-1946.PubMed CentralView ArticlePubMedGoogle Scholar
- Sayle R, Milner-White EJ: RASMOL – Biomolecular Graphics for All. Trends in biochemical sciences. 1995, 20 (9): 374-376.View ArticlePubMedGoogle Scholar
- Aichholzer O, Bremner D, Demaine ED, Meijer H, Sacristan V, Soss M: Long proteins with unique optimal foldings in the H-P model. Computational Geometry. 2003, 25 (1–2): 139-159.View ArticleGoogle Scholar
- Gupta A, Manuch J, Stacho L: Structure-Approximating Inverse Protein Folding Problem in the 2D HP Model. Journal of Computational Biology. 2005, 12 (10): 1328-1345.View ArticlePubMedGoogle Scholar
- Hutter F, Hoos HH, Stützle T: Automatic Algorithm Configuration based on Local Search. Proceedings of the Twenty-Second Conference on Artifical Intelligence (AAAI '07). 2007, 1152-1157.Google Scholar
- Parkes A, Walser J: Tuning Local Search for Satisfiability Testing. Proceedings of the Application of Artifical Intelligence Conference. 1996, MIT Press, 356-362.Google Scholar
- A replica exchange Monte Carlo algorithm for protein folding in the HP model: Project website. [http://www.cs.ubc.ca/labs/beta/Projects/REMC-HPPFP]
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.