Research Article  Open  Published:
Using the multiobjective optimization replica exchange Monte Carlo enhanced sampling method for protein–small molecule docking
BMC Bioinformaticsvolume 18, Article number: 327 (2017)
Abstract
Background
In this study, we extended the replica exchange Monte Carlo (REMC) sampling method to protein–small molecule docking conformational prediction using RosettaLigand. In contrast to the traditional Monte Carlo (MC) and REMC sampling methods, these methods use multiobjective optimization Pareto front information to facilitate the selection of replicas for exchange.
Results
The Pareto front information generated to select lower energy conformations as representative conformation structure replicas can facilitate the convergence of the available conformational space, including available nearnative structures. Furthermore, our approach directly provides minmin scenario Pareto optimal solutions, as well as a hybrid of the minmin and maxmin scenario Pareto optimal solutions with lower energy conformations for use as structure templates in the REMC sampling method. These methods were validated based on a thorough analysis of a benchmark data set containing 16 benchmark test cases. An indepth comparison between MC, REMC, multiobjective optimizationREMC (MOREMC), and hybrid MOREMC (HMOREMC) sampling methods was performed to illustrate the differences between the four conformational search strategies.
Conclusions
Our findings demonstrate that the MOREMC and HMOREMC conformational sampling methods are powerful approaches for obtaining protein–small molecule docking conformational predictions based on the binding energy of complexes in RosettaLigand.
Background
Simulating the interactions between a macromolecule and small molecule (ligand) is important for understanding the molecular basis of the mechanisms found in healthy and diseased cells [1]. The complex conformational search problem has been investigated in recent decades in order to predict the conformations of protein–small ligand docking [2]. Given the importance of conformational search, several software systems have been developed over the past 20 years, including Dock [3], FlexX [4, 5], GOLD [6, 7], Autodock [8–10], Glide [11] and others [12–14]. These software systems and sampling methods can efficiently predict realistic complex protein–ligand docking structures according to predefined sets of criteria [15]. In general, a protein–ligand docking conformational search method uses either Monte Carlo (MC) [16] search strategies or genetic algorithms [17]. However, in order to improve the sampling procedure, various advanced sampling approaches have been developed in recent years [18–20].
The MC method comprises a class of numerical methods based on random sampling and estimating the desired outputs using this sample. Integration by MC simulation evaluates E[f(x)] by drawing samples {X _{ t },t=1,…,n} from the state space Ω and then approximating
Thus, the function mean of f(X) is estimated based on a sample mean. When the samples {X _{ t }} are independent, the law of large numbers ensures that the approximation can be as accurate as required by increasing the sample size n.
The replica exchange MC (REMC) method [21] implemented using independent Markov chains ${X_{n}^{i}}$(n≥0) is defined on the same state space Ω and it can be used to test several replicas in parallel in order to explore the same stationary normalized distributions ρ _{ i }(x)(x∈Ω,1≤i≤N) (due to the central limit theorem) at different “temperatures” [22, 23]. Replicas at sufficiently high temperatures are sampled broadly so the barriers will be crossed, whereas low temperature replicas can used to deeply explore the local energy minima. In the REMC method, frequent exchanges are attempted between states $X_{n}^{i}$ and $X_{n}^{j}$ of two “neighboring” Markov chains with indices i and j, which belong to different thermodynamic states, and the configurations can be identified that cross the local energy barriers more easily.
Many versions of the REMC sampling method have been used in studies related to simulation [24–26]. These search methods provide significant improvements in terms of computational efficiency compared with the traditional MC search methods. Hamiltonian [27–29] and welltempered ensemble [30, 31] methods are used widely as MC search methods. Hamiltonian MC is a Markov chains MC method that uses the physical system dynamics rather than a probability distribution to estimate future states in the Markov chain. This allows the Markov chain to explore the target distribution much more efficiently, thereby resulting in faster convergence in Ω. The welltempered ensemble can be designed to have approximately the same average energy as the canonical ensemble but much larger fluctuations. An even greater advantage is obtained when a welltempered ensemble is combined with parallel tempering [32]. Using a welltempered ensemble, it is possible to observe transitions between states, which would be impossible to study using the standard MC method [33]. In this study, we present novel multiobjective optimization (MO)REMC sampling methods.
A multiobjective optimization problem (MOP) comprises several conflicting objectives that need to be optimized. In general, a MOP is defined mathematically as presented in [34].
Definition 1
(General MOP): A MOP minimizes $F(\vec {x})$ = ($f_{1}(\vec {x})$, …, $f_{k}(\vec {x})$) subject to $g_{i}(\vec {x})\leq 0, i=1,\ldots,m,\vec {x}\in \Omega $. A MOP solution minimizes the component functions of a vector function $F(\vec {x})$, where $\vec {x}$ is an ndimensional decision variable vector ($\vec {x}$ = x _{1},…,x _{ n }) from some space Ω, the vector $\vec {x}$ minimizes every component of $F(\vec {x})$, or at least one, and the component functions of the vector function $F(\vec {x})$ should be computable for every $\vec {x}$.
The objectives of DEFINITION 1 contradict each other because no point in Ω maximizes all of the objectives simultaneously. Thus, in order to balance them, the best tradeoffs among the objectives can be defined in terms of Pareto optimality. Using the MOP presented in DEFINITION 1, the key Pareto concepts of Pareto dominance, Pareto optimality, Pareto optimal set, and the Pareto front (nondominated solutions set) are defined mathematically as presented in [34, 35]. The multiobjective optimization approach finds the Pareto optimal set of the population, which comprises a set of solutions that are nondominated with respect to each other. In the objective space, the set of nondominated solutions lie on a surface known as the Pareto front. Nondominated solution sets are those in which no other solutions are superior in terms of all attributes (objectives). Pareto optimality is effective for facilitating the convergence of the population in a lowdimensional search space [36]. By comparing every solution in the Pareto optimal set, it is always possible to improve one attribute to achieve a better gain without another becoming worse. However, each objective can be minimized or maximized when considering optimization problems with two objectives. The Pareto front approach offers a method based on attributes for finding the subset of promising solutions. This method also considers the solution attributes directly without converting them into a standard form initially. Figure 1 illustrates the case of a Pareto front with two objectives (colored points), where there is a tradeoff between minimizing and maximizing the Pareto optimal points of both the x and y coordinate values in minmax, maxmax, minmin, and maxmin scenarios. The scatter plots indicate the Pareto optimal set with discrete points for four different scenarios and two objectives. In each case, the Pareto optimal set always comprises solutions from a particular edge of the feasible search space for discrete points [37].
In recent studies, protein–small ligand docking prediction has focused on improving the convergence speed using sampling methods. A form of solution is used as an important component of evolutionary multiobjective optimization algorithms. It has been shown that using an elitist solution improved the convergence speed for various sampling algorithms. Therefore, in this study, we developed MOREMC methods by using multiple nondominated solutions as replicas for exchange during optimization at different temperatures, thereby improving the REMC sampling algorithm convergence speed associated with replica selection. We also developed methods for choosing replicas to enhance search and to improve exploration of the state space by using the Pareto front energy information. We demonstrated that the MOREMC methods could enhance the performance of sampling methods based on a suite of benchmark test sets using the RosettaLigand protocol [38, 39]. We also performed an extensive comparative study of the proposed methods with traditional MC (detailed implementation is presented in the “Sampling methods” section in reference Algorithm 1) and REMC (see Algorithms 3 and 2) sampling algorithms based on 16 benchmark test cases. As part of this investigation, the RosettaLigand energy function total score (TScore), binding energy interface delta (IFDelta), and ligand of RMSD(Lrmsd) obtained with the proposed MOREMC algorithms were compared with those produced by MC and REMC sampling methods, which showed that the proposed methods generally performed better than MC and REMC. The MOREMC (see Algorithms 3, 4 and 5) and hybrid MOREMC(HMOREMC, see Algorithms 3, 4 and 6) methods were found to enhance the convergence to solutions compared with the MC and REMC sampling methods.
Methods
Test data set
The RosettaLigand protocol yielded better results with the classic MC sampling method when using a data set of 100 native proteinligand complexes. In 71/100 cases, the lowest energy model had an Lrmsd less than 2Å [39]. We suggest that the RosettaLigand protocol cannot obtain satisfactory results in the remaining cases mainly because the MC sampling technique employed in docking is not sufficiently efficient for sampling or optimization in challenging cases. In the present study, we considered cases where satisfactory result could not be obtained with the MC approach. In all of these cases, the native complex was not recognized as a particularly low energy pose even after minimization. The 16 complexes used in this study are summarized in the “Summary of the docking results obtained using different sampling methods and scales” section.
Preparation of the protein and ligand
A validated receptor is crucial for the successful prediction of targets. In this study, we performed repacking of the sidechain of the receptor near the initial ligand position in a similar manner to the RosettaLigand protocol [38]. Placing a ligand near clashing residues allowed the sidechains to be repacked stochastically. We generated 10 structures per receptor and the receptor structure was directly derived based on the RosettaLigand TScore to select the protein conformation with top minor TScore value. This selection process used the RosettaLigand protocol to generate 10 structures per receptor and we only selected that with the lowest energy. This procedure can resolve any preexisting clashes between the protein sidechains and ligand, thereby gaining a large energy increase [39].
Alternatively, we treated ligand conformations as “rotamers,” which were sampled at the same time as the protein sidechains were repacked. Ligands were represented as a set of discrete conformations. To generate these conformations, all the torsional degrees of freedom in the ligand were identified and each of the torsion angles with probable conformations was compiled based on the atom type and hybridization state of the linked atoms. Next, each torsion angle was placed in one of the states considered, but conformations with internal clashes in ligand atoms were not considered, especially the conformations where the closed ring systems were not altered. Finally, we evaluated the internal ligand energy and energy minimization was applied [40]. At present, ligand conformers are generated externally in the RosettaLigand protocols. Thus, we used the Omega program (v2.3.2, OpenEye) [41] with its default settings and restrained the ligand torsions with a harmonic potential during minimization.
Scoring function for docking
In the coarsegrained sampling stage, the coarsegrained complementary score S _{ cg } is defined as
where R denotes ligand atoms within 2.25Å of the receptor backbone or C ^{β}s (repulsive clashes), A denotes ligand atoms between 2.25Å and 4.75Å of any protein atom (attractive contacts), and N denotes the total ligand atoms. The bestscoring poses were filtered by stochastic elimination of near duplicates with a threshold of $0.65\sqrt {N}$ Å, where N is the number of nonhydrogen ligand atoms [39].
In the highresolution refinement stage, the fullatom score is a linear combination of the different scoring items. These scoring items include the attractive LennardJones score, repulsive LennardJones score, implicit LazaridisJarplus solvation score, reference energy for each amino acid, proline ring closure energy score, backbonebackbone Hbonds distant and close scores in the primary sequence, hydrogen bond energy score, probability of an amino acid at phi and psi angles, residue–residue pair probability score, and omega dihedral in the backbone. The highresolution refinement scoring function S _{ fa } is defined as
where s _{ i } denotes different scoring items and w _{ i } denotes alternative weights. The full details are described in Table 1, reference [42]. In this research, we are simply using coarsegrained sampling stage and highresolution refinement stage scoring functions for docking, including TScore and IFDelta functions, as implemented in RosettaLigand [39].
Sampling methods
Our docking methods are based on the RosettaLigand(v3.4) protocol, where we use the repacking sidechain method in ROSETTA suites to generate the receptor and represent ligands as a set of discrete conformations generated by the Omega program. Finally, we examined the capability of the RosettaLigand docking protocol based on MC, REMC, MOREMC, and HMOREMC sampling methods.
MC sampling method
The MC method approximates an expectation based on the sample mean of a function of simulated random variables. The term MC generally applies to all simulations that utilize random sampling to obtain numerical solutions for a system of interest. In the general RosettaLigand protocol, MC refers to MetropolisHastings sampling, which samples from the Boltzmann distribution, and it was developed by Metropolis et al. in the Los Alamos team [43]. In the present study, MC simulations were performed as follows. Starting from an initial conformation of the protein–ligand interaction, a perturbation of rotamerTrialMover() or packRotamersMover() was attempted that changed the conformation of the complex. This trail Mover() from state last accepted (old) to state perturbed (new) is accepted based on an acceptance probability such that [39]
where the boltz_factor=(last_accepted_score−score)/k _{ B } T, last_accepted_score denotes the energy value of the last accepted structure of the complex, score denotes the energy value of the perturbed structure of the complex, T denotes the current temperature, and k _{ B } denotes the Boltzmann constant, which is considered to be one. In order to decide whether to accept or reject the trail Mover(), we generate a random number, denoted by mc_RG_uniform, from a uniform distribution in the interval [ 0,1].
Clearly, the probability that mc_RG_uniform[ 0,1] is less than prob[ old→new] is equal to prob[old→new]. We now accept the trail Mover() if mc_RG_uniform[ 0,1]<prob[ old→new] or prob[old→new]≥1 and reject it otherwise. The transition probability for the MC sampling method from conformation p to a perturbed conformation p ^{′} depends on the difference in last_accepted_score−score between the last accepted (old) conformation and the perturbed (new) conformation, which is determined such that
where prob[ old→new] is the acceptance probability between conformations p ^{′} and p. This rule guarantees that the probability to accept a trail Mover() from the last accepted conformation to perturbed conformation is indeed equal to prob[ old→new] [44]. If the current conformation structure is rejected, MC can retain an additional duplicate of the previous sampling structure as the sample accepted by the system. Figure 2 (left and upper panel) shows that the last sampling structure (red point) is accepted by the MC method as the exclusive solution. After many iterations, an accurate average energy value can be obtained for a complex structure. Algorithm 1 shows the pseudocode for the RosettaLigand MC Boltzmann sampling method implementation.
In RosettaLigand, the efficiency of the MC Boltzmann sampling method can be improved by avoiding the computation of the exponential function (line 4, Algorithm 1). A more detailed interpretation is given in reference [44].
REMC sampling method
In current protocols, replica exchange is the most widely used method for enhancing sampling in biomolecular simulations, where it can be viewed as a parallel version of simulation tempering, and it is also known as parallel tempering or multiple Markov chains. In the proposed method, REMC search maintains M identical copies of replicas as M sampled canonical ensembles at different temperatures. Each temperature value is unique and each of the M replicas has an associated temperature value (T _{1}, T _{2},..., T _{ M }). Each of the M replicas independently performs a simple MCBoltzmann(p,T) search at the respective temperature setting. In addition, in our REMC algorithm, each replica pi′ is perturbed and the associated energy value E(pi′) is archived in ensembles P ^{′} and E ^{′}. The elite replicas in the archives are selected using a procedure called select_REMC_Replicas(E ^{′},P ^{′}). In this procedure, we select the last “numR” conformations that have been pushed into the queue in the archives as replicas for exchange, as shown in Fig. 2 (right and upper panels), where the last “numR” sampling structures are used as replicas(red points) for exchange in the REMC method. Algorithm 2 presents the pseudocode for the selection of replicas from the archives in the implementation of the REMC sampling method.
We can represent the current state of the “numR” replicas selected from the archives as a protein conformation ensemble pe ^{′}: = ($pe^{\prime }_{1}$,..., $pe^{\prime }_{numR}$), as follows, where $pe^{\prime }_{j}$ is the conformation of replica j, which (as stated previously) runs at temperature T _{ j }. During replica exchange, the temperature values of neighboring replicas are exchanged at a probability proportional to their energy value and difference in temperature. The transition probability from some current conformation $pe^{\prime }_{i}$ to a perturbed (trail Mover()) conformation pei″ is determined using the socalled Metropolis criterion, as shown in the MC sampling method section.
Exchanges are performed between neighboring temperatures, T _{ i } and T _{ j }. The probability of an exchange depends on the energy values, E(pei′) and E(pej′), and the inverse temperatures, β _{ i } and β _{ j }. An exchange of temperatures, and thus the relabeling of replicas, affects the state of the replica ensemble pe ^{′}. Therefore, we define an exchange between two replicas i and j more generally as a transition from the current ensemble state pe ^{′} to an exchanged state pe ^{″}. We define l(pei′) = i, the current label or replica number, for all $pe^{\prime }_{i}$. The probability of a transition from the current ensemble state pe ^{′} to an altered state pe ^{″} by exchanging replicas i and j is defined as [45]:
The value Δ is the product of the energy difference and inverse temperature difference:
where β _{ i }=1/T _{ i } is the inverse of the temperature of replica i. Potential replica exchanges are only performed between neighboring temperatures because the acceptance probability of the exchange decreases exponentially as the temperature difference between replicas increases.
The pseudocode for Algorithm 3 illustrates the details of our REMC search procedure performed for “numR” replicas and a predetermined temperature range between minT and maxT. In the “ while i+1<numR do” loop, which runs over the pairs of replicas to be swapped, it can be seen that the swaps being attempted include pairs (0,1), (2,3), (4,5), etc., but never pairs (1,2), (3,4), (5,6), etc. This scheme will not satisfy the “detailed balance condition”(transition probabilities i→j≠j→i). Moreover, in the condition structure for Δ, it is obvious that the swap is rejected if Δ is larger than some threshold number (often 75, but also depends on the computer architecture), then the swap is rejected because e ^{−Δ} can never be larger than any random number mc_RG_uniform[ 0,1], and hence one call of the random number generator is saved, making the algorithm computationally more efficient.
MOREMC sampling method
The REMC method involves a group of MC moves that generate a Markov chain of states. This Markov process has no dependence on history in the sense that new configurations are generated with a probability that depends only on the current configuration and not on any previous configurations. In this study, we developed the MOREMC sampling method where the random configuration process is not Markovian so the “detailed balance criterion” is not satisfied. In contrast to the traditional REMC algorithm, which typically samples a canonical ensemble of states, we introduce a dependence on history into the REMC method and use historic multiobjective optimal Pareto front information to facilitate the selection of critical replicas of current states, which comprise a set of replicas that are similar to lower energy states but also as diverse possible. Using the generated Pareto front as representative conformation structure templates can improve the convergence of the available conformational space including possible nearnative structures.
The aim of the MOREMC sampling method is to enhance the speed of convergence for the available conformational space. The MOREMC method employs a historydependent Pareto frontier list to explicitly maintain a limited number of nondominated conformations found by the REMC sampling method. Each individual in the archives generated by the REMC sampling method is evaluated using binary objectives: the sampling search steps (MC steps) and the TScore values of the perturbed conformations. The objective MC steps denote the time series for the search process and the TScore values for the perturbed conformations in RosettaLigand denote a historydependent information map of the available conformational space. The MOREMC sampling method is inspired by evolutionary, populationbased algorithms. In the traditional REMC method, replicas at sufficiently high temperatures are sampled broadly so the barriers will be crossed, whereas lowtemperature replicas can used to deeply explore the local energy minima principle. Included in multiobjective optimal method critical replicas of current states are similar greedy states, dominated nonPareto frontier list replicas, and diverse possible characteristics. This method is effectively a combination of the REMC sampling method and historic multiobjective optimal Pareto front critical conformation structures. The experimental results show that the elite replicas generated by the historic multiobjective optimal Pareto front can enhance the speed of convergence of the available conformational space.
Algorithm 4 presents the pseudocode for calculating the binary objectives based on the Pareto front of archives in the implementation of the MOREMC sampling method. Each objective can be minimized or maximized according to the values of Boolean variables maxX and maxY. In this procedure, in the first step (lines 1–6), all of the solutions x _{0},…,x _{ n−1} in the archives are the alternatives sorted in order of increasing/decreasing objective X, which can be minimized or maximized. Let pf ^{′}:= {x _{0},y _{0}} and i:=1, where {x _{0},y _{0}} denotes the combination containing the first nondominated front. In the second step (lines 8–17), for each combination in the archives {x _{ i },y _{ i }}∈{X,Y}, let pf ^{′}:= pf ^{′}∪{x _{ i },y _{ i }}, If {x _{ i },y _{ i }} is not dominated by any combination according to objective Y that has been be minimized or maximized already in pf ^{′}, then add {x _{ i },y _{ i }} to pf ^{′}. In the third step (lines 7–18), repeat from the step second until no more combinations can be added to pf ^{′}. In the last step, iteration stops when i=N, where N denotes the number of combinations in the archives.
In addition, in the middle of each iteration of the MOREMC sampling method, a set of conformations is provided instead of the last set of conformations using the select_MO−REMC_Replicas(E ^{′},P ^{′}) procedure, whereas the REMC sampling method uses select_REMC_Replicas(E ^{′},P ^{′}). The select_MO−REMC_Replicas function is obviously designed to select the conformations from the archived and the last “numR” minmin scenario Pareto optimal solutions set that are nondominated relative to the other conformations, as shown in Fig. 2 (left and lower panel), where in the last circle, the last “numR” sampling structures are used as replicas(red points) for exchange in the MOREMC method, and the minmin scenario Pareto optimal solutions set is denoted by yellow points (partial points are covered by red points in Fig. 2). These minmin scenario Pareto optimal solutions from the archives provide a natural and rapid convergence source, which is used to obtain alternative comparison sets from the archives. The pseudocode in Algorithm 5 describes the procedure for determining whether to accept or reject the Pareto front, as well as for deciding whether to select replicas for exchange or not.
HMOREMC sampling method
The pseudocode of our implemented method for selecting HMOREMC replicas is presented in Algorithm 6. We experimented using this variant of the MOREMC algorithm with 16 protein–small ligand docking cases, which differed only in terms of the procedure used for selecting elite solutions in the MOREMC sampling method. Updating of the replicas occurs in the MOREMC method, which ensures that it only contains nondominated solutions where both the objective MC steps and TScore can be minimized. Thus, the replicas for exchange cover a diverse range of individuals so the minmin scenario nondominated solutions assigned to replicas truly reflect the quality of the MOREMC sampling method. The MOREMC sampling method exclusively uses replicas from the archives where both the objective MC steps and TScore are minimized.
Similarly, in the HMOREMC sampling method, the replica selection method is based on the solutions in the archives where the nondominated solutions from both the objective MC steps and TScore are minimized, as well as the maximized objective MC steps and minimized objective TScore values. Figure 2 (right and lower panel) shows that lower energy nondominated solutions are used in minmin and maxmin scenarios Pareto optimal solutions as replicas(red points) for exchanging in the HMOREMC method. The minmin scenario Pareto optimal solutions set is denoted by yellow points and the maxmin scenario Pareto optimal solutions set by green points. Obviously, the replicas do not include all of the lower energy nondominated solutions in the MOREMC sampling method. Our MOREMC variant, the HMOREMC sampling method, uses hybrid nondominated solutions to select the solutions where both the objective MC steps and TScore are minimized, as well as the maximized objective MC steps and minimized objective TScore nondominated solutions. In particular, in each replica selection step, all the lower energy nondominated solutions in both the minmin and maxmin scenarios will be used preferentially as replicas for exchange. If the number of solutions is less than numR, which is the number of replicas used for exchanging, the nondominated solutions set is hybridized, where both the minmin and maxmin scenarios nondominated solutions are used iteratively to fill the replica set in order of the TScore value sequence. Replica selection in the MC, REMC, MOREMC, and HMOREMC sampling methods is illustrated in Fig. 2.
Implementation in Rosetta
All versions of our MC protein–ligand docking sampling methods were coded in C++ and compiled using g++ (GCC v4.4.7). Algorithm 1 presents the pseudocode to illustrate the details of our MC search procedure for a single replica with N times MC runs (N=numR×numC) and a predetermined number of temperatures (T=2.0). Algorithm 3, presents the pseudocode for the implementations of our REMC sampling methods. In order to demonstrate the effectiveness of the REMC algorithms, including REMC, MOREMC, and HMOREMC, and without prior knowledge of the problem instances, we fixed the parameter configuration in all of the experimental cases (numR,numC,repackNth,minT,maxT) : = (16,16,5,2,4), where numR is the number of replicas simulated, numC is the number of local circle steps in REMC search, repackNth is the number of iterative steps performed by a packRotamersMover() mover, and minT and maxT are the minimum and maximum temperature values, respectively. All versions of our REMC algorithms were run on 16 processors and they were parallelized.
Multiple independent trajectories were used to generate an ensemble of docking models near the native complex using the MC, REMC, MOREMC, and HMOREMC sampling methods. In all of the tests in this study, we performed 5000 docking trajectories (runs), 16×16×5000 MC steps, for each receptor–ligand pair in the predictive structures, which required 30–50 processorhours on a 1.9 GHz CPU and 2 GB memory per core Linux cluster. The results of these docking calculations were typically evaluated based on the “energy versus rmsd” plot where IFDelta scores were plotted versus Lrmsd values, and the effectiveness of each sampling method was judged according to the “funnellike” character of the plot. In this procedure, we first discarded any structures where the ligand was not touching the protein (scoring function item ligand_is_touching=0). Second, we took the top 5% of structures based on the total energy. Finally, we ranked the remaining decoys based on the RosettaLigand IFDelta between the protein and ligand. We obtained better results with these ranking scheme and parameters.
Results and discussion
Comparison of different sampling methods
In the procedure using different sampling algorithms, for each crystal structure target in the test data set, the ligand was extracted from the native complex and redocked into the binding pocket. The Lrmsd value was calculated between the predicted positions C ^{α} of the ligand and the ligand C ^{α} in the experimental crystal structure, and Lrmsd ≤2Å was used as the criterion for success. Using the classic MC sampling method, the protein included backbone translation and rotation as well as repacking of the sidechain of the receptor, and we only selected the lowest pose in terms of energy with the traditional RosettaLigand docking protocol. As shown in Fig. 3, for the 1K3U, and 1OWE targets, the MC sampling method could not produce better experimental binding poses for the ligand in these complexes compared with those reported previously [39] even after 1.28 ×10^{6} MC steps. For 1K3U, and 1OWE, the docking results did not satisfy the requirement in terms of Lrmsd ≤2Å, but they converged based on “IFDelta versus Lrmsd,” as shown by the “funnellike” character of the plot at the lower left. Successful predictions were made for the 1AQ1 and 2PRG targets using the MC sampling method, but the predictions were not sufficiently good for all of the target protein structures using the four sampling methods (see the docking results obtained using the REMC, MOREMC, and HMOREMC sampling methods in the figure).
The aim of REMC sampling methods is to increase the scope and depth of sampling by exchanging configurations between replicas characterized by slightly different temperature parameters. The REMC sampling method has been employed widely to enhance sampling methods by crossing energy barriers and accelerating the convergence of MC simulations. For a specific target, the MC sampling method may not be sufficient to cover some important regions of the conformational space that can be recognized by a number of ligands. However, enhanced sampling methods such as REMC, MOREMC, and HMOREMC can be used to generate a large number of receptor conformations for protein–ligand docking. Thus, in this study, in order to sample more of the receptor backbone and sidechain flexibility in each case, we tested 5000 decoys with each enhanced sampling method and only selected the lowest energy pose from these trajectories based on the IFDelta function as implemented in RosettaLigand [38, 39]. As shown in Fig. 3, the RosettaLigand protocol based on the REMC method obtained the lower energy pose (1OWE), faster convergence of the lower energy pose (2PRG), crosslocal energy minima (1K3U), and the binding poses of the alternative ligand for the first pose within 2Å Lrmsd. By contrast, for 2PRG, the MOREMC and HMOREMC sampling algorithms obtained nearly perfect results within 1Å Lrmsd as well as faster convergence for more of the predicted structures with the lowest IFDelta scores.
Comparison of different sampling scales
The evolution of sampling in terms of the IFDelta and Lrmsd scores with different sampling scales is shown for one representative target (2PRG) in Fig. 4. For 2PRG, the four sampling methods could progressively sample lower (more favorable) IFDelta values as the number of MC steps increased from 2.56 ×10^{5} to 1.28 ×10^{6}. However, the enhanced sampling methods obtained faster convergence in terms of IFDelta, as well as the HMOREMC method compared with the MOREMC method for Lrmsd <=2Å. The MC sampling method successfully sampled solutions with Lrmsd <=2Å after 1.28 ×10^{6} steps, whereas the REMC, MOREMC, and HMOREMC sampling methods could reach nearnative solutions, particularly the MOREMC method, which obtained Lrmsd <1Å solutions after only 7.68 ×10^{5} MC steps. In terms of the IFDelta scores, after 1.28 ×10^{6} MC steps, the MC sampling algorithm successfully sampled nearnative solutions with Lrmsd of 1.42Å and the IFDelta score value was –18.8. By contrast, after only 2.56 ×10^{5} MC steps, the REMC, MOREMC, and HMOREMC methods obtained Lrmsd scores within 1.20Å, 1.14Å, and 1.33Å, respectively, and the IFDelta scores were –18.4, –18.9, and –17.2, respectively. Furthermore, after 1.28 ×10^{6} MC steps, the three enhanced sampling algorithms sampled nearnative solutions with Lrmsd scores of 1.20Å, 0.79Å, and 0.69Å, respectively. In addition, the IFDelta scores converged around –18.6 ±0.3. Similar trends were also observed in all the other test cases.
Summary of the docking results obtained using different sampling methods and scales
In general, better docking results are achieved by sampling lower docking score value conformations. So, the first parameter that we evaluated was the global performance of the docking results in terms of the IFDelta score. For all 16 cases, the evolution in terms of IFDelta using different sampling scales in the four sampling methods is shown in Fig. 5. As shown by the histogram of IFDelta values for the 16 individual targets, the four sampling methods could sample nearnative docking solutions with more negative IFDelta scores at three sampling scales in 2.56 ×10^{5}, 7.68 ×10^{5}, and 1.28 ×10^{6} MC steps. However, using the same number of MC steps (2.56 ×10^{5}, 7.68 ×10^{5}, or 1.28 ×10^{6}), the enhanced sampling methods could sample solutions with lower IFDelta scores than the classic MC sampling method. The MOREMC, and HMOREMC enhanced sampling methods obtained better docking results in 9/16 cases (1AQ1, 1DBJ, 1JJE, 1TOW, 1V48, 1Y6B, 2DBL, 2PRG, and 7CPA) with lower final IFDelta scores compared with the standard MC and REMC sampling methods after 1.28 ×10^{6} MC steps. It was interesting that the MOREMC sampling method obtained better docking results in 7/16 cases, with lower IFDelta scores compared with the HMOREMC sampling method. However, in 3/16 cases (1OWE, 1PQ6, and 4TIM), the REMC method obtained configurations, which were closer to the lower binding energy form compared with the MOREMC methods. By contrast, the MC sampling algorithm succeeded also in the cases of 1JD0, 1K3U, 1W2G, and 6TIM after 1.28 ×10^{6} MC steps. The results based on the 16 test cases indicate that the MOREMC and HMOREMC enhanced sampling methods performed better than the MC and REMC sampling methods. The results also showed that the IFDelta values could vary dramatically for different targets and sampling methods, whereas the IFDelta scores obtained for the same target with the REMC, MOREMC, and HMOREMC enhanced sampling methods varied only slightly. For example, for 1PQ6, the MC method achieved an IFDelta value of around –28.8, whereas the REMC, MOREMC, and HMOREMC sampling methods obtained a binding energy value of around −29.8±0.2. In addition, for 1DBJ, the four sampling methods achieved similar IFDelta scores of around 13.2±0.2 after 1.28 ×10^{6} MC steps.
The second parameter that we analyzed was the overall performance of the docking results in terms of the Lrmsd value. An overview of the Lrmsd values obtained for the individual targets is shown in Fig. 6 at three sampling scales of 2.56 ×10^{5}, 7.68 ×10^{5}, or 1.28 ×10^{6} MC steps. For each target, the Lrmsd values are presented after docking the ligand into alternative receptor structures using the MC, REMC, MOREMC, and HMOREMC sampling methods. For each of the 16 targets, the bars from left to right correspond to the results for the protein based on the MC, REMC, MOREMC, and HMOREMC sampling methods, respectively. For 1OWE, and 1W2G, the MC sampling method achieved better (slightly) solutions within 2Å Lrmsd compared with the enhanced sampling methods after 2.56 ×10^{5} MC steps. However, for 14/16 cases, excluding 1JD0, and 1TOW, using the enhanced sampling methods achieved better minimum Lrmsd values for docking with the protein than the MC sampling method after 1.28 ×10^{6} MC steps. In particular, for 1W2G, and 2PRG, the MOREMC enhanced sampling method obtained Lrmsd values that were close to the perfect results within 1Å Lrmsd. These results have never been obtained before using MC sampling methods, and we showed that the MC sampling method could not obtain satisfactory samples of complicated protein flexibility after 1.28 ×10^{6} MC steps. Finally, the Lrmsd values for individual protein structures could vary dramatically using different sampling methods and they changed greatly after 1.28 ×10^{6} MC steps, thereby suggesting that in structurebased protein–ligand docking experiments, different sampling methods can significantly affect the docking results in terms of both depth and breadth. For example, for 2PRG, the best performing target obtained using the MC sampling method only achieved an Lrmsd value of around 8.29Å after 7.68 ×10^{5} MC steps, whereas the REMC enhanced sampling method gave an Lrmsd value within 1.20Å, but the MOREMC and HMOREMC sampling methods obtained the best docking results within 0.79Å and 1.76Å Lrmsd, respectively.
Convergence with different sampling methods
Next, we briefly discuss how different sampling methods can affect the rate of convergence. Firstly, in order to demonstrate that the MOREMC and HMOREMC sampling methods proposed here provide an efficient sampling technique in temperature space, we calculate the probability of finding each replica at different temperatures. For the RosettaLigand docking protocol, the probability value with energy score in heat capacity and temperature T is described by Eq. (4), but no exponential calculation. In Fig. 7, for target 2PRG, we show that using the MC, REMC, MOREMC, and HMOREMC sampling methods, the probability of finding each replica at different temperatures progressively flattened over numR=16 replicas simulated through numC=16 local circles(numR×numC MC steps). On each subfigure, the red circle points correspond to the probability average values of numR=16 replicas simulated through numC=16 local circles. The results obtained by the MC sampling method show that after numR×numC MC steps, the probability average values converged slowly to 1.56 with a wider fluctuation variance value of 8.78. However, using the enhanced sampling methods, REMC, MOREMC, and HMOREMC, the results show that the probability average values converged faster to 0.78, 0.73, and 0.99, with a narrow margin fluctuation variance value of 0.22, 0.18, and 0.07, respectively. Especially, for the HMOREMC sampling method, the probability values of finding each replica at different temperatures show a fairly flat probability distribution. The probability results show that a strong temperature dependence of energy for complex protein–ligand docking systems.
Secondly, in Fig. 8, for the 2PRG and 4TIM targets, we show how the estimated TScore and IFDelta scores obtained using the MC, REMC, MOREMC, and HMOREMC sampling methods converged over a simulation of 1.28 ×10^{6} MC steps. For comparison, we also show the same Lrmsd values calculated after 1.28 ×10^{6} MC steps. For 2PRG, the results obtained by the enhanced sampling methods are shown that after 7.68 ×10^{5} MC steps, which demonstrate that the IFDelta score converged almost exactly to –18.2, with small fluctuations in the order of ≈0.8. However, using the classic MC sampling method, almost all of the 7.68 ×10^{5} MC steps were required to obtain a converged result with an IFDelta value in the order of –17.3, as shown in Fig. 8 (left and middle panels). After 1.28 ×10^{6} MC steps, however, four sampling methods could obtain better convergence in terms of IFDelta, TScore, and Lrmsd. This represents an improvement in the sampling efficiency by one order of magnitude and it is very likely that this could be improved further, such as by incorporating information from the Hamiltonian. One of the most important tests of convergence for a protein–ligand interaction when sampling a complex transformation is the sensitivity of the results to different sampling methods. Thus, to exclude any dependence on different sampling methods, we also calculated the Lrmsd values for the four sampling methods after 1.28 ×10^{6} MC steps and found that the estimated Lrmsd values agreed very well with the results based on the IFDelta values. These results are presented in Fig. 8 (left and lower panel). To facilitate a comparison with other targets, we also performed sampling for 4TIM using the four sampling methods, as shown in Fig. 8 (right panels), which clearly demonstrate that running 1.28 ×10^{6} MC steps for 4TIM was sufficient to obtain a converged estimate of the IFDelta score. In particular, using the MOREMC sampling method, the estimate of the IFDelta score converged rapidly. However, the MC sampling method might obtain better convergence in terms of IFDelta and TScore as well as Lrmsd, but the rate of convergence was slower. The REMC sampling method achieved better convergence after 1.28 ×10^{6} MC steps, but the results indicated that the convergence rate was slower than that using the MOREMC and HMOREMC sampling methods in terms of speed and depth. In addition, the HMOREMC sampling method performed slightly better than the MOREMC sampling method in 9/16 cases after 1.28 ×10^{6} MC steps, as shown in Fig. 6 (lower panel).
Finally, we present further evidence that the MOREMC and HMOREMC sampling methods are effective sampling techniques in temperature space. For 2PRG, we conduct more than one simulation run using different initial parameters((numR,numC,repackNth,minT,maxT) : = (8, 8, 3, 2, 4), (8, 8, 3, 2, 6), (8, 16, 5, 2, 4), (8, 16, 5, 2, 6), (16, 8, 3, 2, 4), (16, 8, 3, 2, 6), (16, 16, 5, 2, 4), and (16, 16, 5, 2, 6), respectively.). In this way, the summaries of IFDelta and Lrmsd through creating different trajectories(decoys) of configurations for each initial parameter are compared in Fig. 9. We show that using different initial parameters, the new methods (including MOREMC and HMOREMC) proposed in this research can efficiently converge to possible nearnative solutions. In addition, we also compare the necessary number of sampling decoys to reach convergence in simulation runs using different initial parameters. The results show that when using different initial parameters, better nearnative conformations can be achieved after sampling 5 ×10^{3} decoys(numR×numC×N MC steps, the N is the number of decoys). However, for different initial parameters, the necessary number of sampling decoys may be conspicuously different when using different sampling methods.
Conclusions
In this study, we developed REMC sampling methods based on multiobjective optimization for predicting conformations in protein–small ligand docking with RosettaLigand. We used temperature replica exchange to enhance conformational sampling between Pareto optimal solutions, and the concept of nondominated solutions was applied to solve the replica selection problem in our REMC enhanced sampling methods. In contrast to most other MC and REMC methods, the MOREMC method selects nondominated solutions, which depend on archived solutions measured in terms of the objective MC steps and TScore values, in order to find a set of similar replicas with lower energy conformations but that are also as diverse as possible. The MOREMC and HMOREMC methods achieve better integration of the REMC sampling method and critical conformation structures of the current sampling state. Using a benchmark data set of 16 proteinligand test cases with different chain lengths in terms of amino acids, we assessed various comparison measures, i.e., TScore, IFDelta, and Lrmsd. We also considered the funnellike character of the energy landscape, the probability of finding each replica at different temperatures, and the rate of convergence in the TScore, IFDelta, and Lrmsd scores.
For the targets tested in our benchmark data set, we found that the ligand pose was correctly positioned within 2Å Lrmsd for 11/16 of these targets using the HMOREMC sampling method after 1.28 ×10^{6} MC steps. The performance of the proposed MOREMC sampling method was better than that of the MC and REMC methods in most cases, whereas MC generally performed better but converged slowly. The MOREMC sampling method achieved significantly faster convergence of the lower energy poses and identified more correct docked complexes with nearnative decoys than the MC and REMC sampling methods. Moreover, the results showed that HMOREMC obtained faster convergence and more distinct solutions than MOREMC in each run for most of the targets. The MOREMC and HMOREMC methods required the same or slightly more time than MC and REMC for the same number of sampling steps. Moreover, for the 1DBJ, 1JD0, 1K3U, 1PQ6, 1Y6B, 2PRG, 4TIM, 6TIM, and 7CPA targets, the performance of HMOREMC was much better than that of MOREMC. The HMOREMC sampling method captured much of the possible variation in the conformations for most of the test cases, and it also sampled lower binding energy conformations within an Lrmsd of 2Å for the conformations of these targets. The HMOREMC hybridizes two scenario combinations for the Pareto optimal solutions with the rankingbased MOREMC method and it worked well for many targets. An interesting feature of the MOREMC method compared with other REMC algorithms is that many nondominated solutions are chosen as the current replicas for exchange. Thus, sampling at a lower energy is a much more greedy process, which leads to better protein–ligand conformational sampling performance. Clearly, this feature can also be incorporated in addition to the concept of Pareto front solutions in other ensemblebased sampling methods in order to improve their performance. In addition, experimental results showed that faster convergence to the global optimal solution does not necessarily provide an efficient algorithm for enhancing conformational sampling of the phase space. Use of temperature replica exchange to enhance conformational sampling between nondominated solutions can also provide good convergence of the available conformational space including available nearnative structures.
In the future, the proposed MOREMC method may be extended in several ways. Even though detailed balance is not satisfied in the MOREMC and HMOREMC sampling methods, some balance condition may still be efficient if it is proved that it provides a good sampling method. We can still generate an algorithm that may satisfy the balance criteria, for example, instead of selecting the conformation ensemble of Pareto optimal solutions, the configurations to be swapped from the history archives can be randomly selected instead. Moreover, in order to obtain performance improvements, several enhanced sampling techniques, including Hamiltonian replica exchange and welltempered ensemble approaches, or even a dynamic temperature selection strategy, can be incorporated in the MOREMC and HMOREMC methods.
Abbreviations
 HMOREMC:

Hybrid MOREMC
 IFDelta:

Binding energy interface delta
 Lrmsd:

Ligand of RMSD
 MC:

Monte Carlo
 MOREMC:

Multiobjective optimizationREMC
 MOP:

Multiobjective optimization problem
 REMC:

Replica exchange Monte Carlo
 TScore:

The RosettaLigand energy function total score
References
 1
Maximova T, Moffatt R, Ma BY, Nussinov R, Shehu A. Principles and overview of sampling methods for modeling macromolecular structure and dynamics. Plos Comput Biol. 2016; 12(4):e1004619. doi:10.1371/journal.pcbi.1004619.
 2
Dror RO, Dirks RM, Grossman JP, Xu HF, Shaw DE. Biomolecular simulation: A computational microscope for molecular biology. Annu Rev Biophys, Vol 41. 2012; 41:429–52. doi:10.1146/annurevbiophys042910155245.
 3
Ewing TJA, Makino S, Skillman AG, Kuntz ID. DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases. J Comput Aid Mol Des. 2001; 15(5):411–28. doi:10.1023/A:1011115820450.
 4
Kramer B, Rarey M, Lengauer T. Evaluation of the FLEXX incremental construction algorithm for proteinligand docking. Proteins. 1999; 37(2):228–41. doi:10.1002/(SICI)109701341999110137:2%3C228::AIDPROT8%3E3.0.CO;28.
 5
Wagener M, de Vlieg J, Nabuurs SB. Flexible proteinligand docking using the Fleksy protocol. J Comput Chem. 2012; 33(12):1215–7. doi:10.1002/jcc.22948.
 6
Verdonk ML, Cole JC, Hartshorn MJ, Murray CW, Taylor RD. Improved proteinligand docking using GOLD. ProteinsStructure Funct Genet. 2003; 52(4):609–23. doi:10.1002/prot.10465.
 7
Verdonk ML, Chessari G, Cole JC, Hartshorn MJ, Murray CW, Nissink JWM, Taylor RD, Taylor R. Modeling water molecules in proteinligand docking using GOLD. J Med Chem. 2005; 48(20):6504–15. doi:10.1021/jm050543p.
 8
Goodsell DS, Morris GM, Olson AJ. Automated docking of flexible ligands: applications of AutoDock. J Mol Recognit. 1996; 9(1):1–5. doi:10.1002/(SICI)109913521996019:1%3C1::AIDJMR241%3E3.0.CO;26.
 9
Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS, Olson AJ. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J Comput Chem. 2009; 30(16):2785–91. doi:10.1002/jcc.21256.
 10
Trott O, Olson AJ. Software news and update AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem. 2010; 31(2):455–61. doi:10.1002/jcc.21334.
 11
Vass M, Tarcsay A, Keseru GM. Multiple ligand docking by Glide: implications for virtual secondsite screening. J Comput Aid Mol Des. 2012; 26(7):821–34. doi:10.1007/s1082201295786.
 12
Grosdidier A, Zoete V, Michielin O. SwissDock, a proteinsmall molecule docking web service based on EADock DSS. Nucleic Acids Res. 2011; 39:W270–W277. doi:10.1093/nar/gkr366.
 13
Rao L, Chi B, Ren YL, Li YJ, Xu X, Wan J. DOX: A new computational protocol for accurate prediction of the proteinligand binding structures. J Comput Chem. 2016; 37(3):336–44. doi:10.1002/jcc.24217.
 14
Huang SY, Li M, Wang JX, Pan Y. HybridDock: A hybrid proteinligand docking protocol integrating protein and ligandbased approaches. J Chem Inf Model. 2016; 56(6):1078–87. doi:10.1021/acs.jcim.5b00275.
 15
Pan LL, Zheng Z, Wang T, Merz KM. Free energybased conformational search algorithm using the movable type sampling method. J Chem Theory Comput. 2015; 11(12):5853–64. doi:10.1021/acs.jctc.5b00930.
 16
Kirkpatrick S, Gelatt CD, Vecchi MP. Optimization by simulated annealing. Science. 1983; 220(4598):671–80. doi:10.1126/science.220.4598.671.
 17
Goldberg D. Genetic algorithms in search, optimization and machine learning. New York: AddisonWesley Publishing Company, Inc.; 1989.
 18
Luitz M, Bomblies R, Ostermeir K, Zacharias M. Exploring biomolecular dynamics and interactions using advanced sampling methods. J PhysCondens Mat. 2015; 27(32):323101. doi:10.1088/09538984/27/32/323101.
 19
Valsson O, Parrinello M. Variational approach to enhanced sampling and free energy calculations. Phys Rev Lett. 2014; 113(9):090601. doi:10.1103/Physrevlett.113.090601.
 20
Bernardi RC, Melo MC, Schulten K. Enhanced sampling techniques in molecular dynamics simulations of biological systems. Biochim Biophys Acta. 2015; 1850(5):872–7. doi:10.1016/j.bbagen.2014.10.019.
 21
Swendsen RH, Wang JS. Replica Monte Carlo simulation of spinglasses. Phys Rev Lett. 1986; 57(21):2607–9.
 22
Geyer CJ. Markov chain Monte Carlo maximum likelihood. In: Computing Science and Statistics, Proceedings of the 23rd Symposium on the Interface. Fairfax Station: Interface Foundation of North America: 1991. p. 156–63.
 23
Earl DJ, Deem MW. Parallel tempering: Theory, applications, and new perspectives. Phys Chem Chem Phys. 2005; 7(23):3910–6. doi:10.1039/b509983h.
 24
Zhang Z, Lange OF. Replica exchange improves sampling in lowresolution docking stage of RosettaDock. Plos One. 2013; 8(8):e72096. doi:10.1371/journal.pone.0072096.
 25
Sambridge M. A parallel tempering algorithm for probabilistic sampling and multimodal optimization. Geophys J Int. 2014; 196(1):357–74. doi:10.1093/gji/ggt342.
 26
Russo A, Scognamiglio PL, Enriquez RPH, Santambrogio C, Grandori R, Marasco D, Giordano A, Scoles G, Fortuna S. In silico generation of peptides by replica exchange Monte Carlo: dockingbased optimization of Maltosebindingprotein ligands. Plos One. 2015; 10(8):e0133571. doi:10.1371/journal.pone.0133571.
 27
Fukunishi H, Watanabe O, Takada S. On the Hamiltonian replica exchange method for efficient sampling of biomolecular systems: Application to protein structure prediction. J Chem Phys. 2002; 116(20):9058–67. doi:10.1063/1.1472510.
 28
Luitz MP, Zacharias M. Proteinligand docking using Hamiltonian replica exchange simulations with soft core potentials. J Chem Inf Model. 2014; 54(6):1669–75. doi:10.1021/ci500296f.
 29
Ostermeir K, Zacharias M. Hamiltonian replicaexchange simulations with adaptive biasing of peptide backbone and side chain dihedral angles. J Comput Chem. 2014; 35(2):150–8. doi:10.1002/jcc.23476.
 30
Barducci A, Bussi G, Parrinello M. Welltempered metadynamics: A smoothly converging and tunable freeenergy method. Phys Rev Lett. 2008; 100(2):020603. doi:10.1103/Physrevlett.100.020603.
 31
Bonomi M, Parrinello M. Enhanced sampling in the welltempered ensemble. Phys Rev Lett. 2010; 104(19):190601. doi:10.1103/Physrevlett.104.190601.
 32
Valsson O, Parrinello M. Welltempered variational approach to enhanced sampling. J Chem Theory Comput. 2015; 11(5):1996–2002. doi:10.1021/acs.jctc.5b00076.
 33
Zhang Z, Schindler CEM, Lange OF, Zacharias M. Application of enhanced sampling Monte Carlo methods for highresolution proteinprotein docking in Rosetta. Plos One. 2015; 10(6):e0125941. doi:10.1371/journal.pone.0125941.
 34
Li BD, Li JL, Tang K, Yao X. Manyobjective evolutionary algorithms: A survey. Acm Comput Surv. 2015; 48(1):13. doi:10.1145/2792984.
 35
von Lücken C, Barán B, Brizuela C. A survey on multiobjective evolutionary algorithms for manyobjective problems. Comput Optim Appl. 2014; 58(3):707–56. doi:10.1007/s1058901496441.
 36
Deb K. Multiobjective optimization In: Burke KE, Kendall G, editors. Search Methodologies: Introductory Tutorials in Optimization and Decision. Support Techniques. Boston: Springer: 2014. p. 403–49.
 37
Deb K, Kalyanmoy D. Multiobjective optimization using evolutionary algorithms. Chichester: John Wiley & Sons, Inc; 2001. pp. 389–400.
 38
Meiler J, Baker D. ROSETTALIGAND: Proteinsmall molecule docking with full sidechain flexibility. Proteins. 2006; 65(3):538–48. doi:10.1002/prot.21086.
 39
Davis IW, Baker D. ROSETTALIGAND docking with full ligand and receptor flexibility. J Mol Biol. 2009; 385(2):381–92. doi:10.1016/j.jmb.2008.11.010.
 40
Hawkins PCD, Skillman AG, Warren GL, Ellingson BA, Stahl MT. Conformer generation with OMEGA: Algorithm and validation using high quality structures from the protein databank and Cambridge structural database. J Chem Inf Model. 2010; 50(4):572–84. doi:10.1021/ci100031x.
 41
Hawkins PCD, Nicholls A. Conformer generation with OMEGA: Learning from the data set and the analysis of failures. J Chem Inf Model. 2012; 52(11):2919–36. doi:10.1021/ci300314k.
 42
Gray JJ, Moughon S, Wang C, SchuelerFurman O, Kuhlman B, Rohl CA, Baker D. Proteinprotein docking with simultaneous optimization of rigidbody displacement and sidechain conformations. J Mol Biol. 2003; 331(1):281–99. doi:10.1016/S0022283603006703.
 43
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equation of state calculations by fast computing machines. J Chem Phys. 1953; 21(6):1087–92. doi:10.1063/1.1699114.
 44
Daan F, Berend S. Understanding molecular simulation from algorithms to applications. San Diego: Academic Press; 2002, pp. 111–38.
 45
Thachuk C, Shmygelska A, Hoos HH. A replica exchange Monte Carlo algorithm for protein folding in the HP model. BMC Bioinforma. 2007; 8:342. doi:10.1186/147121058342.
Acknowledgements
The authors thank W. Yang and C.Y. Cao for advice, valuable ideas, and comments. The authors thank Y. Su for help with preparing the paper and H.Zhou for research support. The authors would like to thank Soochow University for support to complete this study.
Funding
Not applicable.
Availability of data and materials
The datasets generated and/or analyzed in the current study are available in the RCSB PDB repository, http://www.rcsb.org.
Authors’ contributions
HRW conceived the study. HRW and HWL developed the MOREMC approach and wrote the statistical software. LXC was involved with data analysis. CXW ran analyses and prepared the text, data, and figures. HRW and QL drafted the manuscript. All of the authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Monte Carlo
 Enhanced sampling method
 Multiobjective optimization
 Protein–small molecule docking
 Complex structure prediction