 Research
 Open access
 Published:
Improved packing of protein side chains with parallel ant colonies
BMC Bioinformatics volume 15, Article number: S5 (2014)
Abstract
Introduction
The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains.
Methods
We present a sidechain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein sidechain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library.
Results
We focused on improving the accuracy of sidechain conformation prediction. For a testing set of 442 proteins, 87.19% of {\mathcal{X}}_{1} and 77.11% of {\mathcal{X}}_{12} angles were predicted correctly within 40° of the Xray positions. We compared the accuracy of pacoPacker with stateoftheart methods, such as CISRR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CISRR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to stateoftheart solutions for packing side chains.
Conclusions
This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a framework for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms.
Introduction
The accurate packing of side chains plays a very important role in modelling protein structures. In ab initio structure prediction, the goal is to choose a rotamer for each position so that the molecule is close to the natural structure. In homology modelling, the goal is to predict the structure of a protein that is homologous to another of a known structure [1, 2]. In protein design, the goal is to find an amino acids sequence that will fold into a particular backbone [3]. In flexible ligand docking, the goal is to display a structural change ranging from large movements of entire domains to small sidechain rearrangements in the binding site [4–6]. Based on Anfinsen's hypothesis [7], the problem of packing side chains is usually mapped into a combinatorial optimisation problem and can be solved in a number of ways. However, a fixed backbone, an energy function and a possible rotamer set are always foundations of this widely studied formulation. All the current existing algorithms for the sidechain problem can be divided into two categories, heuristic and deterministic.
The sidechain problems have been proven as nondeterministic polynomialtime hard (NPhard) [8–10]. Even when an approximate solution is sought within O(cnR) from the optimum, where c is a constant, n is the number of residues and R is the average number of rotamers per residue [11, 12], the packing side chains cannot be solved successfully. Computational complexity analysis suggests that any global optimisation algorithms for this problem may, in the worst case, run in exponential time [11]. When they converge, deadend elimination (DEE) algorithms [13, 14] are designed to find the global minimum energy. Heuristics are not guaranteed to find a global minimum, but they almost always find a lowenergy conformation in a reasonable time [15]. Therefore, heuristic algorithms become a natural choice for tackling the sidechain modelling problem. Traditionally, all heuristic approaches solve such sidechain problems as a singleobjective optimisation Problem (SOP), using Monte Carlo (MC) [16], Ant Colony (AC) [17], and Simulated Annealing (SA) [18]. Some of the heuristic methods combine multiple strategies, such as a combination of DEE and the A^{*} algorithm [19], and combination of SA and MC [20–22]. The common feature of these heuristic approaches is that they all use an optimisation based on a single objective function.
Another method for solving the sidechain problem was by using the theory of decomposing the underlining residue relationship. One such method is SCWRL [23–25, 15], which is widely used because of its speed, accuracy and ease of use. SCWRL3 decomposes original residue graphs to connected subgraphs, which cannot be disconnected by the removal of a single vertex. They find the global minimal energy conformation for the residues in these subgraphs [25]. The authors who proposed the SCWRL methods also observed that residues with a single rotamer or a single neighbour can be eliminated from the residue graph. Then SCWRL4 [15] transfers the original residue graphs to a tree for speeding up the solver. However, in the tightly packed environments of protein interiors, these methods will inherently lead to atomic clashes and hinder the prediction accuracy. Therefore, a new method, CISRR, performs clash detectionguided iterative searches (CIS) of sidechain rotamers whilst continuously optimising sidechain conformations using a conjugate gradients method [26].
In general, methods for predicting side chains seem to be limited not by the quality of search algorithms, but also by the quality of the energy functions employed [23]. An energy function typically consists of a combination of weighted energy terms. It is not hard to find different approaches, which develope distinctive kinds of energy functions. For example, SCWRL3 use an energy function based on logarithmic probabilities of rotamers and a simple repulsive steric energy term [25]. However, SCWRL4 also uses a shortrange, soft van der Waals interaction potential between atoms rather than the linear repulsiveonly function used in SCWRL3, as well as an anisotropic hydrogen bond function similar to that used in Rosetta [15, 27]. The energy function of CISRR is also a modified the energy function of SCWRL3. The first improvement is to add attractive energy and weights to the van der Waals potential. The second improvement is to penalise the drifting of side chain dihedral angles away from the nearest rotamer library values for the original rotamer term. The existence of different energy functions implies that all energy functions are inaccurate in a universal sense (inaccuracy), but each of them is very useful in some specific sense (usefulness). This hypothesis is referred to as the inaccuracy/usefulness property [28]. The approaches based on SOP all use a single inaccuracy energy function to model side chains, so the results are sometimes inaccurate in a quantitative sense for discriminating native or nearnative conformations.
In this study, a novel approach is proposed to assemble the usefulness and decrease the inaccuracy of different energy functions. We believe that it is more reasonable to model packing side chains as a multiobjective optimisation problem (MOP). Different energy functions should be combined to the best possible extent. As this idea has been successfully applied to de novo prediction of protein backbone [28, 29], we also used parallel ant colony optimisation based on SHOP (SHaring One Pheromone matrix) [30]. Our parallel strategy is not for speeding up the predictor, but can be used to hybridise the usefulness of different energy functions. All energy functions can be adopted by an individual colony. In this way, we can avoid the sensitivity of the optimised parameters of energy functions, so we expect to obtain better generality of our predictor. This parallel strategy has been validated experimentally.
Methods
We propose a novel parallel ant colony optimisation (ACO) metaheuristic framework for packing protein side chains by singleheuristic multiobjective algorithms (SHMO) to reduce the inaccuracy of a single energy. We denote a heuristic algorithm by h and different energy functions by ε = {E_{1} , . . . , E_{ k } }, where the number of threads amount to k. This type of algorithm is generally denoted by {\prod}_{h}\left({E}_{i}\text{\Theta}\right) where \text{\Theta} refers to the control parameters in terms of heuristic search algorithms and can usually be tuned empirically before starting, or adaptively during the algorithm [28]. In the pacoPacker algorithm, h adopts ACO, and \text{\Theta} contains two variables, private and public. To be more specific, all ant colonies share one common pheromone matrix T as a public variable, and each ant colony has a private variable including heuristic matrix H_{ i } and two other parameters, α_{ i } and β_{ i } . A = {α_{1} , . . . , α_{ k } }, determines the importance of the pheromone and B = {β_{1} , . . . , β_{ k } }, determines the importance of the heuristic matrix H = {H_{1} , . . . , H_{ k } }. This paper's method can be described as {\prod}_{AC}\left({E}_{i}{\alpha}_{i},{\beta}_{i},{H}_{i},T\right). The Rosetta3.4 platform [31] is quite mature and supports the objectoriented paradigm, therefore pacoPacker uses Rosetta3.4 for building rotamer libraries, constructing interaction graphs, and scoring structures. Using Rosetta3.4 and OpenMP [32], our scheme is easy to implement.
Search space
For an aminoacid sequence t with n length of residues, its side chains are packed with the lowest free energy. Let the rotamer library for t be R = {R_{1} , . . . , R_{ n }}, where the rotamer set is {R}_{i}=\left\{{r}_{1},...,{r}_{{m}_{i}}\right\} for each residue i in t, the number of rotamers belonging to R_{ i } amount to m_{ i }, and different rotamer sets have a different quantity of rotamers. Rotamers were read from Dunbrack backbone dependent rotamer library (2010 version), such that frequencies and dihedral angles varied with the backbone dihedral angles Φ and ψ [33].
Energy function
We adopted the same energy functions used by Rosetta. These scores are combinations of different weights and energy items, such as residueenvironment and residueresidue interactions, secondary structure packing, chain density and excluded volume [28]. It does not matter which function is more accurate as all the energy functions share the inaccuracy/usefulness property. The Rosetta energy functions are adopted here to illustrate the implementation of our parallel approach. We forked eight threads to run separately using different energy functions, which rule out any sidechainindependent energy terms. Different threads have different private variables, which are listed in Table 1. Table 1 shows the weight of each score term on different score functions. Each score term is represented by letter (A, B, etc.), which correspond to Table 2.
Implementation of the algorithm
Eight parallel threads were created in our SHMO implementation. Figure 1 depicts the design of pacoPacker. Using a protein backbone as the input of pacoPacker, the rotamer library is generated based on the target sequence by using the Rosetta platform. The outputs are proteins with side chains predicted by ant colonies. From the information shown in Figure 1, eight different ant colonies share a single common pheromone matrix T to exchange their search experience asynchronously. Each colony is directed by its own energy functions, which both coevolve towards a better state.
Next, we will focus on a single ant colony to pack side chains. Construction by an ant colony is described as follows:

1.
Conduct side chains based on the selection equation for each ant.

2.
Perform the local search on each oddnumbered iteration ant.

3.
Update global best ant s_{ gb }with iteration best ant s_{ ib } if E(s_{ ib }) is lower.

4.
Update the pheromone matrix T based on s_{ gb }.

5.
If the termination criterion is met, let's return to s_{ gb }, or repeat steps 1 to 5.
In this workflow, each colony terminates when one of the following criteria is met: the colony runs for a specified number of iterations; and there is no energy improvement during the last several iterations. Two important equations, the selection equation and the update pheromone matrix equation are explained below.
Each ant conducts the conformation by assembling rotamers from R. The ant picks up a rotamer r_{ j } from the rotamer set R_{ i } ∈ R for residue i. For g^{th} thread, the rotamer selection is determined by the current heuristic and historical knowledge, described by the following selection equation (Equation 1):
Where τ_{ ij } is defined later in Equation 3, which denotes the useful experience accumulated by previous searches, η_{ ij } denotes the heuristic value. Let the heuristic matrix be: {H}_{g}={\prod}_{i\in n,j\in {m}_{i}}{\eta}_{ij}, where η_{ ij } is the energy difference induced by residue i picking up rotamer r_{ j } , which is standardised according to Equation 2.
q_{0} tunes the bias between the two selection policies. A random probability q will be generated when a rotamer is needed. Once the rotamer is picked, {r}_{j}^{*} is inserted into the protein backbone from the position of residue i.
The second formula updates the pheromone matrix T after all the ants have finished their work in an iteration. Let the pheromone matrix be: T={\prod}_{i\in n,j\subseteq {m}_{i}}{\tau}_{ij}, where τ_{ ij } is the pheromone value accumulated by residue i packing rotamer r_{ j }. For each r_{ j } of residue i in s_{ gb }, the value is updated using Equation 3.
Where \rho \in \left[0,1\right) is the pheromone evaporation factor, and Δτ_{ ij } is calculated by a quality function which converts the energy value to a certain amount of pheromone. We describe this situation in Equation 4.
Our SHMO scheme is simple with the help of OpenMP. The pheromone matrix is extracted from AC, and multiple colonies are run as parallel threads with private variables in each colony to coevolve with the common pheromone matrix.
Rotamer minimization
Rotamer minimisation was implemented in two ways. First, the pacoPacker runs on each normal rotamer as it is placed; after that, the pacoPacker runs a global minimisation on the side chains at all the packable positions. We will not provide much detail about this method, as the Rosetta3.4 mechanism was adopted to achieve it. Second, pacoPacker runs a gradient minimisation on each rotamer as it is placed and keeps the minimised rotamers. To use this second method, we devised a new data structure to remember minimised rotamers (Figure 2). If there are M = m_{1} + m_{2} +· · ·+m_{ n } rotamers, and each normal rotamer has its own alternative obtained by minimising itself, they are called subrotamers. We describe the set of subrotamers for r_{ j } from R_{ i } as A_{ ij } , which can be calculated quantitatively by Equation 5, where i\in n,j\in {m}_{i},{r}_{j}\in {R}_{i}
A detailed explanation of this equation is shown in Figure 3. An ant selects the rotamer r_{ j } for the i^{th} residue based on Equation 1, then find its subrotamers A_{ ij } as shown in step 5 in Figure 3, and randomly picks up a subrotamer from A_{ ij } to replace the primary rotamer at position i. The 9^{th} step attempts to optimise the subrotamer achieved by Rosetta. All minimisation algorithms in Rosetta choose a vector as the descent direction, determine a step along that vector, then choose a new direction and repeat [31]. We selected "dfpmin" as an exact line search for these steps. If this minimised subrotamer results in a drop in energy, it was kept and made into the residue i. Minimisation needs more time, so for researches with sufficient time who want to obtain more accurate results, this application would be a good choice.
Results
The principal idea behind pacoPacker is to make the parallel ant colonies share only one pheromone matrix, which can combine different energies to guide each ant in constructing protein sidechain conformations. We tested pacoPacker by making comparisons with two popular sidechain modelling programs, CISRR and SCWRL4. CISRR combines a novel clashdetection guided iterative search (CIS) algorithm with continuous torsion space optimisation of rotamers (RR) [26]. SCWRL4 is an improved version of SCWRL3 [25] which uses the new rotamer library, more efficient search algorithms and a soft Vander Waals potential plus hydrogen bonding based scoring function [15]. All these predictors are based on discrete rotamers.
Experimental settings
We performed all the tests on a computer cluster containing 20 nodes with 16core 1.9 GHz AMD Opteron CPU per node under Linux 2.6.18 and GCC 4.1.2. CISRR and SCWRL4 were ran using their default settings to produce one prediction for each test instance. We ran pacoPacker, with eight ant colonies running in parallel, on the same test instances. As all these threads were synchronised to work out eight predictions and each is a nondeterministic approach, different numbers of decoys for each test instance were generated. The number of predictions for each test instance ranged from 2130 ([PDB:1CBN] 46 residues) to 4650 ([PDB:1B9O] 635 residues). We selected the highest accuracy rate of each test instance from pacoPacker to compare with CISRR and SCWRL4.
The benchmark instances were taken directly from other research, which contained 442 protein targets with lengths of 46 to 1184 amino acid residues [26, 15]. Because [PDB:2QOL] cannot be predicted by CISRR and [PDB:1G8Q] is considered as a missing main chain atom by Rosetta, we excluded them from this benchmark. A fair evaluation is a difficult task, so we used two criteria to assess the accuracy of side chain packing. One was defined as the percentage of correctly predicted {\mathcal{X}}_{1} and {\mathcal{X}}_{12} angles within thresholds of 40° and 20° compared with the native structures. The second criterion was the root mean square deviation (RMSD) of the sidechain heavy atoms [34]. Both evaluation methodologies are adapted from thirdparty software [26, 35], where they consider residues with symmetric terminal groups, or with a possibly flipped terminal group.
Protein chain based evaluation performance
Firstly, we compared pacoPacker with CISRR and SCWRL4 in sidechain modelling. As shown in Table 3 for the accuracy improvement in terms of correct \mathcal{X} dihedral angles and RMSD, pacoPacker is comparable to the recently developed sidechain programs. As SCWRL4 showed relatively poor performance, so we only present a detailed comparison between pacoPacker and CISRR. Within 40°, the {\mathcal{X}}_{1} of the whole protein was improved by 2.31% with pacoPacker (87.19% by pacoPacker versus 84.88% by CISRR), and the χ 12 was comparable (77.11% by pacoPacker versus 77.13% by CISRR). A similarly consistent trend was also seen for the accuracy rate of {\mathcal{X}}_{1} and {\mathcal{X}}_{12} within 20°. In case of the other metrics, pacoPacker is the best with its lowest RMSD.
We made further comparisons between the three predictors. In Figures 4 to 7, each symbol represents a single protein target, a red cross denotes a better pacoPacker yield and a blue crisscross denotes a worse yield. Some differences between the two methods were less than 0.5% for the accuracy of \mathcal{X} dihedral angles and 0.005Å for RMSD, respectively. These are denoted by a green asterisk. As shown in Figures 4 and 6, when compared with CISRR, there were 342, 210 and 242 targets predicted by pacoPacker for {\mathcal{X}}_{1}, {\mathcal{X}}_{12} and RMSD respectively, showing that it has the advantage over CISRR. Moreover, Figures 5 and 7 show that pacoPacker was better than SCWRL4 for 332, 211 and 267 targets for {\mathcal{X}}_{1}, {\mathcal{X}}_{12} and RMSD respectively. These results clearly show that pacoPacker has a high reliability based on SHOP.
Individual residues based evaluation performance
Next, we sought to evaluate how pacoPacker works on different types of amino acids. Figure 8 shows that pacoPacker improved the percent correct of both {\mathcal{X}}_{1} and {\mathcal{X}}_{12} dihedral angles. For {\mathcal{X}}_{1}, excluding Ala and Gly, pacoPacker has 15 types of amino acids holding the top spot. In Glu, Lys and Ser, they had an average increase of more than 5%. PacoPacker made the greatest contribution to the accuracy of {\mathcal{X}}_{1}. It also can be proven from the situation that pacoPacker made the greatest contribution to the accuracy of {\mathcal{X}}_{1} via its accurate prediction of Ser and Thr. The residues, which were predicted accurately, were predominantly aliphatic and aromatic residue types. For {\mathcal{X}}_{12}, pacoPacker accounted for 6 types of amino acids in the lead, whilst CISRR accounded for 5 and SCWRL4 accounted for 3. Previous research has shown that for the short polar amino acids (Asp, Asn and Ser), CISRR shows lower performance, which could be due to the difference in scoring functions [26]. However, pacoPacker improves them both in {\mathcal{X}}_{1} and {\mathcal{X}}_{12}, which has again shows the importance of combining different energies.
Effects of rotamer minimisation
From the results presented in the previous two sections, we show the superiority of {\mathcal{X}}_{1} while the performance of {\mathcal{X}}_{2} is not strong. For example, when compare the number of red crosses on Figure 4(A) with Figure 4(B), pacoPacker has 342 bestperforming proteins for {\mathcal{X}}_{1}, which is more than the 210 bestperforming proteins for {\mathcal{X}}_{12}. In addition, Cys, Ser, Thr and Val only on wing {\mathcal{X}}_{1}, clearly dominate the area of {\mathcal{X}}_{1}. High quality {\mathcal{X}}_{1} is significant for sidechain prediction, because it is a foundation of residue. On the other side, there is still room for improvement of {\mathcal{X}}_{2}, so we naturally optimised each rotamer as it was placed (rotamer minimization). An overview of how this method performs is given below.
Figure 9 shows the effects of minimisation by comparing RMSD among three different models, and test instances is randomly from the benchmark as above. Model 1 (blue asterisk) uses gradient minimisation on each rotamer when it is placed (the method presented in this paper), model 2 (red solid box) packs the same way as model 1 but then runs a global minimisation on the side chains at all packable positions, and model 3 (green box) with normal rotamers is optimised by global minimisation only. Figure 9 shows that models 1 and 2 both decrease the RMSD compared with model 3, which means that our method can contribute to the quality of repacking. Most of time model 1 is comparable with model 2, so we can only use our method to gain optimisation as well as global minimisation. However, there were 18 proteins (data not shown), which had higher RMSD predicted by rotamer minimization. These can be classified into two groups: Those which already have high accuracies of {\mathcal{X}}_{1} and {\mathcal{X}}_{12} within 20°( with approximately 80% accuracy) and those which are large in size, including [PDB:2OTU] (976 residues), [PDB:1OK7] (739 residues), [PDB:1YTL] (631 residues), [PDB:2EPI] (388 residues). This means that structural integrity is important for proteins that are large in size, because rotamer minimisation cannot play a full role.
Discussion
Under the inaccuracy/usefulness property hypothesis, SOP is not an ideal computational model for protein structure prediction [28]. This means that even if the corresponding SOP is completely solved, the SOP answer may not be correct, and in most cases it will not be perfect. PacoPacker proposes a novel hybrid parallel approach to repack protein side chains based on SHOP [28, 30].
Table 4 shows the distribution of best conformations for each protein from pacoPacker on different threads. The best conformations are constructed on different threads, where each energy is very useful in some specific sense, but is inaccurate in a universal sense. Therefore, we need an approach based on MOP. For using MOP to solve protein structure prediction problems, the Paretobased approach, which focuses on the dominance analysis of the solutions found by the search, will probably result in a large Pareto front with solutions where no single energy function can be dominant. PacoPacker is different as it does not construct a Pareto front, but collects the best solutions found by parallel search procedures directed by different energy functions. The SHOP strategy was proposed as a useful parallel ACO method [30]. Using SHOP, these multiple colonies of pacoPacker can exchange their search experiences asynchronously and coevolve towards better solutions while each colony is guided by its own objective function and algorithm parameters [28]. In 442 structures test set, the close half targets of pacoPacker maintain optimum accuracy, unlike that in the other two programs. Why does the pacoPacker approach have a good performance?
Firstly, from the view of an individual colony, the pheromone matrix accumulates the search experience of ants, which describes which rotamer should be a priori considered as the choice for each residue. Such an experience bias is established by evaluating the conformations found by the previous generation of ants using the corresponding energy function. Then by sharing T , each colony can achieve different search experiences from other colonies asynchronously, and each colony is also directed by their own energy functions to coevolve towards a better state. The process of sharing one T can accumulate the search experience of all parallel ant colonies and propagate the bias among them. As the pheromone matrix T provides an indeterministic bias for all the running colonies, it may be easier to find better solutions.
For example, [PDB:2FLU] was one of the most accurate predictions from pacoPacker with a RMSD of 0.98, while the second most accurate prediction was 1.33 from CISRR. The best conformation appeared in the 27^{th} generation of thread 8, which ends on this generation. The other threads ended incrementally after the 29^{th} generation. In this situation, almost all threads stop at the same time, which gives pheromone matrix T enough time to learn experiences fairly from different threads. There were some poor solutions, such as [PDB:1WVH] where the RMSD was increased by 1.23 with pacoPacker. In this case, the best conformation of pacoPacker was structured by thread 6 on the 40^{th} generation, and other threads stopped after 25^{th} generation. This may be because some threads accomplish too early so that the pheromone matrix T learns search experiences with bias, which may be solved with more time. From a user perspective, we summarise when pacoPacker performs well in Table 5. This shows that the proportion of proteins repacked increased as the sequence length decreased. Therefore pacoPacker can provide the highest accuracy for packing side chains when the sequence length is lower than 400 amino acids.
Conclusions
In summary, pacoPacker makes each heuristic search work with its own energy function and they complement each other in a qualitative way. Different energy functions train search trajectories to obtain different search intelligences. Our parallel strategy diffuses the intelligence to all the parallel searches by SHOP, so that all ant colonies can share their accumulated hybridised intelligence. Such coevolvement guided by multiple objective functions simultaneously has an impact on the nature folding procedure of native proteins [28]. The prediction accuracy of packing side chains was improved for most of the proteins, which proves that pacoPacker has feasibility and practical value, but at a cost of increased CPU time. However, an important reason for using pacoPacker is that it does not need training and tuning of the energy function parameters before the predictor can work.
References
Smith CA, Kortemme T: Backrublike backbone simulation recapitulates natural protein conformational variability and improves mutant sidechain prediction. Journal of molecular biology. 2008, 380 (4): 742756. 10.1016/j.jmb.2008.05.023.
Davis IW, Arendall WB, Richardson DC, Richardson JS: The backrub motion: how protein backbone shrugs when a sidechain dances. Structure. 2006, 14 (2): 265274. 10.1016/j.str.2005.10.007.
Kingsford CL, Chazelle B, Singh M: Solving and analyzing sidechain positioning problems using linear and integer programming. Bioinformatics. 2005, 21 (7): 10281039. 10.1093/bioinformatics/bti144.
Gaudreault F, Chartier M, Najmanovich R: Sidechain rotamer changes upon ligand binding: common, crucial, correlate with entropy and rearrange hydrogen bonding. Bioinformatics. 2012, 28 (18): 423430. 10.1093/bioinformatics/bts395.
Raveh B, London N, Zimmerman L, SchuelerFurman O: Rosetta, flexpepdock abinitio: simultaneous folding, docking and refinement of peptides onto their receptors. PLoS One. 2011, 6 (4): 1893410.1371/journal.pone.0018934.
Wang C, SchuelerFurman O, Baker D: Improved sidechain modeling for proteinprotein docking. Protein Science. 2005, 14 (5): 13281339. 10.1110/ps.041222905.
Anfinsen CB, Haber E, Sela M, White F: The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proceedings of the National Academy of Sciences of the United States of America. 1961, 47 (9): 130910.1073/pnas.47.9.1309.
Pierce NA, Winfree E: Protein design is nphard. Protein Engineering. 2002, 15 (10): 779782. 10.1093/protein/15.10.779.
Unger R, Moult J: Finding the lowest free energy conformation of a protein is an nphard problem: proof and implications. Bulletin of Mathematical Biology. 1993, 55 (6): 11831198. 10.1007/BF02460703.
Hart WE, Istrail S: Robust proofs of nphardness for protein folding: general lattices and energy potentials. Journal of Computational Biology. 1997, 4 (1): 122. 10.1089/cmb.1997.4.1.
Xie W, Sahinidis NV: Residuerotamerreduction algorithm for the protein sidechain conformation problem. Bioinformatics. 2006, 22 (2): 188194. 10.1093/bioinformatics/bti763.
Chazelle B, Kingsford C, Singh M: A semidefinite programming approach to side chain positioning with new rounding strategies. INFORMS Journal on Computing. 2004, 16 (4): 380392. 10.1287/ijoc.1040.0096.
Desmet J, De Maeyer M, Hazes B, Lasters I: The deadend elimination theorem and its use in protein sidechain positioning. Nature. 1992, 356 (6369): 539542. 10.1038/356539a0.
Desmet J, De Maeyer M, Lasters I: Theoretical and algorithmical optimization of the deadend elimination theorem. Pac Symp Biocomput. 1997, 2: 122133.
Krivov GG, Shapovalov MV, Dunbrack RL: Improved prediction of protein sidechain conformations with scwrl4. Proteins: Structure, Function, and Bioinformatics. 2009, 77 (4): 778795. 10.1002/prot.22488.
Gray JJ, Moughon S, Wang C, SchuelerFurman O, Kuhlman B, Rohl CA, Baker D: Proteinprotein docking with simultaneous optimization of rigidbody displacement and sidechain conformations. Journal of molecular biology. 2003, 331 (1): 281299. 10.1016/S00222836(03)006703.
Hsin JL, Yang CB, Huang KS, Yang CN: An ant colony optimization approach for the protein side chain packing problem. Proceedings of the 6th WSEAS International Conference on Microelectronics, Nanoelectronics, Optoelectronics. 2007, 4449.
Roitberg A, Elber R: Modeling side chains in peptides and proteins: Application of the locally enhanced sampling and the simulated annealing methods to find minimum energy conformations. The Journal of chemical physics. 1991, 95 (12): 92779287. 10.1063/1.461157.
Leach AR, Lemon AP: Exploring the conformational space of protein side chains using deadend elimination and the a* algorithm. Proteins Structure Function and Genetics. 1998, 33 (2): 227239. 10.1002/(SICI)10970134(19981101)33:2<227::AIDPROT7>3.0.CO;2F.
Kuhlman B, Baker D: Native protein sequences are close to optimal for their structures. Proceedings of the National Academy of Sciences. 2000, 97 (19): 1038310388. 10.1073/pnas.97.19.10383.
LeaverFay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, Kaufman K, Renfrew PD, Smith CA, Sheffler W: Rosetta3: an objectoriented software suite for the simulation and design of macromolecules. Methods Enzymol. 2011, 487: 545574.
Holm L, Sander C: Fast and simple monte carlo algorithm for side chain optimization in proteins: application to model building by homology. Proteins: Structure, Function, and Bioinformatics. 1992, 14 (2): 213223. 10.1002/prot.340140208.
Bower MJ, Cohen FE, Dunbrack RL: Prediction of protein sidechain rotamers from a backbonedependent rotamer library: a new homology modeling tool. Journal of molecular biology. 1997, 267 (5): 12681282. 10.1006/jmbi.1997.0926.
Dunbrack RL: Comparative modeling of casp3 targets using psiblast and scwrl. Proteins: Structure, Function, and Bioinformatics. 1999, 37 (S3): 8187. 10.1002/(SICI)10970134(1999)37:3+<81::AIDPROT12>3.0.CO;2R.
Canutescu AA, Shelenkov AA, Dunbrack RL: A graphtheory algorithm for rapid protein sidechain prediction. Protein science. 2003, 12 (9): 20012014. 10.1110/ps.03154503.
Cao Y, Song L, Miao Z, Hu Y, Tian L, Jiang T: Improved sidechain modeling by coupling clashdetection guided iterative search with rotamer relaxation. Bioinformatics. 2011, 27 (6): 785790. 10.1093/bioinformatics/btr009.
Kortemme T, Morozov AV, Baker D: An orientationdependent hydrogen bonding potential improves prediction of specificity and structure for proteins and proteinprotein complexes. Journal of molecular biology. 2003, 326 (4): 12391259. 10.1016/S00222836(03)000214.
Lu¨ Q, Xia XY, Chen R, Miao DJ, Chen SS, Quan LJ, Li HO: When the lowest energy does not induce native structures: parallel minimization of multienergy values by hybridizing searching intelligences. PloS one. 2012, 7 (9): 4496710.1371/journal.pone.0044967.
Lv Q, Wu H, Wu J, Huang X, Luo X, Qian P: A parallel ant colonies approach to de novo prediction of protein backbone in casp8/9. Science China Information Sciences. 2013, 56 (10): 113.
Lv Q, Xia X, Qian P: A parallel aco approach based on one pheromone matrix. Ant Colony Optimization and Swarm Intelligence. 2006, Springer, 4150: 332339. 10.1007/11839088_30.
Rohl CA, Strauss CE, Misura KM, Baker D: Protein structure prediction using rosetta. Methods in enzymology. 2004, 383: 6693.
Dagum L, Menon R: Openmp: an industry standard api for sharedmemory programming. Computational Science & Engineering, IEEE. 1998, 5 (1): 4655. 10.1109/99.660313.
Shapovalov MV, Dunbrack RL: A smoothed backbonedependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure. 2011, 19 (6): 844858. 10.1016/j.str.2011.03.019.
Miao Z, Cao Y, Jiang T: Rasp: rapid modeling of protein side chain conformations. Bioinformatics. 2011, 27 (22): 31173122. 10.1093/bioinformatics/btr538.
Eyal E, Najmanovich R, Mcconkey BJ, Edelman M, Sobolev V: Importance of solvent accessibility and contact surfaces in modeling sidechain conformations in proteins. Journal of computational chemistry. 2004, 25 (5): 712724. 10.1002/jcc.10420.
Acknowledgements
The authors acknowledge the support received from Rong Chen for helping with the analysis of the experiments and Caixia Wang for helping with the preparation of the paper. Funder had no role in study design, data collection and analysis, decision to publish, or preparation of the paper.
Declarations
This study was supported by a grant from the National Natural Science Foundation of China (No. 61170125).
This article has been published as part of BMC Bioinformatics Volume 15 Supplement 12, 2014: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2013): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/15/S12.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
Q Lü designed and developed the pacoPacker framework. LJ Quan implemented and improved pacoPacker. LJ Quan, HO Li and HJ Wu performed the experiments. LJ Quan and XX Xia drafted the manuscript. All of the authors read and approved the manuscript.
Lijun Quan, Qiang Lü contributed equally to this work.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.
The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Quan, L., Lü, Q., Li, H. et al. Improved packing of protein side chains with parallel ant colonies. BMC Bioinformatics 15 (Suppl 12), S5 (2014). https://doi.org/10.1186/1471210515S12S5
Published:
DOI: https://doi.org/10.1186/1471210515S12S5