Skip to main content

Improved packing of protein side chains with parallel ant colonies

Abstract

Introduction

The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains.

Methods

We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library.

Results

We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X 1 and 77.11% of X 1 2 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains.

Conclusions

This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a frame-work for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms.

Introduction

The accurate packing of side chains plays a very important role in modelling protein structures. In ab initio structure prediction, the goal is to choose a rotamer for each position so that the molecule is close to the natural structure. In homology modelling, the goal is to predict the structure of a protein that is homologous to another of a known structure [1, 2]. In protein design, the goal is to find an amino acids sequence that will fold into a particular backbone [3]. In flexible ligand docking, the goal is to display a structural change ranging from large movements of entire domains to small side-chain rearrangements in the binding site [46]. Based on Anfinsen's hypothesis [7], the problem of packing side chains is usually mapped into a combinatorial optimisation problem and can be solved in a number of ways. However, a fixed backbone, an energy function and a possible rotamer set are always foundations of this widely studied formulation. All the current existing algorithms for the side-chain problem can be divided into two categories, heuristic and deterministic.

The side-chain problems have been proven as non-deterministic polynomial-time hard (NP-hard) [810]. Even when an approximate solution is sought within O(cnR) from the optimum, where c is a constant, n is the number of residues and R is the average number of rotamers per residue [11, 12], the packing side chains cannot be solved successfully. Computational complexity analysis suggests that any global optimisation algorithms for this problem may, in the worst case, run in exponential time [11]. When they converge, dead-end elimination (DEE) algorithms [13, 14] are designed to find the global minimum energy. Heuristics are not guaranteed to find a global minimum, but they almost always find a low-energy conformation in a reasonable time [15]. Therefore, heuristic algorithms become a natural choice for tackling the side-chain modelling problem. Traditionally, all heuristic approaches solve such side-chain problems as a single-objective optimisation Problem (SOP), using Monte Carlo (MC) [16], Ant Colony (AC) [17], and Simulated Annealing (SA) [18]. Some of the heuristic methods combine multiple strategies, such as a combination of DEE and the A* algorithm [19], and combination of SA and MC [2022]. The common feature of these heuristic approaches is that they all use an optimisation based on a single objective function.

Another method for solving the side-chain problem was by using the theory of decomposing the underlining residue relationship. One such method is SCWRL [2325, 15], which is widely used because of its speed, accuracy and ease of use. SCWRL3 decomposes original residue graphs to connected subgraphs, which cannot be disconnected by the removal of a single vertex. They find the global minimal energy conformation for the residues in these subgraphs [25]. The authors who proposed the SCWRL methods also observed that residues with a single rotamer or a single neighbour can be eliminated from the residue graph. Then SCWRL4 [15] transfers the original residue graphs to a tree for speeding up the solver. However, in the tightly packed environments of protein interiors, these methods will inherently lead to atomic clashes and hinder the prediction accuracy. Therefore, a new method, CIS-RR, performs clash detection-guided iterative searches (CIS) of side-chain rotamers whilst continuously optimising side-chain conformations using a conjugate gradients method [26].

In general, methods for predicting side chains seem to be limited not by the quality of search algorithms, but also by the quality of the energy functions employed [23]. An energy function typically consists of a combination of weighted energy terms. It is not hard to find different approaches, which develope distinctive kinds of energy functions. For example, SCWRL3 use an energy function based on logarithmic probabilities of rotamers and a simple repulsive steric energy term [25]. However, SCWRL4 also uses a short-range, soft van der Waals interaction potential between atoms rather than the linear repulsive-only function used in SCWRL3, as well as an anisotropic hydrogen bond function similar to that used in Rosetta [15, 27]. The energy function of CIS-RR is also a modified the energy function of SCWRL3. The first improvement is to add attractive energy and weights to the van der Waals potential. The second improvement is to penalise the drifting of side chain dihedral angles away from the nearest rotamer library values for the original rotamer term. The existence of different energy functions implies that all energy functions are inaccurate in a universal sense (inaccuracy), but each of them is very useful in some specific sense (usefulness). This hypothesis is referred to as the inaccuracy/usefulness property [28]. The approaches based on SOP all use a single inaccuracy energy function to model side chains, so the results are sometimes inaccurate in a quantitative sense for discriminating native or near-native conformations.

In this study, a novel approach is proposed to assemble the usefulness and decrease the inaccuracy of different energy functions. We believe that it is more reasonable to model packing side chains as a multi-objective optimisation problem (MOP). Different energy functions should be combined to the best possible extent. As this idea has been successfully applied to de novo prediction of protein backbone [28, 29], we also used parallel ant colony optimisation based on SHOP (SHaring One Pheromone matrix) [30]. Our parallel strategy is not for speeding up the predictor, but can be used to hybridise the usefulness of different energy functions. All energy functions can be adopted by an individual colony. In this way, we can avoid the sensitivity of the optimised parameters of energy functions, so we expect to obtain better generality of our predictor. This parallel strategy has been validated experimentally.

Methods

We propose a novel parallel ant colony optimisation (ACO) metaheuristic frame-work for packing protein side chains by single-heuristic multi-objective algorithms (SHMO) to reduce the inaccuracy of a single energy. We denote a heuristic algorithm by h and different energy functions by ε = {E1 , . . . , E k }, where the number of threads amount to k. This type of algorithm is generally denoted by h ( E i | Θ ) where Θ refers to the control parameters in terms of heuristic search algorithms and can usually be tuned empirically before starting, or adaptively during the algorithm [28]. In the pacoPacker algorithm, h adopts ACO, and Θ contains two variables, private and public. To be more specific, all ant colonies share one common pheromone matrix T as a public variable, and each ant colony has a private variable including heuristic matrix H i and two other parameters, α i and β i . A = 1 , . . . , α k }, determines the importance of the pheromone and B = 1 , . . . , β k }, determines the importance of the heuristic matrix H = {H1 , . . . , H k }. This paper's method can be described as A C ( E i | α i , β i , H i , T ) . The Rosetta3.4 platform [31] is quite mature and supports the object-oriented paradigm, therefore pacoPacker uses Rosetta3.4 for building rotamer libraries, constructing interaction graphs, and scoring structures. Using Rosetta3.4 and OpenMP [32], our scheme is easy to implement.

Search space

For an amino-acid sequence t with n length of residues, its side chains are packed with the lowest free energy. Let the rotamer library for t be R = {R1 , . . . , R n }, where the rotamer set is R i = { r 1 , . . . , r m i } for each residue i in t, the number of rotamers belonging to R i amount to m i , and different rotamer sets have a different quantity of rotamers. Rotamers were read from Dunbrack backbone dependent rotamer library (2010 version), such that frequencies and dihedral angles varied with the backbone dihedral angles Φ and ψ [33].

Energy function

We adopted the same energy functions used by Rosetta. These scores are combinations of different weights and energy items, such as residue-environment and residue-residue interactions, secondary structure packing, chain density and excluded volume [28]. It does not matter which function is more accurate as all the energy functions share the inaccuracy/usefulness property. The Rosetta energy functions are adopted here to illustrate the implementation of our parallel approach. We forked eight threads to run separately using different energy functions, which rule out any side-chain-independent energy terms. Different threads have different private variables, which are listed in Table 1. Table 1 shows the weight of each score term on different score functions. Each score term is represented by letter (A, B, etc.), which correspond to Table 2.

Table 1 Score function and ACO parameters.
Table 2 Score terms.

Implementation of the algorithm

Eight parallel threads were created in our SHMO implementation. Figure 1 depicts the design of pacoPacker. Using a protein backbone as the input of pacoPacker, the rotamer library is generated based on the target sequence by using the Rosetta platform. The outputs are proteins with side chains predicted by ant colonies. From the information shown in Figure 1, eight different ant colonies share a single common pheromone matrix T to exchange their search experience asynchronously. Each colony is directed by its own energy functions, which both co-evolve towards a better state.

Figure 1
figure 1

pacoPacker schematic flowchart.

Next, we will focus on a single ant colony to pack side chains. Construction by an ant colony is described as follows:

  1. 1.

    Conduct side chains based on the selection equation for each ant.

  2. 2.

    Perform the local search on each odd-numbered iteration ant.

  3. 3.

    Update global best ant s gb with iteration best ant s ib if E(s ib ) is lower.

  4. 4.

    Update the pheromone matrix T based on s gb .

  5. 5.

    If the termination criterion is met, let's return to s gb , or repeat steps 1 to 5.

In this workflow, each colony terminates when one of the following criteria is met: the colony runs for a specified number of iterations; and there is no energy improvement during the last several iterations. Two important equations, the selection equation and the update pheromone matrix equation are explained below.

Each ant conducts the conformation by assembling rotamers from R. The ant picks up a rotamer r j from the rotamer set R i R for residue i. For gth thread, the rotamer selection is determined by the current heuristic and historical knowledge, described by the following selection equation (Equation 1):

r j * = max r j R i [ τ i j ] α g [ η i j ] β g , if q < q 0 ; randomly pick up r j from R i , otherwise .
(1)

Where τ ij is defined later in Equation 3, which denotes the useful experience accumulated by previous searches, η ij denotes the heuristic value. Let the heuristic matrix be: H g = i n , j m i η i j , where η ij is the energy difference induced by residue i picking up rotamer r j , which is standardised according to Equation 2.

η i j = π 2 - arctan Δ E .
(2)

q0 tunes the bias between the two selection policies. A random probability q will be generated when a rotamer is needed. Once the rotamer is picked, r j * is inserted into the protein backbone from the position of residue i.

The second formula updates the pheromone matrix T after all the ants have finished their work in an iteration. Let the pheromone matrix be: T= i n , j m i τ i j , where τ ij is the pheromone value accumulated by residue i packing rotamer r j . For each r j of residue i in s gb , the value is updated using Equation 3.

τ i j = ( 1 - ρ ) τ i j + ρ Δ τ i j .
(3)

Where ρ [ 0 , 1 ) is the pheromone evaporation factor, and Δτ ij is calculated by a quality function which converts the energy value to a certain amount of pheromone. We describe this situation in Equation 4.

Δ τ i j = π 2 - arctan E ( s g b ) n , if r j of residue i s g b ; τ i j , otherwise .
(4)

Our SHMO scheme is simple with the help of OpenMP. The pheromone matrix is extracted from AC, and multiple colonies are run as parallel threads with private variables in each colony to co-evolve with the common pheromone matrix.

Rotamer minimization

Rotamer minimisation was implemented in two ways. First, the pacoPacker runs on each normal rotamer as it is placed; after that, the pacoPacker runs a global minimisation on the side chains at all the packable positions. We will not provide much detail about this method, as the Rosetta3.4 mechanism was adopted to achieve it. Second, pacoPacker runs a gradient minimisation on each rotamer as it is placed and keeps the minimised rotamers. To use this second method, we devised a new data structure to remember minimised rotamers (Figure 2). If there are M = m1 + m2 +· · ·+m n rotamers, and each normal rotamer has its own alternative obtained by minimising itself, they are called subrotamers. We describe the set of subrotamers for r j from R i as A ij , which can be calculated quantitatively by Equation 5, where in,j m i , r j R i

Figure 2
figure 2

Data structures of rotamers and subrotamers.

A i j = { r j } A i j = { min ( r j ) , r j } A i j n = { min ( R a n d o m P ( A i j n - 1 ) ) , A i j n - 1 }
(5)

A detailed explanation of this equation is shown in Figure 3. An ant selects the rotamer r j for the ith residue based on Equation 1, then find its subrotamers A ij as shown in step 5 in Figure 3, and randomly picks up a subrotamer from A ij to replace the primary rotamer at position i. The 9th step attempts to optimise the subrotamer achieved by Rosetta. All minimisation algorithms in Rosetta choose a vector as the descent direction, determine a step along that vector, then choose a new direction and repeat [31]. We selected "dfpmin" as an exact line search for these steps. If this minimised subrotamer results in a drop in energy, it was kept and made into the residue i. Minimisation needs more time, so for researches with sufficient time who want to obtain more accurate results, this application would be a good choice.

Figure 3
figure 3

Ant constructed side chains by minimising each placed rotamer.

Results

The principal idea behind pacoPacker is to make the parallel ant colonies share only one pheromone matrix, which can combine different energies to guide each ant in constructing protein side-chain conformations. We tested pacoPacker by making comparisons with two popular side-chain modelling programs, CIS-RR and SCWRL4. CIS-RR combines a novel clash-detection guided iterative search (CIS) algorithm with continuous torsion space optimisation of rotamers (RR) [26]. SCWRL4 is an improved version of SCWRL3 [25] which uses the new rotamer library, more efficient search algorithms and a soft Vander Waals potential plus hydrogen bonding based scoring function [15]. All these predictors are based on discrete rotamers.

Experimental settings

We performed all the tests on a computer cluster containing 20 nodes with 16-core 1.9 GHz AMD Opteron CPU per node under Linux 2.6.18 and GCC 4.1.2. CIS-RR and SCWRL4 were ran using their default settings to produce one prediction for each test instance. We ran pacoPacker, with eight ant colonies running in parallel, on the same test instances. As all these threads were synchronised to work out eight predictions and each is a nondeterministic approach, different numbers of decoys for each test instance were generated. The number of predictions for each test instance ranged from 2130 ([PDB:1CBN] 46 residues) to 4650 ([PDB:1B9O] 635 residues). We selected the highest accuracy rate of each test instance from pacoPacker to compare with CIS-RR and SCWRL4.

The benchmark instances were taken directly from other research, which contained 442 protein targets with lengths of 46 to 1184 amino acid residues [26, 15]. Because [PDB:2QOL] cannot be predicted by CIS-RR and [PDB:1G8Q] is considered as a missing main chain atom by Rosetta, we excluded them from this benchmark. A fair evaluation is a difficult task, so we used two criteria to assess the accuracy of side chain packing. One was defined as the percentage of correctly predicted X 1 and X 1 2 angles within thresholds of 40° and 20° compared with the native structures. The second criterion was the root mean square deviation (RMSD) of the side-chain heavy atoms [34]. Both evaluation methodologies are adapted from third-party software [26, 35], where they consider residues with symmetric terminal groups, or with a possibly flipped terminal group.

Protein chain based evaluation performance

Firstly, we compared pacoPacker with CIS-RR and SCWRL4 in side-chain modelling. As shown in Table 3 for the accuracy improvement in terms of correct X dihedral angles and RMSD, pacoPacker is comparable to the recently developed side-chain programs. As SCWRL4 showed relatively poor performance, so we only present a detailed comparison between pacoPacker and CIS-RR. Within 40°, the X 1 of the whole protein was improved by 2.31% with pacoPacker (87.19% by pacoPacker versus 84.88% by CIS-RR), and the χ 12 was comparable (77.11% by pacoPacker versus 77.13% by CIS-RR). A similarly consistent trend was also seen for the accuracy rate of X 1 and X 1 2 within 20°. In case of the other metrics, pacoPacker is the best with its lowest RMSD.

Table 3 Comparison of pacoPacker, CIS-RR and SCWRL4 in the 442 structure set.

We made further comparisons between the three predictors. In Figures 4 to 7, each symbol represents a single protein target, a red cross denotes a better pacoPacker yield and a blue criss-cross denotes a worse yield. Some differences between the two methods were less than 0.5% for the accuracy of X dihedral angles and 0.005Å for RMSD, respectively. These are denoted by a green asterisk. As shown in Figures 4 and 6, when compared with CIS-RR, there were 342, 210 and 242 targets predicted by pacoPacker for X 1 , X 1 2 and RMSD respectively, showing that it has the advantage over CIS-RR. Moreover, Figures 5 and 7 show that pacoPacker was better than SCWRL4 for 332, 211 and 267 targets for X 1 , X 1 2 and RMSD respectively. These results clearly show that pacoPacker has a high reliability based on SHOP.

Figure 4
figure 4

XComparison of pacoPacker and CIS-RR for angles in the 442 structures set. Each symbol corresponds to a single protein target. Red crosses (blue criss-crosses) denote pacoPacker yields better (worse) results. Targets marked by green asterisks mean pacoPacker is comparable with CIS-RR. All X angles are within 40°.

Figure 5
figure 5

XComparison of pacoPacker and SCWRL4 for angles in the 442 structures set. Each symbol corresponds to a single protein target. Red crosses (blue criss-crosses) denote pacoPacker yields better (worse) results. Targets marked by green asterisks mean pacoPacker is comparable with SCWRL4. All X angles are within 40°.

Figure 6
figure 6

Comparison of pacoPacker and CIS-RR for RMSD in the 442 structures set. Each symbol corresponds to a single protein target. Red crosses (blue criss-crosses) denote pacoPacker yields better (worse) results. Targets marked by green asterisks mean pacoPacker is comparable with CIS-RR.

Figure 7
figure 7

Comparison of pacoPacker and SCWRL4 for RMSD in the 442 structures set. Each symbol corresponds to a single protein target. Red crosses (blue criss-crosses) denote pacoPacker yields better (worse) results. Targets marked by green asterisks mean pacoPacker is comparable with SCWRL4.

Individual residues based evaluation performance

Next, we sought to evaluate how pacoPacker works on different types of amino acids. Figure 8 shows that pacoPacker improved the percent correct of both X 1 and X 1 2 dihedral angles. For X 1 , excluding Ala and Gly, pacoPacker has 15 types of amino acids holding the top spot. In Glu, Lys and Ser, they had an average increase of more than 5%. PacoPacker made the greatest contribution to the accuracy of X 1 . It also can be proven from the situation that pacoPacker made the greatest contribution to the accuracy of X 1 via its accurate prediction of Ser and Thr. The residues, which were predicted accurately, were predominantly aliphatic and aromatic residue types. For X 1 2 , pacoPacker accounted for 6 types of amino acids in the lead, whilst CIS-RR accounded for 5 and SCWRL4 accounted for 3. Previous research has shown that for the short polar amino acids (Asp, Asn and Ser), CIS-RR shows lower performance, which could be due to the difference in scoring functions [26]. However, pacoPacker improves them both in X 1 and X 1 2 , which has again shows the importance of combining different energies.

Figure 8
figure 8

Analysis of the performance of pacoPacker, CIS-RR and SCWRL4 on different amino acids. (A) Percent correct within 40° for the X 1 angle. (B) Percent correct within 40° for the X 1 2 angle. The test data used the 442 proteins set. Red, back slash and forward slash indicate the test results of pacoPacker, CIS-RR and SCWRL4, respectively.

Effects of rotamer minimisation

From the results presented in the previous two sections, we show the superiority of X 1 while the performance of X 2 is not strong. For example, when compare the number of red crosses on Figure 4(A) with Figure 4(B), pacoPacker has 342 best-performing proteins for X 1 , which is more than the 210 best-performing proteins for X 1 2 . In addition, Cys, Ser, Thr and Val only on wing X 1 , clearly dominate the area of X 1 . High quality X 1 is significant for side-chain prediction, because it is a foundation of residue. On the other side, there is still room for improvement of X 2 , so we naturally optimised each rotamer as it was placed (rotamer minimization). An overview of how this method performs is given below.

Figure 9 shows the effects of minimisation by comparing RMSD among three different models, and test instances is randomly from the benchmark as above. Model 1 (blue asterisk) uses gradient minimisation on each rotamer when it is placed (the method presented in this paper), model 2 (red solid box) packs the same way as model 1 but then runs a global minimisation on the side chains at all packable positions, and model 3 (green box) with normal rotamers is optimised by global minimisation only. Figure 9 shows that models 1 and 2 both decrease the RMSD compared with model 3, which means that our method can contribute to the quality of repacking. Most of time model 1 is comparable with model 2, so we can only use our method to gain optimisation as well as global minimisation. However, there were 18 proteins (data not shown), which had higher RMSD predicted by rotamer minimization. These can be classified into two groups: Those which already have high accuracies of X 1 and X 1 2 within 20°( with approximately 80% accuracy) and those which are large in size, including [PDB:2OTU] (976 residues), [PDB:1OK7] (739 residues), [PDB:1YTL] (631 residues), [PDB:2EPI] (388 residues). This means that structural integrity is important for proteins that are large in size, because rotamer minimisation cannot play a full role.

Figure 9
figure 9

Comparisons between the three models for RMSD. Model 1 (blue asterisk) use a gradient minimisation on each rotamer when it is placed (this paper's method); model 2 (red solid box) packs using the same method as model 1, but then after the fact, runs a global minimisation on the side chains at all packable positions; and model3 (green box) with normal rotamers is optimised by global minimisation.

Discussion

Under the inaccuracy/usefulness property hypothesis, SOP is not an ideal computational model for protein structure prediction [28]. This means that even if the corresponding SOP is completely solved, the SOP answer may not be correct, and in most cases it will not be perfect. PacoPacker proposes a novel hybrid parallel approach to repack protein side chains based on SHOP [28, 30].

Table 4 shows the distribution of best conformations for each protein from pacoPacker on different threads. The best conformations are constructed on different threads, where each energy is very useful in some specific sense, but is inaccurate in a universal sense. Therefore, we need an approach based on MOP. For using MOP to solve protein structure prediction problems, the Pareto-based approach, which focuses on the dominance analysis of the solutions found by the search, will probably result in a large Pareto front with solutions where no single energy function can be dominant. PacoPacker is different as it does not construct a Pareto front, but collects the best solutions found by parallel search procedures directed by different energy functions. The SHOP strategy was proposed as a useful parallel ACO method [30]. Using SHOP, these multiple colonies of pacoPacker can exchange their search experiences asynchronously and co-evolve towards better solutions while each colony is guided by its own objective function and algorithm parameters [28]. In 442 structures test set, the close half targets of pacoPacker maintain optimum accuracy, unlike that in the other two programs. Why does the pacoPacker approach have a good performance?

Table 4 Best conformations of pacoPacker distributed on different threads.

Firstly, from the view of an individual colony, the pheromone matrix accumulates the search experience of ants, which describes which rotamer should be a priori considered as the choice for each residue. Such an experience bias is established by evaluating the conformations found by the previous generation of ants using the corresponding energy function. Then by sharing T , each colony can achieve different search experiences from other colonies asynchronously, and each colony is also directed by their own energy functions to co-evolve towards a better state. The process of sharing one T can accumulate the search experience of all parallel ant colonies and propagate the bias among them. As the pheromone matrix T provides an indeterministic bias for all the running colonies, it may be easier to find better solutions.

For example, [PDB:2FLU] was one of the most accurate predictions from paco-Packer with a RMSD of 0.98, while the second most accurate prediction was 1.33 from CIS-RR. The best conformation appeared in the 27th generation of thread 8, which ends on this generation. The other threads ended incrementally after the 29th generation. In this situation, almost all threads stop at the same time, which gives pheromone matrix T enough time to learn experiences fairly from different threads. There were some poor solutions, such as [PDB:1WVH] where the RMSD was increased by 1.23 with pacoPacker. In this case, the best conformation of pacoPacker was structured by thread 6 on the 40th generation, and other threads stopped after 25th generation. This may be because some threads accomplish too early so that the pheromone matrix T learns search experiences with bias, which may be solved with more time. From a user perspective, we summarise when pacoPacker performs well in Table 5. This shows that the proportion of proteins repacked increased as the sequence length decreased. Therefore pacoPacker can provide the highest accuracy for packing side chains when the sequence length is lower than 400 amino acids.

Table 5 The proportion of proteins repacked by pacoPacker with lower RMSD compared with other predictors.

Conclusions

In summary, pacoPacker makes each heuristic search work with its own energy function and they complement each other in a qualitative way. Different energy functions train search trajectories to obtain different search intelligences. Our parallel strategy diffuses the intelligence to all the parallel searches by SHOP, so that all ant colonies can share their accumulated hybridised intelligence. Such co-evolvement guided by multiple objective functions simultaneously has an impact on the nature folding procedure of native proteins [28]. The prediction accuracy of packing side chains was improved for most of the proteins, which proves that pacoPacker has feasibility and practical value, but at a cost of increased CPU time. However, an important reason for using pacoPacker is that it does not need training and tuning of the energy function parameters before the predictor can work.

References

  1. Smith CA, Kortemme T: Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction. Journal of molecular biology. 2008, 380 (4): 742-756. 10.1016/j.jmb.2008.05.023.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  2. Davis IW, Arendall WB, Richardson DC, Richardson JS: The backrub motion: how protein backbone shrugs when a sidechain dances. Structure. 2006, 14 (2): 265-274. 10.1016/j.str.2005.10.007.

    Article  CAS  PubMed  Google Scholar 

  3. Kingsford CL, Chazelle B, Singh M: Solving and analyzing side-chain positioning problems using linear and integer programming. Bioinformatics. 2005, 21 (7): 1028-1039. 10.1093/bioinformatics/bti144.

    Article  CAS  PubMed  Google Scholar 

  4. Gaudreault F, Chartier M, Najmanovich R: Side-chain rotamer changes upon ligand binding: common, crucial, correlate with entropy and rearrange hydrogen bonding. Bioinformatics. 2012, 28 (18): 423-430. 10.1093/bioinformatics/bts395.

    Article  Google Scholar 

  5. Raveh B, London N, Zimmerman L, Schueler-Furman O: Rosetta, flexpepdock ab-initio: simultaneous folding, docking and refinement of peptides onto their receptors. PLoS One. 2011, 6 (4): 18934-10.1371/journal.pone.0018934.

    Article  Google Scholar 

  6. Wang C, Schueler-Furman O, Baker D: Improved side-chain modeling for protein-protein docking. Protein Science. 2005, 14 (5): 1328-1339. 10.1110/ps.041222905.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  7. Anfinsen CB, Haber E, Sela M, White F: The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proceedings of the National Academy of Sciences of the United States of America. 1961, 47 (9): 1309-10.1073/pnas.47.9.1309.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  8. Pierce NA, Winfree E: Protein design is np-hard. Protein Engineering. 2002, 15 (10): 779-782. 10.1093/protein/15.10.779.

    Article  CAS  PubMed  Google Scholar 

  9. Unger R, Moult J: Finding the lowest free energy conformation of a protein is an np-hard problem: proof and implications. Bulletin of Mathematical Biology. 1993, 55 (6): 1183-1198. 10.1007/BF02460703.

    Article  CAS  PubMed  Google Scholar 

  10. Hart WE, Istrail S: Robust proofs of np-hardness for protein folding: general lattices and energy potentials. Journal of Computational Biology. 1997, 4 (1): 1-22. 10.1089/cmb.1997.4.1.

    Article  CAS  PubMed  Google Scholar 

  11. Xie W, Sahinidis NV: Residue-rotamer-reduction algorithm for the protein side-chain conformation problem. Bioinformatics. 2006, 22 (2): 188-194. 10.1093/bioinformatics/bti763.

    Article  CAS  PubMed  Google Scholar 

  12. Chazelle B, Kingsford C, Singh M: A semidefinite programming approach to side chain positioning with new rounding strategies. INFORMS Journal on Computing. 2004, 16 (4): 380-392. 10.1287/ijoc.1040.0096.

    Article  Google Scholar 

  13. Desmet J, De Maeyer M, Hazes B, Lasters I: The dead-end elimination theorem and its use in protein side-chain positioning. Nature. 1992, 356 (6369): 539-542. 10.1038/356539a0.

    Article  CAS  PubMed  Google Scholar 

  14. Desmet J, De Maeyer M, Lasters I: Theoretical and algorithmical optimization of the dead-end elimination theorem. Pac Symp Biocomput. 1997, 2: 122-133.

    Google Scholar 

  15. Krivov GG, Shapovalov MV, Dunbrack RL: Improved prediction of protein side-chain conformations with scwrl4. Proteins: Structure, Function, and Bioinformatics. 2009, 77 (4): 778-795. 10.1002/prot.22488.

    Article  CAS  Google Scholar 

  16. Gray JJ, Moughon S, Wang C, Schueler-Furman O, Kuhlman B, Rohl CA, Baker D: Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. Journal of molecular biology. 2003, 331 (1): 281-299. 10.1016/S0022-2836(03)00670-3.

    Article  CAS  PubMed  Google Scholar 

  17. Hsin JL, Yang CB, Huang KS, Yang CN: An ant colony optimization approach for the protein side chain packing problem. Proceedings of the 6th WSEAS International Conference on Microelectronics, Nanoelectronics, Optoelectronics. 2007, 44-49.

    Google Scholar 

  18. Roitberg A, Elber R: Modeling side chains in peptides and proteins: Application of the locally enhanced sampling and the simulated annealing methods to find minimum energy conformations. The Journal of chemical physics. 1991, 95 (12): 9277-9287. 10.1063/1.461157.

    Article  CAS  Google Scholar 

  19. Leach AR, Lemon AP: Exploring the conformational space of protein side chains using dead-end elimination and the a* algorithm. Proteins Structure Function and Genetics. 1998, 33 (2): 227-239. 10.1002/(SICI)1097-0134(19981101)33:2<227::AID-PROT7>3.0.CO;2-F.

    Article  CAS  Google Scholar 

  20. Kuhlman B, Baker D: Native protein sequences are close to optimal for their structures. Proceedings of the National Academy of Sciences. 2000, 97 (19): 10383-10388. 10.1073/pnas.97.19.10383.

    Article  CAS  Google Scholar 

  21. Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, Kaufman K, Renfrew PD, Smith CA, Sheffler W: Rosetta3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 2011, 487: 545-574.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  22. Holm L, Sander C: Fast and simple monte carlo algorithm for side chain optimization in proteins: application to model building by homology. Proteins: Structure, Function, and Bioinformatics. 1992, 14 (2): 213-223. 10.1002/prot.340140208.

    Article  CAS  Google Scholar 

  23. Bower MJ, Cohen FE, Dunbrack RL: Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. Journal of molecular biology. 1997, 267 (5): 1268-1282. 10.1006/jmbi.1997.0926.

    Article  CAS  PubMed  Google Scholar 

  24. Dunbrack RL: Comparative modeling of casp3 targets using psi-blast and scwrl. Proteins: Structure, Function, and Bioinformatics. 1999, 37 (S3): 81-87. 10.1002/(SICI)1097-0134(1999)37:3+<81::AID-PROT12>3.0.CO;2-R.

    Article  Google Scholar 

  25. Canutescu AA, Shelenkov AA, Dunbrack RL: A graph-theory algorithm for rapid protein side-chain prediction. Protein science. 2003, 12 (9): 2001-2014. 10.1110/ps.03154503.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  26. Cao Y, Song L, Miao Z, Hu Y, Tian L, Jiang T: Improved side-chain modeling by coupling clash-detection guided iterative search with rotamer relaxation. Bioinformatics. 2011, 27 (6): 785-790. 10.1093/bioinformatics/btr009.

    Article  CAS  PubMed  Google Scholar 

  27. Kortemme T, Morozov AV, Baker D: An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. Journal of molecular biology. 2003, 326 (4): 1239-1259. 10.1016/S0022-2836(03)00021-4.

    Article  CAS  PubMed  Google Scholar 

  28. Lu¨ Q, Xia XY, Chen R, Miao DJ, Chen SS, Quan LJ, Li HO: When the lowest energy does not induce native structures: parallel minimization of multi-energy values by hybridizing searching intelligences. PloS one. 2012, 7 (9): 44967-10.1371/journal.pone.0044967.

    Article  Google Scholar 

  29. Lv Q, Wu H, Wu J, Huang X, Luo X, Qian P: A parallel ant colonies approach to de novo prediction of protein backbone in casp8/9. Science China Information Sciences. 2013, 56 (10): 1-13.

    Article  Google Scholar 

  30. Lv Q, Xia X, Qian P: A parallel aco approach based on one pheromone matrix. Ant Colony Optimization and Swarm Intelligence. 2006, Springer, 4150: 332-339. 10.1007/11839088_30.

    Chapter  Google Scholar 

  31. Rohl CA, Strauss CE, Misura KM, Baker D: Protein structure prediction using rosetta. Methods in enzymology. 2004, 383: 66-93.

    Article  CAS  PubMed  Google Scholar 

  32. Dagum L, Menon R: Openmp: an industry standard api for shared-memory programming. Computational Science & Engineering, IEEE. 1998, 5 (1): 46-55. 10.1109/99.660313.

    Article  Google Scholar 

  33. Shapovalov MV, Dunbrack RL: A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure. 2011, 19 (6): 844-858. 10.1016/j.str.2011.03.019.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  34. Miao Z, Cao Y, Jiang T: Rasp: rapid modeling of protein side chain conformations. Bioinformatics. 2011, 27 (22): 3117-3122. 10.1093/bioinformatics/btr538.

    Article  CAS  PubMed  Google Scholar 

  35. Eyal E, Najmanovich R, Mcconkey BJ, Edelman M, Sobolev V: Importance of solvent accessibility and contact surfaces in modeling side-chain conformations in proteins. Journal of computational chemistry. 2004, 25 (5): 712-724. 10.1002/jcc.10420.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors acknowledge the support received from Rong Chen for helping with the analysis of the experiments and Caixia Wang for helping with the preparation of the paper. Funder had no role in study design, data collection and analysis, decision to publish, or preparation of the paper.

Declarations

This study was supported by a grant from the National Natural Science Foundation of China (No. 61170125).

This article has been published as part of BMC Bioinformatics Volume 15 Supplement 12, 2014: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2013): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/15/S12.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiang Lü.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

Q Lü designed and developed the pacoPacker framework. LJ Quan implemented and improved pacoPacker. LJ Quan, HO Li and HJ Wu performed the experiments. LJ Quan and XX Xia drafted the manuscript. All of the authors read and approved the manuscript.

Lijun Quan, Qiang Lü contributed equally to this work.

Rights and permissions

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Quan, L., Lü, Q., Li, H. et al. Improved packing of protein side chains with parallel ant colonies. BMC Bioinformatics 15 (Suppl 12), S5 (2014). https://doi.org/10.1186/1471-2105-15-S12-S5

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/1471-2105-15-S12-S5

Keywords