Skip to main content

An effective docking strategy for virtual screening based on multi-objective optimization algorithm



Development of a fast and accurate scoring function in virtual screening remains a hot issue in current computer-aided drug research. Different scoring functions focus on diverse aspects of ligand binding, and no single scoring can satisfy the peculiarities of each target system. Therefore, the idea of a consensus score strategy was put forward. Integrating several scoring functions, consensus score re-assesses the docked conformations using a primary scoring function. However, it is not really robust and efficient from the perspective of optimization. Furthermore, to date, the majority of available methods are still based on single objective optimization design.


In this paper, two multi-objective optimization methods, called MOSFOM, were developed for virtual screening, which simultaneously consider both the energy score and the contact score. Results suggest that MOSFOM can effectively enhance enrichment and performance compared with a single score. For three different kinds of binding sites, MOSFOM displays an excellent ability to differentiate active compounds through energy and shape complementarity. EFMOGA performed particularly well in the top 2% of database for all three cases, whereas MOEA_Nrg and MOEA_Cnt performed better than the corresponding individual scoring functions if the appropriate type of binding site was selected.


The multi-objective optimization method was successfully applied in virtual screening with two different scoring functions that can yield reasonable binding poses and can furthermore, be ranked with the potentially compromised conformations of each compound, abandoning those conformations that can not satisfy overall objective functions.


With the thriving development and confirmative significance of computational chemistry in drug discovery, more and more medicinal chemists and pharmacologists are using computational methods in their drug discovery research[1, 2], and numerous drugs developed based on the clues provided by computations (modeling and simulation) have entered clinical trials or have been launched into the market already[3]. For the computational chemist, an attractive goal is to develop computer programs capable of automatically evaluating large-scale chemical libraries (databases). These computational methods are referred to as virtual screening (VS)[4]. In general, two strategies have been employed in virtual screening: (1), using pharmacophore-based database searching (PBDS) methods to identify potential hits from chemical libraries, mostly in the cases where three-dimensional (3D) structures of the targets are unknown; and (2), using molecular docking approaches to screen the libraries in cases where the 3D structures of the targets are available[4, 5]. Because more and more 3D structures of drug target proteins are available, VS with molecular docking approaches has become promising, as demonstrated by numerous recent examples[2, 610].

The core steps of structure-based virtual screening (SBVS) are docking and scoring. Since Kuntz et al.[11] published the first docking algorithm DOCK in 1982, numerous docking programs have been developed during the past two decades [1225]. Several comprehensive reviews of the advances of docking algorithms and applications have been published [2630]. Scoring (ranking) the compounds retrieved from a database is performed simultaneously with the docking simulation. Molecular docking is a typical optimization problem, for it is difficult to obtain the global optimum solution. As the fitness during the optimization process, scoring function should be fast and accurate enough, allowing simultaneous ranking of the retrieved poses in the optimization process. Based on this idea, several scoring functions have been developed [3133]. Unfortunately, there is no scoring function developed so far that can reliably and consistently predict a ligand-protein binding mode and the binding affinity at the same time[31, 32, 34]. Therefore, heuristic docking and consensus score strategies are often used in virtual screening [3436].

Since a huge number of compounds in a database have to be thoroughly tested in the virtual screening process, several crucial issues have to be addressed. One is the computational cost; only docking programs capable of docking a flexible ligand within a reasonable time scale are acceptable for virtual screening[34]. The other one is the ability to discriminate between true actives and inactive compounds; only those docking approaches able to distinguish the active molecules rapidly and accurately, are suitable for virtual screening applications in practice. Although the consensus score strategy has demonstrated an advantage over single scores, it is actually based on the results of single scoring optimization. This means that consensus scoring only re-scores a limited molecules or conformations (generally 30 or more), thereby inevitably losing a number of true positives[33, 34]. To some extent, consensus scoring seems to be far-fetched and artificial [37].

Most conformational optimization methods in docking program can only deal with a single objective, such as binding energy, shape complementarity, or chemical complementarity. However, real-world problems normally involve multiple objectives (possibly conflicting ones) or optimization criteria, which should be satisfied simultaneously, and suitable solutions to the overall problem cannot be found by using individual optimization algorithms with single objectives[38]. For example, an optimization solution for the binding affinity (energy) between a ligand and a receptor is usually not the optimization solution for other criteria (e.g. shape or chemical complementarity, etc.). Similar problems in combinatorial library design and structural superposition of three-dimensional molecules have been reported[39, 40]. Thus, there is a need for an optimization algorithm compromising several objectives, which may result in more reasonable and robust binding modes between ligands and receptors. In fact, it is a problem of multi-objective optimization (MO)[41], which tends to find a set of alternative good compromises, generally known as pareto-optimal solutions. These solutions are optimal because no other solutions in the search space superior to them when all objectives are considered. Then, the 'optimum' is chosen by the design which fits better in a certain application.[42] There are more than twenty mathematical multi-objective optimization techniques[43, 44]. However, due to their inherent parallelism, evolutionary algorithms (EAs) and genetic algorithms (GAs) are still the top priority in terms of finding multiple pareto-optimal solutions for multi-objective optimization problems[45].

In this paper, two sets of MO methods, denoted MOSFOM (Multi-Objective Scoring Function Optimization Methodology), were adopted for the binding conformation search of a small molecule within the binding site of a protein using two scoring functions as the objectives. Different from consensus score, MOSFOM does not re-score or re-rank the candidates from the primary virtual screening with one or several other scoring functions, but scores all the molecules in a chemical database with two or more scoring functions simultaneously during the binding pose optimization. Testing results indicate that MOSFOM, which is able to enhance hit rates and greatly reduces the false-positive rate, is more robust and reasonable than the consensus strategy as an alternate tool for large-scale library virtual screening. Here, MOSFOM emphasizes a new strategy to obtain the most reasonable binding conformation[46] and increase hit rates with several scoring functions rather than to accurately predict the binding free energy or the combination of several scoring functions. Consequently, MOSFOM can be used in the prioritization of ligands in high-throughput virtual screening.


Preparation of target proteins

Thrombin, the estrogen receptor alpha, and cyclooxygenase-2 (COX-2) have been used as target proteins for testing the newly developed docking algorithms in this study. The coordinates of the X-ray crystal structures of these three proteins were retrieved from the Protein Data Bank (PDB)[47], including thrombin in complex with Mqpa at 2.2 Å resolution (PDB entry 1ETR)[48], estrogen receptor(ER) in complex with 4-hydroxytamoxifen at 1.90 Å resolution (PDB entry 3ERT)[49], and cyclooxygenase-2(COX-2) in complex with Sc-558 at 3.0 Å resolution (PDB entry 6COX)[50]. All water molecules were removed from the protein structures. After extraction of bound ligands, all hydrogen atoms and the Kollman all-atom charges were assigned to the proteins using the BIOPOLYMER module of Sybyl v6.8 (Tripos Associates, Inc. St. Louis, MO). Finally, for each protein target, the binding site was defined as the residues around the bound ligand within 6.5 Å. Gasteiger and Marsili charges[51] were assigned to the extracted ligand of each protein.

Preparation of compounds libraries

For all three targets (thrombin, ER and COX-2), the active compounds were selected from the MDDR (MDL ISIS/HOST software, MDL Information Systems, Inc.). Active compounds with molecular weights between 200 and 600 were selected as drug-like compounds, and those containing water molecules and ions were eliminated. Table 1 shows the number of active compounds for each target. Another 10,000 randomly 'varied' compounds with molecular weights between 200 and 600 were selected as drug-concerned decoys from the Available Chemical Directory (ACD) after eliminating chemical reagents and inorganic compounds by means of CORINA. All compounds of different test libraries were stored as SDF format file using the MDL ISIS_Base program (MDL ISIS/HOST software, MDL Information Systems, Inc.), and their three-dimensional coordinates were converted using a script written in the Sybyl programming language (SPL), and Gasteiger-Marsili atomic charges were assigned to each molecule. The final coordinates of each molecule were then stored in multi-mol2 files. Protonation states were correctly given for all the compounds.

Table 1 Number of the active compounds for each target used in this study

Scoring functions

The energy score and contact score of DOCK [5254] were used in this work. Energy score based on the AMBER force field[55] was composed of steric and electrostatic terms. ε(r) = 4r was used for the coefficient of the dielectric for the Coulomb potential, and Lennard-Jones 6–12 was used for Van der Waals (VDW) potential. Contact score is a summation of the heavy atom contacts between the ligand and the receptor, which provides a simple assessment of shape complementarity; if two atoms approach close enough to bump into one another, then the interaction can be penalized by a certain amount. In this study, a 4.5-Å contact distance cutoff was used, with 50 penalized for each clash.

Calculation of the enrichment factor

The enrichment factor is a key parameter to evaluate the quality of the docking and scoring compared to a random selection[33, 56]. The enrichment factor (EF) is defined as

EF(subset size) = Hits s N s / Hits t N t MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeeyrauKaeeOrayKaeeikaGIaee4CamNaeeyDauNaeeOyaiMaee4CamNaeeyzauMaeeiDaqNaeeiiaaIaee4CamNaeeyAaKMaeeOEaONaeeyzauMaeeykaKcccaGae8xpa0tcfa4aaSaaaeaacqqGibascqqGPbqAcqqG0baDcqqGZbWCdaWgaaqaaiabbohaZbqabaaabaGaeeOta40aaSbaaeaacqqGZbWCaeqaaaaakiabc+caVKqbaoaalaaabaGaeeisaGKaeeyAaKMaeeiDaqNaee4Cam3aaSbaaeaacqqG0baDaeqaaaqaaiabb6eaonaaBaaabaGaeeiDaqhabeaaaaaaaa@54F5@

where Hitss is the number of active compounds in the sampled subset, Hitst is the total number of active compounds in the database and N is the number of compounds. In general, the enrichment factor is against random screening; it is evident that the maximum enrichment is determined by the total number of active compounds and the total number of molecules in the database. For instance, there are 695 active compounds among the total 10695(10000+695) molecules in the database for the COX-2 case, i.e. the achievable maximum is 10695/695 ≈ 15. If 5% of active compounds were found among the top 1% of the database, then the enrichment factor would be fivefold over random (EF = 5) at the 1% of the database.

There are three criteria to evaluate the effectiveness of a docking program as indicated by Wei et al. [56]: the value, the location of the vertex, and the percentage of active compounds retrieved, which represent the ability to find active compound of the docking program and scoring function.

ε-MOEA Method

A steady-state MOEA based on the ε-dominance concept[57] is a pragmatic and fast multi-objective evolutionary algorithm for finding well-spread pareto-optimal solutions. An EA population P(t) and an archival population E(t) (t is the iteration counter) were used as two co-evolving populations in ε-MOEA. After initialization, two solutions from P(t) and E(t) were chosen for mating. Then, each of these offspring solutions was compared with the archive and the EA population for possible inclusion. For the case of j objectives, the search space and objective space were divided into hyper-boxes (a number of grids, each having the size ε j in the j th objective) to ensure that a hyper-box could be occupied by only one solution through comparing with an identification array of the archive, which guarantees the diversity of the archive. The total number of pareto-optimal solutions, i.e. the final size of suitable solutions, can be controlled approximately by adjusting the ε j value (see refs [58] and [57] for more details).

Two scoring functions, the energy score and the contact score of DOCK, were considered in practical virtual screening. Traditional optimization methods, such as the simplex method used in DOCK, are unsuitable for multi-objective optimization. For the impartiality of the comparison, GAsDock, which also uses a stochastic algorithm based on modified multi-population genetic algorithm[25], was adopted for single scoring function optimization. The contact score also performed reasonably well in this study, which coincides with other studies [31, 59, 60]. Energy score and contact score were considered as the two objectives in ε-MOEA (Figure 1a), that is, a set of pareto-optimal solutions, which were satisfied simultaneously with energy score and shape complementarity, were obtained consequently (see additional file 1: Docking example and proof of method).

Figure 1

The flowchart of Multi-Objective Scoring Function Optimization Methodology. (a) ε-MOEA method for virtual screening. (b) EFMOGA method for virtual screening.

The multi-objective optimization model of ε-MOEA consists of a set of n parameters (design variables), a set of l objective functions, and a set of m constraints. Objective functions and constraints are functions of the decision variables. It can be formulated mathematically as follows:

m i n y = f ( x ) = ( f 1 ( x ) , f 2 ( x ) , ... , f l ( x ) ) s . t . g ( x ) = ( g 1 ( x ) , g 2 ( x ) , ... , g m ( x ) ) 0 w h e r e x = ( x 1 , x 2 , ... , x n ) X y = ( y 1 , y 2 , ... , y l ) Y MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqbaeaabqGaaaaabaWexLMBbXgBcf2CPn2qVrwzqf2zLnharyavP1wzZbItLDhis9wBH5gaiqGacaWFTbGaa8xAaiaa=5gaaeaaieWacqGF5bqEiiaacqqF9aqpcqGFMbGzcqqGOaakcqGF4baEcqqGPaqkcqqF9aqpcqqGOaakcqWGMbGzdaWgaaWcbaGaeeymaedabeaakiabcIcaOiab+Hha4jabcMcaPiabcYcaSiabdAgaMnaaBaaaleaacqaIYaGmaeqaaOGaeiikaGIae4hEaGNaeiykaKIaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWIaemOzay2aaSbaaSqaaiabdYgaSbqabaGccqGGOaakcqGF4baEcqGGPaqkcqGGPaqkaeaacqWGZbWCcqGGUaGlcqWG0baDcqGGUaGlaeaacqGFNbWzcqGGOaakcqGF4baEcqGGPaqkcqGH9aqpcqGGOaakcqWGNbWzdaWgaaWcbaGaeGymaedabeaakiabcIcaOiab+Hha4jabcMcaPiabcYcaSiabdEgaNnaaBaaaleaacqaIYaGmaeqaaOGaeiikaGIae4hEaGNaeiykaKIaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWIaem4zaC2aaSbaaSqaaiabd2gaTbqabaGccqGGOaakcqGF4baEcqGGPaqkcqGGPaqkcqGHKjYOcqaIWaamaeaacqWG3bWDcqWGObaAcqWGLbqzcqWGYbGCcqWGLbqzaeaacqGF4baEcqGH9aqpcqGGOaakcqWG4baEdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabdIha4naaBaaaleaacqaIYaGmaeqaaOGaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWIaemiEaG3aaSbaaSqaaiabd6gaUbqabaGccqGGPaqkcqGHiiIZcqGFybawaeaaaeaacqGF5bqEcqGH9aqpcqGGOaakcqWG5bqEdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabdMha5naaBaaaleaacqaIYaGmaeqaaOGaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWIaemyEaK3aaSbaaSqaaiabdYgaSbqabaGccqGGPaqkcqGHiiIZcqGFzbqwaaaaaa@B190@

and x is the design vector x= {T x , T y , T z , R x , R y , R z , Tb 1, T bn }T, in which (T x , T y , T z ) and (R x , R y , R z ) are respectively the state variables of translation and rotation of the entire ligand for the orientation search; and T b1 , , T bn are the torsion angles of the n rotatable bonds of the ligand for the conformation search. Accordingly, the constraints for the design variables (g ( x )s) can be represented as

{ X ¯ T x X ¯ Y ¯ T y Y ¯ Z ¯ T z Z ¯ π R x , R y , R z , T b 1 , , T b n π MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWaaiqaaeaafaqabeabbaaaaeaadaadaaqaaiabdIfaybaacqGHKjYOcqWGubavdaWgaaWcbaGaemiEaGhabeaakiabgsMiJoaanaaabaGaemiwaGfaaaqaamaamaaabaGaeeiiaaIaemywaKfaaiabgsMiJkabdsfaunaaBaaaleaacqWG5bqEaeqaaOGaeyizIm6aa0aaaeaacqWGzbqwaaaabaWaaWaaaeaacqqGGaaicqWGAbGwaaGaeyizImQaemivaq1aaSbaaSqaaiabdQha6bqabaGccqGHKjYOdaqdaaqaaiabdQfaAbaaaeaacqGHsislcqaHapaCcqGHKjYOcqWGsbGudaWgaaWcbaGaemiEaGhabeaakiabcYcaSiabdkfasnaaBaaaleaacqWG5bqEaeqaaOGaeiilaWIaemOuai1aaSbaaSqaaiabdQha6bqabaGccqGGSaalcqWGubavdaWgaaWcbaGaemOyaiMaeGymaedabeaakiabcYcaSiabl+UimjabcYcaSiabdsfaunaaBaaaleaacqWGIbGycqWGUbGBaeqaaOGaeyizImQaeqiWdahaaaGaay5Eaaaaaa@67FB@

y is the objective vector, which consists of energy score f1(x) and contact score f2(x), respectively. Of course, more scoring function can be used in this method, but no more scoring function source can be accessed. Here the scoring functions of DOCK were just used to deal with this problem. X is denoted as the decision space, and Y is called the objective space.

Selection of the optimum and ranking in ε-MOEA case

As stated above, a set of pareto-optimal solutions were obtained by using ε-MOEA, which simultaneously satisfied energy score and shape complementarity. There are two ways to select an optimal solution from the set of pareto-optimal solutions: MOEA_Nrg or 'energy score contact score' with the lowest energy conformation and acceptable shape complementarity; and MOEA_Cnt or 'contact score energy score' with the best shape complementarity conformation and acceptable energy score. The results of MOEA_Nrg and MOEA_Cnt were all compared with their corresponding individual scoring functions, respectively (see Results)


Briefly, a new fast flexible docking program (GAsDock) [25, 61] was developed using a multi-population genetic algorithm based on information entropy[62, 63]. In comparison with other docking methods, information entropy was employed in the genetic algorithm of GAsDock and the size of the narrowed space was used as the convergence criterion, ensuring that GAsDock can converge rapidly and steadily. A detailed description of the algorithm has been presented elsewhere[25].

In this paper, EFMOGA-based GAsDock was applied to solve the above-mentioned multiple scoring function problem. According to GAsDock, the optimization problem (Eq.2) can be transformed into the following evaluation function model

m i n h ( F ) = 1 s ln i = 1 l exp ( s λ i f i ( x ) ) s . t . g ( x ) = ( g 1 ( x ) , g 2 ( x ) , ... , g m ( x ) ) 0 w h e r e x = ( x 1 , x 2 , ... , x n ) X y = ( y 1 , y 2 , ... , y l ) Y MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqbaeaabqGaaaaabaGaemyBa0MaemyAaKMaemOBa4gabaGaemiAaGMaeiikaGIaemOrayKaeiykaKIaeyypa0tcfa4aaSaaaeaacqaIXaqmaeaacqWGZbWCaaGccyGGSbaBcqGGUbGBdaaeWbqaaiGbcwgaLjabcIha4jabcchaWjabcIcaOiabdohaZjabeU7aSnaaBaaaleaacqWGPbqAaeqaaOGaemOzay2aaSbaaSqaaiabdMgaPbqabaGccqGGOaakcqWG4baEcqGGPaqkcqGGPaqkaSqaaiabdMgaPjabg2da9iabigdaXaqaaiabdYgaSbqdcqGHris5aaGcbaGaem4CamNaeiOla4IaemiDaqNaeiOla4cabaacbmGae83zaCMaeiikaGIae8hEaGNaeiykaKIaeyypa0JaeiikaGIaem4zaC2aaSbaaSqaaiabigdaXaqabaGccqGGOaakcqWF4baEcqGGPaqkcqGGSaalcqWGNbWzdaWgaaWcbaGaeGOmaidabeaakiabcIcaOiab=Hha4jabcMcaPiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiabdEgaNnaaBaaaleaacqWGTbqBaeqaaOGaeiikaGIae8hEaGNaeiykaKIaeiykaKIaeyizImQaeGimaadabaGaem4DaCNaemiAaGMaemyzauMaemOCaiNaemyzaugabaGae8hEaGNaeyypa0JaeiikaGIaemiEaG3aaSbaaSqaaiabigdaXaqabaGccqGGSaalcqWG4baEdaWgaaWcbaGaeGOmaidabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiabdIha4naaBaaaleaacqWGUbGBaeqaaOGaeiykaKIaeyicI4Sae8hwaGfabaaabaGae8xEaKNaeyypa0JaeiikaGIaemyEaK3aaSbaaSqaaiabigdaXaqabaGccqGGSaalcqWG5bqEdaWgaaWcbaGaeGOmaidabeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiabdMha5naaBaaaleaacqWGSbaBaeqaaOGaeiykaKIaeyicI4Sae8xwaKfaaaaa@A822@

where λ i is the weight of each objective. (λ i ≥ 0 and i = 1 l λ i = 1 ( i = 1 , 2 , ... , l ) MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWaaabCaeaacqaH7oaBdaWgaaWcbaGaemyAaKgabeaakiabg2da9iabigdaXiabcIcaOiabdMgaPjabg2da9iabigdaXiabcYcaSiabikdaYiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiabdYgaSjabcMcaPaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemiBaWganiabggHiLdaaaa@44A6@ ). By varying the weights, a set of noninferior solutions was generated. s is a positive real variable, when s → ∞, the minimization problem converges to the maximal f i (x), and the minimum is the pareto solution (see additional file 1: Docking example and proof of method)

A knowledge-based method was adopted to obtain appropriate weights for virtual screening, i.e. a set of conformations of the ligand, which are obtained through adjusting the weights, is compared with the ligand in the X-ray crystal structure, and the weights with minimum RMSD are selected. These are considered to best reflect the test values for the current target system, i.e. the most reasonable solution. During this process, useful information can be acquired simultaneously such as the adaptability of different scoring functions to the current system, that is, which scoring function contributes more to the current system, and which scoring function is more sensitive.

As a result, various weights are obtained for different test systems in virtual screening (Figure 1b). For virtual screening of a database, each system only adopts one set of weights, which makes the comparison more equitable. Different from ε-MOEA, EFMOGA yields one solution corresponding to the weight in the pareto-optimal solutions rather than a set of pareto-optimal solutions. Certainly, compared with single objective optimization, the solution obtained here may be worse than any single objective solution, but more reasonable than any extremum obtained by a single objective method because it has considered multiple-objective functions. During the preparation of this manuscript, Grosdidier et al. have developed a new docking software, EADock, based on a multi-objective optimization algorithm, and the results indicate that EADock can accurately predict binding modes for ligand-protein complexes with two fitness functions[64].


As mentioned above, our study is not aimed at calculating absolute values for the free energy of binding and for the affinity, but focuses on the ranking of acceptable conformations for the multiple solution space. Because of the different preferences in the selection of pareto-optimal solutions with ε-MOEA, the results of three different approaches will be compared with individual scoring functions, namely one with preferred energy score for ε-MOEA (MOEA_Nrg), the other with preferred contact score for ε-MOEA (MOEA_Cnt), and the third for EFMOGA.

Docking to the buried lipophilic site of COX-2

The COX-2 ligand binding site is a completely buried, narrow, confined and predominately lipophilic cavity (Figure 2a)[50, 65], which does not accommodate many orientations or conformations of a ligand. Consequently, shape complementarity was expected to be very important. MOEA_Nrg efficiently eliminates conformations that could not satisfy shape complementarity although they displayed good energy scores. The maximum enrichment was 2.8-fold over random, which was reached at the top 8.8% of the database when using a single energy score for optimization, but the maximum enrichment was 3.9-fold over random, and moved up to the top 2.6% of the database when using MOEA_Nrg (Figure 3a). MOEA_Cnt also got a good enrichment (EF = 3.4485) at the top 0.4% of the database, but it was worse than single contact score with EF = 6.9 at the top 0.2%. There are several possible explanations; first, the conformation with highest contact score cannot satisfy energy score, that is, the conformation that has the best contact score but a bad energy score not among the pareto-optimal solutions will be eliminated; second, the scoring function is imprecise which is the common disadvantage for all the scoring functions. Interestingly, although contact score maintained higher enrichment among the top 2% of the database, the same enrichment as that of MOEA_Nrg was obtained at the top 2% of the database, and from this point, the enrichment peak of MOEA_Nrg became broader. EFMOGA has a similar curve as that of MOEA_Cnt, and a maximum of 6.1-fold over random was reached at the top 0.2% of the database.

Figure 2

The characteristics of three binding sites. COX-2 with Sc-558. (b) ER with 4-Hydroxytamoxifen. (c) Thrombin with Mqpa.

Figure 3

Enrichment of 695 active compounds of COX-2 in docking screens. (a) The enrichment factor of docking the database by MOFOM and individual scoring function. (b) The percentage of the active compounds found by MOFOM and individual scoring function. The plot shows the results using five optimization methods: Energy score (black), Contact score (red), EFMOGA (cyan), MOEA_Nrg (green), and MOEA_Cnt (blue), respectively.

In the real application, only a small number (< 2%) was our interesting section within a large source library. MOEA_Nrg exhibited a good performance in this case (Figure 3b, Figure 4a), MOEA_Nrg maintained the best performance among all the methods, and retrieved 7.8% of the COX-2 active compounds within the top 2% of the total library (Figure 3b, Figure 4a). EFMOGA outperformed single energy score among the top 5% of the library, and retrieved 6% of active compounds among the top 2% of the database, although it performed worse than single contact score. Whereas, MOEA_Cnt represents a relatively weak ability than single contact score before the top 5% of the database, but greatly increases thereafter.

Figure 4

Histogram of active compounds found with five MOFSOM and single scoring function for assaying between the 2%, 5%, 10% and 15% of the ranked database against each system. (a) COX-2 system, (b) Estrogen receptor system, (c) Thrombin system with five optimization methods: Energy score (red), Contact score (cyan), EFMOGA (blue), MOEA_Nrg (green), and MOEA_Cnt (pink), respectively.

Docking to the relatively large hydrophobic site of Estrogen Receptor

As an example of a well-understood target with therapeutic relevance whose crystallographic structural data were publicly available, estrogen receptor was chosen. Like COX-2, the binding site of estrogen receptor is a relatively larger, fully-enclosed lipophilic cavity (Figure 2b), which is little opened to solvent, there are acceptor groups at either end that can form hydrogen bonds with ligands, but it is predominantly hydrophobic on the whole[49, 65].

For this case, EFMOGA performed well among the top 5% of the database, especially before the top 2%, and reached the maximum enrichment (EF = 14.4) at the top 0.2% of the database (Figure 5a). Single energy score exhibited the poorest performance in all methods, the maximum enrichment is less than fivefold over random at the top 0.2% of the database, although MOEA_Nrg reached the same enrichment, MOEA_Nrg outperformed single energy score after that. Different from COX-2, MOEA_Cnt indicated a very good performance than single contact score for this case, it is more reasonable that we preferred contact score from pareto-optimal solutions to select the better shape satisfaction with slight difference in energy score. Using single contact score, the maximum enrichment (EF = 7.2) was reached at the top 0.4% of the database, but the maximum enrichment rose to 9.6, and moved up to the top 0.2% of the database when using MOEA_Cnt.

Figure 5

Enrichment of 105 active compounds of ER in docking screens. (a) The enrichment factor of docking the database by MOFOM and individual scoring function. (b) The percentage of active compounds found by MOFOM and individual scoring function. The plot shows the results using five optimization methods: Energy score (black), Contact score (red), EFMOGA (cyan), MOEA_Nrg (green), and MOEA_Cnt (blue), respectively.

Similar to COX-2, approximately 5.7% of active compounds were retrieved using MOEA_Nrg at the top 2% of the database (Figure 5b, Figure 4b), which was about twofold over that of single energy score. Relative to single contact score, MOEA_Cnt retrieved 9.5% of the active compounds at the top 2% of the database, but only less than 5% of the active compounds were retrieved using single contact score. It is encouraging that EFMOGA represents better performance than energy score and contact score at the top 2% of the database, with 8% of active compounds retrieved.

Docking to the intermediate polarity site of Thrombin

In contrast to COX-2 and estrogen receptor, the binding site of thrombin is an intermediate polarity site[65], with a large hydrophobic pocket (a smaller proximal and a larger distal pocket) which is a more exposed binding site to solvent[48], in addition, there is a S1 specificity pocket, which is a narrow and restricted pocket comprising the carboxylate group of Asp189 at the bottom, most thrombin inhibitors form hydrogen bonds with Asp189 and also to Gly216 (Figure 2c). Due to many polar groups in the binding site, it is difficult to distinguish active compounds for DOCK energy score which is most reliable for the apolar active site, unlike those scoring functions with extra consideration of hydrogen bonding interactions such as PMF or FlexX score[18, 34, 6567].

As expected, although the maximum enrichment was 2 at the top 6.4% of the database for MOEA_Nrg, and the maximum enrichment was 2.2 at the top 5.5% of the database when using single energy score, there is a slight increase in quantity using MOEA_Nrg against single energy score at the top 2% of the database (Figure 6a). MOEA_Nrg slightly underperformed than single energy score after the top 2% of the database (Figure 6b). Like ER case, MOEA_Cnt performed the best in all the methods, the maximum enrichment (EF = 8.3) was reached at the top 0.2% of the database, approximately twofold over contact score, with a maximum value of 4.9 at the top 0.2% of the database (Figure 6a), at the same time, 9.4% of the active compounds were retrieved with MOEA_Cnt at the top 2% of the database (Figure 4c). Same as other two systems, EFMOGA continues a good performance at the top 2% of the database, with a 3.0 enrichment reached at the top 0.6% of the database, and retrieved approximately 5% of the active compounds at the top 2% of the database, which is less than that of contact score, but better than that of energy score, EFMOGA and MOEA_Nrg performed not so well compared with single score function after the top 5% of the database.

Figure 6

Enrichment of 646 active compounds of thrombin in docking screens. (a) The enrichment factor of docking the database by MOFOM and individual scoring function. (b) The percentage of active compounds found by MOFOM and individual scoring function. The plot shows the results using five optimization methods: Energy score (black), Contact score (red), EFMOGA (cyan), MOEA_Nrg (green), and MOEA_Cnt (blue), respectively.


Performance of MOSFOM and Characteristics of binding site and preferential selection of scoring function

Simple as it is, contact score performed unexpectedly well for all the three systems in our study. Especially at the top 5%, contact score performed better than energy score, possibly arising from molecular diversity of the randomly selected ACD database, which distributes well not only in heavy atoms but also molecular torsions. Contact score can rapidly seek out those whose geometry shape can satisfy the binding site (containing shape and volume), which is especially obvious to those with completely buried and narrow cavity. However, energy score presented a weak ability to distinguish the active compounds from the decoys with shape satisfaction.

It is encouraging that compared with individual score, every method of MOSFOM presented an increase not only in enrichment but also hit rates of the active compounds retrieved in some ways. EFMOGA with different weights in different systems obtained better results than single score at the top 5% (especially at the top 2%) of the database for all three cases in this study (see Results), but gradually faded afterwards. It can be easily understood that those active compounds with high affinity and good shape complementarity were first retrieved, and then those emphasizing particularly on the larger weight in different test system were obtained, consecutively. Therefore, if not knowing which scoring function will work better for the test system in advance, more enrichment can be obtained with EFMOGA in the most interesting section (generally 2%) of the database. It should be noted that EFMOGA, with different weights for different test systems through experimental protein-ligand complex, is more reasonable and different from the simple linear combination (LC)[68] of several scoring functions or ScreenScore through a combination of scoring terms[69]. EFMOGA seems more pertinent and advantageous, and will be a tendency for scoring and ranking to consider different scoring functions focusing on diverse aspects for different systems.

Different from EFMOGA, ε-MOEA selected the preferential solution from a pareto-optimal solutions family (parts of the pareto frontier) through multi-objective optimization using multiple scoring functions (see Methods). With different preferences, MOEA_Nrg and MOEA_Cnt all outperformed their corresponding individual scoring function, i.e., MOEA_Nrg corresponding to energy score and MOEA_Cnt corresponding to contact score. A higher enrichment and hit rates of the active compounds were obtained using MOEA_Nrg than using individual energy score in COX-2 and ER system, but there is an inverse phenomenon for thrombin case. There are several strong hydrogen bond interactions between the ligand and the residues in S1 pocket for thrombin system. If we select the one with lowest energy score from the docked conformations with acceptable shape complementarity, more compounds will have good shape complementarity conformations but are only slightly different in energy score, since the energy score does not pay more additional attention to the H-bond interaction. In this case, it will be more difficult to distinguish the active compounds. On the contrary, MOEA_Cnt performed well for the thrombin and ER case, where an open cavity exists, accommodating enough orientations or conformations of a ligand, therefore, it seems more reasonable that MOEA_Cnt selected the best shape complementarity conformation with differences in energy score. So, if one does know which scoring function will work better for the test system in advance, more enrichment and better performance can be obtained using ε-MOEA through relative preference.

Limitations of MOSFOM, further improvements and prospect in bioinformatics and drug design

More scoring functions, which pay special attention to hydrogen bonding interaction, or chemical complementarity, solvent effect, should be involved in multi-objective optimization, however, there are only energy score and contact score available for us. Although MOSFOM performed better than individual score, there will be more choices for MOFSOM in specific test system if more interactions were considered. We can determine different weights of scoring functions because EFMOGA can be determined to deal with different cases and maybe prefer others (e.g. chemical complementarity or hydrogen bonds) for ε-MOEA if we know which scoring function will work well for specific test system in advance. Therefore, to develop scoring functions with focus on hydrogen bond, icon or solvent effect and to utilize multi-objective optimization to acquire more reasonable conformations is one of our future research interest.

At the same time, ignorance of the impact of protein flexibility is another limitation for MOSFOM, small changes of the receptor flexibility result in larger variety of binding affinities[70], and docking to a single receptor conformation will significantly reduce the chance of finding the correct pose. Considering protein flexibility to acquire correct protein-ligand binding conformation is crucial for medicinal chemists to find out the structure-activity relationship. We will fulfil this work in the future. Also, computational time of multi-objective optimization is another issue to be solved. There are numerous multi-objective optimization methods, however, they are not suitable for large-scale virtual screening because of the huge time consumption, in this paper, ε-MOEA, which is a steady-state MOEA based on the ε-dominance concept, fulfilled virtual screening quickly within about one minute for one molecule, and EFMOGA, which is a multi-population MOGA based on information entropy, accomplished one docking within 40 seconds using one CPU on SGI origin3800 hardware. However, more improvements are needed to find a set of well-distributed solutions close to the true pareto frontier pragmatically and efficiently.

As stated above, multi-objective optimization approach has been adopted in the library design program MoSELECT[39, 7173] and quantitative structure activity relationships program MoQSAR[74]. We believe that MOEA or MOGA also can be applied to biological sequence alignment, protein fold recognition, conformational generation and ADME/TOX (absorption, distribution, metabolism, excretion and toxicity) with more and more factors taken into account in bioinformatics and drug design.


Development of a fast and accurate scoring function in virtual screening is still a hot and pending issue in current research, different scoring functions focus on different aspects of ligand binding, and no single scoring can satisfy all the systems well, therefore, consensus score was put forward.[35, 36] With several other scoring functions, consensus score re-assessed those conformations optimized using a primary scoring function, but it is not really robust from the viewpoint of optimization. All of these give rise to another heuristic thinking for us, is it possible and more rational to find a most reasonable conformation in the process of optimization using two or more scoring functions in virtual screening? Since present scoring functions can not satisfy all the cases, multi-objective optimization method can be a good compromise.

In this study, we present two kinds of multi-objective optimization method, called MOSFOM, in virtual screening which simultaneously considers energy score and contact score. A set of pareto-optimal solutions were obtained which can simultaneously satisfy energy and shape complementarity using ε-MOEA method, then through two different preferences, the binding conformations of pareto-optimal solutions with lowest energy scoring or best shape complementarity were ranked among all selected conformations. However, EFMOGA acquires different weights of each scoring function for different systems based on experimental X-ray complex structure, that is, for different system, EFMOGA regards the varying weight as the degree of contribution of individual scoring function, which generally focuses on one aspect for a special type of binding sites, this means the scoring function with bigger weight is more suitable for the binding site. We used 10000 random compounds as the decoys, active compounds selected from MDDR database were randomly merged into the decoys, respectively. To ensure justness of comparison, GAsDock, based on an improved genetic algorithm, was used as a benchmark for single-objective optimization. Results show that MOSFOM can enhance the enrichment and greatly increase the hit rates compared with individual score (see Results). For three different kinds of binding sites, MOSFOM represents excellent ability of distinction of active compounds with energy and shape complementarity. EFMOGA specially performed well at the top 2% of the database, MOEA_Nrg and MOEA_Cnt performed better than individual scoring function if a proper type of binding site was selected.

Multi-objective optimization method was successfully applied in virtual screening with two different scoring functions, which can gain reasonable binding pose, and can be furthermore ranked with those potentially compromised conformations of each compound which abandon those conformations that can not satisfy overall objective functions. By analyzing the characteristics of binding site in advance, the most effective multi-objective optimization method is adopted, which is meaningful for all current scoring functions since they can not suit all cases. More specific scoring functions and protein flexibility will be taken into account in our future work.


  1. 1.

    Jorgensen WL: The many roles of computation in drug discovery. Science 2004, 303(5665):1813–1818. 10.1126/science.1096361

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Shoichet BK: Virtual screening of chemical libraries. Nature 2004, 432(7019):862–865. 10.1038/nature03197

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  3. 3.

    Congreve M, Murray CW, Blundell TL: Keynote review: Structural biology and drug discovery. Drug Discov Today 2005, 10(13):895–907. 10.1016/S1359-6446(05)03484-7

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Xiong B, Gui CS, Xu XY, Luo C, Chen J, Luo HB, Chen LL, Li GW, Sun T, Yu CY, et al.: A 3D model of SARS_CoV 3CL proteinase and its inhibitors design by virtual screening. Acta Pharmacol Sin 2003, 24(6):497–504.

    CAS  PubMed  Google Scholar 

  5. 5.

    Lengauer T, Lemmen C, Rarey M, Zimmermann M: Novel technologies for virtual screening. Drug Discov Today 2004, 9(1):27–34. 10.1016/S1359-6446(04)02939-3

    CAS  Article  PubMed  Google Scholar 

  6. 6.

    Bohdan Waszkowycz, Perkins TimDJ, Sykes RichardA, Li J: Large-scale virtual screening for discovering leads in the postgenomic era. IBM Systems J 2001, 40(2):360–378.

    Article  Google Scholar 

  7. 7.

    Doman TN, McGovern SL, Witherbee BJ, Kasten TP, Kurumbail R, Stallings WC, Connolly DT, Shoichet BK: Molecular docking and high-throughput screening for novel inhibitors of protein tyrosine phosphatase-1B. J Med Chem 2002, 45(11):2213–2221. 10.1021/jm010548w

    CAS  Article  PubMed  Google Scholar 

  8. 8.

    Chen L, Gui C, Luo X, Yang Q, Gunther S, Scandella E, Drosten C, Bai D, He X, Ludewig B, et al.: Cinanserin is an inhibitor of the 3C-like proteinase of severe acute respiratory syndrome coronavirus and strongly reduces virus replication in vitro. J Virol 2005, 79(11):7095–7103. 10.1128/JVI.79.11.7095-7103.2005

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  9. 9.

    Liu H, Li Y, Song MK, Tan XJ, Cheng F, Zheng SX, Shen JH, Luo XM, Ji RY, Yue JM, et al.: Structure-based discovery of potassium channel blockers from natural products: Virtual screening and electrophysiological assay testing. Chem Biol 2003, 10(11):1103–1113. 10.1016/j.chembiol.2003.10.011

    CAS  Article  PubMed  Google Scholar 

  10. 10.

    Alvarez JC: High-throughput docking as a source of novel drug leads. Curr Opin Chem Biol 2004, 8(4):365–370. 10.1016/j.cbpa.2004.05.001

    CAS  Article  PubMed  Google Scholar 

  11. 11.

    Kuntz ID, Blaney JM, Oatley SJ, Langridge R, Ferrin TE: A geometric approach to macromolecule-ligand interactions. J Mol Biol 1982, 161(2):269–288. 10.1016/0022-2836(82)90153-X

    CAS  Article  PubMed  Google Scholar 

  12. 12.

    Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, Belew RK, Olson AJ: Automated Docking Using a Lamarckian Genetic Algorithm and an Empirical Binding Free Energy Function. J Comp Chem 1998, 19(14):1639–1662. Publisher Full Text 10.1002/(SICI)1096-987X(19981115)19:14<1639::AID-JCC10>3.0.CO;2-B

    CAS  Article  Google Scholar 

  13. 13.

    Jones G, Willett P, Glen RC, Leach AR, Taylor R: Development and validation of a genetic algorithm for flexible docking. J Mol Biol 1997, 267(3):727–748. 10.1006/jmbi.1996.0897

    CAS  Article  PubMed  Google Scholar 

  14. 14.

    Halgren TA, Murphy RB, Friesner RA, Beard HS, Frye LL, Pollard WT, Banks JL: Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J Med Chem 2004, 47(7):1750–1759. 10.1021/jm030644s

    CAS  Article  PubMed  Google Scholar 

  15. 15.

    Dixon JS: Evaluation of the CASP2 docking section. Proteins 1997, 1: 198–204. Publisher Full Text 10.1002/(SICI)1097-0134(1997)1+<198::AID-PROT26>3.0.CO;2-I

    Article  PubMed  Google Scholar 

  16. 16.

    Perola E, Xu K, Kollmeyer TM, Kaufmann SH, Prendergast FG, Pang YP: Successful virtual screening of a chemical database for farnesyltransferase inhibitor leads. J Med Chem 2000, 43(3):401–408. 10.1021/jm990408a

    CAS  Article  PubMed  Google Scholar 

  17. 17.

    Claussen H, Buning C, Rarey M, Lengauer T: FlexE: efficient molecular docking considering protein structure variations. J Mol Biol 2001, 308(2):377–395. 10.1006/jmbi.2001.4551

    CAS  Article  PubMed  Google Scholar 

  18. 18.

    Rarey MKB, Lengauer T, Klebe G: A fast flexible docking method using an incremental construction algorithm. J Mol Biol 1996, 261: 470–489. 10.1006/jmbi.1996.0477

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    Baxter CA, Murray CW, Clark DE, Westhead DR, Eldridge MD: Flexible docking using Tabu search and an empirical estimate of binding affinity. Proteins 1998, 33(3):367–382. 10.1002/(SICI)1097-0134(19981115)33:3<367::AID-PROT6>3.0.CO;2-W

    CAS  Article  PubMed  Google Scholar 

  20. 20.

    Colin McMartin RSB: QXP: Powerful, rapid computer algorithms for structure-based drug design. J Comput Aided Mol Des 1997, 11: 333–344. 10.1023/A:1007907728892

    Article  Google Scholar 

  21. 21.

    Welch W, Ruppert J, Jain AN: Hammerhead: fast, fully automated docking of flexible ligands to protein binding sites. Chem Biol 1996, 3(6):449–462. 10.1016/S1074-5521(96)90093-9

    CAS  Article  PubMed  Google Scholar 

  22. 22.

    Miller MD, Kearsley SK, Underwood DJ, Sheridan RP: FLOG: a system to select 'quasi-flexible' ligands complementary to a receptor of known three-dimensional structure. J Comput Aided Mol Des 1994, 8(2):153–174. 10.1007/BF00119865

    CAS  Article  PubMed  Google Scholar 

  23. 23.

    Venkatachalam CMJX, Oldfield T, Waldman M: LigandFit: a novel method for the shape-directed rapid docking of ligands to protein active sites. J Mol Graph Model 2003, 21(4):289–307. 10.1016/S1093-3263(02)00164-X

    CAS  Article  PubMed  Google Scholar 

  24. 24.

    Liu M, Wang S: MCDOCK: a Monte Carlo simulation approach to the molecular docking problem. J Comput Aided Mol Des 1999, 13(5):435–451. 10.1023/A:1008005918983

    CAS  Article  PubMed  Google Scholar 

  25. 25.

    Li H, Li C, Gui C, Luo X, Chen K, Shen J, Wang X, Jiang H: GAsDock: a new approach for rapid flexible docking based on an improved multi-population genetic algorithm. Bioorg Med Chem Lett 2004, 14(18):4671–4676. 10.1016/j.bmcl.2004.06.091

    CAS  Article  PubMed  Google Scholar 

  26. 26.

    Buckley PA, Baz Jackson J, Schneider T, White SA, Rice DW, Baker PJ: Protein-protein recognition, hydride transfer and proton pumping in the transhydrogenase complex. Structure Fold Des 2000, 8(8):809–815. 10.1016/S0969-2126(00)00171-4

    CAS  Article  PubMed  Google Scholar 

  27. 27.

    Lyne PD: Structure-based virtual screening: an overview. Drug Discov Today 2002, 7(20):1047–1055. 10.1016/S1359-6446(02)02483-2

    CAS  Article  PubMed  Google Scholar 

  28. 28.

    Kitchen DB, Decornez H, Furr JR, Bajorath J: Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov 2004, 3(11):935–949. 10.1038/nrd1549

    CAS  Article  PubMed  Google Scholar 

  29. 29.

    Bossuyt J, Taylor BE, James-Kracke M, Hale CC: The cardiac sodium-calcium exchanger associates with caveolin-3. Ann N Y Acad Sci 2002, 976: 197–204.

    CAS  Article  PubMed  Google Scholar 

  30. 30.

    Shoichet BK, McGovern SL, Wei BQ, Irwin JJ: Lead discovery using molecular docking. Curr Opin Chem Biol 2002, 6(4):439–446. 10.1016/S1367-5931(02)00339-3

    CAS  Article  PubMed  Google Scholar 

  31. 31.

    Ferrara P, Gohlke H, Price DJ, Klebe G, Brooks CL 3rd: Assessing scoring functions for protein-ligand interactions. J Med Chem 2004, 47(12):3032–3047. 10.1021/jm030489h

    CAS  Article  PubMed  Google Scholar 

  32. 32.

    Gohlke HKG: Approaches to the description and prediction of the binding affinity of small-molecule ligands to acromolecular receptors. Angew Chem Int Ed Engl 2002, 41(15):2644–2676. 10.1002/1521-3773(20020802)41:15<2644::AID-ANIE2644>3.0.CO;2-O

    CAS  Article  PubMed  Google Scholar 

  33. 33.

    Boehm H-J, Stahl M: The Use of Scoring Functions in Drug Discovery Applicationg.New York: Wiley-VCH; 2002., 18: []

    Google Scholar 

  34. 34.

    Bissantz C, Folkers G, Rognan D: Protein-based virtual screening of chemical databases. 1. Evaluation of different docking/scoring combinations. J Med Chem 2000, 43(25):4759–4767. 10.1021/jm001044l

    CAS  Article  PubMed  Google Scholar 

  35. 35.

    Charifson PS, Corkery JJ, Murcko MA, Walters WP: Consensus scoring: A method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. J Med Chem 1999, 42(25):5100–5109. 10.1021/jm990352k

    CAS  Article  PubMed  Google Scholar 

  36. 36.

    Clark RD, Strizhev A, Leonard JM, Blake JF, Matthew JB: Consensus scoring for ligand/protein interactions. J Mol Graph Model 2002, 20(4):281–295. 10.1016/S1093-3263(01)00125-5

    CAS  Article  PubMed  Google Scholar 

  37. 37.

    Xing L, Hodgkin E, Liu Q, Sedlock D: Evaluation and application of multiple scoring functions for a virtual screening experiment. J Comput Aided Mol Des 2004, 18(5):333–344. 10.1023/B:JCAM.0000047812.39758.ab

    CAS  Article  PubMed  Google Scholar 

  38. 38.

    Fonseca CM, Fleming PJ: An overview of evolutionary algorithms in multiobjecitive optimization. Evol Comput 1995, 3(1):1–16. 10.1162/evco.1995.3.1.1

    Article  Google Scholar 

  39. 39.

    Gillet VJKW, Willett P, Fleming PJ, Green DV: Combinatorial library design using a multiobjective genetic algorithm. J Chem Inf Comput Sci 2002, 42(2):375–385.

    CAS  Article  PubMed  Google Scholar 

  40. 40.

    Handschuh S, Wagener M, Gasteiger J: Superposition of three-dimensional chemical structures allowing for conformational flexibility by a hybrid method. J Chem Inf Comput Sci 1998, 38(2):220–232.

    CAS  Article  PubMed  Google Scholar 

  41. 41.

    Zitzler E: Evolutionary Algorithms for Multiobjective Optimization: Methods and Applications. PhD thesis. Zurich, Switzerland,: Swiss Federal Institute of Technology (ETH); 1999.

    Google Scholar 

  42. 42.

    Coello CAC: An Empirical Study of Evolutionary Techniques for Multiobjective Optimization in Engineering Design. PhD thesis. New Orleans, Louisiana,: Tulane University; 1996.

    Google Scholar 

  43. 43.

    Zitzler E, Deb K, Thiele L: Comparison of Multiobjective Evolutionary Algorithms: Empirical Results. Evol Comput 2000, 8(2):173–195. 10.1162/106365600568202

    CAS  Article  PubMed  Google Scholar 

  44. 44.

    Coello CAC: An undated survey of GA-based multiobjective optimization techniques. ACM Comput surveys: 2000 2000, 109–143. 10.1145/358923.358929

    Google Scholar 

  45. 45.

    Deb K: Multi-Objective Optimization using Evolutionary Algorithms. Chichester, UK,: John Wiley & Sons; 2001.

    Google Scholar 

  46. 46.

    Brooijmans N, Kuntz ID: Molecular recognition and docking algorithms. Annu Rev Biophys Biomol Struct 2003, 32: 335–373. 10.1146/annurev.biophys.32.110601.142532

    CAS  Article  PubMed  Google Scholar 

  47. 47.

    Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, et al.: The Protein Data Bank. Acta Crystallogr D Biol Crystallogr 2002, 58(Pt 6 No 1):899–907. 10.1107/S0907444902003451

    Article  PubMed  Google Scholar 

  48. 48.

    Brandstetter H, Turk D, Hoeffken HW, Grosse D, Sturzebecher J, Martin PD, Edwards BF, Bode W: Refined 2.3 A X-ray crystal structure of bovine thrombin complexes formed with the benzamidine and arginine-based thrombin inhibitors NAPAP, 4-TAPAP and MQPA. A starting point for improving antithrombotics. J Mol Biol 1992, 226: 1085. 10.1016/0022-2836(92)91054-S

    CAS  Article  PubMed  Google Scholar 

  49. 49.

    Shiau AK, Barstad D, Loria PM, Cheng L, Kushner PJ, Agard DA, Greene GL: The structural basis of estrogen receptor/coactivator recognition and the antagonism of this interaction by tamoxifen. Cell 1998, 95: 927. 10.1016/S0092-8674(00)81717-1

    CAS  Article  PubMed  Google Scholar 

  50. 50.

    Kurumbail RG, Stevens AM, Gierse JK, McDonald JJ, Stegeman RA, Pak JY, Gildehaus D, Miyashiro JM, Penning TD, Seibert K, Isakson PC, Stallings WC: Structural basis for selective inhibition of cyclooxygenase-2 by anti-inflammatory agents. Nature 1996, 384: 644. 10.1038/384644a0

    CAS  Article  PubMed  Google Scholar 

  51. 51.

    Gasteiger J, Marsili M: Iterative partial equalization of orbital electronegativity – a rapid access to atomic charges. Tetrahedron 1980, 36: 3219–3228. 10.1016/0040-4020(80)80168-2

    CAS  Article  Google Scholar 

  52. 52.

    Meng EC, Shoichet BK, Kuntz ID: Automated docking with grid-based energy evaluation. J Comp Chem 1992, 13: 504–524.

    Article  Google Scholar 

  53. 53.

    Ewing TJA, Kuntz ID: Critical evaluation of search algorithms used in automated molecular docking. J Comp Chem 1997, 18(9):1175–1189. Publisher Full Text 10.1002/(SICI)1096-987X(19970715)18:9<1175::AID-JCC6>3.0.CO;2-O

    CAS  Article  Google Scholar 

  54. 54.

    Ewing TJ, Makino S, Skillman AG, Kuntz ID: DOCK 4.0: search strategies for automated molecular docking of flexible molecule databases. J Comput Aided Mol Des 2001, 15(5):411–428. 10.1023/A:1011115820450

    CAS  Article  PubMed  Google Scholar 

  55. 55.

    Weiner SJ, Kollman PA, Nguyen DT, Case DA: A New force filed for molecular mechanical simulation of nucleic acids and proteins. J Comp Chem 1986, 7: 230–252. 10.1002/jcc.540070216

    CAS  Article  Google Scholar 

  56. 56.

    Wei BQ, Baase WA, Weaver LH, Matthews BW, Shoichet BK: A model binding site for testing scoring functions in molecular docking. J Mol Biol 2002, 322(2):339–355. 10.1016/S0022-2836(02)00777-5

    CAS  Article  PubMed  Google Scholar 

  57. 57.

    Laumanns M, Thiele L, Deb K, Zitzler E: Combining convergence and diversity in evolutionary multi-objective optimization. Evol Comput 2002, 10(3):263–282. 10.1162/106365602760234108

    Article  PubMed  Google Scholar 

  58. 58.

    Deb K, Mohan M, Mishra S: A Fast Multi-objective Evolutionary Algorithm for Finding Well-Spread Pareto-Optimal Solutions. KanGAL Report 2003., No. 2003002:

    Google Scholar 

  59. 59.

    Gohlke H, Hendlich M, Klebe G: Knowledge-based scoring function to predict protein-ligand interactions. J Mol Biol 2000, 295(2):337–356. 10.1006/jmbi.1999.3371

    CAS  Article  PubMed  Google Scholar 

  60. 60.

    Knegtel RM, Wagener M: Efficacy and selectivity in flexible database docking. Proteins 1999, 37(3):334–345. 10.1002/(SICI)1097-0134(19991115)37:3<334::AID-PROT3>3.0.CO;2-9

    CAS  Article  PubMed  Google Scholar 

  61. 61.

    Kang L, Li H, Jiang H, Wang X: An improved adaptive genetic algorithm for protein-ligand docking. J Comput Aided Mol Des 2009, 23(1):1–12. 10.1007/s10822-008-9232-5

    CAS  Article  PubMed  Google Scholar 

  62. 62.

    Mackay DJC: Information Theory, Inference, and Learning Algorithms. Cambridge: Cambridge University Press; 2003.

    Google Scholar 

  63. 63.

    Adami C: Information Theory in Molecular Biology. Phys Life Rev 2004, 1: 3–22. 10.1016/j.plrev.2004.01.002

    Article  Google Scholar 

  64. 64.

    Grosdidier A, Zoete V, Michielin O: EADock: docking of small molecules into protein active sites with a multiobjective evolutionary optimization. Proteins 2007, 67(4):1010–1025. 10.1002/prot.21367

    CAS  Article  PubMed  Google Scholar 

  65. 65.

    Schulz-Gasch T, Stahl M: Binding site characteristics in structure-based virtual screening: evaluation of current docking tools. J Mol Model 2003, 9(1):47–57.

    CAS  PubMed  Google Scholar 

  66. 66.

    Muegge I, Martin YC, Hajduk PJ, Fesik SW: Evaluation of PMF scoring in docking weak ligands to the FK506 binding protein. J Med Chem 1999, 42(14):2498–2503. 10.1021/jm990073x

    CAS  Article  PubMed  Google Scholar 

  67. 67.

    Muegge I, Martin YC: A general and fast scoring function for protein-ligand interactions: a simplified potential approach. J Med Chem 1999, 42(5):791–804. 10.1021/jm980536j

    CAS  Article  PubMed  Google Scholar 

  68. 68.

    Marsden PM, Puvanendrampillai D, Mitchell JB, Glen RC: Predicting protein-ligand binding affinities: a low scoring game? Org Biomol Chem 2004, 2(22):3267–3273. 10.1039/b409570g

    CAS  Article  PubMed  Google Scholar 

  69. 69.

    Stahl M, Rarey M: Detailed analysis of scoring functions for virtual screening. J Med Chem 2001, 44(7):1035–1042. 10.1021/jm0003992

    CAS  Article  PubMed  Google Scholar 

  70. 70.

    Murray CW, Baxter CA, Frenkel AD: The sensitivity of the results of molecular docking to induced fit effects: application to thrombin, thermolysin and neuraminidase. J Comput Aided Mol Des 1999, 13(6):547–562. 10.1023/A:1008015827877

    CAS  Article  PubMed  Google Scholar 

  71. 71.

    Wright T, Gillet VJ, Green DV, Pickett SD: Optimizing the size and configuration of combinatorial libraries. J Chem Inf Comput Sci 2003, 43(2):381–390.

    CAS  Article  PubMed  Google Scholar 

  72. 72.

    Gillet VJ, Willett P, Fleming PJ, Green DV: Designing focused libraries using MoSELECT. J Mol Graph Model 2002, 20(6):491–498. 10.1016/S1093-3263(01)00150-4

    CAS  Article  PubMed  Google Scholar 

  73. 73.

    Gillet VJ: Designing combinatorial libraries optimized on multiple objectives. Methods Mol Biol 2004, 275: 335–354.

    CAS  Article  PubMed  Google Scholar 

  74. 74.

    Nicolotti O, Gillet VJ, Fleming PJ, Green DV: Multiobjective optimization in quantitative structure-activity relationships: deriving accurate and interpretable QSARs. J Med Chem 2002, 45(23):5069–5080. 10.1021/jm020919o

    CAS  Article  PubMed  Google Scholar 

Download references


The authors thank Prof. K. Deb for the source code of ε-MOEA. We also thank Prof. Rolf Hilgenfeld for careful reading this manuscript, and the referees for helpful suggestions and comments. This work was supported by the Special Fund for Major State Basic Research Project (2004CB518900, 2009CB918501 and 2009CB918502), the National Natural Science Foundation of China (grant 20803022, 10572033, 30672539 and 20721003), the Shanghai Committee of Science and Technology(Grant 07dz22004), and the 863 Hi-Tech Program of China (grant 2007AA02Z304). H.L. was also supported by Knowledge Innovation Program of the Chinese Academy of Sciences (grant SIMM0709QN-09).

Author information



Corresponding authors

Correspondence to Honglin Li or Xicheng Wang or Hualiang Jiang.

Additional information

Authors' contributions

HL, HZ and JL contributed to the development and validation of the method, MZ, LK, and XL contributed to the analysis and data interpretation, XW and HJ conceived the idea of the method, and provided guidance for its development and revised the subsequent drafts of this manuscript. All authors read and approved the final manuscript.

Electronic supplementary material

Docking example and method proof.

Additional file 1: A molecular docking example for reproduction of Thymidine Kinase (TK) complex, and the mathematical proof for evaluation function multi-objective optimization genetic algorithm (EFMOGA). (PDF 88 KB)

Authors’ original submitted files for images

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Li, H., Zhang, H., Zheng, M. et al. An effective docking strategy for virtual screening based on multi-objective optimization algorithm. BMC Bioinformatics 10, 58 (2009).

Download citation


  • Enrichment Factor
  • Virtual Screening
  • Pareto Frontier
  • Docking Program
  • Energy Score