KiDoQ: using docking based energy scores to develop ligand based model for predicting antibacterials

Background Identification of novel drug targets and their inhibitors is a major challenge in the field of drug designing and development. Diaminopimelic acid (DAP) pathway is a unique lysine biosynthetic pathway present in bacteria, however absent in mammals. This pathway is vital for bacteria due to its critical role in cell wall biosynthesis. One of the essential enzymes of this pathway is dihydrodipicolinate synthase (DHDPS), considered to be crucial for the bacterial survival. In view of its importance, the development and prediction of potent inhibitors against DHDPS may be valuable to design effective drugs against bacteria, in general. Results This paper describes a methodology for predicting novel/potent inhibitors against DHDPS. Here, quantitative structure activity relationship (QSAR) models were trained and tested on experimentally verified 23 enzyme's inhibitors having inhibitory value (Ki) in the range of 0.005-22(mM). These inhibitors were docked at the active site of DHDPS (1YXD) using AutoDock software, which resulted in 11 energy-based descriptors. For QSAR modeling, Multiple Linear Regression (MLR) model was engendered using best four energy-based descriptors yielding correlation values R/q2 of 0.82/0.67 and MAE of 2.43. Additionally, Support Vector Machine (SVM) based model was developed with three crucial descriptors selected using F-stepping remove-one approach, which enhanced the performance by attaining R/q2 values of 0.93/0.80 and MAE of 1.89. To validate the performance of QSAR models, external cross-validation procedure was adopted which accomplished high training/testing correlation values (q2/r2) in the range of 0.78-0.83/0.93-0.95. Conclusions Our results suggests that ligand-receptor binding interactions for DHDPS employing QSAR modeling seems to be a promising approach for prediction of antibacterial agents. To serve the experimentalist to develop novel/potent inhibitors, a webserver "KiDoQ" has been developed http://crdd.osdd.net/raghava/kidoq, which allows the prediction of Ki value of a new ligand molecule against DHDPS.


Background
An escalating magnitude of drug resistance among bacterial pathogens has been installing a serious threat on the public health and economy of the developed world. A survey report has suggested that the direct cost to US economy alone due to drug resistant bacterial infection is around $4-$5 billion annually [1][2][3]. Even for pharmaceuticals companies, it turns out to be a heart-dying situation that after investing~$800 million and about 15 years of atrocious labor to introduce a drug in the market, the pathogens already attains resistance against the drug. Therefore, there is an urgent need to recognize new inhibitors against novel and/or known targets. Undoubtedly, well-established bacterial targets i.e. cell wall and membrane biosynthesis, protein biosynthesis, nucleic acid etc always the first choice for developing antibacterials. The recent trend in this direction indicates that researchers are looking for novel targets alongside to discover new classes of inhibitors/antibiotics.
The amino acids biosynthetic pathways specifically lysine pathway has gained special attention because of its potential role in bacterial cell wall and protein synthesis [4,5]. The D, L-diaminopimelic acid (meso-DAP), an important intermediate in the biosynthetic pathway of lysine is crucial in cross-linking peptidoglycan chains to provide strength and rigidity to the bacterial cell wall (known as DAP pathway). The absence of this pathway in mammalian system suggests that specific inhibitors of this biosynthetic pathway may be a valuable for developing novel classes of antibacterial agents. In this study, we explored DHDPS enzyme of the pathway, which catalysis condensation of pyruvate and aspartate semialdehyde to form DHDP. Figure 1 shows the established DAP pathway for DAP and lysine biosynthesis. The enzyme is encoded by dapA gene, which has been cloned and expressed from several strains, including Thermatoga maritima, Corynebacterium glutamicum, Mycobacterium tuberculosis and Bacillus anthracis. The three-dimensional structures of DHDPS enzyme from Escherichia coli, Staphylococcus aureus, M. tuberculosis and B. anthracis enzymes with substrate pyruvate and without have been reported [6][7][8][9][10][11][12][13][14][15][16][17][18].
The antibacterial identification using experimental techniques is invariably very expensive, requires extensive pains and labor. Therefore, in silico techniques, which have the power to cut down these unavoidable steps, would be valuable. In recent years, in silico techniques like quantitative structure activity relationship (QSAR) and molecular docking are gaining high popularity in the drug discovery [19][20][21]. Both these methodologies allow the identification of probable lead candidates expeditiously prior to chemical synthesis and characterization, thereby, making the process more cost effective [22,23].
In the present study, we attempt to integrate power of two in silico potential techniques: QSAR and molecular docking by using docking generated energy-based descriptors for building QSAR models. Using this strategy, the information regarding binding mode of ligands in the active site is accumulated which would in turn assist the accurate prediction of better inhibitor with improved K i values. To facilitate this we also developed a web-interface to help experimentalist working in the field of designing novel inhibitors against DHDPS enzyme.

Results
For the docking of 23 inhibitors, E. coli DHDPS crystal structure stored in the PDB file 1YXD was retrieved. The crystal structure of DHDPS consisted of two similar chains (A and B) with inhibitor bound at allosteric site [13]. The water molecules and inhibitor were removed using PYMOL software and chain A was considered for the docking purpose. The python scripts were used for carrying out automated flexible docking of 23 inhibitors on the predefined and experimentally characterized binding pocket, where the residue LYS161 being particularly very important. Hence, it's important to consider the flexibility of LYS161 and the inhibitors, while performing docking. Figure 2 shows the docking of two inhibitors: Inh-6 (having minimum K i value) and Inh-10 (with maximum K i value) at the active site of DHDPS enzyme. In order to validate our docking methodology another crystal structure of E. coli DHDPS (3DU0) with substrate bound at the active site was obtained from PDB. The enzyme 1YXD could not be used as it enclosed bound conformation of an allosteric inhibitor (S)-Lysine. Since crystals were remarkably similar (RMSD value of 0.15Å), therefore, the same procedure for the docking of pyruvate was adopted which resulted in very slight variation in the RMSD value of 0.31 Å. Hence, the docking protocol adopted in the present Figure 1 Enzymatic action of DHDPS leads to the biosynthesis of bacterial cell wall and protein components. Figure 1 shows the action of DHDPS enzyme involved in protein and cell wall synthesis process. study was able to reproduce the conformation comparable to the crystal structure with substrate at active site. Additionally, analysis of 10 docked poses of substrate generated by AutoDock software was also carried out. In Additional File 1: table S1 we have shown the values of free binding energies and RMSDs in the inreasing order of ranking. It was observed that RMSD value for the fifth ranked pose was lesser in comparison with the pose with best and minimum free binding energy. We also calculated the pair-wise corelation between free binding energy and RMSD, resulted in R value of 0.81, which reveals that there exists correlation between free binding energy and RMSD values, however not the ideal or perfect one. Therefore, it's not always true that the pose with the lowest binding energy is the one with the lowest RMSD to the crystal structure. Ofcourse, one can validate or check the RMSD values for a single ligand system with bound crystal structure known. However, during virtual screening procedure with large number of unknown structures to dock, it's practically impossible to obtain the RMSD values. Therefore, in such cases, it has been shown in the past that the compounds with the lowest binding energies are generally considered as potential hits.
It To obtain significant and non-correlated variables from the above-mentioned 11 descriptors, a statistical  Figure 2 shows the docked conformation of inh-10 and inh-6 in active site of DHDPS where protein is shown in secondary structure and inh-10, inh-6 is represented in ball and stick model. package, STATISTICA, was used. Indeed all the descriptors were highly significant, showing the p < 0.05. To filter out correlated descriptors, pair-wise correlation coefficient at the cut-off value of 0.9 was imposed. The two variables namely E Unb and E IntL yielded the pairwise correlation values > 0.9 with E FTot and therefore filtered out from the further analysis (Table 1).
Using MLR, a QSAR based model was generated using 4 variables namely, E FreeBind , E Elec , E IntR , and E Tors which accomplished correlation (R/q 2 ) values of 0.81/0.65 with MAE of 2.61 ( Table 2). Though model was able to obtain good correlation values however, q 2 value was observed to be very low ( Figure 3). Next, SVM along with F-stepping variable selection approach was employed. During first cycle of F-stepping remove-one, an elimination of fifth descriptor i.e. E MLFR from the set of n = 9 and the development of SVM model using 8 remaining variables attained the best correlation R/q 2 values of 0.87/0.75 and MAE of 2.24 (C = 50, γ = 45) listed in Table 2. For the second cycle, the removal of 9 th descriptor i.e. E IntR further improved the correlation value to 0.91/0.81 showing reduction in MAE to 2.01 (C = 50, γ = 50). The next cycle however did not enhance the correlation values significantly as exclusion of E MLMR offered correlation values 0.90/0.81 and MAE of 2.16 (C = 75, γ = 50) therefore, making its absence or presence to elicit no influence on correlation values. The correlation between predicted and actual activity values is shown in Figure 4. In the subsequent cycles of variables selection, no improvement in correlation values was observed. Therefore, it can be deduced that 6 descriptors i.e. E FreeBind , E InterMol , E VHD , E Elec , E FTot and E Tors are important to predict the inhibitory activity values for the present dataset of 23 inhibitors against DHDPS. Further, an external cross-validation was carried out by randomly dividing 23 inhibitors dataset into three different sizes of training and test sets such as 21 and 2; 19 and 4; 17 and 6 respectively. The highest correlation q 2 /r 2 values of 0.81/0.97 (an average of 8-9 best models) was obtained for the largest training and smallest test sets of 21 and 2 inhibitors (Table 3). An increase in the size of test set with corresponding decrease in training set size reduced the r 2 values along with slight reduction in q 2 values. The notion behind this splitting was to appraise a high predictive correlation values on the test set even when the size of training set was very low.
Besides, pair-wise correlation coefficient values listed in Table 4 between K i and energy-based descriptors for 23 inhibitors were calculated. Surprisingly, three descriptors such as E FreeBind , E InterMol , and E VHD from the finally selected 6 energy-based descriptors (described earlier) showed high fluctuations with respect to K i values ( Figure 5), which in turn have higher pair-wise correlation coefficient values (irrespective of signs). On the other hand, the variables i.e. E FTot and E Tors observed to be neutral, indeed provided low correlation values of 0.20 and 0.075 respectively. The E MLFR , which showed high correlation value of 0.62 with K i , was removed in the first cycle of variables selection. These deviations prompted us to carry out the clustering of the dataset of 23 inhibitors using JChem software http:// www.chemaxon.com/. As shown in Figure 6, all 23 Comparison between actual and predicted Ki values for MLR model generated using descriptors selected at pair-wise correlation cut-off value 0.9. Figure 3 depict the experimental and predicted Ki value in X and Y direction respectively with q 2 value 0.6531 using MLR model.

Figure 4
The correlation between actual and predicted Ki values for SVM model generated using variables selected at pair-wise correlation cut-off value 0.9. Figure 4 illustrate the experimental and predicted Ki value in X and Y direction respectively with q 2 value 0.811 using SVM model.   Figure 7 and Table 2).
Additionally, QSAR modeling was also carried out using six non-correlated descriptors i.e. E FreeBind , E Elec , E Ftot , E IntR , E MLMR and E Tors having pair-wise correlation value less than 0.5. Using MLR, the QSAR model was developed with four types of input variables-E FreeBind , E MLMR , E IntR , and E Tors , which accomplished correlation (R/q 2 ) values of 0.82/0.67 and MAE of 2.43 (Table 5). Hence, a small increase in correlation values in comparison to earlier MLR model (0.81/0.65 with MAE of 2.61), Figure 5 The variations in the values of 6 energy-based input variables with respect to experimental Ki values. Figure 5 shows the variation in energy descriptors with respect to Ki values.    (Table 5). Next, filtering of E Tors enhanced the correlation (R/q 2 ) value of model to 0.84/0.69 with reduction in MAE to 2.51 (C = 75, γ = 125). Then, QSAR modeling was carried out on the remaining four descriptors and the removal of E FTot augmented the correlation value to 0.93/0.80 with attenuation of MAE value to 1.89, a noteworthy enhancement. The correlation between predicted activities for the inhibitors and their actual experimental values is depicted in Figure 8A. There exists a good agreement between predicted and experimental activity values hence suggesting the robustness of QSAR model. Therefore, the three energy-based descriptors such E FreeBind , E Elec , and E IntR were imperative to predict the inhibitory activity values. Interestingly, the performance of three descriptor based QSAR model was found to be better in comparison to the six descriptors based SVM model (0.90/0.81 and MAE of 2.16) which was described earlier using cut-off value of 0.9. Additionally, the removal of three outliers like Inh-7, Inh-16 and Inh-23, further enhanced the prediction efficiency of QSAR model by increasing the correlation (R/q 2 ) values to 0.94/0.87 and reduction in MAE to 1.45 ( Figure 8B). Further, high external training/testing cross-validated correlation values (q 2 /r 2 ) in the range of 0.78-0.83/0.93-0.95 was attained by randomly splitting the dataset into several training sets for model building and independent testing on corresponding test sets (Table 6).
In order to assess robustness and validation of the finally developed three descriptors based QSAR model, a bootstrap analysis for 100 runs by statistical sampling of the original dataset was also performed which yielded a higher q 2 bootstap value of 0.88 ± 0.029. Thus, higher and lower values of q 2 bootstap and standard deviation parameters comprehensively support the statistical validity of the presently developed QSAR models. Further, Y-randomization test was also carried out using shuffled activity dataset which resulted in poor performance i.e. nearly all of the q 2 values were < zero (q 2 ranged from -0.15 to -0.41 and MAE from 4.15 to 5.13), thereby signifying the consistency of QSAR model.

2D descriptors based QSAR modeling
To compare the performance of three energy-based QSAR model with simple 2D QSAR models, the 2D  QSAR modeling with 14 non-correlated descriptors was also performed. The study was commenced by MLR modeling, which tried to establish structure-activity relationship using five descriptors i.e. MSD, PJI2, Jhetm, ALOGP2, and Me by attaining correlation R/q 2 values of 0.78/0.61 and MAE of 3.08 ( Table 7). The performance of 2D descriptors based model was found to be lower in comparison with four energy-based

Implementation of webserver
We attempted to develop efficient QSAR model and based on these models, a web server "KiDoQ" (available at http://crdd.osdd.net/raghava/kidoq) using CGI-PERL and python scripts was developed. User can draw the structure of ligand molecule using JME editor incorporated on the server. The server also accepts input as mol/mol2 structure files pasted or uploaded on the server ( Figure 9). The working flow of KiDoQ server is shown in Figure 10.

Discussion
The QSAR modeling has been accepted as a promising methodology for lead identification. Nevertheless, if high-resolution target structure is available, then receptor structure based approach is often a first choice. The recent studies have shown the better performance of QSAR models even in the presence of target structure. However, simple QSAR approach can sometimes lead to false prediction if the collected data does not cover the complete property space or the selected 2D/3D descriptors are not reliable. Therefore, both techniques have their own advantages and limitations [19][20][21][22][23]. Keeping in view, the importance of docking and better performance of QSAR, we integrated both approaches by using docking generated energy-based scores as descriptors for QSAR modeling. The major benefit presumed by this integration would be an additional validation of the docking predicted inhibitors as bioactive or inactive by prediction of their bioactivity values using QSAR models henceforth, would facilitate in reduction of false positives.
In the present study, docking of the 23 experimentally known inhibitors of DHDPS at its active binding site resulted in 11 energy-based descriptors. For valid statistical results, it was imperative to restrict the maximal number of descriptors or to remove highly correlated ones, as presence of redundancy reduced the discriminating power of input variables, thereby reducing their worth in model development. Ideally, a regression model with n training set compounds and k descriptors may be acceptable only if n > 4k and for any of the k descriptors-i) the significance level p is < 0.05; ii) the pair-wise correlation coefficient should be < 0.9 [24,25]. Therefore, we also looked for statistically significant and non-correlated energy-based descriptors. All 11 types of energy-based descriptors were statistically significant but  Further, it was also noticed that the structural diversity of 23 inhibitors and redundancy among finally selected six descriptors resulted in wrong selection of input variables. Therefore, we removed highly diverse three structures and used remaining 20 inhibitors for QSAR modeling. Interestingly, the performance of model was found to be enhanced in comparison to the model developed using 23 inhibitors. Further, QSAR modeling carried out with highly non-correlated 3 descriptors i.e E FreeBind , E Elec , and E IntR (selected at the cut-off pair-wise correlation value of 0.5) provided better correlation values in comparison to the earlier six descriptors (selected at the cut-off pair-wise correlation value of 0.9) based QSAR model. Therefore, removal of redundant descriptors reduced the noise and enabled the better training of QSAR models. The three non-correlated descriptors appeared to be governing factors in establishing structure actvity relationship for DHDPS enzyme. One of the possible reasons for their selection was a higher pair-wise correlation value with respect to K i in comparison with other descriptors i.e. E Ftot , E MLMR and E Tors removed during QSAR modeling.
Among three descriptors, the value of E FreeBind based descriptor was found to be dependent on other two Figure 9 A snapshot of submission page of KiDoQ webserver. Figure 9 shows the screen shot of KiDoQ webserver.  descriptors i.e. E Elec , and E IntR as well on other correlated energy based descriptors (such as E VHD ). Generally, in the absence of receptor's flexibility E IntR remains constant and does not make any significant contribution to E FreeBind , however, the flexibility incorporates transformations leading to internal energy changes. In the present study, the changes in the value of E IntR were observed as the LYS161 was kept flexible. It was noticed that inhibitors with lower K i value were characterized by high negative E IntR values. These inhibitors included aliphatic compounds generally the pyruvate and aspartate semialdehyde analogues. On the other hand, inhibitors such as Inh-17, Inh-14, Inh-23 and Inh-10 with higher K i values exhibited lower negative E IntR . In view of this, we suggest E IntR based descriptor is an important discriminating variable for developing robust QSAR models. Further, E Elec was also found to be imperative as variations in the E Elec values was highly dependent on the number and type of receptor residues involved in establishing charge interactions with inhibitors. We observed that inhibitors such as Inh-9, Inh-11, Inh-18 and Inh-20 with strong electrostatic interactions with receptor exhibited strong binding, resulting in higher negative E FreeBind and E Elec values that in turn provided lower K i values. A few inhibitors such as Inh-17 and Inh-14 characterized by aromaticity, the strong electrostatic or π-cationic interactions though provided higher negative E Elec values however, at the cost of reasonable reduction in the E IntR values, which in turn provided higher K i values, in comparison to the aliphatic inhibitors and the ones with weak electrostatic interactions. In addition, it was also figured out that inhibitors i.e. Inh-1, Inh-3, Inh-4, Inh-5, Inh-6, Inh-12, Inh-13, Inh-15 and Inh-19 exhibited strong affinity to receptor albeit no or very weak electrostatic interactions were observed. This suggest that binding of inhibitors to DHDPS is not specifically dependent on electrostatic interactions, however other bonded and non-bonded interactions appeared to be playing important role, which in turn provided higher negative E FreeBind values and lower K i values (For details see Additional File 1).
As we have employed a complex procedure of using docking generated energy-based descriptors for QSAR modeling; therefore, it became imperative to compare the model performance with simple conventional 2D QSAR models. The SVM based 2D QSAR model achieved a poor correlation value of 0.84/0.67 in comparison with docking energy-based SVM model (0.93/0.80) indicating inadequacy of 2D descriptors in providing acceptable and robust QSAR model for dataset of 23 inhibitors. This low performance of 2D QSAR models may be due to presence of high structural diversity among the inhibitors that was not easily captured using simple 2D descriptors.
To conclude, the present strategy of predicting K i values using docking generated energy-based descriptors for QSAR modeling is a promising approach to predict potent inhibitors against DHDPS enzyme.

Conclusions
In this study, we describe a new approach for prediction of antibacterial compounds that both take QSAR and docking strategy into its consideration. By using this approach, we get promising results instead of using these two strategies individually and develop a webserver called KiDoQ. This webserver will be helpful for better prediction of antibacterial compounds against dihydrodipicolinate synthase (DHDPS).

Inhibitors Dataset
The information regarding the experimentally known 23 inhibitors, classified as potent, moderate and slightly weak, was obtained from the literature [6][7][8][9][10][11][12][13][14][15][16][17][18]. The IUPAC names of these inhibitors along with K i values are shown in Table 8. Chem3D Ultra (v11.0), windowsbased software was used for sketching the 2D structures for all inhibitors followed by cleaning and refinement in order to correct the accidentally distorted or unrealistic bond angles and lengths. The 2D structures were converted into 3D structures using CORINA software. Then each structure was energy minimized to give energetically preferred 3D structures.

Docking energy-based descriptors
were calculated using automated docking software Auto-Dock (v.4.0) (AD) [26]. It is a suite of three C programs: i) AutoTors, which facilitates input of ligand coordinates; ii) AutoGrid, which precalculates a three-dimensional grid based on macromolecular coordinates; and iii) Auto-Dock, which performs a actual docking simulations. Before docking process, several separate pre-docking steps: ligand preparation, receptor preparation and grid map calculations were performed. The ligand and receptor preparation stage involved the addition of hydrogen atoms, computing charges, merging non-polar hydrogen atoms and defining AD4 atom types to ensure that atoms conformed to the AutoDock atom types. The information about rotatable torsion bonds that defines the bond flexibility was acquired. The ligands and receptor molecule preparation was followed by grid construction using AutoGrid module. During grid construction, atom types of the ligand, which acted as probes in the calculation of grid maps, were identified. The grid with default volume of 40 × 40 × 40 Å with a spacing of 0.375Å centered on the receptor was prepared. For conformational searches, the docking calculations using the genetic algorithm (GA) procedure with default parameters was performed.

2D QSAR modeling
DRAGON software was used for the calculation of 2D descriptors. For our dataset, the software calculated~848 types of 2D descriptors categorized into different descriptor blocks such as constitutional descriptors, topological descriptors, walk and path count descriptor, connectivity indices, information indices, 2D autocorrelation, edge adjacency, burden eigenvalues, topological charge indices, functional groups, molecular properties and eigenvalues based indices. Initially, the descriptors with zero or unassigned values were excluded and then pair-wise correlation test to remove highly correlated descriptors at a cut-off value of 0.50 was executed. This procedure resulted in 14 descriptors for 2D QSAR modeling.

QSAR Model Construction
QSAR methodology quantitatively correlates the structural molecular properties (descriptors) with functions (biological activities) for a set of compounds by means of linear or non-linear statistical methods. In the present study, we exploited both linear (MLR) and non-linear (SVM) statistical methods for flourishing the robust QSAR models [27,28]. Retrospectively, for QSAR modeling, both linear and non-linear models have been extensively used [29][30][31][32][33][34][35][36]. MLR tries to model the relationship between two or more independent descriptors and dependent variable such as K i by fitting a linear regression equation to the observed data with corresponding parameters (constants) and an error term. On the other hand, SVM based on statistical and optimization theory, handles complex structural features. In the present study, SVM_light http://www.cs.cornell.edu/People/tj/svm_light/, which is an implementation of SVM, was used for QSAR modeling.

Evaluation of QSAR models
To assess the predictive performance of QSAR models, different cross-validation procedures were adopted. First, in leave-one-out strategy (LOOCV), one molecule was removed from the dataset as a test compound and the remaining 22 molecules were used to build the model. This process was repeated 23 times with each inhibitor as a test molecule. Once a regression model was constructed, goodness about the fit and statistical significance was assessed using the statistical parameters outlined below where, x i and y i represents the actual and predicted K i values for the i th compound. N is the total number of compounds, x represents the averaged value of the actual K i for the entire dataset.
Here, it was equally important to use an independent test set to check the real predictive accuracy of trained QSAR models. However, 23 compounds were not expected to be sufficient for independent testing using existing QSAR models. Therefore, an alternative strategy, external cross-validation, was adopted, where different number of inhibitors i.e. 2, 4, and 6 were randomly selected as independent test sets. The models were then trained on the remaining inhibitors i.e. 21, 19, and 17 Inh-23 (1R)-cyclohex-3-ene-1,3-dicarboxylic acid using LOOCV procedure followed by independent testing on the corresponding test sets. This cycle of randomly separating test and training sets was repeated.
Here, to determine the predictive accuracy of models on the test set, predictive r 2 value was used where, SD is the sum of the squared deviations between the activities of the test set and mean activities of the training molecules.
Then, Y-randomization test was performed in order to appraise high training and testing correlation values observed during QSAR modeling, were not occurred incidentally. Here, the shuffled activity dataset was derived by randomly shuffling the dependent variables K i and keeping the descriptors original, afterward using this randomly shuffled dataset to develop new QSAR models. The process of shuffling was carried out many times with subsequent generation of corresponding models nevertheless, with an assumption that the resulting models should give low performance, which would obviously imply the rigorous robustness of the original models.

Input variables selection
The selection of best descriptors that establish the relationship between chemical structure and an inhibitory property is crucial for the success of QSAR modeling. In the present study, we adopted F-stepping remove-one approach for variable selection. Accordingly, each input variable was removed one-by-one from the set of n variables followed by QSAR modeling using the remaining n-1 variables. However, if the correlation value was increased, the particular variable was permanently removed from the analysis. These cycles were repeated until no further improvement in the correlation values was observed and stopped if n-1 removal resulted in reduction of correlation values.

Availability and requirements
Project name: A webserver for predicting DHDPS inhibitors Project home page: http://crdd.osdd.net/raghava/kidoq Operating system(s): Platform independent; Programming language: PERL, CGI-PERL; License: None; Any restrictions to use by non-academics: No restrictions Additional file 1: Textbox S1: Selection of descriptors based on the chemical structures and activities. Additional file shows the descriptor selection on the basis of similarity in chemical structure of inhibitors and their activity. Click here for file [ http://www.biomedcentral.com/content/supplementary/1471-2105-11-125-S1.DOC ]