Sanjeevini: a freely accessible web-server for target directed lead molecule discovery
© Jayaram et al.; licensee BioMed Central Ltd. 2012
Published: 13 December 2012
Skip to main content
© Jayaram et al.; licensee BioMed Central Ltd. 2012
Published: 13 December 2012
Computational methods utilizing the structural and functional information help to understand specific molecular recognition events between the target biomolecule and candidate hits and make it possible to design improved lead molecules for the target.
Sanjeevini represents a massive on-going scientific endeavor to provide to the user, a freely accessible state of the art software suite for protein and DNA targeted lead molecule discovery. It builds in several features, including automated detection of active sites, scanning against a million compound library for identifying hit molecules, all atom based docking and scoring and various other utilities to design molecules with desired affinity and specificity against biomolecular targets. Each of the modules is thoroughly validated on a large dataset of protein/DNA drug targets.
The article presents Sanjeevini, a freely accessible user friendly web-server, to aid in drug discovery. It is implemented on a tera flop cluster and made accessible via a web-interface at http://www.scfbio-iitd.res.in/sanjeevini/sanjeevini.jsp. A brief description of various modules, their scientific basis, validation, and how to use the server to develop in silico suggestions of lead molecules is provided.
One of the main challenges in structure based drug discovery is to utilize the structural and chemical information of the drug targets and their ligand binding sites to create new molecules with high affinity and specificity, bioavailability and possibly least toxicity . Computer aided drug discovery, in this context, is proving to be particularly invaluable [2–89]. The rapid ascent and acceptance of this methodology has been feasible due to advances in software and hardware. Sanjeevini server has been developed as an enabler for drug designers to address issues of affinity and selectivity of candidate molecules against drug targets with known structures. Sanjeevini comprises several modules with different functions, such as automated identification of potential binding sites (active sites) of ligands on the biomolecular target , a rapid screening of a million molecule database/natural product library  for identifying good candidates for any target protein, optimization of their geometries  and determination of partial atomic charges using quantum chemical methods [92, 93], assignment of force field parameters to ligand  and the target protein/DNA , docking of the candidates in the active site of the drug target via Monte Carlo methods [90, 96], estimation of binding free energies through empirical scoring functions [97–99], followed by rigorous analyses of the structure and energetics [100, 101] of binding for further lead optimization. The computational pathway created rolls over into an automated pipe-line for lead design, if desired. The software takes three dimensional structure of the target protein or nucleotide sequence of DNA as an input; the remaining functionalities are built into the software suite to arrive at the structure and desired binding free energy of the protein/DNA-candidate molecule complex. The methodology treats biomolecular target and candidate molecules at the atomic level and solvent as a dielectric continuum. Validation studies on a large number of protein-ligand and DNA-ligand complexes suggest that performance of Sanjeevini is at the state of the art. The software is freely accessible over the net. We describe here as to how to harness the server for accelerating lead molecule discovery.
Target-molecule complexes with high binding affinity can be subjected to molecular dynamics simulations  in propitious cases, to investigate the effect of conformational flexibility, solvent, salt and entropic factors. About 100 or more structures may be collected over the trajectories and converged average binding free energies of the complexes may be obtained. Further post facto energy component analyses of the target-ligand complex can help in chemical modifications on the candidate molecule for enhancing the binding affinities. Different modules described above have been incorporated, which work in a pipeline as depicted in the architecture (Figure 2).
Sanjeevini software comprises several modules with high accuracies, working in a pipeline, and given a protein/DNA as the drug target, and a ligand molecule which is optional to the software suite, it helps in designing lead molecules.
Sanjeevini comprises three scoring functions christened Bappl , Bappl-Z  and PreDDICTA  for protein-ligand complexes, Zn containing metalloproteinase-ligand complexes and DNA-ligand complexes respectively. Bappl is an all atom energy based empirical scoring function comprising electrostatics, van der Waals, desolvation and loss of conformational entropy of protein side chains upon ligand binding. Bappl-Z scores protein-ligand complexes with Zn as the metal ion in the binding site in which a non-bonded approach to model the interactions of the zinc ion with all other atoms of the protein-ligand complex has been employed along with the four terms described for Bappl. PreDDICTA is an all atom energy based scoring function which computes binding affinity of a DNA oligomer with a non-covalently bound drug molecule in the minor groove. The function is a combination of electrostatics, steric complementarities, entropic and solvent effects, including hydrophobicity. There are very few high accuracy scoring functions reported in literature for DNA-ligand complexes and, PreDDICTA thus provides a strong platform for designing molecules binding specifically to DNA. The program takes DNA-ligand complex as an input and outputs binding free energies associated with the complex.
The docking module of Sanjeevini comprises three programs christened ParDOCK , AADS  and DNADock [96, 99]. ParDock is an all atom energy-based Monte Carlo, protein-ligand docking algorithm. The module requires a reference protein-ligand complex (target protein bound to a reference ligand at its binding site) as an input along with the candidate molecule to be docked. The algorithm docks the ligand molecule to the reference protein and outputs five docked structures representing different poses of ligand molecule along with the predicted binding free energies of the docked poses using Bappl/BapplZ scoring function. The program is in-built into Sanjeevini software for docking ligand molecules to the target protein for which crystal structure of the protein-ligand complex is available in literature. AADS (An automated active site identification, docking and scoring protocol for protein targets based on physico-chemical descriptors) predicts all potential binding sites in a protein and docks the input ligand molecule at the top ten predicted binding sites. Eight docked structures are generated at each of these ten sites and scored using Bappl/BapplZ scoring function. Five out of the eighty structures, favorable energetically are emailed back to the user along with the binding free energy values. The program has been tested previously  on more than 600 protein-ligand complexes with known binding site information. AADS predicted the true binding sites within the top ten sites with 100% accuracy. A blind docking on 170 protein targets  with known binding sites and known experimental binding free energies associated with the complexed ligands was also performed. The methodology restored the binding pose of the ligands to their native binding sites in the above 170 complexes with an accuracy of 90% for the top ranked docked structure and the predicted binding free energies of the top most docked structure correlated well with experiment (correlation coefficient ~ 0.82; see Figure F4 of ). The RMSD (Root Mean Square Deviation) between crystal and the docked structures in more than 80% of the cases is within 2 Å (Figure F5 of ). DNADock is an all atom Monte Carlo based docking algorithm which has been implemented in parallel mode and is incorporated into the software suite. The program takes nucleotide sequence and the candidate ligand molecule as input, generates canonical A or B DNA  or an average molecular dynamics B DNA structure [124, 125] based on the user's choice, docks the candidate ligand molecule in the minor groove of DNA, and scores the docked structures through PreDDICTA scoring function. Five docked structures with their binding free energy values are reported back to the user.
RASPD (A rapid identification of hit molecules for target proteins via physico-chemical descriptors) is a computationally fast protocol for identifying hit molecules for any target protein. The methodology establishes complementarity in physico-chemical descriptor space of the target protein and the candidate molecule via a QSAR type approach and rapidly generates a reasonable estimate of the binding energy. The accuracies of RASPD are discussed elsewhere (Mukherjee and Jayaram manuscript in preparation).
Some of the published results of scoring functions for protein-ligand complexes originating in physics based or knowledge based methods include DFIRE (r = 0.63) , × SCORE (r = 0.77) , SMoG (r = 0.79) , BLEEP (r = 0.74) , PMF(r = 0.78) , SCORE (r = 0.81) , LUDI (r = 0.83) , ChemScore (r = 0.84) , Ligscore (r = 0.87) , KGS comprising of both X-Score and PLP (r = 0.82) . Sanjeevini scoring function for protein-ligand complexes yielded a correlation coefficient (r) of 0.87. There are very few scoring functions reported in literature for DNA-ligand complexes. One among them is the KS score (r = 0.68) . Sanjeevini scoring function for DNA-ligand complexes has been tested on 39 DNA-ligand complexes involving no training which yielded a correlation coefficient of 0.90. PreDDICTA has been reported to perform better than some of the existing scoring functions for DNA-ligand complexes in literature . Scoring functions for zinc containing metalloprotein-ligand complexes reported in literature include the work of Raha et al., (R2 = 0.69) , Hou et al., (R2 = 0.85) , Hu et al., (0.50) , Rizzo et al., (R2 = 0.74) , Khandelwal et al., (R2 = 0.90) . Sanjeevini yielded a correlation coefficient R2 = 0.82 on zinc-containing metalloprotein ligand complexes. The overall correlation coefficient of Sanjeevini for protein/DNA-ligand complexes (Figure 3) is 0.88.
While designing new molecules for a target protein/DNA, user may have experimental (Ki/IC50/Kd) values of known binders reported in the literature. Before designing new candidate molecules against a target protein/DNA, we propose to the Sanjeevini user to predict the binding free energies of the known binders and plot a correlation graph between the experimental and predicted binding free energies. This would give a relative understanding of the predicted binding free energies vis-a-vis experiment, helping in discriminating between drug-like and non-drug-like molecules against a given target. With this proposal, we present a few case studies on an important class of drug targets which can set examples for the Sanjeevini users to utilize the same methodology on various drug targets to come up with suggestions of hit molecules.
Docking and scoring studies of experimentally reported trypsin binding molecules using Sanjeevini
Ligand (Molecular formula)
In the Bovine pancreatic trypsins, the amino acids mainly involved in interactions with the ligand molecules are reported to be Ser 172, Asp 171 and Gly 196 in the target protein (PDBID 1S0R) . We visualized the docked structures obtained from the above blind docking studies of trypsin inhibitors against the target (PDB ID 1S0Q) to make sure if the top ranked docked structures have the native ligand pose restored in the native binding site of target. A good estimate of the binding free energies through Sanjeevini protocol in the above two case studies evident from a high correlation coefficient obtained (Figures 6 and 7) by two different methodologies taking care of inputs with known binding site and unknown binding site information in a protein target illustrates the strength of the Sanjeevini software.
Improvements conceived in the future versions of Sanjeevini are: (i) consideration of the flexibility of the candidate ligand molecules, and the active site amino acids of the target, (ii) docking and scoring of the candidate molecules in the presence of a cofactor or multiple metal ions, (iii) extension of the DNA docking and scoring methodology to DNA binding intercalators and eventually (iv) creating an assembly line from genomes to hits .
This article presents Sanjeevini, a state of the art, structure based computer aided drug discovery (SBDD/CADD) software suite implemented on an 80 processor cluster and presented to the user as a freely accessible server. The high accuracy of the modules and a user friendly environment should help the user in designing novel lead compounds.
Project name: Sanjeevini
Project home page: http://www.scfbio-iitd.res.in/sanjeevini/sanjeevini.jsp
Operating systems: Linux
Programming languages: C++ and java
Any restrictions to use by non-academics: none
A detailed tutorial with various inputs and outputs of Sanjeevini in the form of snapshots is available at the following link http://www.scfbio-iitd.res.in/sanjeevini/example/Tutorial.pdf. The coordinates of the validation dataset of 335 protein/DNA targets are available at the following link http://www.scfbio-iitd.res.in/sanjeevini/dataset.jsp.
This work is carried out under programme support to computational biology from the Department of Biotechnology, Govt. of India. Ms. Tanya Singh is a recipient of Senior Research Fellowship from Council of Scientific & Industrial Research, Govt. of India. Goutam Mukherjee is a recipient of Senior Research Fellowship from the University Grants Commission. The authors are thankful to Mr. Bharat Lakhani, for help in web-enabling the current version of Sanjeevini.
This article has been published as part of BMC Bioinformatics Volume 13 Supplement 17, 2012: Eleventh International Conference on Bioinformatics (InCoB2012): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S17.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.