High-Throughput parallel blind Virtual Screening using BINDSURF
© Sánchez-Linares et al.; licensee BioMed Central Ltd. 2012
Published: 7 September 2012
Skip to main content
© Sánchez-Linares et al.; licensee BioMed Central Ltd. 2012
Published: 7 September 2012
Virtual Screening (VS) methods can considerably aid clinical research, predicting how ligands interact with drug targets. Most VS methods suppose a unique binding site for the target, usually derived from the interpretation of the protein crystal structure. However, it has been demonstrated that in many cases, diverse ligands interact with unrelated parts of the target and many VS methods do not take into account this relevant fact.
We present BINDSURF, a novel VS methodology that scans the whole protein surface in order to find new hotspots, where ligands might potentially interact with, and which is implemented in last generation massively parallel GPU hardware, allowing fast processing of large ligand databases.
BINDSURF is an efficient and fast blind methodology for the determination of protein binding sites depending on the ligand, that uses the massively parallel architecture of GPUs for fast pre-screening of large ligand databases. Its results can also guide posterior application of more detailed VS methods in concrete binding sites of proteins, and its utilization can aid in drug discovery, design, repurposing and therefore help considerably in clinical research.
In clinical research, it is crucial to determine the safety and effectiveness of current drugs and to accelerate findings in basic research (discovery of new leads and active compounds) into meaningful health outcomes. Both objectives need to process the large data set of protein structures available in biological databases such as PDB  and also derived from genomic data using techniques as homology modeling . Screenings in lab and compound optimization are expensive and slow methods, but bioinformatics can vastly help clinical research for the mentioned purposes by providing prediction of the toxicity of drugs and activity in non-tested targets, and by evolving discovered active compounds into drugs for the clinical trials.
This can be achieved thanks to the availability of bioinformatics tools and Virtual Screening (VS) methods that allow to test all required hypothesis before clinical trials. Nevertheless current Virtual Screening (VS) methods, like docking, fail to make good toxicity and activity predictions since they are constrained by the access to computational resources; even the nowadays fastest VS methods cannot process large biological databases in a reasonable time-frame. Therefore, these constraints imposes serious limitations in many areas of translational research.
The use of last generation massively parallel hardware architectures like Graphics Processing Units (GPUs) can tremendously overcome this problem. The GPU has become increasingly popular in the high performance computing arena, by combining impressive computational power with the demanding requirements of real-time graphics and the lucrative mass-market of the gaming industry . Scientists have exploited this power in arguably every computational domain, and the GPU has emerged as a key resource in applications where parallelism is the common denominator . To maintain this momentum, new hardware features have been progressively added by NVIDIA to their range of GPUs, with the Fermi architecture  being the most recent milestone in this path.
Therefore, GPUs are well suited to overcome the lack of computational resources in VS methods, accelerating the required calculations and allowing the introduction of improvements in the biophysical models not affordable in the past . We have previously worked in this direction, showing how VS methods can benefit from the use of GPUs [7–9].
Moreover, another important lack of VS methods is that they usually take the assumption that the binding site derived from a single crystal structure will be the same for different ligands, while it has been shown that this does not always happen , and thus it is crucial to avoid this very basic supposition. In this work, we present a novel VS methodology called BINDSURF which takes advantage of massively parallel and high arithmetic intensity of GPUs to speed-up the required calculations in low cost and consumption desktop machines, providing new and useful information about targets and thus improving key toxicity and activity predictions. In BINDSURF a large ligand database is screened against the target protein over its whole surface simultaneously. Afterwards, information obtained about novel potential protein hotspots is used to perform more detailed calculations using any particular VS method, but just for a reduced and selected set of ligands. Other authors have also performed VS studies over whole protein surfaces  using different approaches and screening small ligand databases, but as far as we know, none of them have been implemented on GPUs and used in the same fashion as BINDSURF.
The main idea underlying our VS method BINDSURF is the protein surface screening method, implemented in parallel on GPUs. Essentially, VS methods screen a large database of molecules in order to find which one fit some established criteria . In the case of the discovery of new leads, compound optimization, toxicity evaluation and additional stages of the drug discovery process, we screen a large compound database to find a small molecule which interacts in a desired way with one or many different receptors. Among the many available VS methods for this purpose we decided to use protein-ligand docking [13, 14]. These methods try to obtain rapid and accurate predictions of the 3D conformation a ligand adopts when it interacts with a given protein target, and also the strength of this union, in terms of its scoring function value. Docking simulations are typically carried out in a very concrete part of the protein surface in methods like Autodock , Glide  and DOCK , to name a few. This region is commonly derived from the position of a particular ligand in the crystal structure, or from the crystal structure of the protein without any ligand. The former can be performed when the protein is co-crystallized with the ligand, but it might happen that no crystal structure of this ligand-protein pair is at disposal. Nevertheless, the main problem is to take the assumption, once the binding site is specified, that many different ligands will interact with the protein in the same region, discarding completely the other areas of the protein.
Given this problem we propose to overcome it by dividing the whole protein surface into defined regions. Next, docking simulations for each ligand are performed simultaneously in all the specified protein spots. Following this approach, new hotspots might be found after the examination of the distribution of scoring function values over the entire protein surface. This information could lead to the discovery of novel binding sites. If we compare this approach with a typical docking simulation performed only in a region of the surface, the main drawback of this approach lies on its increased computational cost. We decided to pursue in this direction and show how this limitation can be overcome thanks to GPU hardware and new algorithmic designs.
In essence, in a docking simulation we calculate the ligand-protein interaction energy for a given starting configuration of the system, which is represented by a scoring function . In BINDSURF the scoring function calculates electrostatic (ES), Van der Waals (VDW) and hydrogen bond (HBOND) terms. Furthermore, in docking methods it is normally assumed  that the minima of the scoring function, among all ligand-protein conformations, will accurately represent the conformation the system adopts when the ligand binds to the protein. Thus, when the simulation starts, we try to minimize the value of the scoring function by continuously performing random or predefined perturbations of the system, calculating for each step the new value of the scoring function, and accepting it or not following different approaches like the Monte Carlo minimization method  or others.
One of the main computational bottlenecks of docking simulation methods resides in the calculation of the scoring function . We have already implemented non-bonded interactions kernels on GPUs  when direct summation is used to calculate the electrostatic interactions term, achieving speedups of up to 260 times versus the sequential counterpart. We will refer later to this kernel as DIRECT_KERNEL. When dealing with rigid or mixed flexible-rigid systems we can further improve the speed of the calculations using precomputed grids . This kernel will be referred later as GRID_KERNEL. We studied the influence and convenience of both kernels for the design of BINDSURF, which was carried out totally from scratch on the GPU.
NVIDIA GPU platforms can be programmed using the Compute Unified Device Architecture (CUDA) programming model  which makes the GPU to operate as a highly parallel computing device. Each GPU device is a scalable processor array consisting of a set of SIMT (Single Instruction Multiple Threads) multiprocessors (SM), each of them containing several stream processors (SPs). Different memory spaces are available in each GPU on the system. The global memory (also called device or video memory) is the only space accessible by all multiprocessors. It is the largest and the slowest memory space and it is private to each GPU on the system. Moreover, each multiprocessor has its own private memory space called shared memory. The shared memory is smaller and also lower access latency than global memory.
The CUDA programming model is based on a hierarchy of abstraction layers: The thread is the basic execution unit that is mapped to a single SP. A block is a batch of threads which can cooperate together because they are assigned to the same multiprocessor, and therefore they share all the resources included in this multiprocessor, such as register file and shared memory. A grid is composed of several blocks which are 5 equally distributed and scheduled among all multiprocessors. Finally, threads included in a block are divided into batches of 32 threads called warps. The warp is the scheduled unit, so the threads of the same block are scheduled in a given multiprocessor warp by warp. The programmer declares the number of blocks, the number of threads per block and their distribution to arrange parallelism given the program constraints (i.e., data and control dependencies).
Algorithm 1 BINDSURF overview2
1: Read main simulation configuration file bindsurf_conf.inp
2: Generate ES and VDW grids (es_grid, vdw_grid) of the protein using GEN_GRID
3: Generate ligand_conformations with GEN_CONF
4: Read protein and calculate surface_spots using GEN_SPOTS
5: for all ligand_conformations do
6: Calculate initial_configuration of the system on GPU (protein, surface_spots, ligand_conformation) using GEN_INI
7: Surface Screening using SURF_SCREEN (initial_configuration, ligand_conformation, protein, surface_spots, es_grid, vdw_grid)
8: end for
9: Process results
This Section introduces the underlying design of BINDSURF which is summarized in Algorithm 1. All necessary simulation parameters are specified in a configuration file called bindsurf_conf.inpwhich contains the following information:
Molecular data: Filenames related with protein, ligand and input-output directories. Also, number of ligand conformations that will be generated from the input one.
Force field data: ES, VDW and HBOND force field parameters used in the scoring function. By default we provide the OPLS force field .
Monte Carlo minimization parameters: number of steps of the simulation, energy cutoff, maximum values for random shift and rotation.
Output related data: file names for the different output files such as for graphical display through Pymol , energy distributions, detailed information about the poses with the lowest scoring function values, etc.
Once bindsurf_conf.inphas been read and processed, we generate an ensemble of ligand conformations from the input ligand using GEN_CONF; an ad hoc version of the FlexScreen docking program . Given the modular structure of BINDSURF, any other program which generates ligand conformational ensembles can be used for the same purpose.
Next, we need the information pertaining to the areas of the protein surface (spots) where each individual simulation will take place. It is important to find an optimal number of spots, since too many spots would increase unnecessarily the total computation time, and few spots would not cover completely the protein surface. We have found that a good strategy, implemented in GEN_SPOTS, consists in the calculation for the input protein of the coordinates of the alpha carbons of each residue. All parallel simulations will take place in spherical regions, defined by the centers of these spots, with a cutoff radius of 10 Angstroms. This module can be easily changed to any other that just outputs the list of the coordinates of the spots.
In the fourth step of Algorithm 1 we generate the electrostatic (ES) and van der Waals (VDW) grids of the protein using the GPU program GEN_GRID, as depicted in our previous work . The main idea underlying GEN_GRID is to suppose that the protein is rigid, and then to precomputate the electrostatic potential and the neighbours lists, in a regular spatial disposition of points called grid , as depicted in, that comprises both the protein and the ligands. GEN_GRID reads its parameters from the configuration files, generates the grids belonging to the input protein and writes them into different output files that are consequently processed during conformation generation and surface screening, resulting in fast computations of the scoring function.
Before the surface screening process can begin, we need to generate all necessary simulation parameters from the input ligand and protein with the GPU program GEN_INI. For a given ligand ligand_conformation, GEN_INI performs random ligand translations and rotations in order to obtain valid starting conformations of the ligand on each protein surface spot. Once the system is set up, the program SURF_SCREEN performs the protein surface screening procedure. Finally, BINDSURF reports statistics of the obtained results, Pymol files for its convenient 3D visualization as well as many other reports, as will be shown in the results section.
In this Section, we introduce the main core of the BINDSURF program (namely SURF_SCREEN). Algorithm 2 shows the host side pseudocode of BINDSURF. Firstly, the previously obtained information regarding protein and its precomputed grids and surface spots, the ligand conformation, and the simulations initial states are transferred from CPU to the GPU, where all the simulation process takes place.
Algorithm 2 Host side of the SURF_SCREEN core of the BINDSURF application for a given ligand conformation
1: CopyDataCPUtoGPU (protein, es_grid, vdw_grid, surface_spots, ligand_conformation; shifts_set, quaternions_set, initial_configuration)
2: for i = 0 to number_of_spots/BLOCKSURF do
3: for n = 1 to numSteps do
4: if n is even then
5: GenerateShiftsKernel (randomStates, shifts_set)
7: GenerateRotationsKernel(randomStates, quaternions_set)
8: end if
9: Energy(es_grid, vdw_grid, protein, ligand_conformation, shifts_set, quaternions_set, newEnergy)
10: if n is even then
11: UpdateShiftsKernel(randomStates, oldEnergy, newEnergy)
13: UpdateRotationsKernel(randomStates, oldEnergy, newEnergy)
14: end if
15: end for
16: FindMinima(oldEnergy, minIndexes, minEnergy)
17: CopyDataGPUtoCPU(minIndexes, minEnergy, shifts_set, quaternions_set)
18: end for
On each spot, many simulations (in this case, 128) for each ligand conformation are carried out in parallel on the GPU. The protein-ligand interaction energy is minimized using a parallel adaptation of the Monte Carlo method, utilizing the Metropolis algorithm . Required random numbers in Monte Carlo are generated using the NVIDIA CURAND library , which later are employed to perform the required ligand rotations and displacements in parallel.
As a minimization process, the next iteration always depends on the previous one. Thus, the loop comprised between steps 3 to 15 in the Algorithm 2 is not affordable for parallelization. Therefore, only the internal computation is paralellized; i.e. the generation of shifts and rotations, energy calculation and the update of simulation state.
Moreover, we cannot launch simultaneously all the threads we need for the execution of all the simulations in parallel because the number of threads needed is greater than the maximum allowed. Hence, we only perform simulations for a maximum number of spots (BLOCKSURF value) simultaneously (line 2 in the Algorithm 2).
On one hand, each simulation needs to have a copy of the ligand that can modify. On the other hand, the number of simulations required in this process is huge, and thus it is not viable to have a copy of all information related to the ligand atoms in the GPU memory, such as for instance, all the atom positions. An alternative way of representing the ligand information, which is independent of the ligand size and thus benefits its allocation in the scarce internals of the GPU memory, is to keep a model of the ligand in the GPU constant memory. In this way, the state of each simulation is represented by one 3D point and a quaternion which represent the displacement and rotation about the origin, accumulated along the simulation. This solution can be applied because we use a rigid representation of the molecules.
The Monte Carlo process alternates different steps of rotation and displacement. Thus, we developed two different kernels; (1) for generating displacements of the simulations (called GenerateShiftsKernel), and (2) for generating rotations (called GenerateRotationsKernel). These kernels generate a random move using a local copy of the random state of each simulation and do not modify the random state in global memory. Later, if that move is accepted, the random state in global memory is updated with the random state that generated this movement; otherwise, the random state is not updated and the move is undone in the simulation state.
The function Energy in Algorithm 2 launchs highly GPU optimized non-bonded interaction kernels  for the description of the electrostatic, Van der Waals and hydrogen bond interactions between the ligand and the protein. These kernels are named ESKernel and VDWKernel, which are described in posterior sections. Once the energy is calculated, UpdateShiftsKernel and UpdateRotationsKernel check whether the previous energy values are smaller than the new values calculated for the energy, and if so the movement made is applied permanently to the simulation state.
The minimum value found for the energy belonging to the same sphere surface is obtained by FindMinima function. This function launches two different kernels; (1) a kernel to reduce the energy vector, which stores all energy values calculated in all simulations, and (2) a kernel to compact this vector, in order to reduce the data transferred to the CPU. Finally, we obtain a vector which contains the minimum energy obtained in the simulations for each spot.
Once the simulations are carried out on each protein spot in the surface, in the final output BINDSURF produces for each ligand detailed information about the protein spots where the strongest interactions are found for the different ligand conformations. This information can be parsed directly to PyMOL  to get a graphical depiction of the results. The information regarding the hotspots obtained for different ligands, and the set of ligands with the lowest values of the scoring function, is thought to be later employed in a more detailed VS methodology to screen only this resulting set of ligands in the hotspots found by BINDSURF. Other option is to pass the resulting ligand binding pose obtained by a detailed VS method for a binding site as input for BINDSURF to check whether it could interact in other parts of the protein surface.
In the next subsections we describe how is the scoring function calculated in both CPU and GPU versions.
Algorithm 3 Sequential pseudocode for the calculation of the electrostatic potential
1: for i = 1 to N_simulations do
2: for j = 1 to nlig do
3: energy[i * nlig + j] = q i * interpolate (lig[i * nlig + j], ESGrid)
4: end for
5: end for
The precomputed ES grid is generated by the program GEN_GRID as described in  and afterwards, it is read by SURF_SCREEN from file and loaded onto memory. The calculation of the electrostatic potential for the protein-ligand system is performed as follows; for each ligand atom i with charge q i at point P i we calculate the eight closest protein grid neighbours. Next, an interpolation procedure is applied to estimate the value of the electrostatic potential due to all protein atoms at P i . The same procedure is applied to all ligand atoms summing them up. The pseudocode is shown in Algorithm 3, where N_simulations is the number of simulations, nlig is the number of atoms of each ligand and the function interpolate performs the calculation of the electrostatic potential for each atom.
Algorithm 4 GPU pseudocode for the calculation of the electrostatic potential
1: for all nBlocks do
2: dlig = rotate(clig[myAtom], myQuaternion)
3: ilig = shift(myShift, dlig)
4: index = positionToGridCoordinates (ESGridInfo, ilig)
5: energy_shared[myAtom] = charge[myAtom] * accessToTextureMemory(ESGrid, index)
6: totalEnergy = parallelReduction(energy_shared)
7: if threadId == numThreads%nlig then
8: energy[mySimulation] = totalEnergy
9: end if
10: end for
In a previous work , we studied different strategies for the GPU implementation of the previous algorithm applied to many different ligands. In that study, we reported that the use of the texture memory decreases considerably the time needed for the calculation of the interpolation. Therefore, in BINDSURF we decided to use the texture memory to obtain the electrostatic potential by linear interpolation in a 3D grid. The Algorithm 4 shows the pseudocode of the ES kernel, where clig, charge and nlig is the ligand model (atom positions and charges, and number of atoms); myAtom, myQuaternion and myShift are the atom assigned to the thread and the rotation and shift belonging to the simulation assigned to the thread mySimulation; ESGridInfo and ESGrid are the grid description and the grid data (the latter is stored in the GPU texture memory) and energy_shared is a auxiliary vector in shared memory.
Each thread calculates the energy of only one atom. Firstly, each thread has to obtain the current atom position from the ligand model and the simulation state using the functions shift and rotation. Then, it calculates the grid position (function positionToGridCoordinates), interpolates the energy value accessing to the texture memory and (function accessToTextureMemory) stores the result in shared memory (line 5). Finally, threads of the same simulation sum up their results by a parallel reduction (line 6) with complexity order O(log(n)) and one of these threads writes the final result in global memory (lines 7-8).
Algorithm 5 Sequential pseudocode for the calculation of the VDW and HBOND energies
1: for i = 1 to N_simulations do
2: for j = 1 to nlig do
3: index = positionToGridCoordinates(V DWGRidI n f o, j)
4: for k = 0 to numNeighbours(V DWGRid[index]) do
5: vdwTerm+ = vdwEnergy (j, V DWGRid [index][k])
6: hbondTerm+ = hbondEnergy (j, V DWGrid[index][k])
7: end for
8: end for
9: energy[i * nlig + j] = vdwTerm + hbondTerm;
10: end for
The precomputed VDW grid is generated by the program GEN_GRID as described in  and afterwards, it is read by SURF_SCREEN from file and loaded onto memory. Next, the Van der Waals (VDW) energy of each atom is calculated using the expression explained before and following the pseudocode shown in Algorithm 5, where N _simulations is the number of simulations, nlig is the number of atoms of the ligand and the functions vdwEnergy and hbondEnergy performs the calculation of the Van der Waals and hydrogen bonds potentials for each pair of atoms.
The Algorithm 6 shows the pseudocode of the VDW kernel, where V DWGridInfo and V DWGrid are the grid description and the grid data, both stored in the GPU global memory. Other variables have the same meaning than in Algorithm 4. Each thread calculates the energy of only one atom. In the same way as the previous kernel, each thread applies the rotation and displacement corresponding to the simulation over the ligand model in order to obtain the current atom position (lines 2-3). Then, it calculates the grid position, calculates the VDW and HBOND potentials using the neighbors stored in the VDW grid and stores the result in shared memory (lines 4-9) . The parameters needed by the VDW and HBOND energies are previously stored in the GPU constant memory. Finally, threads of the same simulation sum up their results by a parallel reduction (line 10) and one of these threads accumulates the final result in global memory (lines 11-12).
Algorithm 6 GPU pseudocode for the calculation of the VDW and HBOND energies
1: for all nBlocks do
2: rlig = rotate(clig[myAtom], myQuaternion)
3: ilig = shift(myShift, rlig)
4: index = positionToGridCoordinates(V DWGridI n f o, ilig)
5: for k = 0 to numNeighbours(V DWGRid[index]) do
6: vdwTerm+ = vdwEnergy(j, V DWGRid[index][k])
7: hbondTerm+ = hbondEnergy(j, V DWGrid[index][k])
8: end for
9: energy_shared[myAtom] = vdwTerm + hbondTerm
10: totalEnergy = parallelReduction(energy_shared)
11: if threadId == numThreads%nlig then
12: energy[mySimulation]+ = totalEnergy
13: end if
14: end for1
Processors for a $3000 high-end server
No. cores @ speed
4 @ 2.4 GHz
No. stream processors
448 @ 1.15 GHz
L2 cache size
768 KB. .
DRAM memory size
Memory bus width
2 × 1.5 GHz
Intel Xeon CPU
gcc compiler, 4.3.4 version with the -O3 flag
Nvidia Tesla GPU
CUDA compilation tools, release 4.0
Running times for the different parts of BINDSURF
BINDSURF running time
BINDSURF does not make any assumption about the location of the binding site of the protein. After a BINDSURF execution, and with the obtained information about how different ligands dock in the protein surface we can start to make hypothesis about different potential binding sites which can guide posterior and more detailed analysis using known standard docking tools like Autodock , FlexScreen , Glide  or DOCK , or Molecular Dynamics or mixed Quantum-Mechanical/Molecular-Mechanics methods. We run BINDSURF over receptor-ligand structures selected from the PDB database, which are known to have one or several binding pockets depending on the ligand, and checked whether BINDSURF could find correctly a) the binding site area, b) the binding pose of the crystallographic ligand. Simulations were always carried out with a total of 500 Monte Carlo steps. We also selected application cases known to present difficulties for binding site prediction with other methods.
Nevertheless, it is clear that there is still room for improvement in the scoring function that BINDSURF uses, and in its energy optimization method (Monte Carlo), since both affect directly to the effectiveness of the direct prediction of binding poses.
In this work we have presented the BINDSURF program. We have shown the details of its modular design, so other users can modify it to suit their needs.
In view of the results obtained, we conclude that BINDSURF is an efficient and fast methodology for the unbiased determination on GPUs of protein binding sites depending on the ligand. It can also be used for fast pre-screening of a large ligand database, and its results can guide posterior detailed application of other VS methods. Its utilization can help to improve drug discovery, drug design, repurposing and therefore aid considerably in clinical research.
In the next steps we want to substitute the Monte Carlo minimization algorithm for more efficient optimization alternatives, like the Ant Colony optimization method, which we have already implemented efficiently on GPU  and implement also full ligand and receptor flexibility.
Lastly, we are also working on improved scoring functions to include efficiently metals, aromatic interactions, and implicit solvation models.
The program code is available upon authors' request.
GPU: (Graphics Processing Unit)
(Van der Waals)
This research was supported by the Fundación Séneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grant 15290/PI/2010, by the Spanish MEC and European Commission FEDER under grants CSD2006-00046 and TIN2009-14475-C04, and a postdoctoral contract from the University of Murcia (30th December 2010 resolution).
This article has been published as part of BMC Bioinformatics Volume 13 Supplement 14, 2012: Selected articles from Research from the Eleventh International Workshop on Network Tools and Applications in Biology (NETTAB 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/13/S14
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.