Several drugs have been recently repositioned for other diseases in the market, with up to two thirds of the costs being cut during the drug discovery course since only phase II clinical trials were the starting point. The promiscuity of compound binding, and the multi targeting strategy are being explored for different purposes and the old paradigm of one key one lock is being changed. This could also be due to the limited space of folds and binding pockets in proteins that are chosen by nature, and the similarity of binding/different binding preferences that one compound can make.
Pharmacophore-based virtual screening
In the first phase of our work, we consider only a ligand-based approach to find new candidates for HCV targeting the polymerase protein. The Structure Activity Relationship (SAR) obtained from "pf-008654" was used for building of the pharmacophore model for the thumbII site. The pharmacophore of the PF-868554 features are (Figure 3): the enol/ketone oxygen of the dihydropyrone which appear to form a direct hydrogen bond to the backbone amide NH of Ser-476 and a water-mediated hydrogen bond to the amide, also NH of Tyr-477 (the donor-donor motif). The lactone carbonyl of the dihydropyrone is involved in a water mediated hydrogen bond to Arg-501. The phenol functional group forms another hydrogen bond with Leu-497 through a water molecule, while the phenyl ring occupies an otherwise hydrophobic pocket. [Figure 3 shows the features of the pharmacophore on the PF008654 ligand].
The screening of the databases Drug bank, Maybridge and zinc databases yielded no significant structures while the NCI database yielded two significant hits. The hits were subjected to further analysis through docking with Surflex and Glide achieving significant docking scores. The thumb II best hits NSC 115863 and NSC 295688) which by docking scored (-18.803 and -16.119 respectively but with no hydrogen bonds for the NSC 115863 and one water bridge hydrogen bond for the NSC 295688). This motivated us to implement a more rigorous virtual screening method using structure-based design after understanding the essential binding requirements as shown below.
Polymerase NNI binding-sites essential interactions pharmacophores
Inspection of the interactions and binding modes of different classes of molecules that have exhibited strong inhibitory activities to different HCV polymerase binding sites was carried out, which provides a better insight into the essential residue interactions for each NNI site (Figure 1 shows a 3D structure diagram of the NS5B domains and their key residues).
Thumb I: Ligand interactions were calculated and it was noticed that the residues that form the pocket are MET36, VAL37, ALA396, LEU392, ALA393, ALA395, THR399, ILE424, LEU425, HIS428, PHE429, LEU492, GLY493, VAL494, PRO495, TRP500, ARG503. Figure 4-A shows how important is formation of hydrogen bond with ARG503 side chain and a hydrogen acceptor from the ligand as all structure share the same interaction. In addition, resistance towards benzimidazole thumb I inhibitors was reported upon mutations at PRO495 position .
Palm II: The phamacophoric features were observed from the interaction of HCV-796 inhibitor with both receptor's side-chains of SER365 -forms hydrogen bond with a hydrogen donor- and ARG200, in addition to arene-cation bond between the latter and the aromatic benzene ring of the benzofuran nucleus (Figure 4-B). In 3FQK the ligand forms a hydrogen bond (HB) with ASN316 which is not present in the structure 3FQL due to replacement of ASN with CYS. This substitution seemed to significantly affect the binding, suggesting that HB an essential interaction (bearing in mind that position 316 already affords natural sequence polymorphism). The S365T mutation seemed to be equally disruptive to the Kd (fold-shift > 200) as well. The pocket is formed of the following residues LEU204, LEU314, VAL321, LEU360, ILE363, SER365, ASN369 and the overlapped part with palm I site with residues MET193, PRO197, ARG200, ASN316, CYS366, SER368, LEU384, MET414, TYR415 and TYR448.
Palm I: From analysis of binding of the ligands of the aforementioned crystal structures, the ligands were categorized according to the three largest chemical classes of inhibitors co-crystallized with protein; class I (benzothiazoles), class II (benzothiadiazines), and class III (Benzodiazepines) which each bind to the palm I site in different modes. The site is at the junction of the thumb and palm domains and in proximity of the active site (similarly palm II and palm III), the pocket is formed of the following residues GLN184, MET193, PRO197, ARG200, THR287, SER288, ASN291, ASN316, GLY317, ASP318, CYS366, SER368, LEU384, GLY410, ASN411, MET414, TYR415, GLN446, ILE447, TYR448, GLY449 and SER556, these residues show partial overlap with palm II site in the following residues MET193, PRO197, ARG200, ASN316, CYS366, SER368, LEU384, MET414, TYR415 and TYR448.
From Figure 5 it seems obvious that all of the inhibitor classes fill the deep hydrophobic pocket, as also described as an important binding feature by Vandyck et al., . It is also thought to afford the most substantial Van der Waal interaction energy contribution and confers an important selectivity character. Point mutations at MET414 [48, 49] (which forms a major portion of the deep hydrophobic pocket) resulted in resistance against benzothiadiazines. On the other hand, there are diverse polar interactions among the classes in addition to few common ones. In particular, the amino acid TYR448 as a backbone HB acceptor has been proven to play a critical role regardless of the chemical class. This fact was established through molecular dynamics simulations showing the free energy decomposition for different residues performed by Li et al.,  and by point mutation studies at 448  which again resulted in benzothiadiazine-resistant mutants. ASP318 also had polar interactions at the backbone. SER556 and ASN291 had hydrogen bond (HB) interactions through the terminal hydroxyl and amide groups with all members of class I shown in Figure 5. For class II ASP318, GLY449 and ASN291 were also involved in the same manner in addition to the terminal polar groups of SER556 and CYS366. Lastly, for class III it seemed that replacing the ketone on the hexene ring by a sulphone group expands the hydrogen bonding from TYR448 to TYR448 and GLY449, additionally, SER367 HB seemed a common feature and to a lesser extent SER368.
Palm III: To date only a single PDB coordinate set was reported and deposited, showing the inhibitory potential of phenylpropyl-benzamides at the recently observed palm III. The key interaction residues were found to be TYR195, PRO197, ARG200, LEW384, MET414, TYR415, ASN316, ILE447, TYR452, LEU446, TRP550, and PHE551. The pocket seems to be narrow and only two polar interactions were computed (Figure 4-C) with residues: ASN316 and TYR195, the figure shows a tightly closed proximity contour with almost no solvent exposure by the ligand.
Thumb II: The thumb II site is clearly distinct from the thumb I site and each site has its separate residues, thumb II site is formed from the following residues (LEU419, ARG422, MET423, HIS475, SER476, TYR477, ILE482, VAL485, LEU497, LEU489, ARG501, TRP528, and LYS533), it lies at the base of the thumb domain around 35 Å from the active site (Figure 5-D). The main residues forming common polar interactions were SER476, TYR477 -as backbone HB acceptors- and ARG501 through the guanidinium group. A well defined π stacking was noticed between the histidine's imidazole ring and the filibuvir's (3FRZ ligand) triazolopyrimidine group. Other inhibitor-specific hydrogen bonding was noted by the residues LYS533, TRP528, ARG422 and MET423 (it is noteworthy that 419 and 423 mutations exhibited viral selection against thiophene-based carboxylic acid derivatives ). The inhibitor on which the ligand-based pharmacophore model was based (Figure 3)afforded direct polar bonding with SER476, ARG501 and Van der Waal interaction through the cyclopentyl moiety (with the shallow pocket formed of TRP528, MET423, LEU419 and TYR477) also π-π interaction with HIS475.
Based on this retrospective analysis, the analysis performed seemed to be consistent with results shown above from single point mutagenesis studies in confirming some predicted key interaction residues. Particularly important are ASN316, SER365, TYR448 and MET414 are at the palm region, LEU419 and MET423 at the thumb II site.
The screening of the drug bank on the three sites specified using Surflex-Dock yielded several potential binders; those compounds were subjected to our two-phase docking using SYBYL then Glide, as described earlier. In SYBYL we selected the ones exceeding the control threshold score (the experimentally proven ligands). Several candidates were found targeting the three sites on HCV polymerase, twelve compounds targeting NNI site I, ten compounds targeting NNI site II, five compounds targeting NNI site III, while for thumb I no significant hits were achieved from score prospective. On the second docking phase we used Glide extra-precision (XP) docking to further filter our candidates.
First stage filtration with SurFlex
Different ligands of proven activity/affinity against the polymerase which came co-crystallized with polymerase receptor and bound to its different binding sites were separated and re-docked into polymerase protein. The mean of the docking scores of these ligands was used as a threshold for first phase filtration of the screening results of the drug bank on the different sites of the polymerase receptor. This process resulted in 84 high scoring structure ligands that exceeded the threshold of 8 of Surflex-Dock total score. They were arranged as 19 structure for thumb II site, 12 structure for palm I site, 52 structure for palm II site (Additional file 1) while no structures succeeded to cross the filtration threshold in thumb I, interestingly we noticed that 9 structures of high scoring drugs showed potential activity against more than one binding site which are (DB01166, DB01036, DB04859, DB00918, DB04471, DB01087, DB06202 and DB00481) two of these drug, DB01036 and DB04859 achieved high scores for three binding site which are (palm I, palm II and thumb II) with scores of (8.62, 8.03, 8.03 for DB01036) and (8.58, 8.02, 8.24 for DB04859) in the three binding sites respectively, also DB01166 appears to bind at the palm I site with a score of 12.16 while DB04471 appears to bind to palm II and thumb II with the scores 9.65 and 9.36.
Upon screening the drug bank, DB01166 achieved highest score; 12.16, then the compounds DB00777, DB04471, DB05039, DB02166 and DB04205 with scores 9.98, 9.36, 9.23, 9.19 and 8.83 respectively (Table 2).
On the other hand the further filtration process based on consensus of scores of many docking programs which were incorporated in one machine learning model (described in the methods section) has helped in the selection and correlation with enzymologic IC. According to the model results, the correlation was improved slightly using the model for the thumb II site (from 0.86 to 0.87) and significantly for the Palm I site (from 0.5 to 0.67 (Table 2 and Additional files 2,3, and 4). The model was then used for ranking the filtered hits based on the predicted IC values. The domain of applicability of the neural-network model here was used only for ranking, as the idea of a machine-learning model based on docking scores as features was not implemented before. We will seek to expand this model in terms of a docking-based workflow and machine-learning filtration in the future to be applied to diverse chemical databases. Many of these drugs have proven good affinity against RDRP. The DB01940 which achieved the highest score for palm I site driven from the model the potential binding mode proposed by surflex shows that the azocane ring of the structure fixes the structure to the pocket B while formation of 4 hydrogen bonds two of them are the same formed by the original ligand which are ASN411, Gly449 and two other hydrogen bonds with each of Tyr415 and Tyr448, while on the molecular fields level fair similarity of the hydrophobic molecular field to the original ligand was noticed. We were encouraged by the identification of some known HCV polymerase inhibitors among the retrieved hits (e.g.: HCV-086 in Additional file 1), which gave us more confidence in the work flow and methodology that we have proposed. Now, these eighty six filtered compounds were advanced to a second round of filtration as detailed below.
Second stage filtration with XP Glide docking
While no potential final list candidates were found for the thumb I site; several candidates that fulfil the inhibitory pharmacophoric requirements for the thumb II site, and the palm region were found. The next phase of docking was performed using the Schrodinger Glide module version 5.5, in particular, implementing the CPU-expensive Glide XP docking function which has resulted in satisfactorily low RMSD differences upon re-docking the co-crystallized inhibitors (please refer to the validation title in the introduction point 2 for details). Consequently from the resultant hits only 7 compounds were selected according to their affinity and mode of binding for the final molecular dynamics simulation.
For the palm region, docking of the co-crystallized ligand resulted in a pose with a RMSD of 1.2 Å and 1.09 Å calculated from superposition of heavy atoms of the docked ligands over the crystal coordinates for PDB coordinates 2JC1 and 3D5M respectively. Through retrospective inspection, the top four ranking hits were selected for further analysis as they bare a relatively rich electrostatic interaction profile as well as sufficiently high Van der Waal interaction XP terms with the receptor. The rest of the hits showed very weakly interacting poses and Glide XP scores lower than that of the co-crystallized -reportedly potent- inhibitor [44, 45] (except for DB04118 and DB01888).
The original ligand of 2JC1 (Figure 6-A), as in the crystal structure, had a docked pose exhibiting a single hydrogen bond with the TYR448 residue, which suggests the sufficiency of that bonding - in addition to filling the deep hydrophobic pocket with the p-tert-butyl phenyl group - as a criterion for inhibition at this site. Worth noting is the improved field positioning for the carboxyl group of the docked pose right amid the two guanidinium groups of ARG394 and ARG386, that might have happened because we haven't minimized the PDB structures before superimposition. In spite of the relatively low RMSD value for superposition of the docked and original poses of 3D5M ligand (Figure 7-A), the docking result gives a better preposition of a possible proton exchange between the sulphonamide of the ligand and the ASP318 carboxyl, and yet again hydrophobic moiety (the di halo-substituted phenyl in this case) fell very well into the deep pocket (Figure 7-A).
Though realizing the highest docking score, hit DB05039 did not interact with any of the common polar interactions discussed, yet it shows a very good placement of the diethyl-indanyl group into the deep pocket (Table 1). A salt bridge strengthens the binding with ARG394. Similarly, hydrogen bonding with ARG386 and TYR415, and Pi stacking with TYR415. DB01940 followed in terms of score, interestingly enough with an azepane ring forming a HB inside the deep hydrophobic pocket (as with one of benzodiazepines) in addition to a HB with GLN446 (just four atoms away along the backbone's TYR448 amidic nitrogen) (Figure 6-C). Hence, both of the formerly mentioned ligands were found to be of interest, and may potentially define new pharmacophoric features.
The hit DB00560 ranked third (Table 1) with a glide score of -8.073859 and achieved the highest electrostatic energy contribution of all with three hydrogen bonds and two salt bridges with the receptor, superseding the original ligands in terms of coulombic interaction term as well as total score, although it seems to possess a different positioning than the other hits, the structure was protonated from predocking processing at the secondary amine, which was the reason why the tert-butyl placement failed inside the hydrophobic pocket (Figure 7-B). Albeit the compound already satisfies two of the common interactions discussed, namely with CYS366 and SER367.
In comparison between the pose of hit DB04142 and that of the original ligand (Figure 6-D), we find a similar placement, similar TYR448 interaction, a significantly enriched electrostatic field complimentarity with the site features, even less unfavourable ligand solvent exposure, and obviously less expected conformational entropy. The docking score lied well between those of the two original ligands.
While for Thumb II the main trend of ligand-receptor interaction observed from the docked set was mainly comprised of the Van der Waal terms -the contributed to most of the total binding energy- with many unfavourable solvent-surface exposure, yet in the contrary to the palm region the hydrophobic of thumb II is quite shallow, the fact that down weighs such terms. Most of the set exhibited minimal electrostatic interaction (in most cases a single hydrogen bond was observed), nevertheless, π interactions were common (especially with TRP528, due its annular residue proximity).
An RMSD of 0.78 Å was obtained upon re-docking the original ligand of 3FRZ. The overlaid poses are shown in Figure 8-A where the docked pose seemed to have shifted slightly to form HB acceptor and donor -to TRP528 and ARG501 respectively- out of the hydroxyl group instead of a double acceptor -from ARG501.
Only 3 ligands (DB04205, DB01087, DB00816) were observed to possess better binding modalities (Figure 8), including salt bridging with ARG501 & LYS503, along with hydrogen bonding with the protein's backbone at different residues albeit minimal. Also π-π, σ-π interactions were observed (Table 1), while they recorded the highest coloumbic energy terms. The three compounds were candidates for further investigation though molecular dynamics simulations for a sufficiently long production phase. In spite of the prospected good binding properties of DB04205, it seems to be highly hydrophilic and thus not expected to be effective in vivo as is. So a pro-drug strategy is required during the biological assays. Also the stability of DB00816 inside the binding site was questionable due to improper positioning of group and the apparently promiscuous nature of the hit as a non-selective ligand due to the relatively small size.
From these two stages of docking screens and ranking, some interesting compounds have emerged that will be taken into molecular dynamics simulations as the next and final step in this protocol. These compounds belong to diverse structural classes, highlighting one advantage of the structure-based approach. Some of them also satisfy some of the interaction pharmacophoric features that were identified, while others show new binding modalities.
Molecular dynamics simulations
For a stronger hypothesis and a better insight into the validity of the docking results achieved, we have conducted coarse-grained molecular dynamics simulations for the formerly mentioned selected group of ligand-protein complexes that were the result of the two-phase docking and selection phases using the GROMOS force field. It was performed in the form of 20 ns of unrestrained production phase dynamics, following a double equilibration period 300 ps long where constant temperature (298.15 K) and pressure (1 atm) were imposed. The Cα RMSD was calculated and plotted for all of the systems starting from the end of the equilibration phase. Since all of the systems have produced similar plots in terms of time to RMSD stability tendency beyond the 2500 ps interval, Figure 9 shows Cα RMSD for only the three original systems derived from the PDB coordinates taken as control. Resultant interaction patterns were inspected after a post-production minimization process. Also confirmation of positional and conformational tendency from minimized average structures sampled from the last 1 ns of the simulation (500,000 frames) was regarded. For brevity, only 5NS are shown in Figure 9. In Figures 10 and 11, the HB count along the trajectory are shown to illustrate the strength and stability of the different specific HB interactions of the hits in comparison to reference ligands.
In general the control ligands of the palm region showed a preserved placement (Figure 10 shows HB pattern along the trajectories of palm site complexes), 2JC1 had alternating interactions between ARG386 and TYR448, while for 3D5M a minor backbone rotation has led to hydrogen bonding with SER556, GLN446, and additional TYR448 contact, also two new π-interactions were noticed inside the hydrophobic pocket (π-π with TYR415 and cation-π with ARG386) in a manner baring more resemblance to the original PDB coordinates (rather than the docked one).
A conserved mode was noticed for hit DB05039 as its hydrophobic contact which evolved into more subtly relaxed rotomer was preserved, also the π-π interaction with TYR415, while the an extra HB was formed with the latter. Moreover, DB01940 has undergone significant conformational changes around the site releasing some strain to a more extended conformation, the conformational leap could be noticed in the form of a transient sharp increase in HB count around 2.7 ns, new HB with TYR448, SER556 and a cation-π with ARG158, and less frequently with hydrogen bonding with GLY557 and VAL284.
Calculating the average structure proved that although hit DB00560 possesses a seemingly favourable hydrogen bonding trend, yet the hydrophilic nature of the hit has led to a sufficient binding liability and more favourable ligand-solvent interactions. Notably hit DB04142 has undergone slight rotation to reorient forming an average of 3 hydrogen bonds 2 of which with the TYR448 (the most commonly redundant pharmacophoric feature amongst the analysed PDB's), SER368 and GLN446.
The results of the dynamics simulations for the thumb II site were inherently different due to its nature. The hydrogen bonding pattern seemed very labile owing to the extensive solvent exposure and the shallowness of the binding site, hence the obvious lower density of the resultant HB contacts along the ensemble (Figure 11).
The HB count of hit DB04205 (Figure 11) seems to have dropped suddenly by the end of the first 2.5 ns of the production phase, where a transient partial binding has taken place at the edge of the site hinged through the two phosphate groups till beyond the 4.5 ns, then the ring -by the end of the 5 ns interval- has flipped outwards to form new hydrogen bonds with the surface residues ARG380 and LYS379.
DB01087 has exhibited a very stable trend of hydrogen bonding (Figure 11). Albeit around an average of 1.6 HB, the compound formed two additional π interactions; nominally σ-π with ALA529, and π-π with ARG501 thus sandwiching the substituted electron-rich quinoline scaffold. On the contrary, and as expected hit DB00816 failed to prove any appreciable binding properties and was gradually expelled towards the edge of the binding site.
Thus, from the discussion above, hits DB05039, DB01940, and DB04142 (Figure 6B, C, and 6D show their binding poses and interactions, and Figure 10 shows their Hydrogen bond stability along the MD simulation) would be expected to bare an inhibitory potential against NS5B at the palm region, while for the thumb II site only DB001087 (Figure 8-C) would be prospected to bare an inhibitory potential from amongst the other hits. These virtual hits are currently experimentally tested on the HCV replicon system as well as enzymatic assays.