Potent bace-1 inhibitor design using pharmacophore modeling, in silico screening and molecular docking studies

John, Shalini; Thangapandian, Sundarapandian; Sakkiah, Sugunadevi; Lee, Keun Woo

doi:10.1186/1471-2105-12-S1-S28

Volume 12 Supplement 1

Selected articles from the Ninth Asia Pacific Bioinformatics Conference (APBC 2011)

Research
Open access
Published: 15 February 2011

Potent bace-1 inhibitor design using pharmacophore modeling, in silico screening and molecular docking studies

Shalini John¹,
Sundarapandian Thangapandian¹,
Sugunadevi Sakkiah¹ &
…
Keun Woo Lee¹

BMC Bioinformatics volume 12, Article number: S28 (2011) Cite this article

10k Accesses
61 Citations
Metrics details

Abstract

Background

Beta-site amyloid precursor protein cleaving enzyme (BACE-1) is a single-membrane protein belongs to the aspartyl protease class of catabolic enzymes. This enzyme involved in the processing of the amyloid precursor protein (APP). The cleavage of APP by BACE-1 is the rate-limiting step in the amyloid cascade leading to the production of two peptide fragments Aβ₄₀ and Aβ₄₂. Among two peptide fragments Aβ₄₂ is the primary species thought to be responsible for the neurotoxicity and amyloid plaque formation that lead to memory and cognitive defects in Alzheimer’s disease (AD). AD is a ravaging neurodegenerative disorder for which no disease-modifying treatment is currently available. Inhibition of BACE-1 is expected to stop amyloid plaque formation and emerged as an interesting and attractive therapeutic target for AD.

Methods

Ligand-based computational approach was used to identify the molecular chemical features required for the inhibition of BACE-1 enzyme. A training set of 20 compounds with known experimental activity was used to generate pharmacophore hypotheses using 3D QSAR Pharmacophore Generation module available in Discovery studio. The hypothesis was validated by four different methods and the best hypothesis was utilized in database screening of four chemical databases like Maybridge, Chembridge, NCI and Asinex. The retrieved hit compounds were subjected to molecular docking study using GOLD 4.1 program.

Results

Among ten generated pharmacophore hypotheses, Hypo 1 was chosen as best pharmacophore hypothesis. Hypo 1 consists of one hydrogen bond donor, one positive ionizable, one ring aromatic and two hydrophobic features with high correlation coefficient of 0.977, highest cost difference of 121.98 bits and lowest RMSD value of 0.804. Hypo 1 was validated using Fischer randomization method, test set with a correlation coefficient of 0.917, leave-one-out method and decoy set with a goodness of hit score of 0.76. The validated Hypo 1 was used as a 3D query in database screening and retrieved 773 compounds with the estimated activity value <100 nM. These hits were docked into the active site of BACE-1 and further refined based on molecular interactions with the essential amino acids and good GOLD fitness score.

Conclusion

The best pharmacophore hypothesis, Hypo 1, with high predictive ability contains chemical features required for the effective inhibition of BACE-1. Using Hypo 1, we have identified two compounds with diverse chemical scaffolds as potential virtual leads which, as such or upon further optimization, can be used in the designing of new BACE-1 inhibitors.

Background

Beta-site amyloid precursor protein cleaving enzyme (BACE-1), also known as β-secretase, memapsin-2, or Aspartyl protease-2, is a single-membrane protein belongs to the aspartyl protease class of catabolic enzyme. This is one of the enzymes responsible for the sequential proteolysis of amyloid precursor protein (APP) [1]. The cleavage of APP by BACE-1, which is the rate-limiting step in the amyloid cascade, results in the generation of two peptide fragments Aβ40 and Aβ42. Among two peptide fragments, Aβ42 is the primary species and thought to be causal for the neurotoxicity and amyloid plaque formation that lead to memory and cognitive defects in Alzheimer’s disease (AD) [2]. The AD is a debilitating neurodegenerative disease that results in the irreversible loss of neurons, particularly in the cortex and hippocampus [3]. It is characterized by progressive decline in cognitive function that inevitably leading to incapacitation and death. It also histopathologically characterized by the presence of amyloid plaques and neurofibrillar tangles in the brain. Regardless of the increasing demand for medication, no truly disease-modifying treatment is currently available [4, 5]. The BACE knockout study in mice shows a complete absence of Aβ production with no reported side effects [6–8]. Since gene knockout study showed a reduction in AD-like pathology, inhibition of BACE-1 the key enzyme in the production of Aβ peptide has emerged as an attractive therapeutic target for AD [9]. Therefore extensive efforts have been followed in the discovery of potential inhibitors of BACE-1. Most of the designing of BACE-1 inhibitors are based on the transition state mimetic approach, which depends mainly on replacing the scissile amide bond of an appropriate substrate with a stable mimetic of the putative transition-state structure [10].

The main aim of our approach, which is discussed in this study is different than the transition state mimetic approach, is to develop an accurate and efficient method for discovering potent BACE-1 inhibitors. A pharmacophore hypothesis was generated based on key structural features of compounds with BACE-1 inhibitory activity. It provides a rational hypothetical representation of the most important chemical features responsible for activity. Herein, a ligand-based 3D pharmacophore hypothesis for BACE-1 inhibitors was constructed based on the structure-activity relationship observed in a set of known BACE-1 inhibitors. The resulted pharmacophore hypotheses were validated by test set, Fischer randomization, leave-one-out, and decoy set methods. The validated pharmacophore hypothesis has been used in in silico screening to identify hits that are highly varied in chemical nature. The retrieved hits were subsequently subjected to a well-defined refining procedure based on estimated activity values, drug-likeness prediction and further by molecular docking study. The identified hits can further be utilized in designing novel and potent BACE-1 inhibitors.

Methods

Dataset collection

In a computerized pharmacophore generation process the accurate choice of the training set is a key issue. The built pharmacophore hypothesis can be as good as the input data information. The following criteria should be considered during the selection of data set in order to achieve a significant pharmacophore hypothesis. (1) All compounds used in the training set have to bind to the same receptor in roughly the same fashion. Compounds having more binding interaction with the receptor are more active than those with fewer; (2) the data set must be widely populated covering an activity range of at least 4 orders of magnitude; (3) the most active compounds should inevitably be included in the training set and (4) all biologically relevant data should be obtained by homogenous procedures [11]. Every individual feature in the resulting hypotheses will invade a certain weight that is proportional to its relative contribution to biological activity.

By taking these criteria into account, we have collected a total number of 320 BACE-1 inhibitors from various literature resources [12–25] and a database has been created. The 2D structure of the compounds were built using ChemSketch program version 12, and subsequently converted in to 3D structures using Discovery studio 2.5 (DS) [26]. In the next step, 60 compounds were selected as final dataset as the BACE-1 inhibitory activities of these 60 compounds were studied under same biological assay condition. Based on the principle of structural diversity and wide coverage of activity range, 20 compounds were carefully selected as training set compounds and the rest were used as test set in model validation. Here, the IC₅₀ value of the training set compounds was taken into account, the inhibitory activity values of the training set compounds span over a range of four orders of magnitude, from 4 nM to 37 000 nM. The chemical structure and experimental activity of the training set compounds are shown in Figure 1.

Diverse conformation generation

Prior to the generation of pharmacophore hypotheses, the training set compounds, which were converted to 3D structure, were used to generate diverse conformations. Diverse Conformation Generation protocol implemented in DS was used to generate conformations using the Best conformation model generation method with CHARMM force field and Poling algorithm to ensure the energy-minimized conformation for each compound. The parameters like maximum number of 250 conformers, the ‘best conformational analysis’ method, and an energy threshold of 20 kcal/mol above the global energy minimum were chosen during conformation generation.

Pharmacophore modeling

The training set comprises of 20 compounds was used in pharmacophore hypothesis generation. The HypoGen algorithm available in 3D QSAR Pharmacophore Generation protocol of DS tries to generate hypotheses with features common amongst active molecules and do not reflect the inactive molecules of the training set. The training set compounds were predicted for their inherent chemical features using Feature Mapping protocol implemented in DS. During pharmacophore hypothesis generation a minimum of 4 and a maximum of 5 pharmacophoric features like hydrogen bond acceptor (HBA), hydrogen bond donor (HBD), positive ionizable (PI), ring aromatic (RA) and hydrophobic (HY) were included. These features were selected based on the feature mapping results. All parameters were set to their default values except uncertainty value, which has been changed to 2 instead of 3. An uncertainty value of 2 was more convenient for our dataset because the activity values of the training set spanned exactly the required 4 orders of magnitude; this choice has been further confirmed by preliminary calculations and by other literature evidence [27]. The uncertainty value represents the ratio of the uncertainty range of the actual activity against measured biology activity for each compound.

The HypoGen algorithm

With the full range of training set compounds from active to inactive the pharmacophore hypotheses were generated by HypoGen algorithm implemented in DS. This algorithm constructs and ranks the pharmacophore hypotheses that correlate best between 3D spatial arrangement of features in a given training set compounds and their respective experimental activities. This process is accomplished in three steps: the constructive phase, the subtractive phase and the optimization phase [28].

The constructive phase identifies the hypotheses that are common amongst the active compounds. HypoGen enumerates all possible pharmacophore configurations using all combinations of pharmacophore features for each of the conformations of the most active compound. In order to consider the left over most active compounds the hypotheses must fit a minimum subset of its features. Hence, a large database of pharmacophore configurations will be generated at the end of the constructive phase.

The subtractive phase will remove the pharmacophore configurations that are present in the least active compounds. All compounds whose activity is by default 3.5 orders of magnitude less than that of the most active compound are considered to represent the least active compounds. The value 3.5 is adjustable depending on the activity range of the training set. The optimization phase improves the hypothesis score. These scores of the generated hypotheses depend on the errors in activity estimation from regression and complexity. The optimization involves a variation of features and/or locations to optimize activity prediction via a simulated annealing approach. The total cost parameter will be calculated for every new hypothesis. The HypoGen will quit and reports the 10 top-scoring hypotheses when there is no improvement in the hypothesis score.

Data analysis

The quality of a pharmacophore hypothesis is best determined by two theoretical cost calculations, which are represented in bit units [29]. One is the ‘fixed cost’ also known as cost of an ideal hypothesis, which represents the simplest model that fits all the data perfectly. The second cost is the ‘null cost’, which represents the highest cost of a pharmacophore with no features that estimates every activity to be the average of the activity data of the training set compounds.

The total cost of any pharmacophore hypothesis should always be close to the fixed cost and away from the null cost to be the significant model. The cost difference between fixed and null cost values should be larger for a meaningful pharmacophore hypothesis. A value of 40-60 bits in a pharmacophore hypothesis indicates that it has 75-90% probability of representing a true correlation in the data.

The hypotheses are also evaluated based on other cost components. The cost value for every individual hypothesis is the summation of three cost components: the error cost (E), the weight cost (W) and the configuration cost (C). The error cost is the value represents the root-mean-squared difference (RMSD) between experimental and estimated activity value of the training set compounds. The weight cost is a value that increases in a Gaussian form as this function weights in a model deviate from the ideal value of two. The configuration cost or entropy cost measures the entropy of the hypothesis space. If the input training set compounds are too multiplex, e.g. because of too flexible training set compounds, this will result in an effusive number of hypotheses as an outcome of the subtractive phase. This configuration cost should always be less than a maximum value of 17 [30]. The correlation coefficient of the pharmacophore hypothesis should be close to 1.

Pharmacophore validation

The generated pharmacophore hypothesis was validated using test set, Fischer randomization, decoy set and leave-one-out methods.

Test set method

A total of 40 compounds with experimental activity data were selected as test set compounds. This method is used to elucidate whether the generated pharmacophore hypothesis is proficient to predict the activities of the compounds other than training set and classify them correctly in their activity scale. The conformation generation for test set compounds was carried out in a similar way like training set compounds using Diverse Conformation Generation protocol in DS. The compounds associated with their conformations were subsequently carried out for pharmacophore mapping using Ligand Pharmacophore Mapping protocol with Best/Flexible Search option available in DS.

Fischer randomization method

The main purpose of this validation is to verify whether there is a strong correlation existing between the chemical structure and biological activities of compounds. This generates pharmacophore hypotheses by randomizing the activity data of the training set compounds with the same features and parameters used to generate the original pharmacophore hypothesis. The statistical significance is calculated using the following formula: Significance = 100 (1-(1+x/y)), where x represents the total number of hypotheses having a total cost value lower than the original hypothesis, and y represents the total number of HypoGen runs i.e. initial and random runs. The confidence level was set to 99%, where 99 random spreadsheets (random hypotheses) were generated. During the pharmacophore generation, if the randomized data set results in similar or better cost values, RMSD and correlation, it means that the original hypothesis have been generated by chance.

Decoy set method

An external database containing BACE-1 active and inactive compounds was used to evaluate the discriminative ability of Hypo 1 in the separation of active compounds from the inactive compounds. The database was developed using a total of 453 compounds containing 206 actives and 247 inactives. All the compounds were collected from published literature including binding database [12–25, 31]. The database screening was carried out using Ligand Pharmacophore Mapping protocol available in DS. A set of statistical parameters [32] like Ht, % yield of actives, Enrichment factor (E), false positives, false negatives and Goodness of Hit (GH) score were calculated.

Leave-one-out method

The pharmacophore hypothesis is cross validated by leave-one-out method. In this method, one compound is left in the generation of a new pharmacophore model and its affinity is predicted using that new model. The model building and estimation cycle is repeated until each compound was left out once [33]. This test is performed to verify whether the correlation coefficient of the training set compounds is strongly depend on one particular compound or not [34].

Database screening

The validated pharmacophore hypothesis, Hypo 1, was used as a 3D query for screening four different chemical databases. The purpose of this screening is to retrieve novel and potential leads suitable for further development. The chemical databases used were Maybridge, Chembridge, NCI and Asinex. Conformers were generated for each molecule in the database using best conformer generation method that allows a maximum energy of 15 kcal/mol above that of the most stable conformation. The database screening was carried out using Ligand Pharmacophore Mapping protocol implemented in DS with Best/Flexible Search option. The retrieved compounds were filtered by restricting the estimated activity value less than 100 nM and the obtained compounds were further refined using molecular docking study.

Molecular docking

Pharmacophore modeling normally firmly associated with docking procedure, which in a first step flexibly aligns the ligand molecule into a rigid macromolecule environment and then estimates the tightness of the interaction by different scoring functions [35]. The Docking takes all the information from a rigid protein environment and scores several possible interaction modes for different alignments. There are many docking programs available for molecular docking studies. In this study, we used GOLD (Genetic Optimisation for Ligand Docking), a docking program [36] that uses genetic algorithm for docking and performs automated docking with full acyclic ligand flexibility, partial cyclic ligand flexibility and partial protein flexibility in the neighborhood of the protein active site. The crystal structure of BACE-1 complexed with an inhibitor SC7 (PDB ID: 2QP8) was used in molecular docking studies. The inhibitor SC7 was extracted from the active site and the retrieved database hits were docked based on the ligand SC7 coordinates, in to the active site of BACE-1. The water molecules were removed prior to docking because they were not found to play any important roles in BACE1-ligand interaction. The early termination option parameter in GOLD was changed from 3 to 5 and the maximum save conformations was set to 10. All the other parameters were set at their default values.

Results and discussion

Pharmacophore generation

We have used the HypoGen algorithm implemented in DS in order to quantitatively correlate the chemical structure of BACE-1 inhibitors to their biological activity. The training set of 20 compounds (Figure 1) with activity values ranging from 4 to 37000 nM was used in pharmacophore model generation. The Feature Mapping protocol resulted in HBA, HBD, RA, PI and HY features. Selecting these features, the pharmacophore generation run was performed along with diverse conformers of training set molecules generated as described in methods section. Ten top-scored pharmacophore hypotheses were generated and in order to choose the best one and also to give an idea about the statistical significance, the pharmacophore hypotheses were subjected to cost analysis. The results of top ten pharmacophore hypotheses and their statistical parameters are given in Table 1. In this study, the first pharmacophore hypothesis (Hypo 1) is the best hypothesis characterized by the large cost difference (121.98 bits), lowest RMSD value (0.804) and a high correlation coefficient of 0.977. All ten hypotheses consist of HBD, PI, RA and HY features. Nine of ten hypotheses were composed of five pharmacophoric features except only one hypothesis, which was of four features. The best pharmacophore hypothesis (Hypo 1), which scored the large cost difference, lowest RMSD, lowest error cost and high correlation coefficient, was made of one HBD, one PI, one RA and two HY features.

Table 1 Results of the top 10 pharmacophore hypotheses generated by the HypoGen algorithm

Full size table

Statistical data analysis

The generated hypotheses were subjected to cost analysis. The two main values used for cost analysis are the difference between fixed and null cost and another is the difference between the null cost and the total cost (Δcost). The fixed cost of the run was 74.77 bits, which was well separated from the null cost of 203.22 bits and close to the total cost of 81.24 bits. The large difference (128.45 bits) observed between the fixed cost and null cost value indicates that Hypo 1 has more than 90% statistical significance to be a significant model. All the 10 hypotheses were subjected to further evaluation for their capability to predict the activity of the training set compounds. Configuration cost or entropy value must be less than 17 for which a value of 15.59 was obtained in this study. All hypotheses have scored RMSD values lower than 1.5 and ranging from 0.804 to 1.111, this characterization further emphasizing the good predictive quality of these hypotheses. Based on the rule to select a hypothesis with a lowest total cost, high correlation coefficient, large cost difference and significantly low RMSD value, Hypo 1 gave the best statistical values among other hypotheses. Hence, Hypo 1 with one HBD, one PI, one RA and two HY was chosen as the best hypothesis for further analysis. The inter-feature distance constraints were observed for this five-featured pharmacophore hypothesis (Hypo 1) (Figure 2).

Activity prediction and mapping of training set compound on Hypo1

To verify the predictive ability of Hypo1 on training set compounds, the activity of each training set compound is estimated by regression analysis. The experimental activities of training set compounds were classified into four groups: most active (IC₅₀ ≤ 100 nM, ++++), active (100 nM < IC₅₀ ≤ 1000 nM, +++), moderately active (1000 nM < IC₅₀ ≤ 10 000 nM, ++) and inactive (IC₅₀ > 10 000 nM, +). The estimated activity values of training set compounds based on Hypo 1 and the corresponding error values are calculated (Table 2). The error value is the ratio between the estimated and experimental activities. The positive error value indicates that the estimated IC₅₀ value is higher than the experimental activity, whereas the negative error value indicates that the estimated IC₅₀ value is lower than the experimental activity. An error value of less than 10 signifies the prediction of activity lesser than one order of magnitude. Among 20 training set compounds, only one compound had an error value of greater than 3. From Table 2 it is clear that the estimated activity values of most of the training set compounds was predicted with the same activity scale as the experimental activity. Among 20 training set compounds, one most active compound (++++) was estimated as active (+++), one active compound (+++) was estimated as moderately active (++), one moderately active compound (++) was estimated as inactive (+) and two inactive compounds (+) were estimated as moderately active (++). The divergence between the estimated and experimental activity observed in four compounds was only about 1 order of magnitude, which might be an artifact of the program that uses different number of degrees of freedom for these compounds to mismatch the pharmacophore model. Interestingly, for feature fitting, the most active compounds in the training set mapped well on all the chemical features that are one HBD, one PI, one RA and two HY features of Hypo 1 with good fitting score. The active, moderately and inactive compounds have missed at least one of five features. In addition, the most active compounds mapped well on the RA and PI features whereas some active, moderately active and all inactive compounds could not map on the RA and PI features signifying the importance of these two features. The pharmacophore overlay of most active Compound 1 and Hypo 1 has shown a fit value of 9.52. The RA feature corresponds to phenyl ring present in between two amide and a sulfonamide groups, one HBD feature corresponds to nitrogen of amide group located at the branch, PI group corresponds to the only amino nitrogen, one HY feature corresponds to a phenyl group whereas the another HY feature corresponds to alkyl group (Figure 3A). The pharmacophore overlay of least active compound 20 has revealed that it missed two features when mapped on Hypo 1 with a fit value of 5.16. This compound has mapped only the HBD and two HY features in the same manner as most active compound with no mapping over RA and PI features (Figure 3B). Fit value indicates how well the features in the pharmacophore overlaps the chemical features in the molecule and thereby aid in understanding the chemical meaning of the hypothesis [37]. These results emphasized Hypo 1 as a reliable model to accurately estimate the experimental activity of the training set compounds.

Table 2 Experimental and estimated IC₅₀ values of the training set compounds based on the pharmacophore hypothesis ‘Hypo 1’.

Full size table