Drug mechanism enrichment analysis improves prioritization of therapeutics for repurposing

Background There is a pressing need for improved methods to identify effective therapeutics for diseases. Many computational approaches have been developed to repurpose existing drugs to meet this need. However, these tools often output long lists of candidate drugs that are difficult to interpret, and individual drug candidates may suffer from unknown off-target effects. We reasoned that an approach which aggregates information from multiple drugs that share a common mechanism of action (MOA) would increase on-target signal compared to evaluating drugs on an individual basis. In this study, we present drug mechanism enrichment analysis (DMEA), an adaptation of gene set enrichment analysis (GSEA), which groups drugs with shared MOAs to improve the prioritization of drug repurposing candidates. Results First, we tested DMEA on simulated data and showed that it can sensitively and robustly identify an enriched drug MOA. Next, we used DMEA on three types of rank-ordered drug lists: (1) perturbagen signatures based on gene expression data, (2) drug sensitivity scores based on high-throughput cancer cell line screening, and (3) molecular classification scores of intrinsic and acquired drug resistance. In each case, DMEA detected the expected MOA as well as other relevant MOAs. Furthermore, the rankings of MOAs generated by DMEA were better than the original single-drug rankings in all tested data sets. Finally, in a drug discovery experiment, we identified potential senescence-inducing and senolytic drug MOAs for primary human mammary epithelial cells and then experimentally validated the senolytic effects of EGFR inhibitors. Conclusions DMEA is a versatile bioinformatic tool that can improve the prioritization of candidates for drug repurposing. By grouping drugs with a shared MOA, DMEA increases on-target signal and reduces off-target effects compared to analysis of individual drugs. DMEA is publicly available as both a web application and an R package at https://belindabgarana.github.io/DMEA. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-023-05343-8.

One common drawback of many drug repurposing tools is that they output a long list of candidate drugs with limited information about how the top candidates are related. For example, the gene2drug algorithm [13] returns a ranked list of > 1300 drugs without any information about molecular targets or pathways of these drugs. This complicates efforts to prioritize drugs on the list for validation because researchers must consider many drugs targeting different molecular pathways with the caveat that some targeted therapies may not actually inhibit their intended target [33]. Therefore, given a list of candidate drugs, we reasoned that grouping drugs with similar mechanisms of action (MOAs) into a "set" and then statistically evaluating the enrichment of the drug set in the list would increase on-target signal and reduce off-target effects compared to evaluating drugs on an individual basis. Here, MOA refers to both the biological pathway targeted and the direction of action of each drug (e.g., "EGFR inhibitor"). Our approach, called drug mechanism enrichment analysis (DMEA), is an adaptation of the popular gene set enrichment analysis (GSEA) algorithm [34] in which drugs, rather than genes, are grouped into sets based on annotated MOAs. Each drug set is then statistically evaluated against a background of all other drug sets. If multiple drugs which share a common MOA are all highly ranked candidates, then this indicates that the identified MOA is more likely to be a true on-target sensitivity.
Furthermore, to address a lack of web-accessible tools to predict selectively toxic drugs based on an input gene signature, we included a feature to pair DMEA with a simple molecular classification method (i.e., weighted gene voting, WGV [40]; see Methods). Although the CMap L1000 Query also accepts input gene symbols to rank drug MOA [8], it is limited to use of the CMap L1000 expression database and cannot accept more than 150 input gene symbols or consider gene ranks. Similar to the CMap L1000 Query, the gene2drug web tool also ranks individual drugs based on gene inputs using a gene set-based analysis of the original CMap gene expression database (e.g., gene ontology terms) without considering input gene ranks [13]. In addition to sharing these limitations of the CMap L1000 Query, gene2drug does not evaluate drugs in terms of MOA. On the other hand, the CMap PRISM Query does rank drugs based on selective toxicity, but it only accepts cell line names as input features, restricting its applicability to cell lines present in the CMap database, and does not evaluate drug MOA. In contrast, DMEA can accept an input gene signature with any number of ranked genes and requires their directionality to evaluate drug MOA based on selective toxicity.  [8] and the DrugEnrichr [35][36][37] and Drugmonizome methods [38] In summary, DMEA can help researchers better prioritize potential drug treatments by aggregating results across many drugs which share MOAs to identify global trends. By quantifying the coordinated enrichment of drugs annotated with the same MOA and normalizing scores across a large background of drug MOAs, DMEA can improve ontarget prioritization of candidates for drug repurposing. DMEA is publicly available as a web application or an R package at https:// belin dabga rana. github. io/ DMEA.

Drug mechanism enrichment analysis (DMEA)
DMEA tests whether drugs known to share a MOA are enriched in a rank-ordered drug list. DMEA can be applied to any rank-ordered list of drugs with annotations for at least two MOAs. For a drug MOA to be evaluated, at least six drugs (or the minimum number of drugs per MOA set by the user) must be annotated with that MOA and each drug must be ranked by a nonzero numeric value. DMEA uses the same algorithm as GSEA [34] but applies it to sets of drugs, rather than genes, to identify drug MOAs which are overrepresented at either end of the input rank-ordered drug list (further detail below). If a drug MOA is positively enriched, then drugs annotated with that MOA are overrepresented at the top of the list. Conversely, if a drug MOA is negatively enriched, then drugs which share that MOA annotation are overrepresented at the bottom of the list.
Specifically, for each MOA, DMEA calculates an enrichment score (ES) as the maximum deviation from zero of a running-sum, weighted Kolmogorov-Smirnov-like statistic. The p value is estimated using an empirical permutation test wherein drugs are randomly assigned MOA labels in 1000 independent permutations to calculate a distribution of null enrichment scores (ES_null); the p value is then calculated as the percentage of same-signed ES_null equal to or greater than the ES divided by the percentage of same-signed ES_null. The normalized enrichment score (NES) is then calculated by dividing the ES by the mean of the same-signed portion of the ES_null distribution. Finally, the q-value or false discovery rate (FDR) is calculated as the percentage of samesigned NES in the null distribution (i.e., NES_null) with NES equal or greater to the observed NES divided by the percentage of same-signed NES equal or greater. We use a significance threshold of p < 0.05 and FDR < 0.25 by default per the recommendation for GSEA, but this FDR cutoff can be customized by the user. Given a rank-ordered drug list, DMEA generates (1) enrichment results for all tested drug MOAs; (2) a volcano plot summarizing the NES and − log 10 (p value) for all tested drug MOAs; and (3) mountain plot(s) for individual drug MOA(s) which pass the given FDR cutoff (Fig. 2).

Simulation study of DMEA
To evaluate the sensitivity of DMEA, we first simulated a rank-ordered drug list by randomly assigning values from a normal distribution (μ = 0, σ = 0.5) for 1351 drugs with MOA annotations in the PRISM drug screen. Next, a number of drugs, X, were randomly sampled as a synthetic drug set and their rank values were selected from a shifted normal distribution (μ = Y, σ = 0.5); the size of the synthetic drug set, X, was varied from 5 to 50 drugs, and the perturbation value Y was varied from − 1 to + 1. This rank-ordered drug list was then analyzed by DMEA to determine the enrichment of the synthetic drug set relative to the known drug MOA sets provided by the PRISM drug screen. For each pair of X and Y values, the simulation was repeated 50 times to assess reproducibility (i.e., the ability of DMEA to consistently detect a true difference between the synthetic drug set and the background drug sets determined by MOA annotations from PRISM).

CMap L1000 query
The Connectivity Map (CMap) web portal (https:// clue. io) [8] allows users to query the L1000 gene expression database using 10 to 150 input up-and down-regulated gene IDs. The output is a normalized connectivity score which indicates the similarity between the query and differentially expressed gene sets induced by drug treatments. A positive score indicates similarity between the query and the perturbagen signature, whereas a negative score indicates dissimilarity. Specifically, we used the "query.gct" file from the zip file output by the CMap L1000 Query (found within their "arfs/TAG" folder), including the MOA annotations provided in the file. Since this file includes results for all cell lines in the L1000 database as well as information for quality control, we removed any scores indicated to be low quality and averaged scores across cell lines for each drug with MOA annotations. Here, we used example data sets from the CMap web portal, including: (1) GSE32547, HUVEC cells treated with the HMGCR inhibitor pitavastatin (1 μM, 4 h) or DMSO [41]; (2) GSE35230, A375 melanoma clones treated with the MEK inhibitor GSK212 (30 nM, 24 h) or DMSO [42]; (3) GSE14003, JEKO1 cells treated with the proteasome inhibitor bortezomib (10 h) or untreated [43]; (4) GSE28896, human CD34 + cells treated with the glucocorticoid agonist dexamethasone (24 h) or untreated [44]; and (5) GSE33643, A2058 cells treated with the PI3K/MTOR inhibitor BEZ235 (3 doses at 24 h) or DMSO [45]. We also used the up-and down-regulated biomarkers from a proteomic signature of senescence in primary human mammary epithelial cells (HMECs) [46]. To compare DMEA's results to CMap's MOA enrichment results, we used the "gsea_result.gct" file found within the "gsea/TAG/arfs/NORM_CS" folder. We compared the results for chemical perturbagens combined across all cell lines, specifically "cell_iname = − 666" with pert_type = "TRT_CP", for either set_type = PCL" or "set_type = "MOA_CLASS".

CMap PRISM query
The Connectivity Map (CMap) web portal (https:// clue. io) [8] also allows users to query PRISM viability data [5] for 3 to 489 input cell line IDs classified as having "UP" or "DOWN" phenotypes. The query outputs a normalized connectivity score ranking the drugs based on their toxicity towards the "UP" versus "DOWN" cell lines. A positive score indicates toxicity towards "UP" cell lines, whereas a negative score indicates toxicity towards "DOWN" cell lines or lack of toxicity towards "UP" cell lines. In particular, we used the "ncs.gct" file from the zip file output by the CMap PRISM Query, including the MOA annotations provided in the file. Again, we only considered drugs with MOA annotations. Here

Weighted gene voting (WGV)
To calculate a molecular classification score for cell lines based on external molecular signatures, we used weighted gene voting (WGV) [40]. The WGV score is the dot product between an external gene signature of interest and normalized RNAseq expression values for 327 adherent cancer cell lines from the Cancer Cell Line Encyclopedia (CCLE, version 19Q4) [3]. In other words, the WGV score for each cell line is the sum across all genes available in both the input gene signature and CCLE RNAseq data set, where each gene's value is the product between their gene signature ranking and CCLE RNAseq normalized expression value. This WGV score ranks each cell line based on the similarity of its gene expression to that of the input gene signature, such that cell lines with expression more similar to the positive phenotype of the gene signature are more positively ranked and cell lines with expression more similar to the negative phenotype of the gene signature are more negatively ranked. In this study, we analyzed four independent transcriptomic signatures, three derived from data sets for intrinsic resistance to EGFR inhibitors and one derived from a data set for acquired resistance to a RAF inhibitor. For each transcriptomic data set, we used the R package limma [47] to perform an eBayes statistical analysis for differential expression comparing sensitive and resistant samples. Then, the top 500 genes based on |log 2 (fold-change)| with q-value < 0.05 were used for WGV (with the log 2 (fold-change) being the gene "weight" or rank value).

DMEA using WGV molecular classification scores
To identify drug MOAs with selective toxicity towards cells represented by an input gene signature, DMEA can be used in combination with a molecular classification method such as WGV, correlations, and large public databases for gene expression and drug screens. To do this, we accessed the Cancer Cell Line Encyclopedia version 19Q4 for RNAseq data and calculated WGV scores for 327 adherent cancer cell lines using external molecular signatures. To avoid overfitting, we did not include WGV scores from any cell lines that had been used to generate the external molecular signature. Next, we calculated the Pearson correlation between the WGV scores and PRISM drug sensitivity scores (i.e., area under the curve (AUC) values for cell viability as a function of drug concentration) for each drug [5] using data from the most recent PRISM screen available (e.g., HTS002, MTS005, MTS006, and MTS010). Lastly, drugs were ranked by the Pearson correlation coefficient, and the rank-ordered drug list was analyzed by DMEA using the MOA annotations provided in the PRISM data set.

Simulation study of DMEA using WGV molecular classification scores
For 200 synthetic cell lines, we sampled drug sensitivity scores for 1351 drugs with MOA annotations in the PRISM drug screen from a bimodal mixture of two normal distributions (μ 1 = 0.83, σ 1 = 0.08 and μ 2 = 1.31, σ 2 = 0.08) with the lower distribution containing 72% of all drugs. This distribution was chosen to reflect the distribution of the PRISM drug sensitivity data (i.e., AUC) [5]. Additionally, we simulated gene expression for each cell line by sampling from a normal distribution with a mean (μ) of 0 and standard deviation (σ) of 0.5. This distribution was chosen to reflect the distribution of the normalized CCLE RNAseq data [3].
To introduce a synthetic association between gene expression and drug sensitivity, we randomly sampled a synthetic gene set of 25 genes and a synthetic drug set of 10 drugs. Next, expression values for the synthetic gene set and sensitivity scores for the synthetic drug set were each sampled from a shifted distribution, where the magnitude of the shift for each synthetic cell line is determined by a perturbation value ranging from 0 (no perturbation) to 0. . Afterwards, we calculated the Pearson correlation between the WGV and drug sensitivity scores for each of the 1351 drugs in the synthetic data set. Finally, drugs were ranked by their Pearson correlation coefficient and DMEA was performed to measure the enrichment of the synthetic drug set relative to the background drug sets which were determined by the MOA annotations in the PRISM drug screen. To assess reproducibility, this entire process was repeated 50 times.

DrugEnrichr and Drugmonizome
To benchmark DMEA against DrugEnrichr and Drugmonizome, we input the top 50 or 100 drugs from each analysis detailed above into DrugEnrichr and Drugmonizome. Though DMEA evaluates the full list of ranked drugs from each analysis, DrugEnrichr and Drugmonizome are designed to accept just one input drug set, so we chose to use the top 50 or 100 drugs as input drug sets to match the orders of magnitude of the example inputs provided on their websites. The top drugs were positively ranked in analyses of drug rank lists from the CMap Query tools or negatively ranked in analyses where drugs were ranked based on Pearson correlations between their PRISM drug sensitivity scores (i.e., AUC) and CCLE WGV scores. We recorded their rankings of MOA sets from the PRISM drug repurposing hub based on each tool's default ranking when viewing their results in tabulated form on the web.

Cell viability experiments
Proliferating HMECs (PD ~ 12) were seeded at a concentration of 2.1 × 10 3 /cm 2 or 9.5 × 10 3 /cm 2 for DMSO and triapine treatment, respectively. The following day, cells were treated with DMSO (vehicle control) or 2 µM triapine for 3 days. The cells were counted and then treated with either DMSO (vehicle control), dacomitinib, AZD8931, or navitoclax at 100 nM or 500 nM for 3 days. Cell viability and live cell number were measured with trypan blue assay using a TC20 automated cell counter (Bio-Rad).

DMEA: web application
Using our web application, researchers can either input a drug rank list or gene signature to identify enriched drug MOA without any programming knowledge required. With both input types, the outputs contain: (1) the processed input (after any averaging across duplicate input features if applicable), (2) MOA annotations used (either provided by the user or the default PRISM drug annotations provided on our GitHub repository (MOA_ gmt_ file_ n6_ no_ speci al_ chars. gmt), (3) the results of our DMEA analysis, (4) any drug sets which were removed because they were not represented by at least 6 drugs or the minimum set by the user, (5) any unannotated drugs which could not be matched into drug sets, (6) a volcano plot with an overview of the normalized enrichment scores and − log 10 (p values) for all evaluated drug MOA with significant enrichments highlighted in red, and (7) mountain plots for each significantly enriched drug MOA so that users can visually confirm that most of the drugs for these MOA were ranked as strong candidates. All the analyses of non-simulated data sets in this study can easily be replicated as examples from a drop-down menu on our web application. These example inputs are available on our GitHub repository (Examp les) and described below as well as on our website's "How to Use" page (https:// belin dabga rana. github. io/ DMEA/ howto use. html), which also includes a video tutorial.

Drug rank lists
The input drug rank list can either be a CSV file with headers or direct output from a CMap Query tool (GCT file). If inputting a drug rank list as a CSV file, the user can either provide MOA annotations in the third column separated by the "|" character in each row or rely on our default MOA annotations derived from the PRISM drug screen database; the first column must contain drug names and the second column must contain signed (i.e., nonzero) drug ranks. If there are multiple ranks provided for each drug, the user must select the checkbox option to average results across drugs. In the advanced settings, there is also an option to convert drug synonyms if no MOA annotations are provided, which is enabled by default. For transparency, the CSV file used to convert drug synonyms is uploaded on our GitHub repository (PRISM_ drug_ synon yms. csv) and contains PRISM drug names, PubChem CIDs, and synonyms found on PubChem (https:// pubch em. ncbi. nlm. nih. gov). The conversion of drug synonyms feature allows more input drugs to be matched into PRISM drug sets and even allows input of unique PubChem CIDs as drug names to avoid issues with different naming systems. Drugs which are not matched into drug sets (i.e., unannotated drugs) even after drug synonym conversion is performed are still considered as background drugs in our analysis and also output as a CSV file to allow users to repeat the analysis after either converting their drug names manually (e.g., with unique PubChem CIDs using the PubChem search tool at https:// pubch em. ncbi. nlm. nih. gov/) or adding any known MOA annotations for these drugs. If a CMap L1000 drug rank list is input, the positively enriched drug MOA may induce similar expression to the "UP" phenotype genes and vice-versa. If a CMap PRISM drug rank list is input, then the positively enriched drug MOA may be selectively toxic to the "UP" phenotype cell lines and vice-versa. If a custom drug rank list is input, then the positively enriched drug MOA are overrepresented at the top of the input drug rank list and vice-versa.

Gene signatures
The input gene signature should be formatted as a CSV file with headers where the gene symbols are in the first column and the gene ranks are in the second column. If there are multiple ranks provided for each gene, the user must select the checkbox option to average results across genes. In the advanced settings, there is an option to use either HUGO Gene Nomenclature Committee (HGNC)-approved gene symbols to analyze your input gene signature [52] or gene symbols used in the CCLE 19Q4 release. Our software also outputs a CSV file containing any genes which were not matched to the CCLE 19Q4 data set so that users can repeat their analysis after either using the HGNC multi-symbol checker web tool (https:// www. genen ames. org/ tools/ multi-symbol-check er/) to convert their gene symbols to approved symbols or searching for the CCLE 19Q4 versions of their gene symbols in the CSV file on our GitHub repository which includes approved symbols, aliases, previous symbols, and HGNC IDs for each gene symbol from the CCLE 19Q4 release (CCLE_ gene_ symbo ls_ 20230 404. csv). Using the input gene signature, DMEA runs weighted gene voting (WGV) to rank 372 adherent cancer cell lines in the CCLE 19Q4 RNAseq data set based on their expression of the gene signature and then correlates their WGV scores with their drug sensitivity scores (i.e., AUC) across 1351 drugs with MOA annotations in the PRISM drug screen. For transparency in this process, our software outputs the WGV scores, correlation results, and scatter plots with the correlations for each drug in addition to the other outputs mentioned above. With this analysis, negatively enriched drug MOA may be selectively toxic to samples with higher expression of positively ranked genes in the input gene signature and vice-versa.

DMEA: R package
Our R package is available on GitHub to allow many automated DMEA analyses to be run in sequence and querying of any database beyond just the CCLE RNAseq 19Q4 release and the PRISM drug screen. In brief, if analyzing an input drug rank list, users can run the "drugSEA" function; if analyzing an input gene signature, users can run the "DMEA" function. These functions accept the same inputs except formatted as data frames in the R software environment as well as custom MOA annotations and, in the case of the DMEA function, also expression and drug sensitivity data frames representing the same biological samples (e.g., cell lines). The same outputs are available for each input type as described above for the web application. Documentation is available on our GitHub repository (https:// github. com/ Belin daBGa rana/ DMEA), including a vignette and man pages with installation instructions and runnable examples, and also on our website's "How to Use" page (https:// belin dabga rana. github. io/ DMEA/ howto use. html).

DMEA identifies an enriched drug MOA in simulated data
To evaluate the ability of DMEA to identify the enrichment of drug sets, we tested it on a normally distributed, synthetic ranked list of 1351 drugs (see Methods). For a randomly sampled set of drugs ranging in size from 5 to 50 drugs, we shifted these drugs' rankings by a perturbation value ranging from − 1 to 1. Next, we ran DMEA using the full rank list of drugs to assess enrichment of the synthetic drug set. This process was repeated 50 times for each synthetic drug set size, after which the average normalized enrichment score (NES) and the percentage of replicates with significant enrichment of the synthetic drug set were visualized as heatmaps (Fig. 3).
As expected, we observed no false discovery except for very small drug sets (i.e., when evaluating a set of 5 drugs we observed 2% of replicates were falsely enriched). As the magnitude of the perturbation was increased or decreased, the average NES of the synthetic drug set increased or decreased, respectively. Likewise, the percentage of replicates passing the significance threshold of p < 0.05 and FDR < 0.25 increased as the magnitude of the perturbation increased. These results demonstrate that DMEA can successfully identify an enriched set of drugs in simulated data.

DMEA identifies similar MOAs based on gene expression connectivity scores
Next, we sought to test whether DMEA could identify enriched drug MOAs in rankordered drug lists generated by the Connectivity Map (CMap), [8] a popular tool for drug discovery that contains > 1 million gene expression signatures measured using L1000, a reduced representation transcriptomic profiling method. Specifically, we analyzed example data sets from the CMap L1000 Query tool to identify perturbagen signatures that are similar or dissimilar to an input gene set. First, we used a gene expression signature from HUVEC cells treated with pitavastatin, an inhibitor of 3-hydroxy-3-methylglutaryl-CoA reductase (HMGCR) [41], to rank 3,868 drugs based on the similarity of their effects on gene expression. Because pitavastatin itself was found in the list of 3,868 drugs, one might have expected it to be the top-ranked, most similar drug produced by this analysis, but in fact it ranked 24th out of 3,868 drugs (Additional file 1: Fig. S1A). In contrast, analysis of the rank-ordered list of drugs using DMEA identified the HMGCR MOA as the only significant similar MOA (Fig. 4A). This demonstrates that analysis of MOAs by DMEA can generate clearer and more statistically significant results than analysis of individual drugs in results from the CMap L1000 Query.
Next, we tested a gene expression signature from A375 melanoma cells treated with the MEK inhibitor GSK212 [42]. Again, DMEA correctly identified that MEK inhibitors were the most similar MOA in the rank-ordered list of drugs ( Fig. 4B; Additional file 1: Fig. S1A). In this case, comparison to the single-drug rankings was not possible because the L1000 database does not contain the true drug  [41], B A375 melanoma clones treated with the MEK inhibitor GSK212 [42], and C JEKO1 cells treated with the proteasome inhibitor bortezomib [43]. Volcano plots summarizing the NES and − log 10 (p value) for all tested drug MOAs and mountain plots of the expected MOAs are shown. Red text indicates MOAs with p value < 0.05 and FDR < 0.25. For each mountain plot, the inhibitors with the most positive connectivity scores are highlighted treatment, GSK212 (Additional file 1: Fig. S1A). Subsequently, we analyzed a gene expression signature from JEKO1 mantle cell lymphoma cells treated with the proteasome inhibitor bortezomib [43]. DMEA again accurately found that proteasome inhibitors were the most similar MOA (Fig. 4C) and DMEA's MOA-level ranking (#1) was improved upon the single-drug ranking of the true drug treatment, bortezomib (#14/3,868) (Additional file 1: Fig. S1A). Finally, we used DMEA to test data sets from human CD34 + cells treated with the glucocorticoid agonist dexamethasone [44] and A2058 melanoma cells treated with the PI3K/mTOR inhibitor BEZ235 [45]. In both cases, DMEA correctly identified the expected MOA as significantly enriched (glucocorticoid receptor agonist and PI3K/mTOR inhibitor MOAs, respectively) (Additional file 1: Fig. S2) and DMEA's MOA-level rankings were improved upon CMap's individual drug rankings of the true drug treatments (Additional file 1: Fig. S1A). Taken together, these results show that DMEA can correctly identify enriched MOAs in rank-ordered lists of drugs generated by the CMap L1000 Query and that the MOA-level rankings of the true drug treatments are improved compared to the single-drug rankings.
Next, we compared DMEA's MOA-level results to those of the CMap L1000 Query (found in an output sub-folder generated by CMap L1000 Query called "gsea/TAG/ arfs/NORM_CS"). Like our DMEA results, CMap's MOA-level rankings revealed the expected MOA as the top-ranked MOA in all cases except for glucocorticoid receptor agonists and PI3K inhibitors which were not found in the L1000 output (Additional file 1: Fig. S1A). We also compared our DMEA results to the CMap L1000's perturbagen classes (PCLs), which are derived from MOA sets but exclude drugs which do not fit the overall trend of the MOA [8]. Again, CMap's PCL rankings were similar to that of DMEA (Additional file 1: Fig. S1A). Thus, DMEA and the CMap L1000 generate similar MOA-level rankings. However, in contrast to DMEA, the CMap L1000 MOA-based analysis has less statistical rigor (i.e., no p values provided by CMap) and does not generate any plots of the overall and MOA-specific results (e.g., volcano or mountain plots).
We also compared our results with those of DrugEnrichr and Drugmonizome (Additional file 1: Fig. S1). Since DrugEnrichr and Drugmonizome are only designed to evaluate one input drug set, whereas DMEA and the CMap L1000 Query evaluate the full list of drugs, we input the top 50 and 100 positively ranked drug names into DrugEnrichr and Drugmonizome. DrugEnrichr and Drugmonizome were only able to evaluate the expected drug MOA for three out of five CMap L1000 examples (HMGCR inhibitor pitavastatin, mTOR/PI3K inhibitor BEZ235, and glucocorticoid receptor agonist dexamethasone). Drugmonizome performed similarly in ranking the expected MOA to both DMEA and the CMap L1000 Query tool. DrugEnrichr also performed similarly with the mTOR/PI3K inhibitor example but had lower rankings for the glucocorticoid receptor agonist and HMGCR inhibitor examples (Additional file 1: Fig. S1A). However, it is important to consider that we cannot make a fair comparison since DrugEnrichr and Drugmonizome were only evaluating drug MOA based on a small fraction of the drug lists compared to DMEA and the CMap L1000 Query.

DMEA identifies selectively toxic MOAs based on cell viability connectivity scores
To evaluate if DMEA can identify enriched MOAs in a different type of rank-ordered drug list, we used the CMap PRISM Query tool to query data from the PRISM drug repurposing database [5]. Given an input list of cell line names, the CMap PRISM Query generates a list of ~1,200 drugs ranked by normalized connectivity scores which represent the predicted sensitivity of the input cell lines to each drug. The higher the normalized connectivity score, the more toxic the drug is predicted to be for the input cell lines. Again, we analyzed example data sets from the CMap PRISM Query tool to test DMEA, including: (1) cell lines with the activating EGFR mutation p.E746_A750del (n = 3), (2) cell lines with high expression of PDGFRA (n = 19), and (3) cell lines with sensitivity to the HMGCR inhibitor lovastatin (n = 43). As hypothesized, DMEA identified EGFR inhibitors (Fig. 5A), PDGFR inhibitors (Fig. 5B), and HMGCR inhibitors (Fig. 5C),   Fig. 5 DMEA identifies selectively toxic MOAs based on cell viability connectivity scores. Rank-ordered drug lists were generated by querying the PRISM database with input cell line sets characterized by A the activating EGFR mutation p.E746_A750del, B high expression of PDGFRA, and C sensitivity to the HMGCR inhibitor lovastatin. Volcano plots summarizing the NES and − log 10 (p value) for all tested drug MOAs and mountain plots of the expected MOAs are shown. Red text indicates MOAs with p value < 0.05 and FDR < 0.25. For each mountain plot, the inhibitors with the most positive connectivity scores are highlighted respectively, as significantly positively enriched in these rank-ordered drug lists. DMEA also improved upon the rank of the true drug sensitivity for the HMGCR inhibitor lovastatin (#1 with DMEA's MOA-level rankings versus #3 in the single-drug rankings); DrugEnrichr and Drugmonizome also improved on the rank of the true drug sensitivity, though they evaluated fewer drug MOA sets because their analyses only considered the top 50 or 100 positively ranked drugs instead of the full list of evaluated drugs (Additional file 1: Fig. S1B). Altogether, these examples demonstrate that DMEA can identify enriched MOA in rank-ordered lists of drugs generated by CMap Query of the PRISM drug screen and that the MOA-level ranking of the true drug sensitivity is higher than that of the single-drug ranking.

DMEA identifies selectively toxic MOAs based on molecular signatures
To offer a web-accessible method to identify selectively toxic drug MOAs based on an input molecular signature (i.e., up-or down-regulated genes that characterize a disease or cell type), we paired DMEA with a simple molecular classification method, namely weighted gene voting (WGV) [40]. Specifically, we: (1) used WGV to classify adherent cancer cell lines in the CCLE database based on similarity to the input gene signature; (2) correlated WGV scores with drug sensitivity scores (i.e., AUC) for each of 1351 drugs in the PRISM database; and (3) ranked drugs by the correlation coefficient of WGV scores and drug AUC values (Additional file 1: Fig. S3; see Methods).
To test this approach, we first performed a simulation study. Specifically, we simulated gene expression and drug sensitivity scores for 200 cell lines by randomly sampling values from distributions that reflected the CCLE RNAseq and PRISM drug sensitivity data. Next, to create a synthetic association between gene expression and drug sensitivity, we perturbed a subset of the gene expression data and drug sensitivity scores. We then ran 50 replicates to determine if DMEA could consistently identify enrichment of the synthetic drug set in this simulated data. To visualize the results, we plotted the average normalized enrichment score (NES) and the percent of replicates which pass the significance threshold of FDR < 0.25 as heatmaps (Additional file 1: Fig. S4). Importantly, when there was no perturbation in drug sensitivity (AUC), the tested drug set was never significantly enriched (0% of replicates) regardless of the size of the perturbation in RNA expression. This demonstrates that DMEA is not prone to false positive results using this WGV-based approach. In addition, increasing the perturbation in either RNA expression or drug sensitivity led to increased enrichment scores (i.e., average NES) and increased significance (i.e., higher percentage of significant replicates). These results illustrate that DMEA can successfully identify associations between gene expression and drug sensitivity with high reproducibility in simulated data.
Next, we tested whether DMEA could successfully identify drug MOAs with selective toxicity using published transcriptomic signatures of drug resistance. First, we tested three different signatures of intrinsic resistance to EGFR inhibitors: (1) non-small cell lung cancer (NSCLC) cell lines treated with erlotinib (GSE31625) [49]; (2) breast cancer cell lines treated with erlotinib (GSE12790) [48]; and (3) NSCLC cell lines treated with gefitinib (Coldren et al.) [50]. Notably, there was little overlap in the genes used for WGV (GSE12790 and Coldren et al. share zero genes, GSE12790 and GSE31625 share 15 genes, and GSE31625 and Coldren et al. share 19 genes; Fig. 6B). Despite the lack of overlap in input gene signatures, all three DMEA analyses correctly identified EGFR inhibitors as the top toxic drug MOA for the EGFR inhibitor-sensitive cancer cell lines (Fig. 6A,B, Additional file 1: Fig. S5). Again, DMEA's MOA-level rankings were improved Fig. 6 DMEA identifies selectively toxic MOAs based on external gene expression signatures of intrinsic EGFR inhibitor resistance and acquired RAF inhibitor resistance, respectively. Using gene expression signatures of intrinsic resistance to EGFR inhibition and acquired resistance to RAF inhibition, we calculated WGV molecular classification scores for 327 adherent cancer cell lines in the CCLE database. For each signature, the WGV scores were correlated with drug sensitivity scores (i.e., AUC) for 1351 drugs from the PRISM database. Drugs were then ranked by Pearson correlation coefficient, and DMEA was performed to identify selectively toxic MOAs. A DMEA analysis of GSE12790 [48] transcriptomic signature of intrinsic resistance to EGFR inhibitor erlotinib, including a volcano plot of NES versus − log 10 (p value) for MOA evaluated where red text indicates MOAs with p value < 0.05 and FDR < 0.25 and a mountain plot showing that DMEA identified the EGFR inhibitor MOA as negatively enriched. The most negatively correlated EGFR inhibitors are labeled along with their correlation coefficients. B Comparison of three transcriptomic signatures for intrinsic resistance to EGFR inhibition analyzed using DMEA, including a Venn diagram showing the number of shared genes among the signatures and a dot plot illustrating the consistency of MOA enrichment across DMEA's analyses. C DMEA analysis of GSE66539 [51] transcriptomic signature of acquired resistance to RAF inhibitor vemurafenib, including a volcano plot of NES versus − log 10 (p value) for MOA evaluated where red text indicates MOAs with p value < 0.05 and FDR < 0.25 and a mountain plot showing that DMEA identified the RAF inhibitor MOA as negatively enriched. The most negatively correlated RAF inhibitors are labeled along with their correlation coefficients compared to the single-drug rankings (#1 for EGFR inhibitors in all cases versus #16 for erlotinib based on GSE31625, #10 for erlotinib based on GSE12790, and #13 for gefitinib based on Coldren et al.); DrugEnrichr and Drugmonizome's MOA rankings were also improved over the individual drug rankings, though they evaluated fewer drug MOA sets since they are only designed to accept one input drug set instead of the complete drug rank list (Additional file 1: Fig. S1B). In addition, DMEA revealed consistent results across all three input gene signatures for drug MOAs identified as potentially toxic for EGFR inhibitor-resistant cancer cell lines, including HMGCR and MDM inhibitors (Fig. 6B, Additional file 1: Fig. S5). These results support that DMEA can identify selectively toxic drug MOAs given a molecular signature of intrinsic drug resistance and that DMEA's MOA-level rankings improve upon single-drug rankings of toxicity.
Next, we tested whether DMEA could identify selectively toxic drug MOAs given a transcriptomic signature of acquired drug resistance. Specifically, we analyzed RNAseq data from patient biopsies of BRAF-mutant melanoma before treatment with the BRAF inhibitor vemurafenib and after the development of acquired resistance [51]. Again, DMEA correctly identified the RAF inhibitor MOA as the top toxic drug MOA for the samples collected prior to BRAF inhibitor treatment (Fig. 6C), and the ranking of RAF inhibitors at the MOA-level (#1) was improved compared to the ranking of vemurafenib alone (#35); this ranking was also improved when inputting the top 50 negatively ranked drugs into DrugEnrichr or Drugmonizome (Additional file 1: Fig. S1B). Additionally, inhibitors of HDAC, EGFR, CHK, and SYK were identified as possibly beneficial for tumors with acquired resistance to RAF inhibition. Conversely, DMEA identified that drugs inhibiting MEK, MDM, nerve growth factor receptor, and HMGCR may be toxic towards tumors which are sensitive to RAF inhibitors (Fig. 6C). These results demonstrate that DMEA can amplify on-target signal to identify acquired resistance in tumors and other drug MOAs which may be beneficial based on patient biopsies.

DMEA identifies potential senescence-inducing and senolytic drug MOAs for primary human mammary epithelial cells
Lastly, we sought to demonstrate how DMEA can be used as a discovery tool. As an example, we analyzed our recently published proteomic signature of replicative senescence, a stress-induced irreversible growth arrest associated with aging, in primary human mammary epithelial cells (HMECs) [46]. To highlight the versatility of DMEA to either identify similar or selectively toxic drug MOAs, we analyzed the same molecular signature using either the CMap L1000 Query or our WGV-based approach to rank drugs, respectively (Fig. 7A). First, we performed a CMap L1000 Query using the gene names for the up-and down-regulated proteins to predict drug MOAs that could induce senescence in HMECs. Using the CMap results, DMEA revealed positive enrichment for MOAs including proteasome, HDAC, HMGCR, and MDM inhibitors (Fig. 7B), suggesting that treatment with drugs from these MOAs may induce senescence in primary HMECs. Among the MOAs with significant negative enrichments were Na/K-ATPase inhibitors and matrix metalloprotease inhibitors, suggesting that these drug MOAs might antagonize senescence in primary HMECs.
Second, we analyzed the same proteomic signature of senescence with our own WGVbased method to identify selectively toxic MOAs based on the input molecular signature (Additional file 1: Fig. S3). In contrast to analysis of the CMap L1000 Query which identified MOAs which induce similar gene expression, this DMEA pipeline instead predicted drug MOAs that may be toxic to cells with similar gene expression profiles as senescent HMEC (i.e., senolytic MOAs). Using this approach, we found that the EGFR and MEK inhibitor MOAs were significantly positively enriched in the rank-ordered drug list (Fig. 7C). This suggests that compounds from these MOAs may be senolytic in HMECs. Among the negatively enriched drug MOAs which may be more toxic to nonsenescent, proliferating HMECs, we found MDM inhibitors, bromodomain inhibitors, and other MOAs.
Next, we experimentally tested the hypothesis that EGFR inhibitors exhibit selective toxicity against senescent HMECs using the same cells with which we generated the proteomic signature of senescence [53]. We treated proliferating and senescent HMECs with DMSO (negative control), the EGFR inhibitor dacomitinib, the EGFR inhibitor AZD8931, or the known senolytic compound navitoclax (positive control) at 100 and 500 nM. After 3 days of treatment, dacomitinib, AZD8931, and navitoclax significantly reduced the viability of senescent but not proliferating HMECs at both concentrations (Fig. 7D). Additionally, all three drugs reduced the total number of viable senescent cells. Notably, low concentrations of dacomitinib and navitoclax (100 nM) were selectively toxic to senescent cells without affecting the growth rate of non-senescent, proliferating HMECs. In contrast, although 100 nM of AZD8931 was selectively toxic to senescent HMEC, AZD8931 also reduced the growth rate of proliferating, non-senescent HMECs. These experimental results support that the EGFR inhibitors dacomitinib and AZD8931 are novel senolytic compounds in HMECs, validating a hypothesis generated by DMEA.

Discussion
Here, we introduce drug mechanism enrichment analysis (DMEA), a user-friendly bioinformatic method to better prioritize drug candidates for repurposing by grouping drugs based on shared MOA. Similar to how GSEA enhances biological interpretation of transcriptomic data [34], DMEA improves drug repurposing by aggregating information across many drugs with a common MOA instead of considering each drug independently. We have demonstrated the power and sensitivity of DMEA Proliferating HMECs (PD ~ 12) were treated with DMSO or 2 μM triapine for 3 days to induce proliferating or senescent phenotypes, respectively, as in our previous work [53]. Proliferating and senescent HMECs were then treated with DMSO (negative control), 100 nM/500 nM dacomitinib, 100 nM / 500 nM AZD8931, or 100 / 500 nM navitoclax for 3 days, after which cell viability and live cell number were measured by trypan blue staining. The live cell number was normalized to the number of live cells present at the time of drug treatment. * and ** represent p < 0.05 and 0.01, respectively, compared to the senescent DMSO control calculated by Student's t-test first with simulated data (Fig. 3; Additional file 1: Fig. S4) and then with real examples including gene expression connectivity scores (Fig. 4), cell viability connectivity scores (Fig. 5), and weighted gene voting molecular classification scores (Figs. 6, 7). In all cases, DMEA ranked the true drug MOA sensitivity or similarity higher than the original ranking of the single-drug agent and performed similarly to or better than existing tools for evaluating drug MOA (Additional file 1: Fig. S1). In addition, DMEA improves upon existing tools for analyzing enriched MOA in drug lists in terms of flexibility, statistical rigor, and visual outputs (Fig. 1). This demonstrates that DMEA helps better prioritize drug treatments by improving the on-target identification of candidate drugs.
Importantly, our results demonstrate the ability of DMEA to analyze a variety of input rank-ordered drug lists from different drug repurposing algorithms to identify enrichment of diverse MOAs (e.g., kinase inhibitors, proteasome inhibitors, metabolic pathway inhibitors). In these validation cases, DMEA not only identified the expected drug MOAs (e.g., EGFR inhibitor MOA given a signature of EGFR inhibitor resistance) but also MOAs which may exhibit toxicity against tumor cells resistant to the input signature of interest. One interesting example is that DMEA identified HMGCR inhibitors as potentially toxic to cancer cells with intrinsic resistance to EGFR inhibitors (Fig. 6A,B; Additional file 1: Fig. S5). Indeed, this finding is supported by published work demonstrating that HMGCR inhibitors can overcome resistance to EGFR inhibitors in NSCLC cells by inhibiting AKT [54,55]. In addition, DMEA also identified that EGFR inhibitor-resistant cells may be sensitive to MDM inhibitors (Fig. 6A,B; Additional file 1: Fig. S5), a finding that is supported by published work showing that MDM2 mediates resistance to EGFR inhibitors in mouse models of NSCLC [56]. Furthermore, our analysis suggested that melanomas sensitive to BRAF inhibitors may also be sensitive to MEK inhibitors (Fig. 6C), an observation that is supported by clinical trials showing that combination treatment with BRAF and MEK inhibitors is more effective than inhibition of BRAF alone in BRAFmutant melanoma patients [57]. Finally, for melanomas with acquired resistance to RAF inhibitors, DMEA identified CHK inhibitors and SYK inhibitors as potentially beneficial. In fact, both CHK1 and SYK kinases have been identified as drug targets for melanomas resistant to RAF inhibitors [58,59]. Collectively, these results support that DMEA can even identify drug mechanisms beneficial for combination treatments and drug-resistant cancers.
To demonstrate the power of DMEA for biological discovery, we analyzed our recently published proteomic signature of replicative senescence in primary HMECs [46] (Fig. 7). To illustrate the difference between CMap and DMEA's interpretation of an input gene signature, we used: 1) the CMap L1000 Query followed by DMEA to identify similar (e.g., senescence-inducing) drug MOAs and 2) DMEA with WGV molecular classification scores to identify selectively toxic (e.g., senolytic) drug MOAs. Both senescence-inducing and senolytic compounds have great therapeutic promise in aging [60][61][62][63][64], cancer [65,66], and other diseases [67]. Among the potential senescence-inducing drug MOAs we identified were proteasome, HDAC, HMGCR, and MDM inhibitors (Fig. 7B). Indeed, experimental evidence has shown that proteasome inhibitors induce senescence in primary fibroblasts [68,69] and that HDAC inhibitors can induce senescence in cancer cells [70][71][72]. For potential senolytic MOAs, we identified EGFR and MEK inhibitors (Fig. 7C) and subsequently experimentally validated the senolytic activity of the EGFR inhibitors dacomitinib and AZD8931 in a drug-induced model of HMEC senescence (Fig. 7D). To our knowledge, this is the first demonstration that EGFR inhibitors can exhibit senolytic activity.