Skip to main content
  • Research article
  • Open access
  • Published:

Predicting potential adverse events using safety data from marketed drugs



While clinical trials are considered the gold standard for detecting adverse events, often these trials are not sufficiently powered to detect difficult to observe adverse events. We developed a preliminary approach to predict 135 adverse events using post-market safety data from marketed drugs. Adverse event information available from FDA product labels and scientific literature for drugs that have the same activity at one or more of the same targets, structural and target similarities, and the duration of post market experience were used as features for a classifier algorithm. The proposed method was studied using 54 drugs and a probabilistic approach of performance evaluation using bootstrapping with 10,000 iterations.


Out of 135 adverse events, 53 had high probability of having high positive predictive value. Cross validation showed that 32% of the model-predicted safety label changes occurred within four to nine years of approval (median: six years).


This approach predicts 53 serious adverse events with high positive predictive values where well-characterized target-event relationships exist. Adverse events with well-defined target-event associations were better predicted compared to adverse events that may be idiosyncratic or related to secondary target effects that were poorly captured. Further enhancement of this model with additional features, such as target prediction and drug binding data, may increase accuracy.


The Food and Drug Administration’s (FDA) proposed process modernization to support new drug development involves establishing a unified post-market safety surveillance framework to monitor the benefits and risks of drugs across their lifecycles [1]. While clinical trials are considered the gold standard for detecting and labeling adverse events, these trials are not sufficiently powered to detect less common adverse events. Additionally, some adverse events emerge when a drug is used in clinical practice outside of the specified inclusion/exclusion criteria. Some adverse events may have high prevalence in specific subpopulations who were not enrolled in the clinical trials or subgroups who cannot be identified based on information collected from patients in the trials. For example, a substantially increased risk of Stevens-Johnson syndrome in patients positive for the HLA-B*1502 allele taking carbamazepine was not identified until decades after approval [2]. In addition, concomitant medications (drug-drug interactions) and comorbidities may also contribute to adverse events, and these interactions are not always adequately present or captured in clinical trials. Therefore, post-market safety surveillance is crucial.

FDA uses the FDA Adverse Event Reporting System (FAERS) [3] and the Sentinel Initiative [4] to obtain information about adverse events occurring after drug approval. In 2017, over 1.8 million adverse event cases were reported to the FDA, including nearly 907,000 serious reports and over 164,000 fatal cases [5]. While traditional pharmacovigilance relies on data mining systems, these methods have reporting biases and require manual review of cases to determine reporting accuracy. Recently, there has been a strong interest in developing prediction algorithms to assist in post-market surveillance to overcome such weaknesses and make post-market pharmacovigilance more efficient.

Adverse event information from a variety of sources such as FAERS, literature, genomic data, and social media has been used to both evaluate adverse events and make predictions. For example, FAERS and similar post-market databases have demonstrated utility in adverse event prediction; Xu and Wang showed FAERS, combined with literature, had great utility in detecting safety signals [6]. Others have used chemical structure as the basis for adverse event predictions. Vilar and colleagues used molecular fingerprint similarity to drugs with a known association with rhabdomyolysis to further support and prioritize rhabdomyolysis signals found in FAERS [7]. Another unique option has been to use social media reports to identify new adverse events for drugs before they are reported to regulatory agencies or in peer-reviewed literature; Yang and colleagues used a partially supervised classification method to identify reports of adverse events on the discussion forum for Medhelp [3]. Other sources of information for adverse event prediction and detection include electronic health records, drug labels and even bioassay data [8,9,10]. Additionally, a wide variety of algorithms have been used to make adverse event predictions, including logistic regression models, support vector machine, and ensemble methods [8, 11, 12]. Many of these models have experienced varying degrees of success but overall demonstrate the great potential of developing an adverse event prediction model using a classifier.

However, many of these methodologies have focused on predicting a specific adverse event (e.g. cardiovascular events) or drug class (e.g. oncology drugs) [12,13,14]. Algorithms that can predict a wide variety of adverse events for multiple drug classes are important to enhance post-market safety surveillance. We have previously developed a genetic algorithm to predict approximately 900 adverse events using FDA product labels and FAERS data [15]. In this study, we build on this algorithm to predict 135 adverse events of high priority to regulatory review using safety data from marketed drugs with one or more shared molecular targets. We hypothesize that drugs that have similar modes of action at the same targets will have a similar adverse event profile because of shared structural features and likely target binding characteristics. We additionally expect adverse events that are more closely associated with drug targets (such as serotonin syndrome) to be well-predicted via this methodology. Some idiosyncratic reactions may also be captured well because the shared structural features likely play a role in these reactions where the targets and actions have not yet been fully characterized.


Inclusion and exclusion criteria resulted in 54 test drugs and 213 unique comparator drugs, leading to 287 test-comparator drug combinations. The 54 test drugs used in this study had one to 37 comparator drugs, with one and two comparators being most frequent, as identified by DrugBank (Fig. 1a), and were on the market four to nine years (Fig. 1b). Tanimoto similarity scores between test drugs and comparator drugs ranged between 0.02 and 1, with 0.51 being the mean and 0.5 being the mode. Eighteen test drug-comparator associations included a biologic, as defined by a − 1 Tanimoto score (Fig. 1c). Target cosine similarity scores between test drugs and comparator drugs ranged between 0 and 1, with 0.45 being the mean and 1 being the mode (Fig. 1d). Seventy-nine comparator drugs were approved before 1982, while the most recently approved comparator drug had five years of time in market (Fig. 1e). The 54 test drugs are known to bind to 126 targets based on DrugBank data (summarized in Supplemental Table 1).

Fig. 1
figure 1

Characteristics of test drugs, comparator drugs and test-comparator drug combinations. a) Distribution of number of comparator drugs for test drug. b) Distribution of time on market for test drugs. c) Tanimoto score distribution for test-comparator drug combinations. d) Target similarity score distribution for test-comparator drug combinations. e) Distribution of time on market for comparator drugs

The prevalence of the 135 adverse events considered in this study is summarized in Fig. 2. The overall prevalence of adverse events was higher in the comparator drugs.

Fig. 2
figure 2

Prevalence of adverse events within comparator drugs and test drugs

Prediction models were not made for 26 adverse events that were not observed or observed only in one test drug label (accident, anaphylactoid reaction, aplastic anaemia, apnoea, atrioventricular block, azotaemia, cardiomyopathy, cerebral infarction, coagulopathy, colitis, colitis ulcerative, Crohn’s disease, dermatitis bullous, dermatitis exfoliative, gastric ulcer, granulocytopenia, hepatic necrosis, hypokinesia, injury, myopathy, oliguria, respiratory depression, road traffic accident, skin ulcer, thrombosis, and ulcer).

Results at varying thresholds (the minimum percentage of comparator drugs which are predicted positive for an adverse event to result in a positive prediction) for the safety label change evaluation and the number of adverse events with left-skewed positive predictive value, which demonstrated a high probability for high positive predictive value, are summarized in Table 1. Based on these results, we selected 70% as the optimum threshold. This resulted in the highest number of adverse events with high positive predictive values along with a high percentage of predicted safety label changes that were also issued by FDA (32%). All performance histograms at 70% threshold for each adverse event are provided in supplementary materials. Positive predictive value histograms of two well-predicted (i.e. left-skewed histograms) adverse events (febrile neutropenia and hypertension) and two poorly-predicted (i.e. right-skewed histograms) adverse events (bacterial infection and haemorrhage) are shown in Fig. 3.

Table 1 Performance of the algorithm when the threshold to make a positive prediction was varied
Fig. 3
figure 3

Left-skewed positive predictive value histograms demonstrated well-predicted adverse events, as shown in a) Febrile Neutropenia and b) Hypertension. Right-skewed positive predictive value histograms demonstrated poorly-predicted adverse events, as shown in c) Bacterial Infection and d) Haemorrhage

Fifty-three adverse events showed 100% as the positive predictive value mode, with the median between 50 and 100, 25% quantile between 0 and 100, and 75% quantile at 100%, which suggests left-skewed distributions. By having a left-skewed distribution for positive predictive value, these adverse events were considered well-predicted, which suggests high probability of having high positive predictive value (Table 2). Additionally, these adverse events had a sensitivity mode between 0 and 100%, specificity mode of 100%, and negative predictive value mode of 50–100%.

Table 2 Performance and prevalence of adverse events that were well-predicted by the algorithm

Fifty-six adverse events had positive predictive values mode between 0 and 33%, which suggested right-skewed distribution and thus were considered poorly-predicted (Table 3). While the positive predictive value was low, all these adverse events did have high specificity (mode: 76–100%) and negative predictive value (mode: 55–91%). Two adverse events, bacterial infection and fungal infection, additionally had high sensitivity (mode: 100%) (Table 3).

Table 3 Performance and prevalence of adverse events that were poorly-predicted by the algorithm


In this study we developed a preliminary approach to predict 135 adverse events of high priority to regulatory review using post-market safety data from marketed drugs that have the same activity at one or more of the same targets. We identified 53 adverse events that were well-predicted with this approach and chose to use a threshold which optimizes positive predictive value. These adverse events had varying sensitivity, but high specificity and negative predictive value. A model with high positive predictive value but low sensitivity will miss some true adverse events, but this was deemed acceptable for this study. In discussions about optimizing either positive predictive value or sensitivity in this study, it was deemed more important to identify adverse events that are most likely to be true and save time and effort sifting through false positives. In practice, a balance between sensitivity and positive predictive value would likely be optimal in conjunction with a manual review of predictions.

Adverse event predictions based on molecular targets have multiple applications. We may be able to identify difficult to observe events that are not commonly seen in clinical trials to statistical significance. Predicted adverse events may be able to augment post-marketing surveillance activities by providing a list of adverse events to monitor. If an adverse event is discovered during pre-market evaluation or post-market utilization, examination of other drugs with similar pharmacologic mechanism and activity may help evaluate causality of the event and determine if further studies are necessary based on information from all comparators, not necessarily limited to those with the same indication. Particularly, examination of secondary targets may be useful, as this may explain the emergence of an adverse event or why a particular drug is at lower risk for adverse events traditionally labeled as a class adverse event. While the preliminary approach presented here is considered a tool for hypothesis generation, further evaluation and refinement will determine if it is useful in regulatory safety review.

The method reported in this study matches safety data based on drug activity at one or more of the same known targets. This may limit the predictive ability, as some adverse events may be idiosyncratic or be associated with unknown secondary targets, and thus the mechanisms responsible for the event have not yet been identified. Associations may still be identified, however, if overlapping structural features capture this unknown shared idiosyncratic activity. This method can be expanded to match a drug not only based on drug activity at one or more of the same targets, but also considering other features which characterize the drug activity, such as Anatomical Therapeutic Chemical (ATC) codes or binding strength (Ki). ATC codes, developed by the World Health Organization, may provide insight into drugs that are related by mechanism or therapeutic use [16]. Binding strength to targets of interest, which may be obtained from literature or databases such as the Psychoactive Drug Screening Program [17] or ChEMBL [18], may provide further classification of target similarity by identifying comparator drugs that bind to targets of interest at a similar order of magnitude. The model also does not capture drug dose that may be needed to produce the required target activity.

Fifty-six adverse events were predicted with low positive predictive value. Therefore, a positive prediction for these adverse events should be carefully reviewed by experts before reaching a conclusion. In practice, expert review augments this by assessment of FDA Adverse Event Reporting System (FAERS) reports, literature, and more recently evaluations using insurance claims and electronic health data. Reviewers may examine predictions made by this algorithm by reviewing literature and other databases to identify plausible mechanisms for the drug eliciting the reaction, or review cases in FAERS and electronic health records. More detail about evaluation of safety signals at the FDA can be found in Szarfman et al. [19]. Analysis of the poor-performing adverse events in this study identified several clinical patterns: hemorrhage (including “haemorrhage”, “haematoma”, and “rectal haemorrhage”), infection (including “cellulitis”, “fungal infection”, and “bacterial infection”), and psychiatric (including “paranoia”, “delirium”, and “hallucination”) adverse events were among the worst-performing events by positive predictive value. Many of these adverse events may be idiosyncratic or related to unknown secondary target effects, and therefore it is difficult to predict an adverse event based on the known drug targets. This study may have been limited by the known targets that are available in DrugBank, as DrugBank may not contain all known secondary targets for all drugs. To better capture adverse events that may be related to secondary drug targets, target prediction for the test drugs and comparator drugs may be incorporated to better match comparator drugs to test drugs. DrugBank contains limited target predictions, so another source would be used.

This study had several limitations. First, the current version of Embase only allows users to extract manually curated adverse events by date for one drug at a time, which makes this process time-intensive for a large set of test drugs and their comparators and thus limited the number of drugs used in this study. We tried to address this limitation by using a probabilistic approach of performance evaluation using bootstrapping. Creating a tool to automate extraction of these adverse events may alleviate the manual burden. Additionally, text-mining FDA labels for adverse events is most accurate when used on a structured document, and thus we elected to use test drugs that had labels available in SPL format. While an assessment of the text-mining for 20 labels showed positive predictive value, sensitivity, and F-score at approximately 90% (unpublished data, Racz et al., 2018), we anticipate larger text-mining errors. This assessment identified patterns in the text-mining algorithm that may lead to errors, and the query is currently being updated to improve performance. Finally, several adverse events were not observed or observed with low prevalence in the test drug set. Further analysis of these adverse events identified some events that may be associated with targets that were not substantially analyzed. This includes events such as “respiratory depression”, which is particularly associated with drugs such as benzodiazepines and opioids and their related receptors [20], and “hypokinesia”, which may be associated with dopamine receptors [21]. Other adverse events, such as “anaphylactoid reaction” and “apnea”, may be reported interchangeably with other MedDRA Preferred Terms, such as “anaphylactic reaction” and “sleep apnea”, respectively; therefore, these terms may be reported in lower frequency. To better capture this, we may consider alternative groupings or adding additional terms to complete a mechanistically-related grouping.


This classifier algorithm predicts significant adverse events that are of high priority for regulatory monitoring, some of which may be difficult to observe in clinical trials. The prediction algorithm uses evidence of adverse events available through FDA product labels and scientific literature for drugs that have the same activity at one or more of the same targets along with structural and target similarities and the duration of post-market experience. For this study, we prioritized achieving high positive predictive value for the adverse event prediction. The model achieved high positive predictive value on 53 out of 135 adverse events, including several adverse events with well-characterized target relationships. We found that 32% of the model predicted safety label changes were FDA-issued within four to nine years after approval.


Selection of adverse events for evaluation

This methodology predicts 135 adverse events identified by FDA medical experts and reviewers to be of high priority to regulatory review and the pharmacovigilance efforts of the Office of Surveillance and Epidemiology. High priority was determined by FDA pharmacovigilance experts as events that are serious, may be life-threatening or debilitating, or represent frequent events that result in the need for safety label changes. These 135 adverse events were derived using 167 MedDRA Preferred Terms, grouped by mechanistic similarity according to FDA medical experts. For example, “pancreatitis” and “pancreatitis acute” are mechanistically similar and may be reported interchangeably, thus they were captured as one adverse event, “pancreatitis”. The 135 adverse events and the 167 MedDRA Preferred Terms used to define them are listed in Table 4. MedDRA is the Medical Dictionary for Regulatory Activities and is the international medical terminology developed under the auspices of the International Council for Harmonization of Technical Requirements for Pharmaceuticals for Human Use [22]. MedDRA Preferred Terms are medical concepts for symptoms, signs, diagnoses, indications, investigations, procedures, and medical, social, or family history. The FDA Adverse Event Reporting System (FAERS) currently codes reported adverse events as MedDRA Preferred Terms, and all terms from other sources were converted to MedDRA Preferred Terms as described below.

Table 4 Adverse events defined using MedDRA Preferred Terms. The bolded MedDRA Preferred Term is used to name the adverse event, while all MedDRA Preferred Terms grouped together were used to define that adverse event


Drug set selection

Selection of test drugs

Fifty-four drugs approved by FDA between 2008 and 2013 were chosen for this analysis. Analyses were based on available Structured Product Labeling for products and required both an original label and a subsequent version of the label for this assessment. As Structured Product Labeling began in 2006, 2008 was selected to allow time for the requirement to be adequately implemented. The year 2013 was selected as the upper bound to allow at least four years of post-market experience to 2017, which is the median time for a regulatory action on a safety event (e.g. updating a drug label) [23]. Of the drugs approved between 2008 and 2013, drugs were included as long as there was at least one other U.S. marketed drug with the same pharmacological activity at one or more of the same known targets. Additional inclusion criteria were systemic exposure (e.g. not ophthalmic only) and multiple doses (i.e. drugs with single dose administration were excluded) due to an increased likelihood of multiple and significant adverse events.

Selection of comparator drugs

Comparator drugs, defined as drugs that have the same activity (i.e. agonist or antagonist) at one or more of the same targets as the test drug, were chosen using DrugBank [24]. Test and comparator drug targets were identified if the drug had “pharmacological action” at the target (i.e. the column “pharmacological action” in DrugBank must read “yes” as opposed to “no” or “unknown”) and must have a defined action column in DrugBank (i.e. “antagonist” or “agonist”) at the target. Additionally, the comparator drugs must have been approved in the United States and thus have an FDA product label available.

Features for classifier algorithm

Adverse Events from FDA drug labels

Adverse events were obtained from two versions of the test drug label: the originally-approved FDA product label (between 2008 and 2013) and the drug label as of 2017. The adverse events from the 2017 FDA product label were text-mined using Linguamatics I2E (OnDemand Release, Linguamatics Limited, Cambridge, United Kingdom). Adverse events were extracted as MedDRA Preferred Terms from the Boxed Warnings, Warnings and Precautions, and Adverse Reactions sections. The adverse events from the original product label were manually extracted and translated to MedDRA Preferred Terms by a medical expert from the Boxed Warnings, Warnings and Precautions, and Adverse Reactions sections. Manual curation was employed as Linguamatics OnDemand text-mines the current product label only.

Comparator drug adverse events were text-mined using Linguamatics I2E (Enterprise Release, Linguamatics Limited, Cambridge, United Kingdom). Adverse events were extracted as MedDRA Preferred Terms from Boxed Warnings, Warnings and Precautions, and Adverse Reactions sections. For each comparator drug, the FDA product label in use at the time of the respective test drug approval was used as the source for text-mining (e.g.: if a test drug was approved on November 1, 2010, the comparator drug labels that were in use on November 1, 2010 were mined).

For each drug label and adverse event, the presence or absence of a MedDRA Preferred Term was indicated by “1” or “0”, respectively. The classifiers were trained on and performance was analyzed using test drug label data from 2017. To assess the algorithm’s ability to predict future safety label changes at the approval date (described in detail in “Classifier” below), the difference between drug label data from 2017 and the label at approval (2008–2013) was used.

Adverse events from scientific literature

Adverse events from scientific literature were mined using Embase Biomedical Database (Elsevier B. V, Amsterdam, The Netherlands), a biomedical database covering journals and conference abstracts [25]. A team of Embase indexers manually curate all adverse events from all full-text articles and associate each adverse event with the related drug. These drugs and adverse events are documented in Emtree terms, Elsevier’s controlled terminology. Therefore, each drug in Embase has hundreds to thousands of adverse events associated with it, and each adverse event-drug association has a curated reference. Adverse events reported for all comparator drugs before their respective test drug’s approval date were searched for in Embase. The list of adverse events documented by Elsevier as Emtree terms for each comparator drug was exported and manually matched to MedDRA Preferred Terms.

Comparator drug duration in market

Comparator time in market was included as a feature. The longer a drug has been marketed, the more adverse events, particularly difficult to observe adverse events, are identified and evaluated for labeling. The duration in market for comparator drugs was determined from the Orange Book [26]. Drugs that were approved before 1982 have an approval date listed as “Approved Prior to Jan 1, 1982”; the duration in market for these drugs was imputed to be 36 years (1982 to 2017).

Structural similarity

Structural similarity was included as a feature as it was hypothesized that the more structurally similar a comparator drug was to a test drug, the more likely they were to share pharmacology, including unknown secondary pharmacology that was not included in this analysis and may contribute to similar idiosyncratic reactions. Structural similarities of each test drug to its respective comparator drugs were determined using Tanimoto scores. Simplified Molecular Input Line Entry System (SMILES) structures for all test and comparator drugs were imported into the Tanimoto Matrix workflow in the KNIME Analytics Platform (version 3.3.2) [27]. Structures were then converted to MACCS 166-bit fingerprints, and structural similarity between the test drug and the respective comparator drug was determined. For biologics where similarity score was not available, − 1 was imputed as Tanimoto score.

Target similarity

Target similarity, or how closely the target profile of each comparator aligned with that of the test drug, was included as a feature as it was hypothesized that the more targets a comparator shares with a test drug, the more likely it is that a comparator and test drug share adverse events. The set of known pharmacological targets for each test drug and corresponding comparator drugs was extracted from DrugBank [24]. Target similarities of each test drug with its comparator drugs were determined using target-based cosine similarity scores. A trivalent drug-by-target matrix was then constructed such that for each drug-target pair an entry of “1” indicates drug-target activation, an entry of “-1” indicates drug-target inhibition, and an entry of “0” indicates no pharmacological activity. Cosine similarities the test drug has with its comparator drugs were then computed as follows:

$$ cosine\left(\left[ Test\ Drug\right],\left[ Comparator\ Drug\right]\right)=\frac{\left[ Test\ Drug\right]\bullet \left[ Comparator\ Drug\right]}{\left\Vert \left[ Test\ Drug\right]\right\Vert \left\Vert \left[ Comparator\ Drug\right]\right\Vert } $$


Five features were defined for each comparator-test drug -adverse event association: 1) presence or absence of an adverse event in FDA drug label for the comparator drug; 2) presence or absence of an adverse event in scientific literature for comparator drug; 3) structural similarity between comparator drug and test drug; 4) target similarity between comparator drug and test drug; and 5) duration the comparator drug was on the market (Fig. 4), all of which are independent of each other. These features were used to train a Naïve Bayes classifier, using presence or absence of an adverse event in the 2017 FDA drug label for the test drug as the training label (see section Adverse Events from FDA Drug Labels for details). Given the wide range of prevalence of presence of an adverse event, we anticipated the contribution of prevelance of presence of an adverse event to model prediction would be high. Therefore a Naïve Bayes classifier was chosen in order to take into account both prior probability (i.e. prevelance of presence of an adverse event) and likelihood for presence of an adverse event. All statistical calculations were conducted in R version 3.2.2 (R Foundation for Statistical Computing, Vienna, Austria) and the Naïve Bayes classifier from package e1071 was used [28] (see supplemental materials for code).

Fig. 4
figure 4

Flow diagram of experimental methods

Due to the limited number of drugs available for testing and the high dimensionality of prediction (135 adverse events), 10,000 bootstrapping steps were conducted by selecting a random set of 44 drugs to train the Naïve Bayes classifier, while leaving 10 drugs for testing at each iteration (i.e. 10,000/ \( {C}_{44}^{54} \)). A prediction was made by each comparator drug-test drug association for an adverse event of interest. Therefore, since a single test drug can have multiple comparator drugs, there may be multiple predictions for one test drug for each adverse event of interest. To remediate this, if the percentage of comparator drug-test drug combinations that predicted the adverse event of interest was above a predefined threshold, the adverse event was considered a positive prediction for the test drug. Performance was calculated while varying the threshold (0, 10, 30, 50, 60, 70, 90%) above which the percentage of comparator drug-test drug combinations predicted the adverse event of interest to identify the optimum threshold.

As 10,000 bootstrapping steps were performed, the most frequent value (mode), median, 25th and 75th quantiles for each of the performance metrics (sensitivity, specificity, positive predictive value and negative predictive value) were calculated to assess the predictive ability for each adverse event. Performance metric histograms for each adverse event are provided in the supplemental materials. We chose to optimize positive predictive value, as false positives may be more costly in terms of additional studies and regulatory review compared to false negatives. Adverse events with a distribution for positive predictive value that was left-skewed (defined as a mode positive predictive value > 75%) were considered well-predicted.

Leave-one-out cross validation was performed to evaluate safety label changes. Predictions were evaluated as follows:

$$ \%\mathrm{of}\ \mathrm{FDA}-\mathrm{issued}\ \mathrm{safety}\ \mathrm{label}\ \mathrm{changes}\ \mathrm{that}\ \mathrm{were}\ \mathrm{predicted}=\frac{\# of\ drug- AE\ combos\ that\ \boldsymbol{change}\boldsymbol{d}\ \boldsymbol{from}\ \boldsymbol{negative}\ \boldsymbol{to}\ \boldsymbol{positive}\ between\ approval\ and\ 2017\ that\ were\ \boldsymbol{predicted}\ \boldsymbol{positive}}{\# of\ drug- AE\ combos\ that\ \boldsymbol{change}\mathbf{d}\ \boldsymbol{from}\ \boldsymbol{negative}\ \boldsymbol{to}\ \boldsymbol{positive}\ between\ approval\ and\ 2017} $$
$$ \%\mathrm{of}\ \mathrm{predicted}\ \mathrm{safety}\ \mathrm{label}\ \mathrm{changes}\ \mathrm{that}\ \mathrm{were}\ \mathrm{also}\ \mathrm{FDA}-\mathrm{issued}=\frac{\# of\ drug- AE\ combos\ that\ \boldsymbol{changed}\ \boldsymbol{from}\ \boldsymbol{negative}\ \boldsymbol{to}\ \boldsymbol{posi}\mathbf{t}\boldsymbol{ive}\ between\ approval\ and\ 2017\ that\ were\ \boldsymbol{predicted}\ \boldsymbol{posi}\boldsymbol{tive}}{\# of\ drug- AE\ combos\ that\ were\ \boldsymbol{negative}\ \boldsymbol{at}\ \boldsymbol{approval}\ that\ were\ \boldsymbol{predicted}\ \boldsymbol{posi}\boldsymbol{tive}} $$

Evaluation of false positive predictions

Positive predictions that were made by the Naïve Bayes classifier that were not on the respective 2017 drug label were classified as “false positives”. To further evaluate if these predictions may be early signals not yet on the label, the case count and Proportional Reporting Ratio (PRR) were identified for each drug-adverse event pair from the FDA Adverse Event Reporting System using OpenFDA [29, 30]. Data from June 30, 1989 to January 1, 2018 was used in this analysis.

Availability of data and materials

The datasets supporting the conclusions of this article are included within the article (and its additional files).



Anatomical Therapeutic Chemical


Medical Dictionary for Regulatory Activities


FDA Adverse Event Reporting System


Proportional Reporting Ratio


Food and Drug Administration


Simplified Molecular Input Line Entry System


  1. Woodcock J. FDA Voice [Internet]2018. Available from: [cited 2018].

  2. Ferrell PB Jr, McLeod HL. Carbamazepine, HLA-B*1502 and risk of Stevens-Johnson syndrome and toxic epidermal necrolysis: US FDA recommendations. Pharmacogenomics. 2008;9(10):1543–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Yang M, Kiang M, Shang W. Filtering big data from social media--building an early warning system for adverse drug reactions. J Biomed Inform. 2015;54:230–40.

    Article  PubMed  Google Scholar 

  4. Ball R, Robb M, Anderson SA, Dal PG. The FDA's sentinel initiative--a comprehensive approach to medical product surveillance. Clin Pharmacol Ther. 2016;99(3):265–8.

    Article  CAS  PubMed  Google Scholar 

  5. FDA. FAERS Public Dashboard [Available from:].

  6. Zhang L, Zhang J, Shea K, Xu L, Tobin G, Knapton A, et al. Autophagy in pancreatic acinar cells in caerulein-treated mice: immunolocalization of related proteins and their potential as markers of pancreatitis. Toxicol Pathol. 2014;42(2):435–57.

    Article  CAS  PubMed  Google Scholar 

  7. Vilar S, Harpaz R, Chase HS, Costanzi S, Rabadan R, Friedman C. Facilitating adverse drug event detection in pharmacovigilance databases using molecular structure similarity: application to rhabdomyolysis. J Am Med Inform Assoc. 2011;18(Suppl 1):i73–80.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Pouliot Y, Chiang AP, Butte AJ. Predicting adverse drug reactions using publicly available PubChem BioAssay data. Clin Pharmacol Ther. 2011;90(1):90–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Gurulingappa H, Toldo L, Rajput AM, Kors JA, Taweel A, Tayrouz Y. Automatic detection of adverse events to predict drug label changes using text and data mining techniques. Pharmacoepidemiol Drug Saf. 2013;22(11):1189–94.

    Article  PubMed  Google Scholar 

  10. Zhao J, Henriksson A, Asker L, Bostrom H. Predictive modeling of structured electronic health records for adverse drug event detection. BMC Med Inform Decision Making. 2015;15(Suppl 4):S1.

    Article  Google Scholar 

  11. Schuemie MJ, Coloma PM, Straatman H, Herings RM, Trifiro G, Matthews JN, et al. Using electronic health care records for drug safety signal detection: a comparative evaluation of statistical methods. Med Care. 2012;50(10):890–7.

    Article  PubMed  Google Scholar 

  12. Strickland J, Zang Q, Paris M, Lehmann DM, Allen D, Choksi N, et al. Multivariate models for prediction of human skin sensitization hazard. J Appl Toxicol. 2017;37(3):347–60.

    Article  CAS  PubMed  Google Scholar 

  13. Xu R, Wang Q. Automatic signal extraction, prioritizing and filtering approaches in detecting post-marketing cardiovascular events associated with targeted cancer drugs from the FDA adverse event reporting system (FAERS). J Biomed Inform. 2014;47:171–7.

    Article  PubMed  Google Scholar 

  14. Frid AA, Matthews EJ. Prediction of drug-related cardiac adverse effects in humans--B: use of QSAR programs for early detection of drug-induced cardiac toxicities. Regul Toxicol Pharmacol. 2010;56(3):276–89.

    Article  CAS  PubMed  Google Scholar 

  15. Schotland P, Racz R, Jackson D, Levin R, Strauss DG, Burkhart K. Target-adverse event profiles to augment Pharmacovigilance: a pilot study with six new molecular entities. CPT Pharmacometrics Syst Pharmacol. 2018;7(12):809–17.

  16. ATC/DDD Index 2018: World Health Organization.  [Available from:].

  17. Roth BL, Lopez E, Patel S, Kroeze WK. The Multiplicity of Serotonin Receptors: Uselessly Diverse Molecules or an Embarrassment of Riches? Neuroscientist. 2000;6(4):252–62.

  18. Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, et al. The ChEMBL database in 2017. Nucleic Acids Res. 2017;45(D1):D945–D54.

    Article  CAS  PubMed  Google Scholar 

  19. Szarfman A, Tonning JM, Doraiswamy PM. Pharmacovigilance in the 21st century: new systematic tools for an old problem. Pharmacotherapy. 2004;24(9):1099–104.

    Article  PubMed  Google Scholar 

  20. Horsfall JT, Sprague JE. The pharmacology and toxicology of the 'Holy Trinity'. Basic Clin Pharmacol Toxicol. 2017;120(2):115–9.

    Article  CAS  PubMed  Google Scholar 

  21. Lemos JC, Friend DM, Kaplan AR, Shin JH, Rubinstein M, Kravitz AV, et al. Enhanced GABA transmission drives Bradykinesia following loss of dopamine D2 receptor signaling. Neuron. 2016;90(4):824–38.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. MedDRA: Medical Dictionary for Regulatory Activities [Available from:].

  23. Downing NS, Shah ND, Aminawung JA, Pease AM, Zeitoun JD, Krumholz HM, et al. Postmarket safety events among novel therapeutics approved by the US Food and Drug Administration between 2001 and 2010. JAMA. 2017;317(18):1854–63.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018;46(D1):D1074–D82.

    Article  CAS  Google Scholar 

  25. Elsevier. [Available from:].

  26. FDA. Approved Drug Products with Therapeutic Equivalence Evaluations. 38th ed; 2018.

    Google Scholar 

  27. Berthold MRCN, Dill F, Gabriel TR, Kotter T, Meinl T, Ohl P, Sieb C, Thiel K, Wiswedel B, KNIME. The Konstanz Information Miner. In: BH PC, Schmidt-Thieme L, Decker R, editors. Data Analysis, Machine Learning and Applications. Berlin: Springer; 2008. p. 319–26.

    Chapter  Google Scholar 

  28. Meyer D, Hornik K, Weingessel A, Leisch F. e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly E1071). R package version 1.6–8 ed2017.

  29. Kass-Hout TA, Xu Z, Mohebbi M, Nelsen H, Baker A, Levine J, et al. OpenFDA: an innovative platform providing access to a wealth of FDA's publicly available data. J Am Med Inform Assoc. 2016;23(3):596–600.

    Article  PubMed  Google Scholar 

  30. Evans SJ, Waller PC, Davis S. Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiol Drug Saf. 2001;10(6):483–6.

    Article  CAS  PubMed  Google Scholar 

Download references


The authors thank Jeffry Florian and Anuradha Ramamoorthy for their helpful feedback.


This article reflects the views of the authors and should not be construed to represent the FDA’s views or policies. The mention of commercial products, their sources, or their use in connection with material reported herein is not to be construed as either an actual or implied endorsement of such products by the Department of Health and Human Services.


This project was supported in part by a research fellowship from the Oak Ridge Institute for Science and Education through an interagency agreement between the Department of Energy and the Food and Drug Administration (FDA). The funding body played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Author information

Authors and Affiliations



CD led classifier design and implementation. CD, PS, and RR contributed data. CD and RR led data analysis and interpretation. CD, PS, DGS, KKB, and RR participated in study design and in the writing and editing of this manuscript. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Rebecca Racz.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

RR’s spouse is an employee of AstraZeneca. All other authors have no competing interests to declare.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1.

“Supplemental Materials” contains histograms of the performance for each adverse event; “Supplemental Table 1” contains all targets represented in this study.

Additional file 2.

Contains Naïve Bayes code, files necessary to run code, and output files obtained to perform analysis described in the paper.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Daluwatte, C., Schotland, P., Strauss, D.G. et al. Predicting potential adverse events using safety data from marketed drugs. BMC Bioinformatics 21, 163 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: