Prediction models for drug-induced hepatotoxicity by using weighted molecular fingerprints
© The Author(s). 2017
Published: 31 May 2017
Drug-induced liver injury (DILI) is a critical issue in drug development because DILI causes failures in clinical trials and the withdrawal of approved drugs from the market. There have been many attempts to predict the risk of DILI based on in vivo and in silico identification of hepatotoxic compounds. In the current study, we propose the in silico prediction model predicting DILI using weighted molecular fingerprints.
In this study, we used 881 bits of molecular fingerprint and used as features describing presence or absence of each substructure of compounds. Then, the Bayesian probability of each substructure was calculated and labeled (positive or negative for DILI), and a weighted fingerprint was determined from the ratio of DILI-positive to DILI-negative probability values. Using weighted fingerprint features, the prediction models were trained and evaluated with the Random Forest (RF) and Support Vector Machine (SVM) algorithms. The constructed models yielded accuracies of 73.8% and 72.6%, AUCs of 0.791 and 0.768 in cross-validation. In independent tests, models achieved accuracies of 60.1% and 61.1% for RF and SVM, respectively. The results validated that weighted features helped increase overall performance of prediction models. The constructed models were further applied to the prediction of natural compounds in herbs to identify DILI potential, and 13,996 unique herbal compounds were predicted as DILI-positive with the SVM model.
The prediction models with weighted features increased the performance compared to non-weighted models. Moreover, we predicted the DILI potential of herbs with the best performed model, and the prediction results suggest that many herbal compounds could have potential to be DILI. We can thus infer that taking natural products without detailed references about the relevant pathways may be dangerous. Considering the frequency of use of compounds in natural herbs and their increased application in drug development, DILI labeling would be very important.
As the leading cause of development failure in clinical trials and withdrawal of drugs from the market, drug-induced liver injury (DILI) is one of the most important factor in drug development . The severe adverse effects of DILI, which include acute liver failure and jaundice, must be considered in drug development. The toxicity of these drugs is attributable to their conversion in the liver to highly reactive metabolites that cause organ damage [2–4]. However, determining DILI potential is a very challenging task, primarily because animal studies do not efficiently predict DILI potential in human. For example, in a phase II clinical trial, acute liver toxicity induced by fialuridine led to the deaths of five subjects, in contrast to its safe use in animal studies . In a study of 221 pharmaceutical products, the rate of concordance of hepatotoxicity in humans and animals was low, approximately 55%, whereas the rate of concordance was much higher in other target organs, including the hematological (91%), gastrointestinal (85%), and the cardiovascular (80%) systems . In addition, clinical features or laboratory tests for predicting DILI potential have not been identified [7, 8]. Moreover, the statistical power of clinical trials is insufficient. Severe idiosyncratic hepatotoxicity occurs at very low frequency, and patient samples in clinical trials number only in the thousands. Due to this low statistical power, even well-controlled clinical trials can fail to predict DILI.
To overcome these problems, many researchers have sought to evaluate the toxicity of compounds in vitro and/or in vivo. However, considering the number of compounds, this approach is time-consuming and costly, and thus there has been much effort to develop prediction models to determine if a compound could cause liver toxicity. Computational modeling approaches have been adopted by pharmaceutical companies to help evaluate the efficacy, toxicity, and metabolism of pharmaceutical ingredients . In the early stages of the development of prediction models, the predictive power of the constructed models was not satisfactory, and models often relied on experimental data for better performance. Some researchers used molecular signatures, such as for alanine transaminase (ALT), aspartate aminotransferase (AST), and alkaline phosphatase (ALP), all of which are commonly assessed in the diagnostic evaluation of hepatocellular damage . In more recent years, machine-learning algorithms for prediction models have also been developed to obtain better predictions [11, 12]. However, experimental data are limited utility in constructing prediction models. Therefore, several researchers have focused on computational predictions using compound properties and structural characteristics. Greene et al. developed structure-activity relationships for potentially hepatotoxic compounds . Compounds were categorized into four classes associated with hepatotoxicity: no evidence, weak evidence, animal hepatotoxicity and human hepatotoxicity. The resultant hepatotoxicity alerts yielded a concordance of 56%, a specificity of 73%, and a sensitivity of 46%. Ekins et al. built a classification model based on the Bayesian modeling method with molecular descriptors and fingerprint descriptors . The evaluation of the classifier demonstrated a concordance of 60% for internal validation and 64% for external validation. Rodgers et al. also developed a quantitative structure-activity relationship (QSAR) model using liver adverse effects of drugs (AEDs) as a dataset. They used information on enzyme markers of hepatotoxicity, but these markers can fluctuate due to other factors throughout the day . Moreover, Huang et al. developed a prediction model based on QSAR using a variety of descriptors including fingerprints. Their model performed well with an accuracy of 79.1% in internal validation. They further predicted the potential hepatotoxicity of Traditional Chinese Medicines . Zhang et al. also developed an in silico prediction model for DILI. They used three different fingerprints and five machine-learning algorithms and obtained a concordance of 66% using the Support Vector Machine algorithm and FP4 fingerprint, in addition to identifying important substructure patterns related to liver toxicity . Despite these extensive efforts to predict DILI, there are no standard QSAR models for DILI, in contrast to the availability of QSAR models for mutagens. Moreover, less is known about the substructures that are significantly associated with DILI [18–20].
The Liver Toxicity Knowledge Base Benchmark Dataset (LTKB-BD) and the DrugBank database were used as training datasets. LTKB-BD is a benchmark dataset provided by the National Center for Toxicological Research (NCTR), U.S. FDA [21, 22]. This dataset contains a list of drugs with DILI potential in humans in accordance with FDA-approved prescription drug labels. Drugs in the dataset are categorized into one of three groups based on their description and severity: most-DILI-concern, less-DILI-concern, and no-DILI-concern. Drugs with a black box warning of hepatotoxicity or that were withdrawn from the market were classified into the most-DILI-concern category. The drugs in that class were labeled due to their fatal hepatotoxicity, including liver necrosis, jaundice, and acute liver failure. The less-DILI-concern drugs included those with moderate DILI warnings, and drugs without any DILI indication were classified as no-DILI-concern drugs. In this study, we began by labeling 222 DILI-concern drugs and 65 no-DILI-concern drugs from the LTKB-BD as positive and negative, respectively. We then retrieved simplified molecular-input line-entry system (SMILES) information using ChemSpider python API by name matching [23, 24]. The SMILES information was further used to obtain molecular fingerprints for use as features in model training and construction. We selected only one-matched compounds for higher confidence because ChemSpider API offers a partial matching service. Finally, we obtained 180 positive and 53 negative compounds.
The number of compounds used in training and the independent test
Green & Xu
Molecular fingerprints are a representation of the structure of a compound. Fingerprints are widely used in chemical informatics because they consist of bitstrings, which facilitate molecule comparisons. Each bit of a fingerprint represents a specific substructure of a molecule, and the annotation of the substructure depends on the type of fingerprint. In the current study, we used PubChem fingerprints (ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.pdf), which have a length of 881 bits. Each bit represents the presence of an element, the count of a ring system, the atom pairs, the atom’s nearest neighbors, and the SMARTS patterns. The PubChem fingerprint was chosen for substructure reporting in the present study because it describes the structure of a molecule in detail with a long bit-vector. To retrieve fingerprint information, we used the PaDEL-Descriptor, which is software used to calculate molecular descriptors including 1D, 2D, and 3D descriptors and 12 types of fingerprints for the PubChem fingerprint . The software can be downloaded online and supports a graphical interface.
Bayesian theory for feature weight calculation
The Random Forest (RF) and the Support Vector Machine (SVM) algorithms were used to construct the classification and prediction model. The RF algorithm is an ensemble learning algorithm that operates by constructing a large number of decision trees and collecting them. When it devises a prediction, it runs a new input for every decision tree and votes on how it is to be classified. The main advantage of the RF algorithm is that it avoids overfitting problems, which occur frequently when dealing with a small dataset. The implementation of the algorithm is found in MATLAB Statistics and Machine Learning Toolbox (MATLAB and Statistics Toolbox Release 201#, The MathWorks, Inc., Natick, Massachusetts, United States). The TreeBagger function was used for the RF algorithm. SVMs are among the most popular supervised machine-learning algorithms for pattern recognition and are also used for classification. SVM constructs a hyperplane that is used for classification using specified training examples, each including a category label. The constructed model can then be used to predict the DILI potential of a new drug. The implementation of the SVM we used is A Library for Support Vector Machines (LIBSVM) . When training a model, we used similarity matrices calculated using the Tanimoto coefficient, a similarity metric that uses the ratio of the intersecting set to the union set because the constructed space would be very high-dimensional with 881 features. The use of similarity matrices reduces the dimensions to the data size.
When training the models, we performed 10-fold cross-validation, which divides the training dataset into ten subsamples. Nine subsamples are used for training, and one subsample is used for testing. We constructed each model with different thresholds and multiplication numbers, and we compared the performances to select the best model for prediction.
The data from previous studies were used for further evaluation. We collected the independent test set from two studies: Greene et al. and Xu et al. [13, 27]. Greene’s dataset was categorized into four groups: HH (evidence of human hepatotoxicity); NE (no evidence of hepatotoxicity in any species); WE (weak evidence of human hepatotoxicity); and AH (evidence for animal hepatotoxicity but not tested in humans). To use strict data, we used the compounds in the HH and NE categories as positive and negative, respectively. After combining the two datasets, we pre-processed the resultant dataset in the same manner as the training set. The SMILES information was retrieved from ChemSpider and was used to eliminate duplicates from the training set and eliminate label contradictions between the two sets. In total, we obtained 398 compounds, including 224 positive and 174 negative.
Prediction of natural products
The constructed classification model was then applied to predict the potential hepatotoxicity of natural products. We collected herbal compound information from the TCMID, TCM-ID, and KAMPO databases [28–30], all of which contain information about the efficacy of herbs and their constituent compounds. The natural product dataset was also standardized by ChemSpider, and a fingerprint was obtained. Fingerprints were not able to be retrieved for a few compounds, primarily very complex, large molecules with a mass greater than 1000 Da. These compounds were excluded, resulting in a final total of 17,826 compounds.
Frequent substructures in hepatotoxic compounds
One of the main purposes of this research was to identify important substructures in DILI-positive compounds. The frequently appearing substructures can be inferred from the weighted substructures. We first calculated the probabilities of each substructure to be in positive and negative labeled compounds respectively. Then with the log odds ratio of positive to negative we selected substructures to be weighted. We determined the weighted substructures by high log odds ratio values, since we focused on substructures which are frequent in DILI-positive compounds. With a log odds ratio threshold of 2.5, we identified 24 substructures.The following substructures with other various threshold values are described in Additional file 1: Table S1–S3.
Prediction of hepatotoxic compounds in natural products
In the current study, we calculated the weighted feature using Bayesian theory and constructed DILI prediction models using the updated feature with two algorithms: RF and SVM. When calculating the weight vector, we focused on giving weight to those features that appeared more frequently in DILI-positive compounds than in DILI-negative compounds because it is more important to identify hepatotoxic compounds that might cause critical adverse reactions when developed into drugs. Therefore, we set a cutoff to select the substructures to be weighted by their log odds ratio values. The threshold ranged from 0.5 to 2.5 and resulted in different performances. With an excessively low threshold, the number of weighted substructures was too large, causing the overall values of the weight vector to increase without differentiating specific substructures and, consequently, poor model performance. By contrast, the use of an excessively high threshold would weight too few substructures, resulting in a decrease of performance. The parameter multiplied with the selected substructure also affected the performance, but the effect was not significant. This result indicates that amplification of values is important but that the degree of amplification does not significantly affect model performance.
Both constructed models resulted in good performance in cross-validation considering AUC and accuracy; however, the accuracy of the independent test slightly decreased compared to the results of cross-validation. The low accuracy was due to low specificity, indicating that the model tends to predict more compounds as positive than it predicts as negative. This problem occurred because we focused on predicting DILI-positive compounds by weighing the related substructures and used a sensitivity threshold of 0.8, which could be relatively high. Because it is safer to predict negative compounds as positive (classifying nontoxic compounds as toxic) than to classify toxic compounds as nontoxic, we did not lower the threshold but attempted to reduce the gap between sensitivity and specificity using a weighted feature. This approach helped increase the accuracy. Although the increase in accuracy was not dramatic, the model classified the independent test set more precisely, positive to positive and negative to negative. The results also demonstrated that the weighted substructures affected the prediction of DILI-positive compounds.
In this study, we also determined frequently occurring substructures in DILI-positive compounds. Although the substructures with the highest probability are general, as the threshold lowers, more details in the SMARTS patterns can be observed. We obtained general structures because of the characteristic of PubChem fingerprints, which divide a structure into lower levels.
We introduced a DILI prediction model with weighted features. The weighted features were calculated using Bayesian probability giving information of frequency of each substructure in DILI-positive and DILI-negative compounds. As a result, the weighted features increased the model performance in both cross-validation and independent test with unseen dataset. Moreover, we applied the constructed model to prediction of DILI potential in herbs. The results show that large number of predicted positive compounds indicates that even compounds found in nature can be toxic and harmful to the human body. This finding is important because some people in Eastern countries rely on herbal medicine and believe it is safer than taking general drugs. However, natural products are not always beneficial to health. In addition, natural products have come to the forefront in drug discovery and development. Therefore, herbs that are used as home remedies or that are under development must be carefully administered, considering their toxic effects on the human body. In addition, we listed frequent substructures in DILI-positive compounds to facilitate drug screening in less time and at lower cost.
As an additional approach, we can improve the prediction models using structural information other than two-dimensional structural information. The frequent substructures we reported here based on the fingerprint annotation can be further developed to aid the identification of toxicophores using neural networks.
This work was supported by the Bio-Synergy Research Project (NRF-2014M3A9C4066449) of the Ministry of Science, ICT and Future Planning through the National Research Foundation, by the National Research Foundation of Korea grant funded by the Korea government (MSIP) (NRF-2015R1C1A1A01051578), and by the GIST Research Institute (GRI) in 2017. Publication charge for this work was funded by the Bio-Synergy Research Project (NRF-2014M3A9C4066449).
Availability of data and materials
The Liver Toxicity Knowledge Base Benchmark Dataset (LTKB-BD) is developed by NCTR scientists and available on the U.S. Food and Drug Administration (http://www.fda.gov/ScienceResearch/BioinformaticsTools/LiverToxicityKnowledgeBase/). The additional negative dataset from DrugBank is also available online (https://www.drugbank.ca/).
EK and HN conceived of the study. EK wrote the manuscript. HN helped draft the manuscript and participated in the editing of the manuscript. All authors have read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
About this supplement
This article has been published as part of BMC Bioinformatics Volume 18 Supplement 7, 2017: Proceedings of the Tenth International Workshop on Data and Text Mining in Biomedical Informatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-18-supplement-7.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Lee WM. Drug-induced hepatotoxicity. New England J Med. 2003;349(5):474–85.View ArticleGoogle Scholar
- Kassahun K, Pearson PG, Tang W, McIntosh I, Leung K, Elmore C, Dean D, Wang R, Doss G, Baillie TA. Studies on the metabolism of troglitazone to reactive intermediates in vitro and in vivo. Evidence for novel biotransformation pathways involving quinone methide formation and thiazolidinedione ring scission. Chem Res Toxicol. 2001;14(1):62–70.View ArticlePubMedGoogle Scholar
- Park BK, Kitteringham NR, Maggs JL, Pirmohamed M, Williams DP. The role of metabolic activation in drug-induced hepatotoxicity. Annu Rev Pharmacol Toxicol. 2005;45:177–202.View ArticlePubMedGoogle Scholar
- Walgren JL, Mitchell MD, Thompson DC. Role of metabolism in drug-induced idiosyncratic hepatotoxicity. Crit Rev Toxicol. 2005;35(4):325–61.View ArticlePubMedGoogle Scholar
- McKenzie R, Fried MW, Sallie R, Conjeevaram H, Di Bisceglie AM, Park Y, Savarese B, Kleiner D, Tsokos M, Luciano C, et al. Hepatic failure and lactic acidosis due to fialuridine (FIAU), an investigational nucleoside analogue for chronic hepatitis B. N Engl J Med. 1995;333(17):1099–105.View ArticlePubMedGoogle Scholar
- Olson H, Betton G, Robinson D, Thomas K, Monro A, Kolaja G, Lilly P, Sanders J, Sipes G, Bracken W, et al. Concordance of the toxicity of pharmaceuticals in humans and in animals. Regul Toxicol Pharmacol. 2000;32(1):56–67.View ArticlePubMedGoogle Scholar
- Grant LM, Rockey DC. Drug-induced liver injury. Curr Opin Gastroenterol. 2012;28(3):198–202.View ArticlePubMedGoogle Scholar
- Zhou Y, Qin S, Wang K. Biomarkers of drug-induced liver injury. Curr Biomark Find. 2013;3:1–9.Google Scholar
- Gibb S. Toxicity testing in the 21st century: a vision and a strategy. Reprod Toxicol. 2008;25(1):136–8.View ArticlePubMedGoogle Scholar
- Jennen D, Polman J, Bessem M, Coonen M, van Delft J, Kleinjans J. Drug-induced liver injury classification model based on in vitro human transcriptomics and in vivo rat clinical chemistry data. Systems Biomed. 2014(ahead-of-print):e29400.Google Scholar
- Mishra M, Fei H, Huan J. Computational prediction of toxicity. International journal of data mining and bioinformatics. 2013;8(3):338-348.Google Scholar
- Meenakshi Mishra BP, Jun Huan. Bayesian Classifiers for Chemical Toxicity Prediction. In: Bioinformatics and Biomedicine (BIBM), IEEE International Conference: 12-15 Nov. 2011; Atlanta, GA, USA. IEEE 2011.Google Scholar
- Greene N, Fisk L, Naven RT, Note RR, Patel ML, Pelletier DJ. Developing structure-activity relationships for the prediction of hepatotoxicity. Chem Res Toxicol. 2010;23(7):1215–22.View ArticlePubMedGoogle Scholar
- Ekins S, Williams AJ, Xu JJ. A predictive ligand-based Bayesian model for human drug-induced liver injury. Drug Metab Dispos. 2010;38(12):2302–8.View ArticlePubMedGoogle Scholar
- Rodgers AD, Zhu H, Fourches D, Rusyn I, Tropsha A. Modeling liver-related adverse effects of drugs using knearest neighbor quantitative structure-activity relationship method. Chem Res Toxicol. 2010;23(4):724–32.View ArticlePubMedPubMed CentralGoogle Scholar
- Huang SH, Tung CW, Fulop F, Li JH. Developing a QSAR model for hepatotoxicity screening of the active compounds in traditional Chinese medicines. Food Chem Toxicol. 2015;78:71–7.View ArticlePubMedGoogle Scholar
- Zhang C, Cheng F, Li W, Liu G, Lee PW, Tang Y. In silico prediction of drug induced liver toxicity using substructure pattern recognition method. Mol Inf. 2016;35(3-4):136–44.View ArticleGoogle Scholar
- Custer LL, Sweder KS. The role of genetic toxicology in drug discovery and optimization. Curr Drug Metab. 2008;9(9):978–85.View ArticlePubMedGoogle Scholar
- Valerio Jr LG, Cross KP. Characterization and validation of an in silico toxicology model to predict the mutagenic potential of drug impurities. Toxicol Appl Pharmacol. 2012;260(3):209–21.View ArticlePubMedGoogle Scholar
- Valencia A, Prous J, Mora O, Sadrieh N, Valerio Jr LG. A novel QSAR model of Salmonella mutagenicity and its application in the safety assessment of drug impurities. Toxicol Appl Pharmacol. 2013;273(3):427–34.View ArticlePubMedGoogle Scholar
- Chen M, Vijay V, Shi Q, Liu Z, Fang H, Tong W. FDA-approved drug labeling for the study of drug-induced liver injury. Drug Discov Today. 2011;16(15-16):697–703.View ArticlePubMedGoogle Scholar
- Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V, et al. DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 2014;42(Database issue):D1091–1097.View ArticlePubMedGoogle Scholar
- Pence HE, Williams A. ChemSpider: an online chemical information resource. J Chem Educ. 2010;87(11):1123–4.View ArticleGoogle Scholar
- Williams AJ TV, Golotvin S, Kidd R, McCann G. ChemSpider - building a foundation for the semantic web by hosting a crowd sourced databasing platform for chemistry. J Cheminf. 2010;2 Suppl 1:O16.View ArticleGoogle Scholar
- Yap CW. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011;32(7):1466–74.View ArticlePubMedGoogle Scholar
- Chang C-C, Lin C-J. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol. 2011;2(3):27.View ArticleGoogle Scholar
- Xu JJ, Henstock PV, Dunn MC, Smith AR, Chabot JR, de Graaf D. Cellular imaging predictions of clinical drug-induced liver injury. Toxicol Sci. 2008;105(1):97–105.View ArticlePubMedGoogle Scholar
- Japanese Traditional Medicine and Therapeutics [https://kampo.ca/]
- Ji ZL, Zhou H, Wang JF, Han LY, Zheng CJ, Chen YZ. Traditional Chinese medicine information database. J Ethnopharmacol. 2006;103(3):501.View ArticlePubMedGoogle Scholar
- Xue R, Fang Z, Zhang M, Yi Z, Wen C, Shi T. TCMID: Traditional Chinese Medicine integrative database for herb molecular mechanism analysis. Nucleic Acids Res. 2013;41(Database issue):D1089–1095.View ArticlePubMedGoogle Scholar