Skip to main content

Computational determination of hERG-related cardiotoxicity of drug candidates



Drug candidates often cause an unwanted blockage of the potassium ion channel of the human ether-a-go-go-related gene (hERG). The blockage leads to long QT syndrome (LQTS), which is a severe life-threatening cardiac side effect. Therefore, a virtual screening method to predict drug-induced hERG-related cardiotoxicity could facilitate drug discovery by filtering out toxic drug candidates.


In this study, we generated a reliable hERG-related cardiotoxicity dataset composed of 2130 compounds, which were carried out under constant conditions. Based on our dataset, we developed a computational hERG-related cardiotoxicity prediction model. The neural network model achieved an area under the receiver operating characteristic curve (AUC) of 0.764, with an accuracy of 90.1%, a Matthews correlation coefficient (MCC) of 0.368, a sensitivity of 0.321, and a specificity of 0.967, when ten-fold cross-validation was performed. The model was further evaluated using ten drug compounds tested on guinea pigs and showed an accuracy of 80.0%, an MCC of 0.655, a sensitivity of 0.600, and a specificity of 1.000, which were better than the performances of existing hERG-toxicity prediction models.


The neural network model can predict hERG-related cardiotoxicity of chemical compounds with a high accuracy. Therefore, the model can be applied to virtual high-throughput screening for drug candidates that do not cause cardiotoxicity. The prediction tool is available as a web-tool at


Many drug candidates are withdrawn owing to unexpected side effects. Therefore, it is a major challenge to screen out potential toxic compounds in the drug discovery process. Cardiac toxicity is one of the side effects and a major cause of drug withdrawals in drug discovery. A representative mechanism of cardiotoxicity involves the binding of compounds to the cardiac potassium channel encoded by the human ether-a-go-go-related gene (hERG), which results in long QT syndrome (LQTS) and eventually leads to fatal ventricular arrhythmias and sudden death [1, 2]. Recently, many drugs, such as terfenadine, cisapride, astemizole, sertindole, thioridazine, and grepafloxacin, were withdrawn from the market owing to undesired cardiotoxicity effects [3]. The development of an accurate prediction model for hERG channel blockers is, therefore, essential in the early stage of drug development.

Experimental high-throughput screening methods have been developed [4], but experimental methods for drug-induced cardiotoxicity are time-consuming and costly. Thus, it is necessary to develop a computational approach to accelerate drug discovery. In recent years, several ligand-based in silico models have been developed to predict drug-hERG interactions based on the pharmacophore, quantitative structure-activity relationship (QSAR), and classification models [5,6,7,8].

The first pharmacophore model was developed based on steric and electronic features associated with the biological effects on hERG binding affinity using 15 compounds by Ekins et al. [9]. Because conventional pharmacophore models were generally developed using small training datasets of fewer than 500 [10, 11], their applicability was highly limited. Thus, ensemble models integrating diverse pharmacophore methods have also been developed for a better prediction of the hERG binding affinity [5, 12].

Three-dimensional (3D)-QSAR models based on 3D structure information, such as the molecular interaction fields, have been developed to predict the correlation between the 3D structure information and hERG binding affinity by regression analysis. Two representative methods used for 3D-QSAR modelling were the comparative molecular field analysis (CoMFA) [13] and grid-independent descriptors (GRIND) [14]. Both 3D-QSAR models exhibited a high performance in predicting the binding affinity for most compounds that were not lipophilic compounds [13, 15].

Classification models for toxicity prediction have been developed using a set of physicochemical descriptors. To improve prediction performance, various machine learning algorithms have been employed, including the support vector machine (SVM), naïve Bayes, decision tree, random forest, and k-nearest neighbors (kNN) [16,17,18,19]. The machine learning algorithms have facilitated the advancement of prediction model development, but the inclusion of inconsistent experimental data included in training datasets damps the development of accurate prediction models [20]. Available hERG toxicity datasets were compiled from the literature in which experiments were conducted under different conditions and the definition of toxicity was also different. To our knowledge, there are no large hERG toxicity datasets obtained from a single study. Recently, Czodrowski et al. developed a hERG toxicity prediction model using a large dataset containing 4415 compounds extracted from the ChEMBL database [20]; however, the model showed a low AUC value because of the inconsistency of the database. Because the hERG toxicity database was compiled from the literature, it included many inconsistent experimental data.

For this study, we generated a large experimental dataset of hERG assay results from 2130 chemicals, which were carried out under the same conditions. Similar to the ChEMBL hERG toxicity database, publicly available datasets were generally collected from the literature and may contain many inconsistent data. Such inconsistency may lead to inaccurate computational models. Our dataset was used to train machine learning models (linear regression, ridge regression, logistic regression, naïve Bayes, neural network, and random forest), and it was found that the model using the neural network showed a higher Matthews correlation coefficient (MCC) of 0.368, than the other models. In addition, when the neural network model was further evaluated using a test dataset of ten drug compounds obtained from in vivo experiments in this study, the model showed a high accuracy of 80% (MCC of 0.655). Therefore, the developed hERG-toxicity prediction model can be utilized as a virtual screening tool for the identification of the cardiotoxicity of drug candidates in the early stage of drug discovery.

Materials and methods

Binding assay for hERG based on fluorescence polarization

The fluorescence polarization (FP)-based binding assay for hERG was measured according to the protocol of the Predictor™ hERG FP kit (Thermo Fisher Scientific, Inc., Rockford, IL, USA). The membrane fraction containing the hERG channel protein (Predictor™ hERG membrane) and tracer (Predictor™ hERG tracer red) was prepared with dilution in the binding buffer provided by the manufacturer. The binding assay was conducted in a final volume of 20 μL with a 10 μL membrane, 5 μL of a 4 nM tracer, and 5 μL of test compounds. The assays were conducted in 384 well black flat-bottom microplates (Corning Life Sciences, Lowell, MA, USA). After incubation for 4 h at room temperature, the FP was determined using a multimode reader (Infinite M1000PRO; Tecan, Mannedorf, Switzerland) in the FP detection mode, with excitation and emission filters of 535 and 590 nm, respectively.

In vivo experimental procedures and recordings of electrocardiography

In this study, guinea pigs were used and fasted for 18 h prior to the experimental procedures. The animals were anesthetized with sodium pentobarbital (60 mg/kg, i.p.), followed by artificial respiration using a rodent ventilator (60 strokes/min, 1 ml/100 g BW). The animals were placed on a heat pad with circulating water at a temperature of 37 °C. A catheter was inserted into the jugular vein for drug administration, and electrocardiography (ECG) pin electrodes were positioned for the standard limb lead and chest lead configurations. All the animals were allowed to stabilize for 20 min after being instrumented, prior to drug administration. When the heart rate of each animal was constant, the lowest concentration of the drug was administered for 1 min through the jugular vein. After 10 min, the test drug at the following concentration was administered according to the cumulative method. The QRS complex and the PR, QT, PRC, and QRc intervals were measured with the ECG measurement yields, in addition to the heart rate, for the evaluation of the cardiac function. The values were expressed as the mean and standard deviations of each group. The data were analyzed using the one-way analysis of variance (ANOVA) followed by Dunnett’s test, to verify the significant differences between the groups.

Data preparation

The hERG toxicities of 2130 compounds were measured as IC50 values. Compounds with IC50 < 10 μM were classified as toxic and the other compounds were classified as nontoxic [19]. Consequently, 221 compounds (10.38%) were identified as hERG-toxic, and 1909 compounds (89.62%) were identified as nontoxic. The toxicities of ten drug compounds obtained from in vivo experiments, which were not included in the 2130 compounds, were used for testing our developed model.

Descriptor calculation

The compounds from the hERG toxicity assays were expressed in the simplified molecular-input line-entry system (SMILES) format [21], and the SMILES were used for the DRAGON software (version 7.0.10) to calculate their physicochemical descriptors and fingerprints (2432 nonconstant molecular descriptors) [22]. In addition, extended connectivity fingerprints (ECFPs) were also generated [23] with a maximum diameter parameter of 4 and length parameter of 1024. Thus, in this study, 3456 molecular features were used for the training of the learning models.

Feature correlation calculation and feature selection

To reduce the number of features in developing the prediction models, 3456 features were ranked in order of their correlation with toxicity. The phi coefficient was calculated for binary features [24], and the point-biserial correlation coefficient was calculated for continuous features [25].

To calculate the point-biserial correlation coefficient, the dataset was divided into toxic and nontoxic molecules. The point-biserial correlation coefficient (rpb) was calculated as follows:

$$ {\displaystyle \begin{array}{c}{r}_{pb}=\frac{M_{toxic}-{M}_{nontoxic}}{s_n}\sqrt{\frac{n_{toxic}\times {n}_{nontoxic}}{n^2}}\\ {}\mathrm{where}\ {s}_n=\sqrt{\frac{1}{n}\sum \limits_{i=1}^n{\left({X}_i-\overline{X}\right)}^2,}\end{array}} $$

Mtoxic and Mnontoxic denote the mean feature values of the toxic and nontoxic compounds, respectively. ntoxic and nnontoxic denote the numbers of toxic and nontoxic compounds, respectively, and n is the total number of molecules. sn denotes the standard deviation of the feature. Xi represents each feature value and \( \overline{X} \) denotes the mean value of all the feature values.

The phi coefficient () was calculated as below:

$$ \mathbf{\varnothing}=\frac{n_{toxic\bullet 1}\times {n}_{nonto xic\bullet 0}-{n}_{toxic\bullet 0}\times {n}_{nonto xic\bullet 1}}{\sqrt{\left({n}_{toxic\bullet 1}+{n}_{toxic\bullet 0}\right)\left({n}_{toxic\bullet 1}+{n}_{nonto xic\bullet 1}\right)\left({n}_{nonto\mathrm{x} ic\bullet 1}+{n}_{nonto xic\bullet 0}\right)\left({n}_{toxic\bullet 0}+{n}_{nonto xic\bullet 0}\right)}} $$

where ntoxic ∙ 1 and ntoxic ∙ 0 denote the number of features of toxic compounds, which are 1 and 0, respectively. nnontoxic ∙ 1 and nnontoxic ∙ 0 denote the number of features of nontoxic compounds, which are 1 and 0, respectively.


Six machine learning algorithms were used to construct the hERG toxicity prediction models. The linear regression is a simple regression algorithm that models the linear relationship between a dependent variable and multiple explanatory variables [26]. The ridge regression is an advanced linear regression model that introduces a ridge regularization method for the optimization of the model [27]. The logistic regression is a regression algorithm that models a logistic relationship, which can be used for binary classification [28]. A naïve Bayes is a probabilistic classification model based on the Bayesian theorem and the naïve independency between features [29]. A random forest is an ensemble model that constructs multiple decision trees and combines them to derive a merged result [30]. A neural network is a machine learning model that refers to a network structure composed of artificial neurons and nodes, which can optimize the network to recognize patterns of input data [31]. These algorithms were implemented in the Orange 3 Python machine learning package, and, in this study, Orange 3 was used to develop the hERG toxicity prediction models [32].

Performance evaluation

The six models trained with our dataset were evaluated by ten-fold cross-validation. In this process, the optimal number of features was also determined by the area under the receiver operating characteristic curve (AUC). Because the dataset was biased to nontoxic compounds, we also calculated the MCC that is an accuracy measure for unbalanced datasets. After the cross-validation and feature number optimization, the best model was determined. This model was further evaluated with ten drug compounds that were not included in the training dataset and were tested in vivo on guinea pigs to assess the applicability of our model developed using in vitro data to in vivo toxicity. The performance of our model was compared with other hERG prediction tools, the Pred-hERG 4.1 [6] and OCHEM Predictor [33].

Results and discussion

Model construction

Correlation coefficients between the features and toxicity were calculated and the top-ranked features were used to train models. The top 20 features are listed in Table 1. Computational hERG prediction models were trained using six different machine learning algorithms with a different number of top features. The six algorithms were linear regression, ridge regression, logistic regression, artificial neural network, naïve Bayes, and random forest. Their ten-fold cross-validation results and respective optimal feature numbers are shown in Fig. 1 and Table 2. Of the six models, those developed based on the neural network (AUC = 0.764, feature = 1400), ridge regression (AUC = 0.774, feature = 400), and logistic regression (AUC = 0.764, feature = 350) showed better performances than those of the other models. Because the performances of the three models were comparable, they were further optimized to determine the best model.

Table 1 Top 20 features with a high correlation
Fig. 1

AUC with respect to feature number: AUC values of the six models were measured by a ten-fold cross-validation with respect to feature number

Table 2 Performance (AUC) results of six machine learning methods

Model optimization

To select the best model, we optimized the threshold values of the three selected models, which discriminated toxic and nontoxic groups. The best threshold values that showed the highest MCC are listed in Table 3. MCC is an accuracy measure for highly unbalanced datasets. Of the three models, the neural network model showed the best performance, with an accuracy of 90.1%, an MCC of 0.368, and a positive predictive value (PPV) of 0.542 after threshold optimization. The low sensitivity and high specificity of the neural network model were due to its high threshold value, but the high threshold improved its performance expressed as MCC. Consequently, the toxicity prediction model based on the neural network was selected for further evaluation.

Table 3 Performance results of the top three models with optimized thresholds

Test of the constructed model on in vivo data

The optimized model was further tested on ten known drug molecules, whose cardiotoxicities were measured in vivo using guinea pigs. In vitro experiments are simpler and less expensive than in vivo experiments, hence, they can be carried out at a larger scale. However, owing to the complex physiology of in vivo systems, in vitro experimental results are often inconsistent with in vivo results. Thus, we further evaluated the applicability of our model that was trained using in vitro data to the in vivo toxicity. The prediction results of the test compounds are shown in Tables 4 and 5. Our model showed an overall accuracy of 80.0%, an MCC of 0.655, a sensitivity of 0.600, a specificity of 1.000, and a PPV of 1.000. This high performance indicates that our model could also be utilized to predict in vivo cardiotoxicity.

Table 4 Prediction results of ten drug compounds
Table 5 Performance comparison on the in vivo test dataset

Several computational methods have been reported for the prediction of hERG toxicity (Pred-hERG and OCHEM Predictor). We compared the performance of our model with previous methods; the prediction results of other methods are also listed in Table 5. The Pred-hERG model is a web-tool based on the statistical QSAR model of hERG channel blockers. OCHEM is also a web-tool based on eight associative neural network models. The prediction results of the ten test drug compounds using the previous methods, and their overall performances are listed in Tables 4 and 5, respectively. Pred-hERG has two models: binary and multiclass. The Pred-hERG binary model decides whether a query compound is a hERG-blocker or nonblocker. The Pred-hERG multiclass model determines the group in which a query compound belongs: nonblockers, weak/moderate blockers, or strong blockers. In this study, we considered weak/moderate and strong blockers as hERG-toxic. The binary model of the Pred-hERG predicted eight out of ten compounds as toxic molecules with an accuracy of 30%. Whereas the multiclass model of the Pred-hERG predicted nine out of ten compounds as nontoxic with an accuracy of 60%. Their MCC values were − 0.500 and 0.333, respectively. Similar to the multiclass model of the Pred-hERG, the OCHEM Predictor predicted nine out of ten compounds as nontoxic. Its accuracy and MCC were 60% and 0.333, respectively. The three previous models made biased predictions, resulting in a very low sensitivity or very low specificity (Table 5). Our model correctly predicted eight out of ten compounds with an accuracy of 80% and an MCC of 0.655, which indicates that our model outperforms other methods and would be useful for the prediction of the in vivo cardiotoxicity of drug candidates. It can also be used for virtual screening in drug discovery.

Additional comparison with previous models

Because in vivo cardiotoxicity assays require animal experiments, it is difficult to obtain a large number of in vivo data. Performance comparison with only ten compounds was not fair, so we evaluated the performances of previous methods using the training dataset containing 2130 compounds obtained from in vitro experiments. For a fair comparison, we divided the dataset into training (90%) and test (10%) datasets; the training data was used to build our model and the remaining test dataset was used to evaluate the performances of our model, the Pred-hERG, and OCHEM Predictor. The evaluation was iterated ten times, and their averages were calculated (Table 6). The MCC values of the previous models were lower than that of our model. Specifically, the Pred-hERG binary model showed an MCC of − 0.034, a sensitivity of 0.912, and a specificity of 0.061, indicating that this model classified most query molecules as toxic and had many false positives. This high number of false positives for the Pred-hERG binary model were also shown on the test dataset (Tables 4 and 5). On the contrary, the Pred-hERG multiclass and OCHEM Predictor showed a low sensitivity and a high specificity, indicating that they classified most query molecules as nontoxic. Because the dataset was highly unbalanced to negative (nontoxic) data, the biased predictions of the Pred-hERG multiclass and OCHEM Predictor to the nontoxic class increased the accuracy to 90.2 and 88.5% and decreased their MCCs to 0.218 and 0.133, respectively. Consequently, our model consistently showed a better performance for the small test dataset as well as on the training dataset.

Table 6 Performance comparison on the in vitro dataset


In this study, we aimed at producing a reliable hERG toxicity dataset and then at developing a better performing cardiotoxicity prediction model. Computational models are highly dependent on the reliability of datasets; however, the collected datasets from the literature may include inconsistent experimental results. We generated our own consistent dataset to build a model; the developed prediction model using our dataset outperformed the other hERG prediction tools. Our model can be useful for the virtual screening for potential drug candidates that do not cause cardiotoxicity and would facilitate the advancement of in silico drug discovery. However, in this study, new features and new machine learning methods were not introduced, so there is scope to improve our model further if new features specialized for describing the cardiotoxicity of molecules are included or new machine learning algorithms are used that efficiently and effectively classify molecules using the features.



Area under ROC curve


Extended connectivity fingerprint


Human ether-a-go-go-related gene


Long QT syndrome


Matthews correlation coefficient


Positive predictive value




Simplified molecular-input line-entry system




  1. 1.

    Tristani-Firouzi M, Chen J, Mitcheson JS, Sanguinetti MC. Molecular biology of K(+) channels and their role in cardiac arrhythmias. Am J Med. 2001;110(1):50–9.

    CAS  PubMed  Article  Google Scholar 

  2. 2.

    Sanguinetti MC, Tristani-Firouzi M. hERG potassium channels and cardiac arrhythmia. Nature. 2006;440(7083):463–9.

    CAS  PubMed  Article  Google Scholar 

  3. 3.

    Laverty H, Benson C, Cartwright E, Cross M, Garland C, Hammond T, et al. How can we improve our understanding of cardiovascular safety liabilities to develop safer medicines? Br J Pharmacol. 2011;163(4):675–93.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  4. 4.

    Polak S, Wisniowska B, Brandys J. Collation, assessment and analysis of literature in vitro data on hERG receptor blocking potency for subsequent modeling of drugs' cardiotoxic properties. J Appl Toxicol. 2009;29(3):183–206.

    CAS  PubMed  Article  Google Scholar 

  5. 5.

    Kratz JM, Schuster D, Edtbauer M, Saxena P, Mair CE, Kirchebner J, et al. Experimentally validated hERG pharmacophore models as cardiotoxicity prediction tools. J Chem Inf Model. 2014;54(10):2887–901.

    CAS  PubMed  Article  Google Scholar 

  6. 6.

    Braga RC, Alves VM, Silva MF, Muratov E, Fourches D, Liao LM, et al. Pred-hERG: a novel web-accessible computational tool for predicting cardiac toxicity. Mol Inform. 2015;34(10):698–701.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  7. 7.

    Chemi G, Gemma S, Campiani G, Brogi S, Butini S, Brindisi M. Computational tool for fast in silico evaluation of hERG K(+) channel affinity. Front Chem. 2017;5:7.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  8. 8.

    Munawar S, Windley MJ, Tse EG, Todd MH, Hill AP, Vandenberg JI, et al. Experimentally validated pharmacoinformatics approach to predict hERG inhibition potential of new chemical entities. Front Pharmacol. 2018;9:1035.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  9. 9.

    Ekins S, Crumb WJ, Sarazan RD, Wikel JH, Wrighton SA. Three-dimensional quantitative structure-activity relationship for inhibition of human ether-a-go-go-related gene potassium channel. J Pharmacol Exp Ther. 2002;301(2):427–34.

    CAS  PubMed  Article  Google Scholar 

  10. 10.

    Aronov AM. Common pharmacophores for uncharged human ether-a-go-go-related gene (hERG) blockers. J Med Chem. 2006;49(23):6917–21.

    CAS  PubMed  Article  Google Scholar 

  11. 11.

    Jing Y, Easter A, Peters D, Kim N, Enyedy IJ. In silico prediction of hERG inhibition. Future Med Chem. 2015;7(5):571–86.

    CAS  PubMed  Article  Google Scholar 

  12. 12.

    Tan Y, Chen Y, You Q, Sun H, Li M. Predicting the potency of hERG K(+) channel inhibition by combining 3D-QSAR pharmacophore and 2D-QSAR models. J Mol Model. 2012;18(3):1023–36.

    CAS  PubMed  Article  Google Scholar 

  13. 13.

    Cavalli A, Poluzzi E, De Ponti F, Recanatini M. Toward a pharmacophore for drugs inducing the long QT syndrome: insights from a CoMFA study of HERG K(+) channel blockers. J Med Chem. 2002;45(18):3844–53.

    CAS  PubMed  Article  Google Scholar 

  14. 14.

    Carosati E, Lemoine H, Spogli R, Grittner D, Mannhold R, Tabarrini O, et al. Binding studies and GRIND/ALMOND-based 3D QSAR analysis of benzothiazine type K(ATP)-channel openers. Bioorg Med Chem. 2005;13(19):5581–91.

    CAS  PubMed  Article  Google Scholar 

  15. 15.

    Ermondi G, Visentin S, Caron G. GRIND-based 3D-QSAR and CoMFA to investigate topics dominated by hydrophobic interactions: the case of hERG K+ channel blockers. Eur J Med Chem. 2009;44(5):1926–32.

    CAS  PubMed  Article  Google Scholar 

  16. 16.

    Jia L, Sun H. Support vector machines classification of hERG liabilities based on atom types. Bioorg Med Chem. 2008;16(11):6252–60.

    CAS  PubMed  Article  Google Scholar 

  17. 17.

    Wang S, Li Y, Wang J, Chen L, Zhang L, Yu H, et al. ADMET evaluation in drug discovery. 12. Development of binary classification models for prediction of hERG potassium channel blockage. Mol Pharm. 2012;9(4):996–1010.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  18. 18.

    Le Guennec JY, Thireau J, Ouille A, Roussel J, Roy J, Richard S, et al. Inter-individual variability and modeling of electrical activity: a possible new approach to explore cardiac safety? Sci Rep. 2016;6:37948.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  19. 19.

    Thai KM, Ecker GF. A binary QSAR model for classification of hERG potassium channel blockers. Bioorg Med Chem. 2008;16(7):4107–19.

    CAS  PubMed  Article  Google Scholar 

  20. 20.

    Czodrowski P. hERG me out. J Chem Inf Model. 2013;53(9):2240–51.

    CAS  PubMed  Article  Google Scholar 

  21. 21.

    Weininger D. Smiles, a chemical language and information-system .1. Introduction to methodology and encoding rules. J Chem Inf Comp Sci 1988;28(1):31–36.

    CAS  Article  Google Scholar 

  22. 22.

    Mauri A, Consonni V, Pavan M, Todeschini R. Dragon software: an easy approach to molecular descriptor calculations. Match-Commun Math Co. 2006;56(2):237–48.

    CAS  Google Scholar 

  23. 23.

    Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742–54.

    CAS  PubMed  Article  Google Scholar 

  24. 24.

    Cox DR, Wermuth N. A comment on the coefficient of determination for binary responses. Am Stat. 1992;46(1):1–4.

    Google Scholar 

  25. 25.

    Tate RF. Correlation between a discrete and a continuous variable. Point-biserial correlation. Ann Math Stat. 1954;25(3):603–7.

    Article  Google Scholar 

  26. 26.

    Kutner MH. Applied linear statistical models. 5th ed. Boston: McGraw-Hill Irwin; 2005. xxviii, p. 1396.

  27. 27.

    Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67.

    Article  Google Scholar 

  28. 28.

    Carey V, Zeger SL, Diggle P. Modelling multivariate binary data with alternating logistic regressions. Biometrika. 1993;80(3):517–26.

    Article  Google Scholar 

  29. 29.

    Yousef M, Nebozhyn M, Shatkay H, Kanterakis S, Showe LC, Showe MK. Combining multi-species genomic data for microRNA identification using a Naïve Bayes classifier. Bioinformatics. 2006;22(11):1325–34.

    CAS  PubMed  Article  Google Scholar 

  30. 30.

    Boulesteix AL, Janitza S, Kruppa J, Konig IR. Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wires Data Min Knowl. 2012;2(6):493–507.

    Article  Google Scholar 

  31. 31.

    Wang YH, Li Y, Yang SL, Yang L. An in silico approach for screening flavonoids as P-glycoprotein inhibitors based on a Bayesian-regularized neural network. J Comput Aided Mol Des. 2005;19(3):137–47.

    CAS  PubMed  Article  Google Scholar 

  32. 32.

    Demsar J, Curk T, Erjavec A, Gorup C, Hocevar T, Milutinovic M, et al. Orange: data mining toolbox in python. J Mach Learn Res. 2013;14:2349–53.

    Google Scholar 

  33. 33.

    Li X, Zhang Y, Li H, Zhao Y. Modeling of the hERG K+ channel blockage using online chemical database and modeling environment (OCHEM). Mol Inform. 2017;36(12).

    Article  CAS  Google Scholar 

Download references


Not applicable.


This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2018R1A5A1025077). This work was also supported by the Bio-Synergy Research Project (NRF-2018M3A9C4076474) of the Ministry of Science, ICT, and Future Planning through the National Research Foundation. Publication costs are funded by the grant (NRF-2018R1A5A1025077).

Availability of data and materials

The datasets supporting the conclusions of this article are available from the corresponding author upon request.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 20 Supplement 10, 2019: Proceedings of the 12th International Workshop on Data and Text Mining in Biomedical Informatics (DTMBIO 2018). The full contents of the supplement are available online at

Author information




HL and MY developed the prediction models and conducted the evaluations. SK, SO, HC, and KR prepared the training and test datasets and their features. MB, BL, DS, and KO conducted the hERG-related toxicity assays. DL and DN supervised the study. All the authors have read and approved the manuscript.

Corresponding authors

Correspondence to Donghyun Lee or Dokyun Na.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that there are no conflicts of interest.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lee, HM., Yu, MS., Kazmi, S.R. et al. Computational determination of hERG-related cardiotoxicity of drug candidates. BMC Bioinformatics 20, 250 (2019).

Download citation


  • In silico model
  • Machine learning
  • hERG-related cardiotoxicity
  • Drug discovery