Skip to main content

Prediction of vancomycin initial dosage using artificial intelligence models applying ensemble strategy



Antibiotic resistance has become a global concern. Vancomycin is known as the last line of antibiotics, but its treatment index is narrow. Therefore, clinical dosing decisions must be made with the utmost care; such decisions are said to be “suitable” only when both “efficacy” and “safety” are considered. This study presents a model, namely the “ensemble strategy model,” to predict the suitability of vancomycin regimens. The experimental data consisted of 2141 “suitable” and “unsuitable” patients tagged with a vancomycin regimen, including six diagnostic input attributes (sex, age, weight, serum creatinine, dosing interval, and total daily dose), and the dataset was normalized into a training dataset, a validation dataset, and a test dataset. AdaBoost.M1, Bagging, fastAdaboost, Neyman–Pearson, and Stacking were used for model training. The “ensemble strategy concept” was then used to arrive at the final decision by voting to build a model for predicting the suitability of vancomycin treatment regimens.


The results of the tenfold cross-validation showed that the average accuracy of the proposed “ensemble strategy model” was 86.51% with a standard deviation of 0.006, and it was robust. In addition, the experimental results of the test dataset revealed that the accuracy, sensitivity, and specificity of the proposed method were 87.54%, 89.25%, and 85.19%, respectively. The accuracy of the five algorithms ranged from 81 to 86%, the sensitivity from 81 to 92%, and the specificity from 77 to 88%. Thus, the experimental results suggest that the model proposed in this study has high accuracy, high sensitivity, and high specificity.


The “ensemble strategy model” can be used as a reference for the determination of vancomycin doses in clinical treatment.


Antimicrobial resistance (AMR) has become a major global concern. The World Economic Forum has stated that “arguably the greatest risk… to human health comes in the form of antibiotic-resistant bacteria.” [1]. In 2014, the World Health Organization conducted a global study on drug resistance with data from 114 countries, which confirmed that drug resistance is a global crisis [2]. According to recent statistics, by 2050, drug-resistant infectious diseases will kill more people than cancers do [3]. In particular, groups with a high risk of infection, such as elderly patients and those with cancer, require high doses of antibiotics and prolonged treatment, which lead to antibiotic-resistance.

Antibiotics are being developed at a much slower rate than the growth rate of drug-resistant bacteria. The number of antimicrobial agents approved for marketing by the U.S. Food and Drug Administration has been declining from 1983 to 2011. No new antimicrobial agents have been introduced in the last 20 years. Therefore, there are likely to be no antimicrobial agents with new mechanisms available for clinical use for a long period. Consequently, it is essential to adopt appropriate dose control of antimicrobial agents for medical and animal use.

Vancomycin is currently classified as a third-line antibiotic by the Ministry of Health and Welfare in Taiwan, and the higher the level, the greater the risk. Vancomycin is often used to treat severe infections in which all other antibiotics are ineffective and is also known as the “last line of drugs.” The use of vancomycin is strictly limited. Thus, it is a serious problem if patients develop resistance to the drug. Moreover, owing to the narrow treatment index of vancomycin, there is a risk of toxicity, and patients may develop adverse effects such as nephrotoxicity and ototoxicity [4], allergic reactions, renal impairment, or even cardiac arrest [5].

There are many factors affecting the administration of vancomycin with individual variations, such as the renal function, body condition, and hypoproteinemia of the patient [6]. Most patients receiving antibiotics already have serious infections, and the renal damage caused by the drug or a wrong drug administration decision may promote drug resistance. The daily dose and dose interval of vancomycin as well as the use of therapeutic drug monitoring (TDM) in combination with individualized clinical dosing are extremely important.

Trough concentrations are commonly used in the indicator monitoring of vancomycin. To alleviate the risk of nephrotoxicity, it is best to maintain the trough concentration of vancomycin between 10 and 20 mg/L [7]. In the past, clinicians relied on nomograms [8] or pharmacokinetics for dose adjustment in vancomycin therapy [9], but these methods may not be effective. Therefore, in 2007, Hu employed the decision tree induction algorithm C4.5 and back-propagation network to construct a decision support system (i.e., trough and peak concentrations) [4].

However, it is more important for the clinicians to predict the suitability of the dose instructions given in a treatment regimen than to predict the blood drug concentrations—the trough and peak concentrations. Therefore, in recent years, studies have leaned toward predicting the suitability of vancomycin treatment regimens, of which, several investigations have used group pharmacokinetic software. For example, Xu et al. [10] used dose calculation to predict the trough concentrations for obtaining stability, while Nunn et al. [11] used studies to verify the accuracy of the software. In addition, Leu et al. [12] used a conventional nomogram for dosing to control the trough concentration of vancomycin. Finally, Xu et al. [10] studied past data using subgroups with and without pharmacist intervention to ascertain the presence of differences.

Since 1990, after Schapire [13] proposed Bayesian Averaging Ensemble Learning, the research on ensemble learning has been gaining attention. The concept is to construct a model by combining multiple learning algorithms. Such a model usually has superior predictive power than the individual algorithms. The approach is called ensemble learning because it is mostly a combination of basic learning algorithms. However, few studies have used this strategy to predict the suitability of vancomycin decision regimens. Only Hu et al. [4] used decision trees and bagging to build a model; however, the accuracy rate was only about 60%. Ho et al. [14] proposed the use of genetic algorithms and improved the Taguchi algorithm to predict the suitability of the vancomycin decision, which had an accuracy rate of 87.5%. In this study, the ensemble strategy was used to model five ensemble learning algorithms, and the results were filtered by voting.

The patient data of this study, which contained six input variables and one output variable, were trained using the package R-STUDIO version 3.6.0 to implement five algorithms: AdaBoost.M1 [15], Bagging [16], fastAdaboost, Neyman–Pearson [17] and Stacking [18, 19]. The results of these five algorithms were filtered using the majority rule to establish an ensemble strategy to predict the suitability of the initial dosing decision for vancomycin. The results indicated that this approach outperformed the previous studies in terms of the measurement indicators. In addition, with the shortened duration of treatment and reduced unnecessary risks, it further helped clinicians evaluate the initial dose more carefully to enhance the safety and efficacy of drug administration.

Results and discussion

In this study, an ensemble strategy was proposed to predict the suitability of the vancomycin dosing regimen. The experimental results containing three measurement indicators—accuracy, sensitivity, and specificity—are shown in Table 1.

Table 1 Experimental results of the three measurement indicators

The experimental results demonstrated that the accuracy of the testing set models of the five algorithms ranged from 81.93 to 87.23%. Furthermore, the accuracy of the rebuilt models after filtering by voting increased to 87.54%. The sensitivity was second only to the AdaBoosting.M1 algorithm, and the specificity was second only to the Stacking algorithm. The confusion matrix in Table 2 further suggested that the false-positive rate was only 12%. As this study was aimed at helping the clinicians to predict the suitability of vancomycin dosing decisions, it is desirable to have a low false-positivity rate.

Table 2 Confusion matrix of the ensemble strategy model

The results of the measurement indicators signified that the ensemble strategy proposed in this study had a good performance. The ROC curve (Fig. 1) and the AUC values (Table 3) are as follows:

Fig. 1
figure 1

Receiver operator characteristic curve: the ROC curve of our proposed ensemble strategy converges most smoothly

Table 3 Area under the ROC curve (AUC) values

The data in Table 5 show that the ensemble strategy has the highest AUC value, which is close to 1.0, indicating the authenticity of this detection to an extent.

Antibiotics are being developed slowly, but high doses of antibiotics and prolonged treatment often lead to antibiotic-resistance, then to kill more and more people. Therefore, it is essential to adopt appropriate dose control of antimicrobial agents for medical use. Vancomycin is known as the "last line of drugs" in Taiwan. The daily dose and dose interval of vancomycin are extremely important. Some studies tried to predict the suitability of the dose instructions given in a treatment regimen. But most results' accuracies are not good, except Ho et al. [14]. In this study, the ensemble strategy was proposed to predict the suitability of the vancomycin decision by voting five ensemble learning algorithms. The experimental results show that our proposed strategy has the highest accuracy and is robust, while taking into account the advantages of high sensitivity and high specificity, in addition with low false positives and the highest AUC value.


The model developed in this study is expected to assist physicians in initial dosing decisions and serve as a reference tool in clinical decision-making. As vancomycin is often administered to patients after the initial dose and then adjusted based on the trough concentration of the drug in the subsequent blood test report, the ensemble strategy is expected to enhance the safety and effectiveness of the dosing decisions. Therefore, if the initial dosing decision is optimized, the treatment duration can be shortened and the risk of drug resistance and toxicity can be reduced.

The experimental results have showed that the model proposed in this study has the highest accuracy with excellent sensitivity, specificity, and low false-positivity, which proved that the ensemble strategy approach recommended in this study is worth adopting.


Implementing five algorithms

In this study, the “adabag” package of R-STUDIO version 3.6.0 was used to implement the AdaBoost.M1 and Bagging algorithms, the “fastAdaboost” package to implement the fastAdaboost algorithm, the “nproc” package to implement the Neymain-Pearson algorithm, and the “SuperLearner” package to implement the Stacking algorithm. Bagging used a classification tree as a single classifier; the iterated logarithm (mfinal) of fastAdaboost was set to the best possible 50 iterations; Neyman–Pearson used Random Forest as the basic classification method, and the acceptable statistical Type I error was set to 0.05; Stacking also used Random Forest as the basic classification method, and the family parameter was set to Binomial.

The “ensemble strategy concept” was used to predict the suitability of the vancomycin decision after filtering by (majority rule) voting above-mentioned five ensemble learning algorithms, including AdaBoost.M1, Bagging, Boosting, fastAdaboost, Neymain-Pearson and Stacking algorithms. Figure 2 depicts the architecture of the “ensemble strategy concept.” The input features are gender, age, weight, serum creatinine (SCR), dosing interval, total daily dose of medication, and one output regimen category label to indicate if suitable vancomycin dosing or not.

Fig. 2
figure 2

The architecture of the “ensemble strategy concept”: 6 features were input to 5 machine learning algorithms, then the output regimen category label was predicted by the voting scheme

Measurement indicators

To understand the generalization ability of the model, three performance measurement indicators were used, namely, accuracy, sensitivity (also known as true positive and recall), and specificity, which were calculated using Eqs. (13). In addition, the receiver operating characteristic (ROC) curve was plotted, and the area under the ROC curve (AUC) values were calculated to understand the model optimization via the value of the indicators.

$${\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{FP}} + {\text{FN}} + {\text{TN}}}},$$
$${\text{Sensitivity}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}},$$
$${\text{Specificity}} = \frac{{{\text{TN}}}}{{{\text{FP}} + {\text{TN}}}},$$

where TP stands for true positive, which is predicted to be suitable and actually suitable; TN stands for true negative, which is predicted to be unsuitable and actually unsuitable; FP stands for false positive, which is predicted to be suitable but actually unsuitable, and is also a statistical Type I error; FN stands for false negative, which is predicted to be unsuitable but actually suitable and is also a statistical Type II error. Thus, accuracy represents the number of people who were correctly determined as suitable out of all decisions; sensitivity is the percentage of decisions that were successfully predicted to be suitable in cases where they were actually suitable; and specificity is the percentage of decisions that were successfully predicted to be unsuitable in cases where they were actually unsuitable.

The ROC curve is an analytical tool with coordinate plots and is the simplest and most intuitive observation method to analyze clinical accuracy. The plot can be used to make direct judgments from the curves [20]. The vertical coordinates denote the true positive rate (sensitivity) and the horizontal coordinates denote the false-positive rate (1-specificity), reflecting the relationship between the specificity and sensitivity of an analytical method. The diagonal line is the reference line. If the ROC curve, as the testing tool, is located exactly on the diagonal reference line, it means that the testing tool is not discriminative in terms of the prediction. If the ROC curve moves to the upper left, the tool is more sensitive to prediction and the false-positive rate is lower, i.e., the tool has better discriminative power. The point closest to the upper left corner (0, 1) is the cutoff point with the least misclassification, where the sensitivity is the largest, and the false-positive rate (1-specificity) is the smallest [21, 22].

The AUC is the area under the ROC curve, and the value usually ranges from 0.5 to 1. Thus, the higher the AUC value, the better it is, and the closer it is to 1.0, the higher its truthfulness [23].


The data collection period was from 2011 to 2018; 2141 data were collected, including factors related to antibiotic dose decisions (variables), as shown in Table 4. The selection of these variables was the same as those in studies performed by Hu et al. [4] and Ho et al. [14]. The database was reviewed and approved by the Human Investigation Committee of the Kaohsiung Medical University Hospital (KMHIRB-E(I)-20190364). The inclusion criteria were: the patients were hospitalized at the hospital of the Kaohsiung Medical University system, and the hospitalization report matched one of the following vancomycin health insurance drug codes: B018156277, AC41443277, A041443277, AC37290277, AC575743277, AC37290277, AC37290277, AC37290277, AC57430277, AC5743277. AC37290277, AC57286277, BC17742277, and BB17742277.

Table 4 Definition of operating variables

The recommended doses of antimicrobial agents have been mostly studied in younger age groups. However, the recommended doses for the elderly, newborns, and children should be adjusted [7], especially since the vancomycin dose for children is far different from that for adults. The latest version of the revised consensus on methicillin-resistant Staphylococcus aureus infection also pointed out the differences in doses for children and adults [7]. Thus, the present study excluded infants from the study population.

Since obese patients have more fat distribution, the dose calculation should be evaluated carefully based on the real bodyweight or the corrected body weight according to the lipophilicity of the drug. In addition, clinicians usually determine the initial dose based on the patient's infection status, body weight, and renal function. The dose is calculated based on the patient's body weight for drugs with a narrow treatment range. Therefore, body weight was one of the important factors examined in this study.

Serum creatinine (SCR) is derived from the decomposition of serum creatinine due to normal muscle activity. SCR would be filtered from the blood excreted with urine in people having normal kidney functions, where the kidney is responsible for more than 90% of creatinine metabolism. As a result, SCR can be used as an indicator to monitor kidney functions. The normal value of SCR varies with gender. Thus, gender was also included as an input value.

Ultimately, the dependent category variable of Table 4—“suitability of vancomycin dosing regimen” is classified according to the value of trough-based vancomycin TDM. The trough concentration of the drug in the blood test report is better between 10 and 20 mg/L [7].

Descriptive statistics of the datasets

The descriptive statistics about the datasets are shown in Table 5.

Table 5 Descriptive statistics of the dataset

Pre-processing and partitioning of data

As the differences in the units of data may affect the results, this study scaled the data equal to the interval of 0–1 using the Min–Max scaling formula and thus improved the rate of convergence and model accuracy [24].

$$X_{nom} = \frac{{X - X_{min} }}{{X_{max} - X_{min} }},$$

where \(X_{nom}\) represents the result of data normalization, \(X_{min}\) represents the minimum value in the data, and \(X_{max}\) represents the maximum value.

After pre-processing, 85% of the data were randomly partitioned for model training, where 80% were used as the training dataset, 20% as the validation dataset, and the remaining 15% as the test dataset to validate the model. To ensure the evaluation of the model fitting performance, tenfold cross-validation was used in this study.

Availability of data and materials

The data that support the findings of this study are available from Kaohsiung Medical University Hospital but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Kaohsiung Medical University Hospital.



Antimicrobial resistance


Therapeutic drug monitoring


Receiver operating characteristic


Area under the ROC curve


Serum creatinine


  1. Howell L, editor. World Economic Forum. Global risks 2013. 8th edition: an initiative of the Risk Response Network; 2013.

  2. WHO. 2021 Antimicrobial resistance global report on surveillance: 2014 summary. Accessed 25-Aug-2021.

  3. BBC. 2021. Superbugs to kill 'more than cancer' by 2050. Accessed 25 Aug 2021.

  4. Hu PJ, Wei CP, Cheng TH, Chen JX. Predicting adequacy of vancomycin regimens: a learning-based classification approach to improving clinical decision making. Decis Support Syst. 2007;43:1226–41.

    Article  Google Scholar 

  5. Arimura Y, Yano T, Hirano M, Sakamoto Y, Egashira N, Oishi R. Mitochondrial superoxide production contributes to vancomycin-induced renal tubular cell apoptosis. Free Radic Biol Med. 2012;5:1865–73.

    Article  Google Scholar 

  6. Revilla N, Martín-Suárez A, Pérez MP, González FM, de Gatta MDF. Vancomycin dosing assessment in intensive care unit patients based on a population pharmacokinetic/pharmacodynamic simulation. Br J Clin Pharmacol. 2010;70:201–12.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Rybak MJ, Le J, Lodise TP, Levine DP, Bradley JS, Liu C, Mueller BA, Pai MP, Wong-Beringer A, Rotschafer JC, Rodvold KA. Therapeutic monitoring of vancomycin for serious methicillin-resistant Staphylococcus aureus infections: a revised consensus guideline and review by the American Society of Health-System Pharmacists, the Infectious Diseases Society of America, the Pediatric Infectious Diseases Society, and the Society of Infectious Diseases Pharmacists. Clin Infec Dis. 2020;71:1361–4.

    Article  CAS  Google Scholar 

  8. Moellering RC, Krogstad DJ, Greenblatt DJ. Vancomycin therapy in patients with impaired renal function: a nomogram for dosage. Ann Intern Med. 1981;94:343–6.

    Article  PubMed  Google Scholar 

  9. Winter ME. Basic clinical pharmacokinetics. Philadelphia: Lippincott Williams and Wilkins; 2003.

    Google Scholar 

  10. Xu G, Chen E, Mao E, Che Z, He J. Research of optimal dosing regimens and therapeutic drug monitoring for vancomycin by clinical pharmacists: analysis of 7-year data. Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2018;30:640–5.

    PubMed  Google Scholar 

  11. Nunn MO, Corallo CE, Aubron C, Poole S, Dooley MJ, Cheng AC. Vancomycin dosing: assessment of time to therapeutic concentration and predictive accuracy of pharmacokinetic modeling software. Ann Pharmacother. 2011;45:757–63.

    Article  CAS  PubMed  Google Scholar 

  12. Leu WJ, Liu YC, Wang HW, Chien HY, Liu HP, Lin YM. Evaluation of a vancomycin dosing nomogram in achieving high target trough concentrations in Taiwanese patients. Int J Infect Dis. 2012;16:e804–10.

    Article  CAS  PubMed  Google Scholar 

  13. Schapire RE. The strength of weak learnability. Mach Learn. 1990;5:197–227.

    Article  Google Scholar 

  14. Ho WH, Chen JX, Lee IN, Su HC. An ANFIS-based model for predicting adequacy of vancomycin regimen using improved genetic algorithm. Expert Syst Appl. 2011;38:13050–6.

    Article  Google Scholar 

  15. Freund Y, Schapire RE. Experiments with a new boosting algorithm. In: 13th international conference; 1996. pp. 148–156.

  16. Alfaro E, Gamez M, Garcia N. adabag: an R package for classification with boosting and bagging. J Stat Softw. 2013;54:1–35.

    Article  Google Scholar 

  17. Van Trees HL. Detection estimation and modulation theory, part I: detection, estimation, and filtering theory. 2nd ed. Hoboken: Wiley; 2013.

    Google Scholar 

  18. Džeroski S, Ženko B. Is combining classifiers with stacking better than selecting the best one? Mach Learn. 2004;5:255–73.

    Article  Google Scholar 

  19. Zenko B, Todorovski L, Dzeroski S. A comparison of stacking with meta decision trees to bagging, boosting, and stacking with other methods. In: Proceedings 2001 IEEE international conference on data mining 2001; vol. 29, pp. 669–670. IEEE.

  20. Zweig MH, Campbell G. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem. 1993;39:561–77.

    Article  CAS  PubMed  Google Scholar 

  21. Spackman KA. Signal detection theory: valuable tools for evaluating inductive learning. In: Proceedings of the sixth international workshop on machine learning; 1989. pp.160–163. Morgan Kaufmann.

  22. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36.

    Article  CAS  PubMed  Google Scholar 

  23. Brownlee J. 2018. Machine Learning Mastery. Accessed 25 Aug 2021.

  24. Clare L. 2020. KDnuggets. Accessed 25 Aug 2021.

Download references


Not applicable.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 22 Supplement 5 2021: Proceedings of the International Conference on Biomedical Engineering Innovation (ICBEI) 2019-2020. The full contents of the supplement are available at


Publication costs are funded by the Ministry of Science and Technology, Taiwan, under grants MOST 110-2221-E-037-005 and MOST 110-2410-H-037-001. The design and part writing costs of the study are funded by NKUST-KMU JOINT RESEARCH PROJECT (#NKUSTKMU-110-KK-002) and the “Intelligent Manufacturing Research Center” (iMRC) from the Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan.

Author information

Authors and Affiliations



W-HH, T-HH and YJC contributed equally to the algorithm design and theoretical analysis. L-YZ, FFL and Y-CL contributed equally to the quality control and document reviewing. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Fen-Fen Liao or Yeong-Cheng Liou.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ho, WH., Huang, TH., Chen, Y.J. et al. Prediction of vancomycin initial dosage using artificial intelligence models applying ensemble strategy. BMC Bioinformatics 22 (Suppl 5), 637 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: