TY - JOUR AU - Cao, Xi Hang AU - Stojkovic, Ivan AU - Obradovic, Zoran PY - 2016 DA - 2016/09/09 TI - A robust data scaling algorithm to improve classification accuracies in biomedical data JO - BMC Bioinformatics SP - 359 VL - 17 IS - 1 AB - Machine learning models have been adapted in biomedical research and practice for knowledge discovery and decision support. While mainstream biomedical informatics research focuses on developing more accurate models, the importance of data preprocessing draws less attention. We propose the Generalized Logistic (GL) algorithm that scales data uniformly to an appropriate interval by learning a generalized logistic function to fit the empirical cumulative distribution function of the data. The GL algorithm is simple yet effective; it is intrinsically robust to outliers, so it is particularly suitable for diagnostic/classification models in clinical/medical applications where the number of samples is usually small; it scales the data in a nonlinear fashion, which leads to potential improvement in accuracy. SN - 1471-2105 UR - https://doi.org/10.1186/s12859-016-1236-x DO - 10.1186/s12859-016-1236-x ID - Cao2016 ER -