An interpretable rule-based diagnostic classification of diabetic nephropathy among type 2 diabetes patients
© Huang et al.; licensee BioMed Central Ltd. 2015
Published: 21 January 2015
The prevalence of type 2 diabetes is increasing at an alarming rate. Various complications are associated with type 2 diabetes, with diabetic nephropathy being the leading cause of renal failure among diabetics. Often, when patients are diagnosed with diabetic nephropathy, their renal functions have already been significantly damaged. Therefore, a risk prediction tool may be beneficial for the implementation of early treatment and prevention.
In the present study, we developed a decision tree-based model integrating genetic and clinical features in a gender-specific classification for the identification of diabetic nephropathy among type 2 diabetic patients. Clinical and genotyping data were obtained from a previous genetic association study involving 345 type 2 diabetic patients (185 with diabetic nephropathy and 160 without diabetic nephropathy). Using a five-fold cross-validation approach, the performance of using clinical or genetic features alone in various classifiers (decision tree, random forest, Naïve Bayes, and support vector machine) was compared with that of utilizing a combination of attributes. The inclusion of genetic features and the implementation of an additional gender-based rule yielded better classification results.
The current model supports the notion that genes and gender are contributing factors of diabetic nephropathy. Further refinement of the proposed approach has the potential to facilitate the early identification of diabetic nephropathy and the development of more efficient treatment in a clinical setting.
Diabetes is a metabolic disorder characterized by the inability of the pancreas to produce enough insulin or the body's lack of ability to effectively use insulin. The disease is contributed by multiple factors, including diet, lifestyle, and genes. Despite the advancement in healthcare, the prevalence of diabetes is still on the rise. More than 150 million people worldwide are affected with this debilitating disease . Thus, the surveillance, prevention, and control of diabetes and its complications are becoming increasingly important.
Diabetes can result in various complications, damaging the heart, blood vessels, eyes, kidneys, and nerves. As one of the major microvascular complications of diabetes, diabetic nephropathy affects about 30% of the people with type 1 diabetes (T1D), and 25-40% of the people with type 2 diabetes (T2D) . DN is characterized by the persistent elevated levels of albumin in the urine, progressive decline in the glomerular filtration rate, and increased arterial blood pressure. In fact, DN is among the leading causes of end stage renal disease (ESRD), imposing serious impact on morbidity, mortality and the patients' quality of life . Compared to non-diabetics, the likelihood of dying from renal disease is 17 times greater for diabetics .
Generally, DN symptoms are not obvious. In T1D patients, DN develops over an initial 10 to 15 years of diabetes; whereas in T2D patients, the onset is less clearly defined . When clinical indices for renal functions (e.g. urinary protein level) become abnormal, the kidneys have already been significantly damaged and prevention against ESRD may be too late at this stage . Thus, it would be beneficial to develop a prediction model utilizing various clinical measures, such as gender, history of diabetes, body mass index (BMI), etc. In addition, since existing studies suggest that genes and gender play specific and significant roles in DN [3, 6, 7], integrating genetic and gender information in the prediction of DN susceptibility may lead to more effective prevention or treatment.
To date, two groups have attempted to construct prediction models for DN. Cho et al. (2007) utilized a support vector machine (SVM) approach to classify DN patients among type 2 diabetics, but the model was trained on an irregular and unbalanced dataset . On the other hand, Leung et al. compared various machine learning and statistical methods to develop a prediction strategy for the identifcation of genotype-phenotype risk patterns in DN . Combining genotype and clinical information as features for classification through SVM and random forest approaches was found to generate the best performance . However, as diabetes and DN are caused by multiple factors [1, 3, 7], a model designed for one community may not be applicable to another. Moreover, prediction results generated from computational algorithms may be difficult for clinicians to explain to patients regarding their risk of developing DN and convince the patients to take the necessary preventative measures. Therefore, a straightforward and user-friendly tool is more practical in a clinical setting.
Previously, Wu et al. [10, 11] have conducted a candidate gene analysis on 345 T2D patients, analyzing the association of 20 candidate genes with T2D, as well as the related complications such as obesity and DN. Through the generalized multi-dimensional reduction approach, Wu et al. have identified various gene-gene interactions that may represent genetic susceptibilities to T2D, obesity, and DN [10, 11]. The findings suggest that T2D and DN may be attributable to multiple factors that include the interactions between specific genes and the environment.
In the present study, we employed a gender-based rule in a decision tree approach to identify the risk for DN among T2D patients using the data provided by Wu et al [10, 11]. Similar to Leung et al.'s finding (2013), our results indicate that the integration of clinical and genetic features yielded better performance in distinguishing DN from non-DN diabetic patients . Furthermore, the nature of decision tree classification may help simplify interpretation of the results. This easy-to-understand prediction tool may be beneficial for the early detection of T2D patients susceptible to DN.
Performance comparison between individual and combinations of clinical features
The top five best performing clinical features for diabetic nephropathy classification.
Performance of utilizing variable numbers of clinical features in different classifiers to predict diabetic nephropathy.
No. of features
Performance comparison between individual and combinations of genetic features
The top five best performing genetic features for diabetic nephropathy classification.
Performance of utilizing variable numbers of genetic features in different classifiers to predict diabetic nephropathy.
No. of features
Performance evaluation on the integration of clinical and genetic features for DN classification
Performance of utilizing variable numbers of genetic and clinical features to predict diabetic nephropathy.
No. of features
In particular, when 25 of the 32 features were utilized for classification, random forest appeared to generate the best results. Decision tree seemed to be the second best classifier, when features including serum triglyceride, ADRB2, ENPP1 (ectonucleotide pyrophosphatase/phosphodiesterase 1), gender, fasting plasma glucose, TCF7L2 (transcription factor 7-like 2), ADIPOQ (adiponectin, C1Q and collagen domain containing protein) were employed for classification. This approach appeared to outperform previous attempts using clinical or genetic features separately. Yet, the classification performance was still much lower compared to renal function markers such as urinary albumin, BC and BUN (Table 1).
Optimization of DN classification performance by the implementation of a gender-based rule
Based on these results, we decided that decision tree and random forest worked best with our data. However, our objective was to construct an interpretable classification model that can be easily applied in a clinical setting. In this case, decision tree holds an advantage over random forest as the former classifies individuals among a range of numerical and categorical features, following certain set rules in a way that is similar to human decision making. The latter, on the other hand, is not as easy to interpret because trees are arbitrarily added to the forest. Therefore, we focused on using decision tree for subsequent classification attempts.
Performance of gender-based decision tree classification of diabetic nephropathy in the training dataset.
1. Female, BMI < 24
Fasting plasma glucose
2. Female, BMI > 24
3. Male, history<11(years)
4. Male, 11<history<17(years)
Fasting plasma glucose
5. Male, history>17(years)
Performance of gender-based decision tree classification of diabetic nephropathy in the testing dataset.
1. Female, BMI < 24
Fasting plasma glucose
2. Female, BMI > 24
3. Male, history<11(years)
4. Male, 11<history<17(years)
Fasting plasma glucose
5. Male, history>17(years)
Although the performance of the proposed gender-based strategy may not exceed that of renal function markers, the model presents an easily interpretable classification tree for DN prediction. For example, in Figure 1B, a female patient with BMI less than 24 and serum triglyceride level greater than 134 mg/dl, may very likely be susceptible to DN if she exhibits a GG genotype on the IGF2BP2 (insulin-like growth factor 2 mRNA binding protein 2) gene. In contrast, if she possesses the TT genotype, her possibility of developing DN is relatively low. However, if she is heterozygous at this locus (i.e. the GT genotype), the next determining factor of whether or not she belongs to the high-risk group would be her fasting plasma glucose level. Thus, the constructed model may make it easier for clinicians to efficiently identify DN risk among T2D patients based on their clinical examination results as well as their genetic susceptibility.
Multiple environmental and genetic factors affect the disease progression of diabetes and its associated kidney complications [3, 7, 12], leading to various chronic and complex health problems. The aim of the present study was to develop a systematic risk assessment model that aid clinicians in the identification of DN patients to faciliate effective monitoring and efficient use of medical resources. As the purpose was to assess the possibility of early prevention, parameters directly associated with renal functions, such as blood creatinine (BC), blood urinary nitrogen (BUN), and urinary albumin, were removed. Under these conditions, our study focused on analyzing the performance of utilizing various clinical (excluding direct renal function markers) and genetic attributes, as well as the integration of a gender-based rule in a decision tree to classify DN patients in a T2D dataset.
Our present finding supports the notion that mechanisms underlying DN may be attributable to multiple genetic and environmental factors. When individual genetic or clinical attributes were employed in the classification of DN patients from the T2D data, the performance was, however, disappointing. A slight increase in performance was achieved after the clinical and genetic features were combined for classification. The results correspond with existing observations that each genetic and environmental determinant may contribute moderate effects to DN, but when interacting together, these factors may significantly influence the progression of DN in T2D patients .
Therefore, when making predictions on the development of DN for T2D patients, it may be important to consider both genetic and clinical factors. In fact, many chronic disease-related studies also agree on the integration of clinical and genetic parameters in making better predictions about disease risks [9, 14, 15]. For complex diseases like T2D and its associated DN complication that are genetically heterogeneous, the same genetic features that may help predict DN susceptibility for one patient may not apply to everyone. However, classifying individuals based on clinical and genetic differences may present to be a logical solution.
Unlike existing models [8, 9], we implemented a gender-based rule in the decision tree classification of DN. The SVM prediction tool introduced by Cho et al.  was built on unbalanced datasets. While Leung et al.'s model achieved almost 100% accuracy in the prediction of T2D kidney diseases as a result of training their model on a much bigger sample size , their tool is also derived from SVM. Despite being the most effective at handling numeric data, the result of an SVM analysis is not necessarily "clinician or patient friendly," potentially inhibiting it from being widely applied in clinical settings.
In contrast, the simplicity of decision tree learning, in addition to its ability to handle both non-homogeneous and nonlinear data in common clinical situations where outliers, missing attribute values, or misclassified points exist, becomes an advantage . Moreover, the results generated by decision tree-based methods are easy to interpret as the approach discovers if-then rules from a given dataset. Most of these merits may not be easily attainable by conventional classification methods such as linear discrimination analysis or k-nearest neighbor method. Although our attempts on classifying DN based on the combination of clinical and genetic parameters failed to outperform common renal funtion indices, by implementing a gender-based rule which follows existing observations that DN risk differs between males and females [7, 12, 17], we were able to increase the performance of DN identification significantly.
Our approach provides additional support for the emerging evidence that DN risk is gender-specific. For example, for female patients, BMI, serum triglyceride level, fasting plasma glucose level, the IGF2BP2 gene were the features that best discrminate DN from non-DN individuals. It is known that the level of triglycerides is higher in DN patients with T1D  and T2D . In addition, the IGF2BP2 gene has been associated with DN in males with T1D  and appears to modulate the risk of T2D among obese individuals of the Chinese Han population . Although, currently, there is insufficient evidence correlating the IGF2BP2 gene with DN in T2D females, our model suggests that female patients with the TT genotype at the IGF2BP2 gene locus may be at a lower risk of developing DN compared to those with the GG genotype.
Likewise, according to our analysis, the primary determining factors for the development of DN for male T2D patients lie in the duration of T2D and serum triglyceride level. Different genes appear to act on the risk of DN for males with different durations of T2D. For instance, males with 11 to 17 years of T2D history, would very likely develop DN if they have moderately high levels of serum triglyceride and possess the CC genotype at the ADIPOQ gene locus. In contrast, those heterozygous at this gene locus must exhibit high fasting plasma glucose levels to have increased risk of developing DN. Though the ADIPOQ gene has been associated with T2D [22, 23] and DN , the exact nature of this association has not been identified. Our results imply that the ADIPOQ gene may exert its effect on DN risk among individuals with specific clinical attribute values. In addition, our model suggests that for males that have lived with T2D for less than 11 years, if they are heterozygous at the obesity and T2D candidate gene, FTO (fat mass and obesity associated) , they are more likely to develop DN when their LDL levels exceed 100 mg/dl.
To our knowledge, as the disease mechanisms of DN are complex and the symptoms are not obvious, there are no accurate and easy-to-interpret diagnostic methods for the early screening of DN susceptibility. By performing a series of decision tree analyses with the integration of a gender-based rule, we have identified specific clinical and genetic features, as well as their hidden associations and interactions, that may be used for DN risk assessment. The proposed strategy generates results that are easy-to-interpret and easy-to-implement. With further refinement in parameter settings and testing in a bigger sample size, our approach may have the potential to facilitate early identification of individuals with DN susceptibility and therefore, efficient prevention or treatment for DN.
The T2D data were kindly provided by Dr. L.S.H. Wu. The data consisted of 345 Taiwanese patients recruited from the Tri-Service General Hospital in Taipei, Taiwan, in 2002. These data were obtained in previously published studies under the approval of the insititutional review board at the the Tri-Service General Hospital Taipei,Taiwan. The criteria for recruitment has been described in Wu et al. [10, 11]. All recruited patients fulfilled the following criteria: (1) diagnosed with diabetes for >5 years; (2) age 30 to 75 years; (3) The fasting plasma glucose was >6.93 mmol/l (126 mg/dl); (4) The glycated haemoglobin (HbA1C) was >6%. Patients who were classified as the DN group fulfilled any of the following three criteria: (1) the average ACR was >300 μg/mg; (2) the BUN was >20 mg/dl; and (3) the serum creatinine was >1.7 mg/dl. The rest of the patients were classified as the T2D without DN group. The case group comprises 185 T2D patients with DN, and the control group comprised 160 T2D without DN. Table S1 (Additional file 1) provides an differences in demographic and clinical characteristics among the participants were assessed via Student's t-test. The p-value < 0.05 was regarded as statistically significant. In Wu et al [10, 11], genotypes on the 20 T2D candidate genes have already been determined, and the genotype distribution for each gene is shown in Table S2 (Additional file 1).
Experiments were conducted in LibSVM (version 3.12)  and WEKA, or Waikato Environment for Knowledge Analysis (version 3.6.5) , a JAVA based platform for data mining and data analysis. Since diabetes and its associated complications can be caused by multiple factors [12, 17], the best features from our dataset for DN classification were identified by computing the F-scores in LibSVM for SVM, and the information gain scores via the InfoGainAttribute in WEKA  for decision tree and naïve Bayes. Features were subsequently ranked based on these scores.
where the average of the ith feature of the whole, positive, and negative datasets are represented by , respectively. indicates the i-th feature of the k- th positive instance, while is the ith feature of the kth negative instance. Features with larger F-scores are more discriminative.
where S is the dataset for which entropy is being calculated, × represents the classes in S, and p(x) is the proportion of the number of elements in class × to the number of elements in set S.
Decision tree learning algorithm
Next, we used the C4.5 decision tree algorithm  in WEKA to perform DN classfication. The C4.5 algorithm incorporates the concept of information gain to generate the best tree for classification and has become a standard learning tool for the supervised classification problem domain. C4.5 is an extension of the ID3 decision tree induction algorithm  with additional features, including the ability to manage continuous attribute values and noise, deriving alternative measures for selecting attributes, and pruning tees . Tree induction in C4.5 involves the following three steps : 1) construction of a large tree from the training data according to attribute selection by information gain scores; 2) removal of branches to avoid overfitting; 3) processing of the pruned tree to enhance its interpretability. The performance of each generated tree was evaluated by five-fold cross-validation.
SVM is a machine learning algorithm which attempts to find a hyperplane that best differentiate the data into one category or the other . It is often used for classification, regression, and other learning tasks. We selected the features according to their corresponding F-scores and performed classification based on the higher ranking features via SVM in LIBSVM .
where P(C = C k |X = x) is the probability that an item belongs to class C k , given that Ck has a feature vector x. Having made clear that C k and × are values taken on by random variables C and × simplify notation by omitting those random variables. For instance, the expected number of classification errors can be minimized by assigning a document with feature vector × to the class C k for which P(C k |x) is high.
Random forest was also included in our study as one of the classifiers for building the DN prediction model. The random forest algorithm  outputs a collection of decision trees based on the random selection of features. Individual trees may exhibit specific prediction errors. However, when aggregated, it is expected that each tree can capture different patterns within the data and generate better performance .
We appreciate Dr. L.S.H. Wu for providing the clinical and genetic data for the present study. This study was supported by the Ministry of Science and Technology of the Republic of China, Taiwan, under the Contract Number of NSC 101-2628-E-155-002-MY2 and 103-2221-E-155-038.
The authors declare that funding for the publication of this manuscript was sponsored by the Ministry of Science and Technology of the Republic of China, Taiwan.
This article has been published as part of BMC Bioinformatics Volume 16 Supplement 1, 2015: Selected articles from the Thirteenth Asia Pacific Bioinformatics Conference (APBC 2015): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/16/S1
- Danaei G, Finucane MM, Lu Y, Singh GM, Cowan MJ, Paciorek CJ, Lin JK, Farzadfar F, Khang YH, Stevens GA, et al: National, regional, and global trends in fasting plasma glucose and diabetes prevalence since 1980: systematic analysis of health examination surveys and epidemiological studies with 370 country-years and 2.7 million participants. Lancet. 2011, 378 (9785): 31-40. 10.1016/S0140-6736(11)60679-X.View ArticlePubMedGoogle Scholar
- Organization WH: Global status report on noncommunicable diseases 2010. 2011Google Scholar
- Hong CY, Chia KS: Markers of diabetic nephropathy. Journal of diabetes and its complications. 1998, 12 (1): 43-60. 10.1016/S1056-8727(97)00045-7.View ArticlePubMedGoogle Scholar
- Ntemka A, Iliadis F, Papanikolaou N, Grekas D: Network-centric Analysis of Genetic Predisposition in Diabetic Nephropathy. Hippokratia. 2011, 15 (3): 232-237.PubMed CentralPubMedGoogle Scholar
- Sharifiaghdas F, Kashi AH, Eshratkhah R: Evaluating percutaneous nephrolithotomy-induced kidney damage by measuring urinary concentrations of beta2-microglobulin. Urol J. 2011, 8 (4): 277-282.PubMedGoogle Scholar
- Thorsby PM, Midthjell K, Gjerlaugsen N, Holmen J, Hanssen KF, Birkeland KI, Berg JP: Comparison of genetic risk in three candidate genes (TCF7L2, PPARG, KCNJ11) with traditional risk factors for type 2 diabetes in a population-based study--the HUNT study. Scandinavian journal of clinical and laboratory investigation. 2009, 69 (2): 282-287. 10.1080/00365510802538188.View ArticlePubMedGoogle Scholar
- Al-Rubeaan K, Youssef AM, Subhani SN, Ahmad NA, Al-Sharqawi AH, Al-Mutlaq HM, David SK, AlNaqeb D: Diabetic nephropathy and its risk factors in a society with a type 2 diabetes epidemic: a Saudi National Diabetes Registry-based study. PloS one. 2014, 9 (2): e88956-10.1371/journal.pone.0088956.PubMed CentralView ArticlePubMedGoogle Scholar
- Cho BH, Yu H, Kim KW, Kim TH, Kim IY, Kim SI: Application of irregular and unbalanced data to predict diabetic nephropathy using visualization and feature selection methods. Artif Intell Med. 2008, 42 (1): 37-53. 10.1016/j.artmed.2007.09.005.View ArticlePubMedGoogle Scholar
- Leung RKK, Wang Y, Ma RCW, Luk AOY, Lam V, Ng M, So WY, Tsui SKW, Chan JCN: Using a multi-staged strategy based on machine learning and mathematical modeling to predict genotype-phenotype risk patterns in diabetic kidney disease: a prospective case-control cohort analysis. Bmc Nephrol. 2013, 14:Google Scholar
- Lin E, Pei D, Huang YJ, Hsieh CH, Wu LSH: Gene-Gene Interactions Among Genetic Variants from Obesity Candidate Genes for Nonobese and Obese Populations in Type 2 Diabetes. Genet Test Mol Bioma. 2009, 13 (4): 485-493. 10.1089/gtmb.2008.0145.View ArticleGoogle Scholar
- Wu LS, Hsieh CH, Pei D, Hung YJ, Kuo SW, Lin E: Association and interaction analyses of genetic variants in ADIPOQ, ENPP1, GHSR, PPARgamma and TCF7L2 genes for diabetic nephropathy in a Taiwanese population with type 2 diabetes. Nephrol Dial Transplant. 2009, 24 (11): 3360-3366. 10.1093/ndt/gfp271.View ArticlePubMedGoogle Scholar
- Villar E, Remontet L, Labeeuw M, Ecochard R: Effect of age, gender, and in diabetes on excess death in end-stage renal failure. J Am Soc Nephrol. 2007, 18 (7): 2125-2134. 10.1681/ASN.2006091048.View ArticlePubMedGoogle Scholar
- Murea M, Ma L, Freedman BI: Genetic and environmental factors associated with type 2 diabetes and diabetic vascular complications. The review of diabetic studies: RDS. 2012, 9 (1): 6-22. 10.1900/RDS.2012.9.6.PubMed CentralView ArticlePubMedGoogle Scholar
- Kao JH, Chen PJ, Lai MY, Chen DS: Hepatitis B genotypes correlate with clinical outcomes in patients with chronic hepatitis B. Gastroenterology. 2000, 118 (3): 554-559. 10.1016/S0016-5085(00)70261-7.View ArticlePubMedGoogle Scholar
- Pezzolesi MG, Poznik GD, Mychaleckyj JC, Paterson AD, Barati MT, Klein JB, Ng DP, Placha G, Canani LH, Bochenski J, et al: Genome-wide association scan for diabetic nephropathy susceptibility genes in type 1 diabetes. Diabetes. 2009, 58 (6): 1403-1410. 10.2337/db08-1514.PubMed CentralView ArticlePubMedGoogle Scholar
- Guh RS, Wu TCJ, Weng SP: Integrating genetic algorithm and decision tree learning for assistance in predicting in vitro fertilization outcomes. Expert Syst Appl. 2011, 38 (4): 4437-4449. 10.1016/j.eswa.2010.09.112.View ArticleGoogle Scholar
- Vlassoff C: Gender differences in determinants and consequences of health and illness. Journal of health, population, and nutrition. 2007, 25 (1): 47-61.PubMed CentralPubMedGoogle Scholar
- Hadjadj S, Duly-Bouhanick B, Bekherraz A, BrIdoux F, Gallois Y, Mauco G, Ebran J, Marre M: Serum triglycerides are a predictive factor for the development and the progression of renal and retinal complications in patients with type 1 diabetes. Diabetes & metabolism. 2004, 30 (1): 43-51. 10.1016/S1262-3636(07)70088-5.View ArticleGoogle Scholar
- Lee IT, Wang CY, Huang CN, Fu CC, Sheu WHH: High triglyceride-to-HDL cholesterol ratio associated with albuminuria in type 2 diabetic subjects. Journal of diabetes and its complications. 2013, 27 (3): 243-247. 10.1016/j.jdiacomp.2012.11.004.View ArticlePubMedGoogle Scholar
- Gu T, Horova E, Mollsten A, Seman NA, Falhammar H, Prazny M, Brismar K, Gu HF: IGF2BP2 and IGF2 genetic effects in diabetes and diabetic nephropathy. Journal of diabetes and its complications. 2012, 26 (5): 393-398. 10.1016/j.jdiacomp.2012.05.012.View ArticlePubMedGoogle Scholar
- Wu HH, Liu NJ, Yang Z, Tao XM, Du YP, Wang XC, Lu B, Zhang ZY, Hu RM, Wen J: IGF2BP2 and obesity interaction analysis for type 2 diabetes mellitus in Chinese Han population. European journal of medical research. 2014, 19: 40-10.1186/2047-783X-19-40.PubMed CentralView ArticlePubMedGoogle Scholar
- Ramya K, Ayyappa KA, Ghosh S, Mohan V, Radha V: Genetic association of ADIPOQ gene variants with type 2 diabetes, obesity and serum adiponectin levels in south Indian population. Gene. 2013, 532 (2): 253-262. 10.1016/j.gene.2013.09.012.View ArticlePubMedGoogle Scholar
- Siitonen N, Pulkkinen L, Lindstrom J, Kolehmainen M, Eriksson JG, Venojarvi M, Ilanne-Parikka P, Keinanen-Kiukaanniemi S, Tuomilehto J, Uusitupa M: Association of ADIPOQ gene variants with body weight, type 2 diabetes and serum adiponectin concentrations: the Finnish Diabetes Prevention Study. Bmc Med Genet. 2011, 12:Google Scholar
- Ng MC, Park KS, Oh B, Tam CH, Cho YM, Shin HD, Lam VK, Ma RC, So WY, Cho YS, et al: Implication of genetic variants near TCF7L2, SLC30A8, HHEX, CDKAL1, CDKN2A/B, IGF2BP2, and FTO in type 2 diabetes and obesity in 6,719 Asians. Diabetes. 2008, 57 (8): 2226-2233. 10.2337/db07-1583.PubMed CentralView ArticlePubMedGoogle Scholar
- Chang C-C, Lin C-J: LIBSVM: A Library for Support Vector Machines. 2001Google Scholar
- Hall MF, Eibe Holmes, Geoffrey Pfahringer, Bernhard Reutemann, Peter Witten, Ian H: The WEKA Data Mining Software: An Update. SIGKDD Explorations. 2009Google Scholar
- Firouzi F, Rashidi M, Hashemi S, Kangavari M, Bahari A, Daryani NE, Emam MM, Naderi N, Shalmani HM, Farnood A, et al: A decision tree-based approach for determining low bone mineral density in inflammatory bowel disease using WEKA software. Eur J Gastroen Hepat. 2007, 19 (12): 1075-1081. 10.1097/MEG.0b013e3282202bb8.View ArticleGoogle Scholar
- Akay MF: Support vector machines combined with feature selection for breast cancer diagnosis. Expert Systems with Applications. 2009, 36 (2): 3240-3247. 10.1016/j.eswa.2008.01.009.View ArticleGoogle Scholar
- Quinlan JR: C4.5: programs for machine learning. 1993, Morgan Kaufmann Publishers IncGoogle Scholar
- Quinlan JR: Induction of decision trees. Machine Learning. 1986, 1 (1): 16-Google Scholar
- Quinlan JR: Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research. 1996, 4 (1): 14-Google Scholar
- Corinna Cortes VV: Support-vector networks. Machine Learning. 1995, 20 (3): 273-297.Google Scholar
- Chen J, Huang H: Feature selection for text classification with Naïve Bayes. Expert Systems with Applications. 2009, 36 (3): 5432-5435. 10.1016/j.eswa.2008.06.054.View ArticleGoogle Scholar
- Guh R-S, Wu T-CJ, Weng S-P: Integrating genetic algorithm and decision tree learning for assistance in predicting in vitro fertilization outcomes. Expert Systems with Applications. 2011, 38 (4): 4437-4449. 10.1016/j.eswa.2010.09.112.View ArticleGoogle Scholar
- Breiman L: RANDOM FORESTS. 2001Google Scholar
- Oshiro Thais Mayumi, P P S, Baranauskas Augusto José: How Many Trees in a Random Forest?. Lecture Notes in Computer Science. 2012, 7376: 154-168. 10.1007/978-3-642-31537-4_13.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.