Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: The parameter sensitivity of random forests

Fig. 2

Prediction accuracy is a strong function of parameterization in low p/n studies. Summary of low p/n predicted votes for each fitted random forest model (n = 1500). An AUC plot is provided at the top indicating the relative performance of each model, represented by each column. Each model was fitted from a unique combination of n tree (n = 10), m try (n = 15) and sampsize parameters (n = 10) and their respectively outcomes (votes) for each sample or row (n = 576). Votes are provided in values from 0–1 with 0 representing a “bad library” and 1 representing a “good library”. All columns are ordered in descending order of AUC scores and rows are ordered in descending order of the fraction of correct votes for a given sample (total votes for the true sample class/all votes). All samples were subsetted according to the true class labels “good library” and “bad library”, though the votes may not be reflective of this. Barplots for vote fractions are provided on the right of the main heatmaps and the values for each parameter are provided at the bottom of the figure. The n tree parameter is illustrated in blue, m try in magenta and sampsize in orange. Lighter hues represent lower values with darker hues indicating higher values. A scatterplot in the bottom right corner illustrates a strong negative correlation between the m try parameter with AUC scores (ρ = -0.89, p = 0)

Back to article page