Skip to main content
Fig. 4 | BMC Bioinformatics

Fig. 4

From: Multivariate pattern analysis: a method and software to reveal, quantify, and visualize predictive association patterns in multicollinear data

Fig. 4

Validation plot to decide the number of predictive PLS components for the association pattern of the mediating lipoproteins to the outcome HOMA-IR. The plot displays the median value (black line) for Monte Carlo resampling with 1000 repetitions for an increasing number of components (starting from 0 and ending with 6 components which was chosen as the maximum number of components). The minimum median for RMSEP is found for the 4-component PLS model (indicated by *), implying that the number of predictive PLS components is lower than or equal to 4. The plot further compares the distribution of RMSEP values for the A-component models with the median of the (A-1)-component models. Red dots imply RMSEPs for the A-component models that exceed the median for the (A-1)-component model, while green dots indicate lower RMSEPs for the A-component models compared to the median of the (A-1)-component model. The numbers in parentheses show the fraction of the A-component model with RMSEPs exceeding the median for the (A-1)-component model. We start our assessment by comparing the 4-component model (minimum median of RMSEP) with the 3-componen model. The number 0.47 represents the fraction of repetitions for the 4-component PLS model, which exceeds the median for the 3-component model. Since the ratio of objects to variables is high in our data, we used 0.5 as the acceptance threshold. If this number is low the chance of overfitting increases and a lower number is recommended [8]. Thus, the 4-component model was chosen

Back to article page