Skip to main content
Figure 2 | BMC Bioinformatics

Figure 2

From: NClassG+: A classifier for non-classically secreted Gram-positive bacterial proteins

Figure 2

Methodology of NClassG+. The NClassG+ classifier was selected among a large number of possible classifiers resulting from all the possible combinations of protein vector representations and Kernel functions considered in this study. In step A, the candidate classifiers were built and compared in a nested k-fold cross-validation (CV) environment. Briefly, using the training and test data sets from the inner loop of the nested k-fold CV procedure, a classifier is optimized according to CV accuracy for all the possible Kernel function/feature combination pairs, selecting the pair with the best CV accuracy value in each iteration of the outer loop. The training and test data sets from the inner loop come from the training data set of the outer loop, the test data set from the outer loop is used to calculate an estimated accuracy of the whole process. Using the hyperparameters of the best classifier trained with the inner loop CV, a classifier is trained and tested with the outer loop data sets. NClassG+ is the classifier with the best CV accuracy, as calculated in the inner loop. In step B, prior to performing the nested k-fold CV procedure, the learning data set was partitioned to assess and compare the performance of the selected classifier against SecretomeP 2.0 and SecretP 2.0. The a1, a2, and a3 data sets are totally different partitions derived from the learning set used in the construction of NClassG+. * hyperparameter optimization.

Back to article page