Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: Semi-supervised machine learning for automated species identification by collagen peptide mass fingerprinting

Fig. 2

Cycle of semi-supervised learning model: (a) starting with a training set, each species within the training set undergoes B-E, where (b) is the use of random forest to draw 2000 subsets with 10 m/z peaks for the training set, from which the ID3 algorithm was used to find the optimal decision tree to separate the taxon from the rest (retaining those with accuracy > 0.95), (c) reflects majority voting of samples satisfying > 60% of trees, which were then added to the taxon, (d) the removal of samples significantly different to the training set (newly added samples with likelihood < 0.2 were removed) and (e) the updated set of samples were passed on to the next cycle as the new training set

Back to article page