Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Semi-supervised machine learning for automated species identification by collagen peptide mass fingerprinting

Fig. 1

Flow chart of data pre-processing pipeline: (a) m/z peaks from MALDIquant were background-modelled, calibrated, quality checked and combined into a binary data matrix, (b) distribution of top 50% peaks within − 100 to + 100 m/z of the target peaks were modelled by a normal distribution; peaks with background probability (Pr) > 1 × 10− 15 were discarded (green and red areas give examples of background and signal respectively), (c) monoisotopic m/z values were matched to a reference set and linear models fitted between the errors and m/z; all peaks in the spectrum were subsequently corrected according to linear model, and (d) an illustration of the extent to which the ‘union’ set of peaks was greatly reduced by monoisotopic selection (M) and background subtraction (BG) and further reduced by calibration (C) and quality check (Q)

Back to article page