Figure 1

Diagrams of the new and original processing pipelines. Shown are the A) original processing pipeline and B) new processing pipeline for training sequences. The four main differences (shaded boxes) in the new are 1) sequences have their hypervariable regions removed, 2) distance matrices allow 3) grouping (< = 1% sequence difference) into Operational Taxonomic Units (OTUs), and 4) sequences are labelled with their taxonomic designations, as supplied by the Ribosomal Database Project (RDP).