Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Variant site strain typer (VaST): efficient strain typing using a minimal number of variant genomic sites

Fig. 1

VaST Pipeline Schematic. a Overview of the VaST pipeline. b The window (gray box) starts at the first site (115) and captures two additional sites (120 and 121). The amplicon (black box) extends from the first to the last variant site in the window and the primer zones (arrows) extend in opposite directions. c The primer zone region is extracted from the full genome matrix and the number of strains that are missing data (X) or have a base call that differs from the reference are counted for each position. d A position in the primer zone is flagged (!) when the number of poorly conserved strains is greater than or equal to the strain cutoff value. e To pass the filter in this example, 20% of the primer zone positions must be a member of a conserved segment that is longer than three positions. f The table shows the variant sequence features of the amplicons g The resolution pattern of each amplicon is determined and the amplicons that contain redundant information are combined (e.g. Amplicon 3 & 4 into Pattern 3). For ambiguous (N) or missing calls (X), all of the possibilities are enumerated and the strain simultaneously belongs to all of the feature categories that overlap with those of the other strains. The bottom row is the resolution score, r, for each pattern. The minimum spanning set algorithm favors patterns that evenly split up groups of strains. Using SNPs as an example, h is the best case scenario where N strains can be resolved with log4(N) SNPs; however, i log2(N) is more likely with bi-allelic SNPs. j In the worst case, highly unbalanced splitting can occur which can require at most N−1 SNPs to resolve N strains. k The associated haplotypes for each of the minimum spanning sets in (h-j)

Back to article page