Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: STRAIN: an R package for multi-locus sequence typing from whole genome sequencing data

Fig. 1

Summary of STRAIN pipeline. The program starts with input sequence file(s) in FASTQ format (compressed or not) and a dataset of reference allele sequences in FASTA format. Using bowtie2, reads are mapped against the reference alleles dataset. Four separate sets of mapped reads are formed according to an increasing match criterion (all mapped reads, reads perfectly matching at least 30, 60 and 90% of their lengths). As highlighted in the red box, each set of reads is separately assembled by SPAdes or Velvet to produce the locus contig(s). Each contig is finally aligned by BLASTn against the reference alleles dataset to identify the perfect match. The program stops i) when there are no reads to be assembled, ii) when no contigs are produced by the assemblers, iii) a new allele variant sequence is identified or iv) a perfect match is found

Back to article page