Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: pyCancerSig: subclassifying human cancer with comprehensive single nucleotide, structural and microsatellite mutational signature deconstruction from whole genome sequencing

Fig. 1

pyCancerSig workflow diagram. The workflow consists of 4 steps. 1. Data preprocessing - The purpose of this step is to generate a list of variants. This step has to be performed by third party software. - Single nucleotide variant (SNV) - recommending MuTect2, otherwise Muse, VarScan2, or SomaticSniper. - Structural variant (SV) - dependency on FindSV. - Microsatellite instability (MSI) - dependency on MSIsensor. 2. Profiling (Feature extraction) - `cancersig profile` - The purpose of this step is to turn information generated in the first step into matrix features usable by the model in the next step. The output of this stage has similar format as https://cancer.sanger.ac.uk/cancergenome/assets/signatures_probabilities.txt, which consists of at least 3 columns. - Column 1, Variant type (Substitution Type in COSMIC). - Column 2, Variant subgroup (Trinucleotide in COSMIC). - Column 3, Feature ID (Somatic Mutation Type in COSMIC). - From column 4 onward, each column represents one sample. There are subcommand to be used for each type of genetic variation. - `cancersig feature snv` is for extraction single nucleotide variant feature. - `cancersig feature sv` is for extraction structural variant feature. - `cancersig feature msi` is for extraction microsatellite instability feature. - `cancersig feature merge` is for merging all feature profiles into one single profile ready to be used by the next step. 3. Deciphering mutational signatures - `cancersig signature decipher` - The purpose of this step is to use unsupervised learning model to find mutational signature components in the tumors. 4. Visualizing profiles - `cancersig signature visualize` - The purpose of this step is to visualize mutational signature component for each tumor

Back to article page