From: Tissue-associated microbial detection in cancer using human sequencing data
Framework | Approach | Dependencies | Input | output | Advantages/disadvantages | Cancer validation | Refs. |
---|---|---|---|---|---|---|
PathSeq | Alignment and de novo assembly | BLAST BLASTN BLASTX MAQ MegaBLAST RepeatMasker Velvet | Input: RNA-seq or DNA-seq Output: Pathogen presence/absence | Scalable cloud computing Feasible for known and novel pathogen identification Two-pass subtraction with increased filtering costs | Cervical cancer (cell line and simulated data) TCGA ovarian | |
SRSA | Alignment and de novo assembly | Velvet MegaBLAST BLAST BWA TopHat | Input: RNA-seq Output: Species-level taxonomy characterization (prevalence) | Incorporates sample pre-processing, quality filtering, sequence mapping, and assembly Not freely available No known updates Original work validation was limited to cell line | HIV-1 cell line | [60] |
CaPSID | Mix-method, simultaneous alignment, filtration and de novo assembly | BioPython Bowtie2 Trinity | Input: RNA-seq or DNA-seq Output: Top-hit pathogen genome identification ranked by maximum gene coverage | Web-based, open-source and scalable application; Modular analyses; Single pass filtering, which may fail to subtract host reads | Ovarian cancer TCGA stomach | [67] |
SURPI | Dual scanning mode; Known pathogens identification or de novo assembly | SNAP RAPSearch BWA BLASTN Bowtie2 DUST in PRINSEQ | Input: Paired-end metagenomic Output: Species-level taxonomic classification and coverage map | Scalable to cloud or standalone servers Capacity to incorporate reference database Dual-mode: quantitative and semi-quantitative pathogen identification | Prostate cancer (cell line, tissue biopsies) Colorectal cancer (tissue biopsies) | [71] |
PathoScope 2.0 | Penalized probabilistic identification; Modular filtration, alignment and assignment | SAMtools BLASTX Bowtie2 thetaPrior | Input: Metagenomic or genomic (RNA-seq or DNA-seq) Output: Strain level pathogen relative abundance | Modular detailed result reporting with Designed for low abundance strain-level identification MySQL server required; no connection to the population structure of relevant species | TCGA stomach | |
VirusScan | Identification of known viral and integration sites | BWA BLAST MegaBLAST Pindel RepeatMasker PHYLIP | Input: RNA-seq Output: Viral read abundance and integration sites | Designed for viral identification; Abundance and integration sites analyses | TCGA cancer cohorts | [72] |
MetaShot | Two-step similarity filtering and taxonomic assessment | Bowtie2 TANGO STAR Bash | Input: RNA-Seq or DNA-Seq Output: Assigned read report and Krona plot with relative abundance | Extracts unassigned reads; Allow for functional annotations; Slower than other applications | None | [73] |
ConStrains | Marker-based (SNP patterns) Strain-level prediction | MetaPhlAn PhyloPhlAn Bowtie2 SAMtools Metropolis-Hasting Monte-Carlo | Input: Metagenomics (RNA-seq) Output: Strain-level prediction and relative abundance | Single reference strain collection; Facilitates functional analyses when combined with reference genome-based gene coverage metadata | None | [74] |
RINS | Intersection based identification and removal | Bowtie BLAST BLAT Trinity | Input: Mate-paired RNA-seq unmapped reads Output: Pathogen contigs | Requires prior knowledge of reference; Detection limited to user-defined parameters | Prostate cancer (cell line) | [66] |
GRAMMy | Mix- model Bayesian, Expectation–Maximization and maximum likelihood estimation | BLAST BLAT MAQ Bowtie PerM BLASY | Input: Metagenomics reads Output: Genomic relative abundance as numerical vectors | User flexibility Probabilistic handling of ambiguous hits Computational efficiency | None | [76] |