Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: ViR: a tool to solve intrasample variability in the prediction of viral integration sites using whole genome sequencing data

Fig. 1

Overview of ViR. ViR comprises four scripts, organized into two modules. Module 1 starts with the ViR_RefineCandidate script, which uses the SAM file of the WGS data alignment to the host genome and a file with the list of chimeric reads. a ViR_RefineCandidates filters the viral mate of the chimeric reads pairs to identify the best candidate viral reads. In chimeric reads, the host read is shown in in gray, the viral read, in red. Flags of alignment and sequence quality information are extracted for the identified best candidates. The output directory of ViR_Refine Candidates is the input of ViR_SolveDispersion. b ViR_SolveDispersion is designed to identify groups of host reads supporting a potential integration site in equivalent genomic regions (step 1). Read groups are compared in a pair-wise mode to merge groups sharing a certain percentage of reads (step 2). Remaining read groups support potential viral integrations (step 3). The usage of ViR_AlignToGroup script is embedded in ViR_SolveDispersion. c For each identified read group ViR_AlignToGroup extracts the reads from the SAM file of the selected reads by ViR_RefineCandidates (step 1) and the sequence of the equivalent region of the group in FASTA format (step 2). Then, reads are re-aligned to the sequence of the equivalent region and flags of alignment are used to identify the left and right side of the integration site (step 3). Examples of flags supporting the right (i.e., 73, 133) and left (i.e., 117, 185) ends of the integration site are shown (step 4). All the reads are represented by arrows with standard or not standard terminal part. The direction of the arrow is 5′–3′. In Module 2 the FASTQ file of the sample WGS and the FASTA file of the viral genome(s) are the input for ViR_LTFinder script. d ViR_LTFinder align WGS raw reads to the viral genome(s) (step 1). Aligned reads are extracted, converted into FASTQ (step 2) and used for local de-novo assembly (step 3). Aligned reads include mates in which both reads map to the viral genome (in red), chimeric reads (viral read in read and host read in dashed grey) and mates in which one is a soft clipped read (viral portion in red, the rest in dashed gray). Mates are indicated by continuous thin line

Back to article page