Skip to main content

Table 2 An example of a hierarchical alignment and assembly protocol specification

From: Applications of the pipeline environment for visual informatics and genomics computations

Alignment and Assembly
   A preprocessing step: Extracting a sub-sequence of the genomic sequence. This step is not required, but may be useful for some preliminary tests and protocol validation. It restricts the size of the sequences and expedites the computation
Input: reads files output of Illumina sequencing pipeline (sequence.txt files)
Tool: LONI Sub-Sequence extractor
Server Location:/projects1/idinov/projects/scripts/extract_lines_from_Textfile.sh
Output: Shorter sequence.fastq file
   Data conversion: File conversion of solexa fastq in sanger fastq format
Input: reads files output of Illumina sequencing pipeline (sequence.txt files)
Tool: MAQ (sol2sanger option): Mapping and Assembly with Quality
Server Location:/applications/maq
Output: sequence.fastq file
   Binary conversion: Conversion of fastq in a binary fastq file (bfq)
Input: sequence.fastq file
Tool: MAQ (fastq2bfq option)
Server Location:/applications/maq
Output: sequence.bfq file
   Reference conversion: Conversion of the reference genome (fasta format) in binary fasta
Input: reference.fasta file (to perform the alignment)
Tool: MAQ (fasta2bfa option)
Server Location:/applications/maq
Output: reference.bfa file
   Sequence alignment: Alignment of data sequence to the reference genome
Using MAQ:
Input: sequence.bfq, reference.bfa
Tool: MAQ (map option)
Server Location:/applications/maq
Output: alignment.map file
Using Bowtie:
Input: reference.fai, sequence.bfq,
Tool: Bowtie (map option)
Server Location:/applications/bowtie
Output: alignment.sam file
   Indexing: Indexing the reference genome
Input: reference.fa
Tool: samtools (faidx option)
Server Location:/applications/samtools-0.1.7_x86_64-linux
Output: reference.fai
   Mapping conversion:
MAQ2SAM:
Input: alignment.map file
Tool: samtools (maq2sam-long option)
Server Location:/applications/samtools-0.1.7_x86_64-linux
Output: alignment.sam file
SAM to full BAM:
Input: alignment.sam, reference.fai file
Tool: samtools (view -bt option)
Server Location:/applications/samtools-0.1.7_x86_64-linux
Output: alignment.bam file
   Removal of duplicated reads:
Input: alignment.bam file
Tool: samtools (rmdup)
Server Location:/applications/samtools-0.1.7_x86_64-linux
Output: alignment.rmdup.bam file
   Sorting:
Input: alignment. rmdup.bam file
Tool: samtools (sort option)
Server Location:/applications/samtools-0.1.7_x86_64-linux
Output: alignment. rmdup.sorted.bam file
   MD tagging:
Input: alignment. rmdup.sorted.bam file and reference REF.fasta file
Tool: samtools (calmd option)
Server Location:/applications/samtools-0.1.7_x86_64-linux
Output: alignment. rmdup.sorted.calmd.bam file
   Indexing:
Input: alignment.rmdup.sorted.calmd.bam file
Tool: samtools (index option)
Server Location:/applications/samtools-0.1.7_x86_64-linux
Output: alignment. rmdup.sorted.calmd.bam.bai file
  1. This protocol is implemented as a Pipeline graphical workflow and demonstrated in the Results section. Figure 3 shows the corresponding Pipeline graphical workflow implementing this genomics analysis protocol.