Skip to main content

Table 2 An example of a hierarchical alignment and assembly protocol specification

From: Applications of the pipeline environment for visual informatics and genomics computations

Alignment and Assembly

   A preprocessing step: Extracting a sub-sequence of the genomic sequence. This step is not required, but may be useful for some preliminary tests and protocol validation. It restricts the size of the sequences and expedites the computation

Input: reads files output of Illumina sequencing pipeline (sequence.txt files)

Tool: LONI Sub-Sequence extractor

Server Location:/projects1/idinov/projects/scripts/extract_lines_from_Textfile.sh

Output: Shorter sequence.fastq file

   Data conversion: File conversion of solexa fastq in sanger fastq format

Input: reads files output of Illumina sequencing pipeline (sequence.txt files)

Tool: MAQ (sol2sanger option): Mapping and Assembly with Quality

Server Location:/applications/maq

Output: sequence.fastq file

   Binary conversion: Conversion of fastq in a binary fastq file (bfq)

Input: sequence.fastq file

Tool: MAQ (fastq2bfq option)

Server Location:/applications/maq

Output: sequence.bfq file

   Reference conversion: Conversion of the reference genome (fasta format) in binary fasta

Input: reference.fasta file (to perform the alignment)

Tool: MAQ (fasta2bfa option)

Server Location:/applications/maq

Output: reference.bfa file

   Sequence alignment: Alignment of data sequence to the reference genome

Using MAQ:

Input: sequence.bfq, reference.bfa

Tool: MAQ (map option)

Server Location:/applications/maq

Output: alignment.map file

Using Bowtie:

Input: reference.fai, sequence.bfq,

Tool: Bowtie (map option)

Server Location:/applications/bowtie

Output: alignment.sam file

   Indexing: Indexing the reference genome

Input: reference.fa

Tool: samtools (faidx option)

Server Location:/applications/samtools-0.1.7_x86_64-linux

Output: reference.fai

   Mapping conversion:

MAQ2SAM:

Input: alignment.map file

Tool: samtools (maq2sam-long option)

Server Location:/applications/samtools-0.1.7_x86_64-linux

Output: alignment.sam file

SAM to full BAM:

Input: alignment.sam, reference.fai file

Tool: samtools (view -bt option)

Server Location:/applications/samtools-0.1.7_x86_64-linux

Output: alignment.bam file

   Removal of duplicated reads:

Input: alignment.bam file

Tool: samtools (rmdup)

Server Location:/applications/samtools-0.1.7_x86_64-linux

Output: alignment.rmdup.bam file

   Sorting:

Input: alignment. rmdup.bam file

Tool: samtools (sort option)

Server Location:/applications/samtools-0.1.7_x86_64-linux

Output: alignment. rmdup.sorted.bam file

   MD tagging:

Input: alignment. rmdup.sorted.bam file and reference REF.fasta file

Tool: samtools (calmd option)

Server Location:/applications/samtools-0.1.7_x86_64-linux

Output: alignment. rmdup.sorted.calmd.bam file

   Indexing:

Input: alignment.rmdup.sorted.calmd.bam file

Tool: samtools (index option)

Server Location:/applications/samtools-0.1.7_x86_64-linux

Output: alignment. rmdup.sorted.calmd.bam.bai file

  1. This protocol is implemented as a Pipeline graphical workflow and demonstrated in the Results section. Figure 3 shows the corresponding Pipeline graphical workflow implementing this genomics analysis protocol.