Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: RISCI - Repeat Induced Sequence Changes Identifier: a comprehensive, comparative genomics-based, in silico subtractive hybridization pipeline to identify repeat induced sequence changes in closely related genomes

Figure 1

RISCI flow chart depicting basic steps of the pipeline. Major inputs include the repeat name (as identified by Repeat Masker), repeat mining option, filter inputs, flank length (default 5000 bases), merger threshold (default 50 bases), size of non-repeat tag (default 500 bases), maximum target site duplication (TSD) size (default 50 bases) and speed options. The genomic annotation of the repeat locus is parsed from the gen bank file if made available. The upstream sequence is tagged with user defined length of non repeat sequence wherever possible (default 500 bases). The upstream and downstream flanks, each carrying 50 base overhang into the repeat, are BLASTed separately against the comparative genome(s) and the BLAST alignment files summarized. For each repeat locus in the main genome, all upstream blast hits are compared against downstream blast hits in the same orientation and sequentially checked for shared ancestry (occupied), post insertion changes (recombination-mediated deletions, disruptions etc.), target site duplication (orthologous locus empty) and classified as CAN, PAC or PTS. If no matches for TSD are found both on the corresponding chromosomal homologue as well as on other chromosomes, the locus is checked for miscellaneous events (INDELs) like insertion-mediated deletion or parallel insertion or insertion deletion on corresponding chromosomal homolog, else the locus is reported as no match found (NMF). In the final results file, RISCI annotation for each locus, genomic annotation of the repeat loci in the main or reference genome and of the orthologous locus in the comparative genome, percentage repeat content of the flanks, Blastn coordinates (query and subject) for the flanks, size of TSD or INDEL or RMD are reported. In case of TSD, sequence of 5' and 3' TSD in the reference genome and of the lone copy of TSD in the comparative genome are also reported.

Back to article page