Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data

Fig. 1

Schematic representation of the SLIDR pipeline (Spliced leader identification from RNA-Seq). Local alignments of reads (grey) to a genomic reference (illustrated by four genes A–D) allow for 5′ spliced leader (SL) tails to be soft-clipped and extracted (coloured read portions). Clustering of 3′ aligned read tails from all genes at 100% sequence similarity produces unique consensus SL candidates (cluster centroids), which are required to align to the genomic reference to identify candidate SL RNA genes (illustrated by SL1 and SL2 genes). In SL RNA genes, a splice donor site (SD; for example GT) is expected immediately downstream of the genomic alignment, followed by an Sm binding site (for example 5′-ATTTTTG-3′) bookended by inverted repeats capable of forming stem loops in the RNA transcript. Conversely, the spliced gene requires a splice acceptor site (SA; for example, AG) immediately upstream of the 5′ read alignment location in the genomic reference. In this illustration, the example SL1 is fully reconstructed from a single read-tail cluster (cluster 1) with GT and AG splice sites in the expected locations (genes A and B). In contrast, the example SL2 highlights how read tails may be 3′-truncated due to overlap with the splice acceptor site (genes C and D) and the upstream trans-splice acceptor site sequence at some genes (gene D). These missing nucleotides can be filled in from the trans-splice acceptor site region guided by the distance between the 3′ tail alignment location and the splice donor site (GT). Note that although cluster 2 is also 5′ truncated due to insufficient coverage at gene C, consensus calling with cluster 3 allowed for reconstructing the full SL2 RNA gene

Back to article page