Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: SLFinder, a pipeline for the novel identification of splice-leader sequences: a good enough solution for a complex problem

Fig. 1

SLFinder workflow. The pipeline is divided into four scripts (Step0 to 3) to facilitate user intervention. Step0 reduces a Trinity assembly redundancy based on the contigs IDs and extract transcripts ends. Step1 generates Hook sequences for the putative Splice Leaders based on the Kmers present on the filtered sequences generated in Step0, identifies matching sequences in the original assembly and selects the Best Hooks based on their overall sequence orientation and frequency. The Best Hook Hits sequences are then aligned and trimmed in Step2 to generate Hook Variants, and Step 3 localizes them on a Reference genome and classifies hits according to the presence of a potential Donor Site. In detail, longer isoforms are retrieved from the transcriptome (A), both terminals regions are keep (B) and kmers are counted with Jellyfish and selected according to their abundance (C). Selected kmers are assembled into Hooks using Inchworm (D) and then Hooks are blasted against the original unfiltered transcriptome (E) and then filtered according to coi and abundance. The Best Hook matching sequences are retrieved (H). These sequences are then aligned with MAFFT (I) and trimmed automatically with trimAL (J), results are Hook Variants that represent possible versions of the repeated sequence that generated the Hooks (including both real polymorphisms and sequencing errors) (K). Step3 locates locus codding for the Hook Variants in the species genome using BLAST and identifies potential donor sites (M). Finally, putative coding sequences are discarded using a third BLAST search against a protein-coding or protein sequences reference databases (N)

Back to article page