The features of SEQing are presented by running the tool with sample data from [4] on a local linux machine. We show common application cases of SEQing: browsing genes and their corresponding genomic windows. The source code and sample data can be downloaded from github or cloned directly into a local directory by using git clone https://github.com/malewins/SEQing.git. All dependencies to run SEQing can be installed by executing pip3 install -r requirements.txt in the SEQing directory. After startup, the application is accessible via a web-browser by entering the IP address of the host with a specified port number (defaults is 8060), e.g. https://192.168.0.1:8060. This is possible on the same machine or from remote machines inside the network.
Visualization of protein-RNA interactions
To showcase SEQing we utilize datasets with binding targets and crosslink sites of the RBP Arabidopsis thaliana glycine-rich RNA-binding protein 7 (AtGRP7) [4]. Because AtGRP7 is controlled by the circadian clock [14, 15], iCLIP was performed on plants harvested at the circadian maximum of AtGRP7 expression, 36 h after transfer of the plants to continuous light (LL36) and at the circadian minimum of AtGRP7 expression, 24 h after transfer of the plants to continuous light (LL24) [4]. The uniquely mapped reads were reduced to the position 1 nucleotide upstream of the read start (crosslink site) and piled up at each nucleotide position. We refer to transcripts with significant crosslink sites as targets of the RBP. The resulting value represents the number of crosslinks captured by iCLIP. A screenshot of SEQing displaying crosslink sites and significant crosslink sites from this dataset (GSE99427) is shown in Fig. 2. It shows KIN1 (AT5G15960), one of the target transcripts of AtGRP7. We refer to transcripts with significant crosslink sites as targets of the RBP. Significant crosslink sites in this dataset were determined as described in König et al. [1] with the modifications that the threshold of the FDR was set to <0.01 instead of <0.05 and that crosslink sites had to be present at the same nucleotide in all but one biological replicate [4].
The accession-id and strand information of the currently selected gene reside at the top of the dashboard. Genes of interest can be selected via a dropdown menu, also positioned at the top. The dropdown supports a free text search for gene identifiers and descriptions which have been supplied at the start (-desc parameter). After a gene is selected, the corresponding gene description is displayed inside the Gene Description field as well as the matching data tracks in the plot area below the control panel. The imported datasets are selectable in the adjacent Datasets area and the display modes of genomic sequences in the field DNA sequence options. By default, the sequences are displayed in heatmap mode, which can be changed to text mode or disabled (hidden). The legend at the right side of the plots shows the names and colors for each data track and gene isoform. Gene models on the same strand are colored black, whereas genes on the opposite strand are colored grey. The gene models are displayed in the 5′→3′ direction, i.e. if the selected gene resides on the forward strand (+ in description on top) a gene in opposite direction overlapping the selected gene will be shown in grey, and vice versa. The gene annotation track displays the annotated gene models, where thin lines represent introns and thick lines exons. If the selected gene is a protein coding gene with annotated untranslated regions, thinner bars represent untranslated regions and thicker bars represent coding regions. The dashboard will also provide the user with additional information about the selected gene in the corresponding Details tab (Supplemental Figure 1). This place is reserved for tables provided with the -adv_descr parameter containing e.g. complete or extended gene descriptions, synonyms, gene ontologies or known interaction partners of the selected gene. The appearance of the plots can be customized in the Settings tab (Supplemental Figure 2).
As an additional example we visualize crosslink sites from a public human dataset (GSE99700) using SEQing. This dataset contains, among others, in vitro iCLIP data with crosslink sites of the human splicing factor U2 Auxiliary Factor 2 (U2AF2) in the presence (GSM2650339) or absence (GSM2650359) of Far Upstream Element Binding Protein 1 (FUBP1) [16]. BED files from this dataset were imported into SEQing conjoined with the corresponding gene annotation (downloaded from https://www.ensembl.org/Homo_sapiens/Info/Index). Supplemental Figure 3 depicts a genomic window from one of the U2AF2 targets, Polypyrimidine Tract Binding Protein 2 (PTBP2), similar to Figure 6 from Sutandy et al. [16]. The area marked by the arrow points to a binding site that is only detected when FUBP1 is present.
Display of coverage tracks and splice events from RNA-seq
Additional information about iCLIP targets can be inferred from functional data, e.g. RNA-seq data from loss-of-function or gain-of-function mutants of the corresponding RBP (Fig. 3). Here, we use RNA-seq data from atgrp7 mutants and plants constitutively overexpressing AtGRP7 (AtGRP7-ox plants) compared to wild type plants. The samples were again harvested either at LL36 or LL24. The read coverage for each sample is presented as an area graph, combined with annotations below, in this case significant splice events. The splice events shown were determined with SUPPA2 [17] and transformed into BED6 format. The colors represent different event classes of SUPPA2, the alternative 3’ splice site (A3), alternative 5’ splice site (A5), and intron retention (RI) events. We refer to a significant splice event if a change of percent spliced-in (PSI) ratio was greater than 0.1 and the p-value of this event was <0.01 when comparing the mutants to the wild type. Comparisons of the sample graphs yield information on genes differentially expressed or alternatively spliced in response to reduced or elevated AtGRP7 levels. Crosslink sites in the vicinity of alternative splicing events may hint at a regulation of the splicing event by the RBP. The annotation bars below the coverage have three display modes. In the default setting, bars are plotted in blue below the corresponding coverage track. The second option paints the bars corresponding to the supplied text in the name field of the BED6 file, and the last option offers a score-dependent color gradient to display e.g. a score given in floating point numbers. The gene annotation track at the bottom is coherent with the iCLIP tab.