Skip to main content
Fig. 3 | BMC Bioinformatics

Fig. 3

From: Unsupervised detection of regulatory gene expression information in different genomic regions enables gene expression ranking

Fig. 3

Information concentration (mean ARSI; top) and selection (Z-score; bottom) profiles in various transcript regions for E. coli and S. cerevisiae. The sequences are aligned to the ORF’S start, 5'SS, 3'SS, and ORF end; in E. coli, which is prokaryote, only the ORF alignments were generated. a The profiles correspond to the ARSI score of the actual genomic regions in comparison to the randomized ones, using several sequence randomization models of the actual transcriptome that maintain consensus sequences and control for codon-usage bias (CUB) and GC content in various regions (including coding region, intron, and UTR randomizations; see also details in Methods and Additional file 1: Figures S3, S4): the randomized codon model includes scrambled exonic sequences; the randomized intron model includes scrambled intronic sequences; the randomized UTR models include scrambled untranslated sequences. b E. coli profiles for the mature mRNA. c S. cerevisiae profiles for the mature mRNA. d S. cerevisiae profiles for the pre-mRNA. The profiles show that more information is found in the ORF start, rather than downstream in the ORF; around the intronic splice sites the signal is stronger, as well as downstream from the ORF’s end. In addition, the selective pressure on the transcript sequence is stronger in these locations. This suggests the possible enrichment of regulatory sequence motifs in these regions; the distance from the ORF/5’SS/3’SS is relative to the center of the sliding window; sliding window size is 41 nt; other window sizes showed similar results

Back to article page