Skip to main content

REW-ISA: unveiling local functional blocks in epi-transcriptome profiling data via an RNA expression-weighted iterative signature algorithm

Abstract

Background

Recent studies have shown that N6-methyladenosine (m6A) plays a critical role in numbers of biological processes and complex human diseases. However, the regulatory mechanisms of most methylation sites remain uncharted. Thus, in-depth study of the epi-transcriptomic patterns of m6A may provide insights into its complex functional and regulatory mechanisms.

Results

Due to the high economic and time cost of wet experimental methods, revealing methylation patterns through computational models has become a more preferable way, and drawn more and more attention. Considering the theoretical basics and applications of conventional clustering methods, an RNA Expression Weighted Iterative Signature Algorithm (REW-ISA) is proposed to find potential local functional blocks (LFBs) based on MeRIP-Seq data, where sites are hyper-methylated or hypo-methylated simultaneously across the specific conditions. REW-ISA adopts RNA expression levels of each site as weights to make sites of lower expression level less significant. It starts from random sets of sites, then follows iterative search strategies by thresholds of rows and columns to find the LFBs in m6A methylation profile. Its application on MeRIP-Seq data of 69,446 methylation sites under 32 experimental conditions unveiled 6 LFBs, which achieve higher enrichment scores than ISA. Pathway analysis and enzyme specificity test showed that sites remained in LFBs are highly relevant to the m6A methyltransferase, such as METTL3, METTL14, WTAP and KIAA1429. Further detailed analyses for each LFB even showed that some LFBs are condition-specific, indicating that methylation profiles of some specific sites may be condition relevant.

Conclusions

REW-ISA finds potential local functional patterns presented in m6A profiles, where sites are co-methylated under specific conditions.

Background

N6-methyladenosine (m6A), which refers to the methylation of the adenosine bases at the nitrogen-6 position, is the most abundant post-transcriptional modification present in mRNAs and long non-coding RNAs [1]. It has been found to function in various pathways related to mRNA stability [2], DNA damage [3], differentiation [4], circadian clock [5], neurogenesis [6], immunity [7], anti-tumor activity [7], learning and memory [8], sex determination [9], heat shock response [10], etc. With the emergence of high-throughput sequencing technologies, Methylated Immunoprecipitation sequencing (MeRIP-Seq, or m6A-seq) [11, 12], researchers have been able to examine the dynamics and various functions of m6A in human, mouse, yeast, rice and other species [2, 5, 13,14,15].

The m6A methylation has been found to be governed or mediated by relevant enzymes, i.e., writers (METTL3/METTL14/WTAP complex [15, 16], KIAA1429 [17], VIRMA [18], RBM15 [19], ZC3H13 [20], etc.,), erasers (FTO [21], ALKBH5 [22], etc.,) as well as readers (YTH family [2, 10, 23,24,25], IGF2BP1-3 [26], eIF3 [27], etc.,). However, due to the complexity of life, the detailed regulatory circuit of RNA methylate remains uncharted, and it is believed to be more complex than enzyme induced mechanisms.

Till this day, several clustering methods have been proposed to identify co-methylation patterns in MeRIP-Seq data, trying to elucidate the functional mechanisms of m6A methylation. Liu et al. used four different clustering approaches, such as K-means, hierarchical clustering, Bayesian factor regression model as well as nonnegative matrix factorization to unveil the co-methylation patterns [15, 28]. To our knowledge, they revealed the linkage between the global co-methylation patterns embedded in epi-transcriptomic data for the first time. Cui et al. proposed MeTCluster to uncover the potential patterns of m6A methylation. It utilized a hierarchical graphical model to depict the reads counts, suggesting m6A functions could be location specific [29]. We have previously proposed an infinite beta binomial mixture model based on Dirichlet Process (DPBBM) to unveil the co-methylation patterns embedded in MeRIP-Seq data [30]. All the above-mentioned methods focused on clustering methylation sites under all conditions. Current studies have shown that on average 3–5 m6A RNA methylation sites position on each mRNA in human genome [31, 32]. It is conceivable that some specific sites may co-methylate under a subset of experimental conditions. Thus, clustering of sites over all conditions may miss biological meaningful information. On the one hand, sites sharing the same regulatory factor are more likely to co-methylate together; on the other hand, sites residing on the genes that belong to the same pathway may exhibit co-methylation patterns over subsets of experiments. Therefore, we aim to find some local functional blocks (LFBs), where sites are hyper-methylated or hypo-methylated simultaneously across the specific conditions in the same LFB, to unveil the local function patterns in m6A methylation profile.

Biclustering methods have been widely used to identify co-expressed genes under subsets of conditions in large scale microarray data [33,34,35,36,37]. Ihmels proposed an iterative signature algorithm (ISA) [34] to seek for biclusters, where subsets of co-regulated genes and conditions were selected by iterative searching procedure [35]. Murali et al. proposed Xmotifs algorithm, which takes a discretized gene expression matrix as input, to find co-expression patterns, where genes share the same expression level [36]. Prelić et al. proposed a Bimax method, which takes a binarized gene expression matrix as input, to find potential co-expression patterns [37]. The preprocess of discretization of input data results in serious information loss. When profiled by MeRIP-Seq technology, the quantification of RNA methylation level needs to be estimated from two complementary integer measurements indicating site reads count from input and IP samples. Conventionally, m6A methylation level is achieved by simple division operation, which calculates the percentage of site reads in IP sample over the total site reads of input and IP samples. However, it is not always accurate. Even if sites show the same percentage in value, their methylation levels maybe quite different due to their different RNA expression level. To be more specific, if the RNA expression level is very low, there may exist noise, which makes the percentage less confident. Therefore, we proposed herein an RNA Expression Weighted Iterative Signature Algorithm (REW-ISA), which adopted RNA expression level as weight to weaken the confidence sites, then followed an iterative search strategy through rows and columns to seek for LFBs. During the LFB searching strategy, each potential LFB is identified by column threshold (defined as TC) and row threshold (defined as TR). TC and TR are updated automatically according to Standard Deviation within Clusters (SDwC) and Average Similarity within Clusters (ASwC) metrics iteratively. SDwC indicates the closeness of each element in each LFB while ASwC indicates the correlation of each condition pair in each LFB.

REW-ISA was implemented on simulated data as well as real MeRIP-Seq induced m6A methylation level matrix to find potential LFBs. On simulated data, Score of Bi-Clustering (SoBC) metric was followed to evaluate the identification performance of LFBs. On real data, Gene Ontology (GO) analysis and enzyme specificity test were in the next conducted to validate the identified LFBs. As a result, REW-ISA can find LFBs that cover collaboratively hyper-methylated sites under specific conditions.

Results

Performance evaluation

In this study, we applied ISA as well as REW-ISA for simulated data biclustering for performance comparison. As is known, intersection over union (IoU) is a widely used evaluation metric in object detection, which is define as

$${\text{IoU}}\;{ = }\;\frac{{A_{o} }}{{A_{U} }}$$
(1)

where AO represents the intersection between the obtained LFBs and ground truth, while AU indicates the union of the obtained LFBs and ground truth. For example, suppose there are s LFBs embedded in simulated data, and n LFBs are obtained by clustering algorithm. In addition, let \({\varvec{G}} = \{ g_{1} , \ldots , g_{s} \}\) indicate whether there is uncovered LFB matching the s real LFBs respectively. At initialization, all elements in G are 0. So, we can calculate the IoU between each obtained LFB and the real LFBs. To be more specific, the IoU metric for the i-th obtained LFB and real one is achieved, and its maximum value is regarded as the final score of the i-th LFB, which is indicated by IoUid, representing the i-th uncovered LFB matches the d-th real LFB best. Thus, gd = 1. For all the n identified LFBs, the average of IoUid with i = 1, …, n, indicated as IoUmean hereafter can be achieved. Since the number of obtained LFBs may differ from real, IoUmean metric may not be sufficient for performance evaluation. Therefore, SoBC is defined to evaluate the agreement between REW-ISA obtained LFBs and ground truth.

$${\text{SoBC}} = \frac{r}{\max (s,n)}{\text{IoU}}_{mean}$$
(2)

where r indicates the number of ones in G. Thus, r ≤ min(s, n). As SoBC approaches 1, the performance of biclustering is better.

In REW-ISA clustering procedure, TC and TR are key parameters for clustering stringency, and SDwC, ASwC scores are introduced to determine suitable TC and TR. The mean and standard deviation of each LFB are combined in SDwC by (3).

$${\text{SDwC}} = \frac{\sqrt{\sum\limits_{k = 1}^{N}{\frac{1}{{m_{k} \cdot n_{k} }}\;\sum\limits_{i = 1}^{{m_{k} }} {\sum\limits_{j = 1}^{{n_{k} }} {\left( {w_{kij} p_{kij} - \overline{{\varvec{W}_{k} \varvec{P}_{k} }}} \right)^{2}}}}}}{N}$$
(3)

where N indicates the number of algorithm obtained LFBs, Pk is the m6A methylation level of the k-th LFB, Wk is the RNA expression weight for k-th LFB, mk and nk are the number of sites and conditions in the k-th LFB, wkij is the RNA expression level of the i-th site under condition j in the k-th LFB, pkij is the methylation level of the i-th site under condition j in the k-th LFB, \(\overline{{{\varvec{W}}_{{\varvec{k}}} {\varvec{P}}_{{\varvec{k}}} }} = (1/m_{k} n_{k} )\sum\nolimits_{i = 1}^{{{\varvec{m}}_{{\varvec{k}}} }} {\sum\nolimits_{{{\varvec{j}} = 1}}^{{{\varvec{n}}_{{\varvec{k}}} }} {{\varvec{w}}_{{{\varvec{kij}}}} {\varvec{p}}_{{{\varvec{kij}}}} } }\). Thus, SDwC represents the standard deviation of methylation levels in each LFB.

ASwC is regarded as another concern for TC and TR selection. The pearson correlation between condition a and b in the k-th LFB is first calculated as rkab,

$$r_{kab} \; = \;\;\frac{{\sum\nolimits_{t = 1}^{{m_{k} }} {[(w_{kta} p_{kta} \; - \;\overline{{{\varvec{W}}_{{{\varvec{ka}}}} {\varvec{P}}_{{{\varvec{ka}}}} }} ){\kern 1pt} (w_{ktb} p_{ktb} \; - \;\overline{{{\varvec{W}}_{{{\varvec{kb}}}} {\varvec{P}}_{{{\varvec{kb}}}} }} )]} }}{{\sqrt {\sum\nolimits_{t = 1}^{{m_{k} }} {(w_{kta} p_{kta} \; - \;\overline{{{\varvec{W}}_{{{\varvec{ka}}}} {\varvec{P}}_{{{\varvec{ka}}}} }} )^{2} } } {\kern 1pt} \sqrt {\sum\nolimits_{t = 1}^{{m_{k} }} {(w_{ktb} p_{ktb} \; - \;\overline{{{\varvec{W}}_{{{\varvec{kb}}}} {\varvec{P}}_{{{\varvec{kb}}}} }} )^{2} } } }}$$
(4)

where Wka and Wkb represent the RNA expression level under condition a and b in the k-th LFB, Pka and Pkb represent the RNA methylation level under condition a and b, \(\overline{{{\varvec{W}}_{{{\varvec{ka}}}} {\varvec{P}}_{{{\varvec{ka}}}} }} = (1/m_{k} )\sum\nolimits_{i = 1}^{{{\varvec{m}}_{{\varvec{k}}} }} {{\varvec{w}}_{{{\varvec{kia}}}} {\varvec{p}}_{{{\varvec{kia}}}} }\), \(\overline{{{\varvec{W}}_{{{\varvec{kb}}}} {\varvec{P}}_{{{\varvec{kb}}}} }} = (1/m_{k} )\sum\nolimits_{i = 1}^{{{\varvec{m}}_{{\varvec{k}}} }} {{\varvec{w}}_{{{\varvec{kib}}}} {\varvec{p}}_{{{\varvec{kib}}}} }\). Then, ASwC is defined as

$${\text{ASwC}}\; = \;\frac{{\sqrt {\sum\nolimits_{k = 1}^{N} {\frac{2}{{n_{k} (n_{k} - 1)}}\;\sum\nolimits_{a = 1}^{{n_{k} }} {\sum\nolimits_{b = 1,b \ne a}^{{n_{k} }} {r_{kab} } } } } \;}}{N}$$
(5)

where nk is the number of conditions in the k-th LFB. Thus, ASwC indicates how the involved sites co-methylate between conditions in each LFB.

Our original intention is to better reveal the biological functional mechanisms of the co-methylation modules based on transcriptome data, larger SDwC and smaller ASwC metrics are preferred to get larger LFBs with more implicit information.

Simulated data

For performance evaluation, a simulated RNA methylation dataset of size 1000 × 15 was generated from a mixture of 4 beta-binomial distributions, corresponding to three biclustering blocks and the background (Fig. 1a). The overall distribution characteristics of the simulated data were set similar to that of the real MeRIP-Seq data (Fig. 1b, c) to mimic real scenarios. The matched methylation expression data, which is used as the “weight” in the proposed algorithm, was directly calculated from the simulated RNA methylation dataset as previously described.

Fig. 1
figure1

Comparison of statistical characteristics between simulated data and real data. a Heatmap of simulated data. Rows are corresponding to m6A sites while columns represent conditions. b Histogram of simulated data. c Histogram of real data

As is known, TC and TR are defined for subset selection along columns and rows in conventional ISA. They are also defined in REW-ISA, and play a decisive role in LFB stringency. Since the methylation level matrix is pn, it is intuitive that the conditions should be selected more carefully, thus TC asks for more cautious exercise, and detailed explanations are given in method section.

In REW-ISA, a grid search method was followed for parameter optimization. The range of TR is 0.1–5 with step size 0.1, and the range of TC is 0.05–3 with step size 0.05. Their upper bounds are recommended a large value, then adapted automatically in following procedures. For each {TR, TC} setting, 40 experiments were conducted to ensure the robustness. After repeated experiments, the mode of LFB number was adopted as final number of LFBs for each threshold pair, as shown in Fig. 2.

Fig. 2
figure2

The number of LFBs identified by REW-ISA with TR set from 0.1 to 5 at step size of 0.1, TC set from 0.05 to 3 at step size of 0.05. The color scale indicates the number of LFBs obtained

According to Fig. 2, the range of TR and TC can be shrunk. To be more specific, the thresholds that obtain the largest number of LFBs were first reached, then filter out all the combinations with larger TR or TC. This is because, in the abandoned combinations, REW-ISA can also get the same number of LFBs with smaller TR and TC. However, smaller TR and TC remain more rows and columns in each LFB, which may unveil more useful biological information. Thus, the range of TR becomes 0.1–2.5, while the range of TC becomes 0.05–1.15.

We can see from Fig. 2 that when TR and TC are close to their lower limits, only a few LFBs can be achieved with very large scale, which cannot uncover implicit information for functions. Thus, REW-ISA raises the lower bound of TR and TC appropriately. Suppose the matrix of LFB number obtained under different threshold setting is \({\varvec{L}} \in {\mathbb{R}}^{rn \times cn}\) with TR and TC adjusted, where rn represents the number of TRs and cn represents the number of TCs considered. Min–max normalization of L is performed to obtain \({\varvec{L}}^{{{\varvec{norm}}}} \in {\mathbb{R}}^{rn \times cn}\), and then the variance of each row and column in Lnorm is calculated. The variance of the i-th row in Lnorm is vri (1 ≤ i ≤ rn), and the variance of the j-th column is vcj (1 ≤ j ≤ cn).

$$vr_{i} = \frac{{\sum\nolimits_{j = 1}^{cn} {\left(l_{ij}^{norm} -\frac{1}{cn}\sum\nolimits_{j = 1}^{cn} l_{ij}^{norm} \right)^{2}} }}{cn}$$
(6)
$$vc_{j} = \frac{{\sum\nolimits_{i = 1}^{rn} {\left(l_{ij}^{norm} - \frac{1}{rn}\sum\nolimits_{i = 1}^{rn} l_{ij}^{norm} \right)^{2} } }}{rn}$$
(7)

Furthermore, the mean values of elements in vr and vc are calculated as vrmean and vcmean, respectively. We set \(i^{\prime} = \min \{ i:vr_{i} \ge vr_{mean} \}\) and \(j^{\prime} = \min \{ j:vc_{j} \ge vc_{mean} \}\), and then drop out the first i′ − 1 values of the TR and the first j′ − 1 values of the TC from consideration. Thus, the range of TR further becomes 0.3–2.5, while the range of TC becomes 0.1–1.15.

After shrinking the range of TR and TC, the matrix of LFB numbers under each threshold pair setting is updated to be \(\user2{L^{\prime}} \in {\mathbb{R}}^{rn^{\prime} \times cn^{\prime}}\), where \(rn^{\prime} = rn - i^{\prime} + 1\) and \(cn^{\prime} = cn - j^{\prime} + 1\). Then, within the selected threshold range of TR and TC, a sliding window of size ηr × ηc is used to help find more stable selections of TR and TC. The value of ηr and ηc are selected by Eqs. (8) and (9),

$$\eta_{r} = 2 \times \left\lceil {\frac{0.1}{{step_{r} }}} \right\rceil + 1$$
(8)
$$\eta_{c} = 2 \times \left\lceil {\frac{0.1}{{step_{c} }}} \right\rceil + 1$$
(9)

where stepr represents the variable step size of TR, stepc represents that of TC, and \(\left\lceil \cdot \right\rceil\) is round up to integer operation. The sliding window is obvious to cover odd number of rows and columns, which makes the thresholds value of interest locate in the center of the sliding window. Specifically, for the element l′ij locating in the i-th (1 ≤ i ≤ rn′) row and the j-th (1 ≤ j ≤ cn′) column of L′, the mode of the values covered by sliding window is calculated, then compared to the center value l′ij. If they are equal, the threshold setting is maintained for further consideration, and ls′ij = 1 is recorded in the matrix \({\varvec{LS}} \in {\mathbb{R}}^{rn^{\prime} \times cn^{\prime}}\). Otherwise, ls′ij = 0. It is worth noting that when sliding the elements on the boundary of L′, we only select the effective elements in the sliding window to test the stability.

Through L′ and the stable score matrix LS, the threshold pairs with stable LFB number can be screened out. Let S = L′ × LS, \({\varvec{S}} \in {\mathbb{R}}^{rn^{\prime} \times cn^{\prime}}\), the threshold pairs corresponding to non-zero elements in S are stable threshold combinations. After filtering out 0 in S, the number of obtained LFBs in S are counted to provide frequency fln, where ln represents the number of LFBs, and the result is shown in Table 1.

Table 1 Statistics of the number of LFBs obtained in the simulated data

According to the statistics of the number of stable LFBs, that is, Table 1, we find that the number of LFBs with the highest frequency is 3. Therefore, we can reasonably conclude that there is a total of 3 LFBs in the simulated data.

Furthermore, the threshold pairs in LS are filtered according to the number of obtained LFBs is 3. At the same time, we update the ASwC calculated under effective settings of TR and TC. The ASwC value obtained by effective threshold pair is shown in Fig. 3.

Fig. 3
figure3

ASwC metrics of LFBs identified by REW-ISA, the number of LFBs obtained by each threshold pair is 3. The color scale indicates the value of ASwC

Because ASwC is used to measure the similarity between columns in LFBs, the smaller ASwC is, the more information is contained in each LFB. Since smaller ASwC can help retrieve more biological meaningful information, we calculate the mean of ASwC achieved under effective threshold pairs, and remain the threshold pairs with which ASwC scores are less than the mean score to further shrink the range of TR and TC.

Within the narrowed TR and TC range, the SDwC values of each threshold pair are further compared, and the result is shown in Fig. 4.

Fig. 4
figure4

SDwC metrics of LFBs identified by REW-ISA, the values of TR and TC are their respective shrinking threshold ranges. The color scale indicates the value of SDwC

As shown in Fig. 4, the optimal value of TR and TC, where SDwC gets its maximum indicating the loose and information abundance of each LFB, are 1.2 and 0.35 respectively. Since smaller threshold may find the larger LFB, REW-ISA chooses smaller TR and TC when the maximum value of SDwC is achieved under multiple pairs of TR and TC.

In a word, it is suggested that the upper bound initialization of TR and TC be set to larger values, and REW-ISA can automatically shrink the range. Besides, the step size of TR and TC during grid search has no effect on final result. The REW-ISA thresholds optimization algorithm is in the following.

Algorithm 1: Thresholds optimization process of REW-ISA
Input: Methylation level matrix P, weight matrix W and initialize the range of TR and TC
Output: The optimal TR and TC, and the number of LFBs determined by the above TR and TC
Step1: Run REW-ISA within the initial threshold range, obtain the threshold pair that generates the most LFBs, and then shrink the range of TR and TC
Step2: Within the threshold range after contraction, the LFB number matrix L′, stable score matrix LS, stable LFB number matrix S, compactness SDwC and ASwC are calculated
Step3: Count the frequency of the number of LFBs in the matrix S
Step4: According to the maximum frequency, select the corresponding optimal number of LFBs
Step5: The ASwC value corresponding to each threshold pair is calculated, and the thresholds ranges are further reduced
Step6: Select the optimal TR and TC according to the maximum SDwC within the selected thresholds range
Return The optimal TR and TC, and the number of LFBs determined by the optimal TR and TC

To validate the automatic parameter selection procedure of TR and TC, we investigated the SoBC of ISA and REW-ISA identified LFBs with varying TCs (0.1 to 1.15) and TRs (0.3 to 2.5), as shown in Fig. 5.

Fig. 5
figure5

Heatmap of SoBC with TC varying from 0.05 to 1.5, and TR varying from 0.1 to 2.5. a Heatmap of SoBC obtained by ISA. b Heatmap of SoBC obtained by REW-ISA. The color scale indicates SoBC score

As shown Fig. 5b, in REW-ISA, the optimal value of TR and TC locate between 1.2–1.4 and 0.1–0.65 respectively on simulated data. This is consistent with the beforementioned parameter selection procedure given in Algorithm 1. Besides, we also found that the SoBC metric of REW-ISA can reach around 0.9, while that of ISA is around 0.2, implying that the LFBs uncovered by REW-ISA is more effective than ISA.

Real data

A total of 69,446 human m6A sites identified by six base-resolution mi-CLIP and m6A-CLIP experiments were obtained by WHISTLE project [38,39,40,41,42]. However, mi-CLIP and m6A-CLIP only report the positioning of m6A sites, but do not provide the methylation level of each site. The information of methylation level still comes from MeRIP-Seq data. To be more specific, 32 samples in 10 publicly human m6A MeRIP-Seq data sets were collected [2, 5, 11, 17, 43,44,45,46,47], and most of them can be retrieved from MeTDBV2.0 database [48]. The detailed description of data was given in additional file (see Additional file 1: Table S1).

As is known, MeRIP-Seq data profiles the m6A epi-transcriptome by input and IP data. Thus, we first followed [42, 49] to quantify methylation level of each site. The biological replicates of the same cell line from the same experiment were merged, and the methylation level of the combined samples is essentially the average of all the biological replicates. All the sequencing data were downloaded in SRA format from Gene Expression Omnibus, and the reads were aligned to human reference genome hg19 with Tophat2 (with default settings as read-mismatches = 2, read-gap-length = 2, read-edit-dist = 2, min-anchor = 8, min-intron-length = 50 and max-intron-length = 500,000) for Fragments Per Kilobase of transcript per Million (FPKM) statistics [50].

The methylation level was then quantified by calculating the ratio of fold enrichment of reads in IP sample over the total of IP and input samples. To be more specific, let tij representing FPKM of the i-th site in IP sample under the j-th condition, and hij representing FPKM of the i-th site in input sample under the j-th condition. Let P indicate the methylation level matrix, the methylation level of the i-th site under the j-th condition pij can be calculated following (10).

$$p_{ij} = \frac{{t_{ij} { + }\alpha }}{{t_{ij} + h_{ij} { + }2\alpha }}$$
(10)

where α is a very small value, aiming to avoid NaN where FPKM of both IP and input samples are zeros, and pij resides in (0,1).

We also constructed the weight matrix W corresponding to P,

$$w_{ij} = \log (t_{ij} + h_{ij} + 1)$$
(11)

where 1 was added to ensure wij ≥ 0. With the employment of W, the less confident sites with lower expression level are weakened for further biclustering analysis.

Then, REW-ISA is conducted based on P and W. Within the range of TR being 0.1–5 with step size 0.1, and TC being 0.05–3 with step size 0.05, TR and TC are optimized through grid search method. The experiments were repeated 10 times for each parameter setting.

As shown in Fig. 6, maximum number of LFBs is 14. The upper bounds of TR and TC are 3.5 and 1.35 respectively. Furthermore, the variances of each row and column in the LFB number matrix are calculated according to (6) and (7), and then the mean values of row variances and column variances are calculated respectively. The first elements which is larger than the above mean values are selected from the obtained row and column variance vectors, and the corresponding TR and TC are the new lower bounds of TR and TC. Based on the above process, the lower bounds of TR and TC are set to 1.3 and 0.6. The statistics of the number of LFBs obtained under different TR and TC are shown in Table 2. Thus, the number of LFBs is preferred to be 6.

Fig. 6
figure6

The number of LFBs identified by REW-ISA with TR ranges in 0.1–5 with step size 0.1, TC ranges in 0.05–3 with step size 0.05. The color scale indicates the number of LFBs obtained

Table 2 Statistics of the number of LFBs obtained in the real data

Based on the threshold pairs that achieve 6 LFBs, ASwC scores are presented in Fig. 7.

Fig. 7
figure7

ASwC metrics of LFBs identified by REW-ISA, the number of LFBs obtained by the threshold pairs is 6. The color scale indicates the value of ASwC

Then, the mean of ASwC scores is calculated, and the threshold pairs that get greater ASwC than the mean is further filtered. For the remained threshold pairs, their corresponding SDwC values are calculated, as shown in Fig. 8.

Fig. 8
figure8

SDwC metrics of LFBs identified by REW-ISA, the values of TR and TC are their respective shrinking threshold ranges. The color scale indicates the value of SDwC

Based on Fig. 8, TR and TC are selected to be 1.6 and 1 as optimal, where the largest SDwC appears.

To further explore the biological relevance of the reported six LFBs, we first annotated the Entrez Gene ID and Gene Symbol of genes corresponding to each site in each LFB, then conducted pathway and GO enrichment analysis. Six KEGG pathways known to be regulated by RNA methylation [3, 11, 21] were selected to validate whether a pathway is significantly enriched in a specific LFB using Fisher’s exact test. The output p-value shows the significance of association between the obtained LFBs and the biological pathways with multiple hypothesis corrections.

We could see from Table 3 that obtained LFB1, LFB2 and LFB3 are significantly enriched in fatty acid metabolism. Fatty acids are a substance of the aliphatic group, and the efficacy and function of fatty acids are mainly supplemented for human absorption. Also, studies have found that fatty acids play important roles in regulating metabolism, growth and development and cell differentiation. LFB2 and LFB3 are also enriched in p53 pathway, which consists of a network responding to a variety of intrinsic and extrinsic stress signals that impact upon cellular homeostatic mechanisms, disrupting DNA replication, chromosome segregation and cell division, etc. [51, 52]. As is known, Gamma and UV irradiation could also result in DNA damage [53]. LFB6 is shown to be significantly enriched in both Fatty Acid Metabolism and UV response Down pathways, implying that LFB6 is composed of methylation sites that are relevant to life development process from fatty acid metabolism affected by UV stimulation. For LFB5, it is not enriched to any of the six KEGG pathways, indicating that LFB5 may have other implicit biological significance, and further analysis is carried out in the next.

Table 3 Pathway analysis of REW-ISA obtained LFBs

The GO enrichment analysis was then conducted by clusterProfiler Bioconductor package [54] for each obtained LFB, with p-value cutoff set as 0.05 and q-value cutoff 0.2. For all the GO terms enriched by genes in each LFB, the negative log transform of p-value was employed as their enrichment scores.

$$s_{i} = - \log (p_{i} )$$
(12)

where pi is the p-value of the i-th GO term.

In fact, GO terms with more enriched genes may not show higher enrichment scores. As to LFBs, the proportion of genes that involved in LFB is also an important factor. It is conceivable that we adhere a weight to better describe the contribution of each GO term. The weight of the i-th GO term is defined as mi/M, where mi is the number of genes of the i-th GO term enriched in this LFB and M is the total number of genes in this LFB. Therefore, WE_score is defined as (13).

$${\text{WE}}\_{\text{score}} = \frac{{s_{1} m_{1} /M + s_{2} m_{2} /M + \cdots + {\kern 1pt} s_{l} m_{l} /M}}{{m_{1} /M + m_{2} /M + \cdots + {\kern 1pt} m_{l} /M + m_{non} \;{/}\;M}}$$
(13)

where l is the number of GO terms that enriched in each LFB, mnon is the number of genes covered by LFB but not enriched by any GO term. The higher the WE_score is, the more biologically significant the LFBs are [55]. Three state of the art algorithms, Xmotifs [36], Bimax [37], ISA [35] were conducted for biclustering on real data in comparison to REW-ISA. Besides, subsets with different number of sites and conditions were selected randomly as LFBs from real data. WE_score for all the obtained LFBs by all the algorithms are given in Fig. 9.

Fig. 9
figure9

WE_score of LFBs obtained by random gene groups and four biclustering algorithms

It can be seen from Fig. 9 that the REW-ISA algorithm is effective, and the result is consistent with many related research results [37, 56]. The average WE_score of LFBs inferred by REW-ISA is 13.2% higher than that of ISA, which implied more biological significance of REW-ISA. Besides, LFBs identified by the four algorithms achieve significantly higher WE_score than random one, which also indicates the biological significance.

We further examined whether the identified LFBs show enzyme’s substrate specificity. Since LFB covers hyper-methylated sites and conditions, the sites and conditions involved in each LFB are more likely to be the target sites of m6A methyltransferases. Therefore, we investigated the association between each LFB and four m6A methyltransferases, including METTL3, METTL14, WTAP as well as KIAA1429. For this purpose, 12,643 METTL3 targeted sites, 7689 METTL14 targeted sites, 13,124 WTAP-targeted sites and 399 KIAA1429 targeted RNA methylation sites were first identified by TREW [48], as shown in Table 4.

Table 4 Number of m6A methyltransferase target sites in each LFB

Then, the association between the sites in each LFB and m6A methyltransferases target sites was further evaluated by Fisher’s exact test. The reported p-value indicates the significance of association between sites and methyltransferase target sites. As shown in Table 5, all the four m6A methyltransferases targeted sites in the six obtained LFBs are significantly enriched (FDR < 0.05), which means the LFBs obtained by REW-ISA were indeed the collaboratively hyper-methylated sites under specific conditions.

Table 5 Enzyme specificity analysis of REW-ISA obtained LFBs

In LFB1, LFB2, LFB3 and LFB6, Venn diagrams of the sites, conditions and functional annotations of genes that selected sites involved in each LFB reside on were shown in Fig. 10. As shown in Fig. 10a, it was obvious that this four LFBs contain 12,971 identical methylation sites. From the perspective of conditions, the conditions involved in LFB1 were all covered by LFB3 and LFB6, while LFB3 and LFB6 contain two conditions that were not contained by LFB1, respectively, as shown in Fig. 10b. It is also worth mentioning that for LFB2, all the conditions included in it are from human liver hepatocellular cells (HepG2) cell lines, indicating some LFBs may be condition specific. Although the conditions contained in LFB1 and LFB3 are very similar, they still contain over one hundred unshared functional annotations, which may be due to site differences between them, as shown in Fig. 10c.

Fig. 10
figure10

Venn diagrams for obtained LFBs. a Venn diagrams of sites in LFB1, LFB2, LFB3 and LFB6. b Venn diagrams of conditions in LFB1, LFB2, LFB3 and LFB6. c Venn diagrams of functional annotations of genes that selected sites in LFB1, LFB2, LFB3 and LFB6 reside on

Since LFB2 was found enriched in three KEGG pathways previously, and might be condition specific, LFB2 was compared with LFB4 and LFB5 for further study, and the Venn diagram is shown in Fig. 11.

Fig. 11
figure11

Venn diagrams for obtained LFBs. a Venn diagrams of sites in LFB2, LFB4 and LFB5. b Venn diagrams of conditions in LFB2, LFB4 and LFB5. c Venn diagrams of functional annotations of genes that selected sites in LFB2, LFB4 and LFB5 reside on

It can be seen from Fig. 11 that although LFB2, LFB4 and LFB5 share 2544 sites, they share only one condition, which leads to 203 functional annotations that are not shared at all, indicating that the three LFBs may play different roles in m6A methylation.

We further investigated the functions of LFB2 in detail, as shown in Fig. 12.

Fig. 12
figure12

The functional relationship diagram obtained from the analysis of the genes related to the LFB2 using KEGG pathway. Degree represents the number of genes enriched by KEGG pathway

Some genes that sites in LFB2 reside on are found to be involved in m6A-related pathways, such as Ras protein signal transduction [57], macromolecule methylation [58], peptidyl-lysine modification [59], histone modification [60] and covalent chromatin modification [61], implying LFB2 may further help elaborate the functional mechanisms of m6A methylation. Besides, some pathways, such as response to heat [10, 62], are found to be significantly enriched in LFB2, which is also consistent with previous analysis that LFB2 covers conditions with HepG2 cells that exposed to ultraviolet radiation, heat shock, hepatocyte growth factor (HGF; also known as scatter factor (SF)), and interferon-γ.

Since LFB5 is not enriched in any of the six KEGG pathways in the previous analysis, the functionality of LFB5 is similarly examined in further detail, and the result is shown in Fig. 13.

Fig. 13
figure13

The functional relationship diagram obtained from the analysis of the genes related to the LFB5 using KEGG pathway. Degree represents the number of genes enriched by KEGG pathway

We can see that LFB5 is mainly enriched in functional annotations related to histone and lysine modification, in which the modification form is mainly acetylation modification. M6A modification in RNA has been found to be determined by histone modification [63]. Therefore, the genes contained in LFB5 may help uncover the relationship between histone and lysine modification and m6A methylation.

Discussion

More and more studies have shown that m6A RNA methylation plays an extremely important role in a variety of biological processes. Moreover, the functions of m6A methylation have been revealed by more and more researchers. Through the study of m6A methylation, we could understand the pathogenesis of the disease at post-transcriptional level, which would help us build a more comprehensive understanding of life process such as disease mechanisms. However, unveiling the functional m6A methylation sites through biological experiments is time-consuming and expensive, so it is very necessary to develop some effective computational algorithms to predict potential functional m6A sites. In this paper, we developed an RNA expression weighted ISA method, REW-ISA, to uncover the potential local methylation patterns across subsets of condition. REW-ISA approached 6 LFBs based on MeRIP-Seq data from 10 cell lines under 32 different conditions. Further GO analysis and some specificity tests show that REW-ISA obtained LFBs can find hyper-methylated local functional patterns, which are highly relevant with conditions.

REW-ISA could achieve reliable biclustering patterns because of its adoption of RNA expression level. For the m6A methylation level matrix, the level was drawn based on the ratio between IP and input samples, and there are no additional supplements for RNA expression level. To be more specific, the methylation levels of sites of high expression level should be more confident than those of low expression level since the reads count statistics in low expression sites may come from noise, which makes them unconfident. By incorporation of RNA expression level, sites with very low abundance of reads count will be assigned very low weight, thus, excluded for consideration of biclustering. Of course, REW-ISA still has some deficiencies that needs to be improved in the future. REW-ISA seek for LFBs of hyper-methylated sites under subsets of conditions based on methylation level, which is achieved by simple division operation. However, it may lead to information loss during the division operation. The information carried by both input and IP samples should be more than the methylation level and RNA expression level. In the future, we will develop new computational model to overcome these limitations.

Conclusions

With comparison with conventional ISA method, we believe that our test suggests REW-ISA as a simple but effective tool for local functional pattern recognition tasks. Through the experiments, we also showed that REW-ISA is also feasible for real-world applications with similar issues as local pattern analysis problem in m6A methylation profiles.

Methods

In conventional ISA method, rows and columns of data are standardized first, and subsets of rows and columns are updated iteratively according to their own thresholds. However, in REW-ISA, we propose to import weights to enhance the confidence of methylation level estimation, so the min–max normalization was employed instead of z-score normalization. The methylation level matrix \({\varvec{P}} \in {\mathbb{R}}^{p \times n}\) turns into PR after row min–max normalization, and turns into PC after column min–max normalization. The flowchart of REW-ISA is shown in Fig. 14. In general, REW-ISA consists of two steps. The first step aims to form the methylation level and weight matrix for all sites under all conditions. The second part conducts iteratively selection of subsets of rows and columns for LFBs. With the employment of \({\varvec{W}} \in {\mathbb{R}}^{p \times n}\), the contribution of sites showing similar methylation level may be distinguishable due to their different expression level.

Fig. 14
figure14

The flowchart of REW-ISA consists of two steps: The first step prepares the methylation level matrix P and weight matrix W; the second step iteratively updates subsets for LFBs. The iterative update refers to the iterative selection along columns and rows

For subset selection of columns, the updated subsets are achieved following (14).

$$\left\{ \begin{gathered} e_{{\user2{U^{\prime}}{\kern 1pt} v}}^{C} \; = \;\frac{1}{{|\user2{U^{\prime}}{|}}}\sum\nolimits_{{u\; \in \;\user2{U^{\prime}}}} {\left(w_{{u{\kern 1pt} v}} \; \cdot \;p_{{u{\kern 1pt} v}}^{R} \right)} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} v\; \in \;{\varvec{V}} \hfill \\ \user2{V^{\prime}}\;\; = \;\;\left\{ v\; \in {\varvec{V}}\;\;:\;\;|e_{{\user2{U^{\prime}}{\kern 1pt} v}}^{C} - \frac{1}{{|{\varvec{V}}|}}\sum\nolimits_{{v\; \in \;{\varvec{V}}}} {e_{{\user2{U^{\prime}}{\kern 1pt} v}}^{C} } |\;\; > \;\;\frac{{T_{C} }}{{\sqrt {|\user2{U^{\prime}}} {|}}}\right\} \hfill \\ \end{gathered} \right.$$
(14)

where V is the column set of P, \(p_{{u{\kern 1pt} v}}^{R}\) refers to the u-th site under v-th condition in PR, wuv is the RNA expression level of the u-th site under v-th condition. In Eq. (14), \(e_{{U^{\prime}{\kern 1pt} v}}^{C}\) is calculated based on PC and W for column subset selection, but only the conditions involved in U′ are considered. Higher TC setting will result in less conditions in LFB.

Then, the subsets of rows are updated following (15).

$$\left\{ \begin{gathered} e_{{u{\kern 1pt} \user2{V^{\prime}}}}^{R} \; = \;\frac{1}{{|\user2{V^{\prime}}{|}}}\sum\nolimits_{{v\; \in \;\user2{V^{\prime}}}} {(w_{{u{\kern 1pt} v}} \; \cdot \;p_{{u{\kern 1pt} v}}^{C} )} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} u \in {\varvec{U}} \hfill \\ \user2{U^{\prime}}\;\; = \;\;\left\{ {u\; \in \;{\varvec{U}}\;\;:\;\;|e_{{u{\kern 1pt} \user2{V^{\prime}}}}^{R} - \frac{1}{{|{\varvec{U}}|}}\sum\nolimits_{{u\; \in \;{\varvec{U}}}} {e_{{u{\kern 1pt} \user2{V^{\prime}}}}^{R} } |\;\; > \;\;\frac{{T_{R} }}{{\sqrt {|\user2{V^{\prime}}} {|}}}} \right\} \hfill \\ \end{gathered} \right.$$
(15)

where U is the row set of P, \(p_{{u{\kern 1pt} v}}^{C}\) refers to the u-th site under v-th condition in PC, \(e_{{u{\kern 1pt} V^{\prime}}}^{R}\) is calculated based on PR and W for row subset selection, but only the sites involved in V′ are considered. Higher TR results in less sites in LFB.

Since P is \(p \gg n\), it is intuitive that LFBs will cover more sites than conditions. Thus, |U′| |V′|. The selection of sites needs to be more strict, thus TR is recommended to be larger for less sites inclusion for each LFB. On the contrary, there are not that much conditions in P, thus, TC is recommended to be smaller for loose constrain of conditions in each LFB.

In the parameter selection procedure, it is recommended that the upper bound of TR and TC be set larger values, and the algorithm can automatically shrink the thresholds range. Although the larger upper bound may introduce computation load, it is still acceptable since no LFBs can be achieved under large thresholds setting. The optimization of TR and TC is reached by grid search. With optimized TR and TC, a subset of sites is randomly selected as U′, and then the subset of conditions V′ is selected according to (14). U′ and V′ are updated iteratively by (14) and (15) until convergenece is satisfied.

$$\frac{{|\user2{U^{\prime}} \cap \user2{U^{\prime\prime}}|}}{{|\user2{U^{\prime}} \cup \user2{U^{\prime\prime}}|}} \le \varepsilon$$
(16)

where ε is the default convergence criteria, U′′ represents the subset of sites in previous iteration, and U′ represents the subset of site in current iteration.

The implementation of REW-ISA following the above definition is summarized in the following.

Algorithm 2: REW-ISA biclustering algorithm
Input: Methylation site V, conditions U, methylation level matrix P, weight matrix W and converge threshold ε
Output: A series of LFBs (U′, V′)
Step1: Construct row normalized matrix PR, construct column normalized matrix PC
Step2: Given the pre-defined range for TR and TC, get the automatically optimized parameter settings
Step3: Under the optimized parameters TR and TC, initialize the sites subset U′ and update U′ and V′ iteratively until the convergence condition is met
Step4: Report U′ and V′
Return A series of LFBs (U′, V′)

Availability of data and materials

The code implemented to perform the analysis is deposited at https://github.com/cst-cumt/REW-ISA. The detailed information of real data consists of MeRIP-Seq data from 10 cell lines under 32 kinds of treatments, as listed in Additional file 1: Table S1. For each type of treatment, MeRIP-Seq gets input and IP sequences respectively. Among the 32 conditions, 30 can be retrieved from MeTDBV2.0 at https://180.208.58.19/metdb_v2, and the other two can be retrieved by GEO accession numbers (SRR5080301-SRR50312 and SRR5239086-SRR5239109).

Abbreviations

m6A:

N6-mthyladenosine

REW-ISA:

RNA Expression Weighted Iterative Signature Algorithm

LFB:

Local functional block

MeRIP-Seq:

Methylated RNA immunoprecipitation sequencing

DPBBM:

Dirichlet process based beta binomial mixture model

ISA:

Iterative signature algorithm

SDwC:

Standard deviation within clusters

ASwC:

Average similarity within clusters

SoBC:

Score of bi-clustering

GO:

Gene ontology

IoU:

Intersection over union

FPKM:

Fragments Per Kilobase of transcript per Million

HGF:

Hepatocyte growth factor

HepG2:

Human liver hepatocellular cells

References

  1. 1.

    Long W-L, Guo H, Sheng J, Song R-H, Xu Y. Role of m6A RNA methylation in tumorigenesis and development. Biotechnol Bull. 2019;6:25.

    Google Scholar 

  2. 2.

    Wang X, Lu Z, Gomez A, Hon GC, Yue Y, Han D, Fu Y, Parisien M, Dai Q, Jia G, et al. N6-methyladenosine-dependent regulation of messenger RNA stability. Nature. 2014;505(7481):117–20.

    PubMed  Article  CAS  Google Scholar 

  3. 3.

    Xiang Y, Laurent B, Hsu C-H, Nachtergaele S, Lu Z, Sheng W, Xu C, Chen H, Ouyang J, Wang S, et al. RNA m6A methylation regulates the ultraviolet-induced DNA damage response. Nature. 2017;543(7646):573–6.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  4. 4.

    Shay G, Sharon M-M, Dan D, Abed AlFatah M, Nitzan K, Mali S-D, Vera H. m6A mRNA methylation facilitates resolution of naive pluripotency toward differentiation. Science (New York). 2015;6225(347):1.

    Google Scholar 

  5. 5.

    Fustin J-M, Doi M, Yamaguchi Y, Hida H, Nishimura S, Yoshida M, Isagawa T, Morioka Masaki S, Kakeya H, Manabe I, et al. RNA-methylation-dependent RNA processing controls the speed of the circadian clock. Cell. 2013;155(4):793–806.

    CAS  PubMed  Article  Google Scholar 

  6. 6.

    Yoon K-J, Ringeling FR, Vissers C, Jacob F, Pokrass M, Jimenez-Cyrus D, Su Y, Kim N-S, Zhu Y, Zheng L, et al. Temporal control of mammalian cortical neurogenesis by m6A methylation. Cell. 2017;171(4):877-889.e817.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  7. 7.

    Han D, Liu J, Chen C, Dong L, Liu Y, Chang R, Huang X, Liu Y, Wang J, Dougherty U, et al. Anti-tumour immunity controlled through mRNA m6A methylation and YTHDF1 in dendritic cells. Nature. 2019;566(7743):270–4.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  8. 8.

    Shi H, Zhang X, Weng Y-L, Lu Z, Liu Y, Lu Z, Li J, Hao P, Zhang Y, Zhang F, et al. m6A facilitates hippocampus-dependent learning and memory through YTHDF1. Nature. 2018;563(7730):249–53.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  9. 9.

    Haussmann IU, Bodi Z, Sanchez-Moran E, Mongan NP, Archer N, Fray RG, Soller M. m6A potentiates Sxl alternative pre-mRNA splicing for robust Drosophila sex determination. Nature. 2016;540(7632):301–4.

    CAS  PubMed  Article  Google Scholar 

  10. 10.

    Zhou J, Wan J, Gao X, Zhang X, Jaffrey SR, Qian S-B. Dynamic m6A mRNA methylation directs translational control of heat shock response. Nature. 2015;526(7574):591–4.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  11. 11.

    Dominissini D, Moshitch-Moshkovitz S, Schwartz S, Salmon-Divon M, Ungar L, Osenberg S, Cesarkas K, Jacob-Hirsch J, Amariglio N, Kupiec M. Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature. 2012;485(7397):201–6.

    CAS  PubMed  Article  Google Scholar 

  12. 12.

    Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehar J, Kryukov GV, Sonkin D, et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483(7391):603–7.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  13. 13.

    Hess ME, Hess S, Meyer KD, Verhagen LA, Koch L, Brönneke HS, Dietrich MO, Jordan SD, Saletore Y, Elemento O. The fat mass and obesity associated gene (Fto) regulates activity of the dopaminergic midbrain circuitry. Nat Neurosci. 2013;16(8):1042–8.

    CAS  PubMed  Article  Google Scholar 

  14. 14.

    Schwartz S, Agarwala SD, Mumbach MR, Jovanovic M, Mertins P, Shishkin A, Tabach Y, Mikkelsen TS, Satija R, Ruvkun G. High-resolution mapping reveals a conserved, widespread, dynamic mRNA methylation program in yeast meiosis. Cell. 2013;155(6):1409–21.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  15. 15.

    Liu J, Yue Y, Han D, Wang X, Fu Y, Zhang L, Jia G, Yu M, Lu Z, Deng X. A METTL3–METTL14 complex mediates mammalian nuclear RNA N6-adenosine methylation. Nat Chem Biol. 2014;10(2):93–5.

    CAS  PubMed  Article  Google Scholar 

  16. 16.

    Ping X-L, Sun B-F, Wang L, Xiao W, Yang X, Wang W-J, Adhikari S, Shi Y, Lv Y, Chen Y-S. Mammalian WTAP is a regulatory subunit of the RNA N6-methyladenosine methyltransferase. Cell Res. 2014;24(2):177–89.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  17. 17.

    Schwartz S, Mumbach MR, Jovanovic M, Wang T, Maciag K, Bushkin GG, Mertins P, Ter-Ovanesyan D, Habib N, Cacchiarelli D. Perturbation of m6A writers reveals two distinct classes of mRNA methylation at internal and 5′ sites. Cell Rep. 2014;8(1):284–96.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  18. 18.

    Yue Y, Liu J, Cui X, Cao J, Luo G, Zhang Z, Cheng T, Gao M, Shu X, Ma H. VIRMA mediates preferential m6A mRNA methylation in 3′ UTR and near stop codon and associates with alternative polyadenylation. Cell discovery. 2018;4(1):1–17.

    CAS  Article  Google Scholar 

  19. 19.

    Patil DP, Chen C-K, Pickering BF, Chow A, Jackson C, Guttman M, Jaffrey SR. m6A RNA methylation promotes XIST-mediated transcriptional repression. Nature. 2016;537(7620):369–73.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  20. 20.

    Wen J, Lv R, Ma H, Shen H, He C, Wang J, Jiao F, Liu H, Yang P, Tan L. Zc3h13 regulates nuclear RNA m6A methylation and mouse embryonic stem cell self-renewal. Mol Cell. 2018;69(6):1028–38.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  21. 21.

    Jia G, Fu Y, Zhao X, Dai Q, Zheng G, Yang Y, Yi C, Lindahl T, Pan T, Yang Y-G. N6-methyladenosine in nuclear RNA is a major substrate of the obesity-associated FTO. Nat Chem Biol. 2011;7(12):885.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  22. 22.

    Zheng G, Dahl JA, Niu Y, Fedorcsak P, Huang C-M, Li CJ, Vågbø CB, Shi Y, Wang W-L, Song S-H. ALKBH5 is a mammalian RNA demethylase that impacts RNA metabolism and mouse fertility. Mol Cell. 2013;49(1):18–29.

    CAS  PubMed  Article  Google Scholar 

  23. 23.

    Wang X, Zhao BS, Roundtree IA, Lu Z, Han D, Ma H, Weng X, Chen K, Shi H, He C. N6-methyladenosine modulates messenger RNA translation efficiency. Cell. 2015;161(6):1388–99.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  24. 24.

    Xiao W, Adhikari S, Dahal U, Chen Y-S, Hao Y-J, Sun B-F, Sun H-Y, Li A, Ping X-L, Lai W-Y. Nuclear m6A reader YTHDC1 regulates mRNA splicing. Mol Cell. 2016;61(4):507–19.

    CAS  PubMed  Article  Google Scholar 

  25. 25.

    Xu C, Wang X, Liu K, Roundtree IA, Tempel W, Li Y, Lu Z, He C, Min J. Structural basis for selective binding of m6A RNA by the YTHDC1 YTH domain. Nat Chem Biol. 2014;10(11):927–9.

    CAS  PubMed  Article  Google Scholar 

  26. 26.

    Huang H, Weng H, Sun W, Qin X, Shi H, Wu H, Zhao BS, Mesquita A, Liu C, Yuan CL. Recognition of RNA N6-methyladenosine by IGF2BP proteins enhances mRNA stability and translation. Nat Cell Biol. 2018;20(3):285–95.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. 27.

    Meyer KD, Patil DP, Zhou J, Zinoviev A, Skabkin MA, Elemento O, Pestova TV, Qian S-B, Jaffrey SR. 5′ UTR m6A promotes cap-independent translation. Cell. 2015;163(4):999–1010.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. 28.

    Liu L, Zhang SW, Zhang YC, Liu H, Zhang L, Chen R, Huang Y, Meng J. Decomposition of RNA methylome reveals co-methylation patterns induced by latent enzymatic regulators of the epitranscriptome. Mol Biosyst. 2015;11(1):262–74.

    CAS  PubMed  Article  Google Scholar 

  29. 29.

    Cui X, Meng J, Zhang S, Rao MK, Chen Y, Huang Y. A hierarchical model for clustering m6A methylation peaks in MeRIP-seq data. BMC Genom. 2016;17(7):520.

    Article  CAS  Google Scholar 

  30. 30.

    Zhang L, He Y, Wang H, Liu H, Huang Y, Wang X, Meng J. Clustering count-based RNA methylation data using a nonparametric generative model. Curr Bioinform. 2019;14(1):11–23.

    CAS  Article  Google Scholar 

  31. 31.

    Liu H, Flores MA, Meng J, Zhang L, Zhao X, Rao MK, Chen Y, Huang Y. MeT-DB: a database of transcriptome methylation in mammalian cells. Nucl Acids Res. 2014;43:D197.

    PubMed  Article  CAS  Google Scholar 

  32. 32.

    Sun W, Li J, Liu S, Wu J, Zhou H, Qu L, Yang J. RMBase: a resource for decoding the landscape of RNA modifications from high-throughput sequencing data. Nucl Acids Res. 2015;44:D259–65.

    PubMed  Article  CAS  Google Scholar 

  33. 33.

    van Dam S, Vosa U, van der Graaf A, Franke L, de Magalhaes JP. Gene co-expression analysis for functional classification and gene–disease predictions. Brief Bioinform. 2018;19(4):575–92.

    PubMed  Google Scholar 

  34. 34.

    Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N. Revealing modular organization in the yeast transcriptional network. Nat Genet. 2002;2002:1089–1089.

    Google Scholar 

  35. 35.

    Bergmann S, Ihmels J, Barkai N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E. 2003;67:031902.

    Article  CAS  Google Scholar 

  36. 36.

    Murali T, Kasif S. Extracting conserved gene expression motifs from gene expression data. In: Biocomputing 2003. Singapore: World Scientific; 2002, p. 77–88.

  37. 37.

    Prelić A, Bleuler S, Zimmermann P, Wille A, Bühlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics. 2006;22(9):1122–9.

    PubMed  Article  Google Scholar 

  38. 38.

    Linder B, Grozhik AV, Olarerin-George AO, Meydan C, Mason CE, Jaffrey SR. Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome. Nat Methods. 2015;12(8):767–72.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  39. 39.

    Ke S, Alemu EA, Mertens C, Gantman EC, Fak JJ, Mele A, Haripal B, Zucker-Scharff I, Moore MJ, Park CY. A majority of m6A residues are in the last exons, allowing the potential for 3′ UTR regulation. Genes Dev. 2015;29(19):2037–53.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  40. 40.

    Vu LP, Pickering BF, Cheng Y, Zaccara S, Nguyen D, Minuesa G, Chou T, Chow A, Saletore Y, MacKay M. The N6-methyladenosine (m6A)-forming enzyme METTL3 controls myeloid differentiation of normal hematopoietic and leukemia cells. Nat Med. 2017;23(11):1369.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  41. 41.

    Ke S, Pandya-Jones A, Saito Y, Fak JJ, Vågbø CB, Geula S, Hanna JH, Black DL, Darnell JE, Darnell RB. m6A mRNA modifications are deposited in nascent pre-mRNA and are not required for splicing but do specify cytoplasmic turnover. Genes Dev. 2017;31(10):990–1006.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  42. 42.

    Chen K, Wei Z, Zhang Q, Wu X, Rong R, Lu Z, Su J, de Magalhaes JP, Rigden DJ, Meng J. WHISTLE: a high-accuracy map of the human N6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach. Nucl Acids Res. 2019;47(7):e41–e41.

    PubMed  Article  CAS  Google Scholar 

  43. 43.

    Pendleton KE, Chen B, Liu K, Hunter OV, Xie Y, Tu BP, Conrad NK. The U6 snRNA m6A methyltransferase METTL16 regulates SAM synthetase intron retention. Cell. 2017;169(5):824–35.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  44. 44.

    Barbieri I, Tzelepis K, Pandolfini L, Shi J, Millán-Zambrano G, Robson SC, Aspris D, Migliori V, Bannister AJ, Han N. Promoter-bound METTL3 maintains myeloid leukaemia by m6A-dependent translation control. Nature. 2017;552(7683):126–31.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  45. 45.

    Li Z, Weng H, Su R, Weng X, Zuo Z, Li C, Huang H, Nachtergaele S, Dong L, Hu C. FTO plays an oncogenic role in acute myeloid leukemia as a N6-methyladenosine RNA demethylase. Cancer Cell. 2017;31(1):127–41.

    PubMed  Article  CAS  Google Scholar 

  46. 46.

    Meyer KD, Saletore Y, Zumbo P, Elemento O, Mason CE, Jaffrey SR. Comprehensive analysis of mRNA methylation reveals enrichment in 3′ UTRs and near stop codons. Cell. 2012;149(7):1635–46.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  47. 47.

    Batista PJ, Molinie B, Wang J, Qu K, Zhang J, Li L, Bouley DM, Lujan E, Haddad B, Daneshvar K. m6A RNA modification controls cell fate transition in mammalian embryonic stem cells. Cell Stem Cell. 2014;15(6):707–19.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  48. 48.

    Liu H, Wang H, Wei Z, Zhang S, Hua G, Zhang S-W, Zhang L, Gao S-J, Meng J, Chen X. MeT-DB V20: elucidating context-specific functions of N6-methyl-adenosine methyltranscriptome. Nucl Acids Res. 2018;46(D1):D281–7.

    CAS  PubMed  Article  Google Scholar 

  49. 49.

    Wu X, Wei Z, Chen K, Zhang Q, Su J, Liu H, Zhang L, Meng J. m6Acomet: large-scale functional prediction of individual m6A RNA methylation sites from an RNA co-methylation network. BMC Bioinform. 2019;20(1):223.

    Article  Google Scholar 

  50. 50.

    Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  51. 51.

    Harris SL, Levine AJ. The p53 pathway: positive and negative feedback loops. Oncogene. 2005;24(17):2899–908.

    CAS  PubMed  Article  Google Scholar 

  52. 52.

    Vogelstein B, Fau-Lane D, Lane D, Fau-Levine AJ, Levine AJ. Surfing the p53 network. Nature. 2000;408(6810):307–10.

    CAS  PubMed  Article  Google Scholar 

  53. 53.

    Levine AJ, Hu W, Feng Z. The P53 pathway: what questions remain to be explored? Cell Death Differ. 2006;13(6):1027–36.

    CAS  PubMed  Article  Google Scholar 

  54. 54.

    Yu G, Wang L-G, Han Y, He Q-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics J Integr Biol. 2012;16(5):284–7.

    CAS  Article  Google Scholar 

  55. 55.

    Li L, Guo Y, Wu W, Shi Y, Cheng J, Tao S. A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data. BioData Min. 2012;5(1):8.

    PubMed  PubMed Central  Article  Google Scholar 

  56. 56.

    Richards AL, Holmans P, O’Donovan MC, Owen MJ, Jones L. A comparison of four clustering methods for brain expression microarray data. BMC Bioinform. 2008;9(1):490.

    Article  CAS  Google Scholar 

  57. 57.

    Yang F, Jin H, Que B, Chao Y, Zhang H, Ying X, Zhou Z, Yuan Z, Su J, Wu B. Dynamic m6A mRNA methylation reveals the role of METTL3-m6A-CDCP1 signaling axis in chemical carcinogenesis. Oncogene. 2019;38(24):4755–72.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  58. 58.

    Steitz TA, Moore PB. RNA, the first macromolecular catalyst: the ribosome is a ribozyme. Trends Biochem Sci. 2003;28(8):411–8.

    CAS  PubMed  Article  Google Scholar 

  59. 59.

    Zhang C, Chen Y, Sun B, Wang L, Yang Y, Ma D, Lv J, Heng J, Ding Y, Xue Y. m6A modulates haematopoietic stem and progenitor cell specification. Nature. 2017;549(7671):273–6.

    CAS  PubMed  Article  Google Scholar 

  60. 60.

    Lin X, Chai G, Wu Y, Li J, Chen F, Liu J, Luo G, Tauler J, Du J, Lin S. RNA m6A methylation regulates the epithelial mesenchymal transition of cancer cells and translation of Snail. Nat Commun. 2019;10(1):1–13.

    Article  CAS  Google Scholar 

  61. 61.

    Liu J, Dou X, Chen C, Chen C, Liu C, Xu MM, Zhao S, Shen B, Gao Y, Han D. N6-methyladenosine of chromosome-associated regulatory RNA regulates chromatin state and transcription. Science. 2020;367(6477):580–6.

    CAS  PubMed  Article  Google Scholar 

  62. 62.

    Yu J, Li Y, Wang T, Zhong X. Modification of N6-methyladenosine RNA methylation on heat shock protein expression. PLoS ONE. 2018;13(6):e0198604.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  63. 63.

    Huang H, Weng H, Zhou K, Wu T, Zhao BS, Sun M, Chen Z, Deng X, Xiao G, Auer F. Histone H3 trimethylation at lysine 36 guides m6A RNA modification co-transcriptionally. Nature. 2019;567(7748):414–9.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

This work has been supported by Fundamental Research Funds for the Central Universities (Research Projects No. 2015XKQY21 to HL), the National Natural Science Foundation of China (Research Projects Nos. 61971422 to LZ, 31871337 to HL). The funding body did not play any roles in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Author information

Affiliations

Authors

Contributions

LZ and SC built the architecture for REW-ISA, designed and implemented the experiments, analyzed the result, and wrote the paper. JZ conducted the experiments, analyzed the result, and revised the paper. HL and JM supervised the project, analyzed the result, and revised the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Hui Liu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1. Table S1:

The detailed information of real data. The information of real data. Experiments with light green background were not included in MeTDBV2.0 yet. Experiment ID: The index of experiments; Cell line: The cell line that MeRIP-Seq profiled. Expr_name in MeTDB V2.0: The retrieval information in MeTDBV2.0. If the data was not included in MeTDBV2.0, the GEO accession numbers were provided instead (indicated as light green background). Treatment: The treatment applied for the experiment. Reference: The title of the source reference. Reference ID: Reference number in the article.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Chen, S., Zhu, J. et al. REW-ISA: unveiling local functional blocks in epi-transcriptome profiling data via an RNA expression-weighted iterative signature algorithm. BMC Bioinformatics 21, 447 (2020). https://doi.org/10.1186/s12859-020-03787-w

Download citation

Keywords

  • m6A methylation
  • Iterative signature algorithm
  • Biclustering