 Research article
 Open access
 Published:
REWISA: unveiling local functional blocks in epitranscriptome profiling data via an RNA expressionweighted iterative signature algorithm
BMC Bioinformatics volume 21, Article number: 447 (2020)
Abstract
Background
Recent studies have shown that N^{6}methyladenosine (m^{6}A) plays a critical role in numbers of biological processes and complex human diseases. However, the regulatory mechanisms of most methylation sites remain uncharted. Thus, indepth study of the epitranscriptomic patterns of m^{6}A may provide insights into its complex functional and regulatory mechanisms.
Results
Due to the high economic and time cost of wet experimental methods, revealing methylation patterns through computational models has become a more preferable way, and drawn more and more attention. Considering the theoretical basics and applications of conventional clustering methods, an RNA Expression Weighted Iterative Signature Algorithm (REWISA) is proposed to find potential local functional blocks (LFBs) based on MeRIPSeq data, where sites are hypermethylated or hypomethylated simultaneously across the specific conditions. REWISA adopts RNA expression levels of each site as weights to make sites of lower expression level less significant. It starts from random sets of sites, then follows iterative search strategies by thresholds of rows and columns to find the LFBs in m^{6}A methylation profile. Its application on MeRIPSeq data of 69,446 methylation sites under 32 experimental conditions unveiled 6 LFBs, which achieve higher enrichment scores than ISA. Pathway analysis and enzyme specificity test showed that sites remained in LFBs are highly relevant to the m^{6}A methyltransferase, such as METTL3, METTL14, WTAP and KIAA1429. Further detailed analyses for each LFB even showed that some LFBs are conditionspecific, indicating that methylation profiles of some specific sites may be condition relevant.
Conclusions
REWISA finds potential local functional patterns presented in m^{6}A profiles, where sites are comethylated under specific conditions.
Background
N^{6}methyladenosine (m^{6}A), which refers to the methylation of the adenosine bases at the nitrogen6 position, is the most abundant posttranscriptional modification present in mRNAs and long noncoding RNAs [1]. It has been found to function in various pathways related to mRNA stability [2], DNA damage [3], differentiation [4], circadian clock [5], neurogenesis [6], immunity [7], antitumor activity [7], learning and memory [8], sex determination [9], heat shock response [10], etc. With the emergence of highthroughput sequencing technologies, Methylated Immunoprecipitation sequencing (MeRIPSeq, or m^{6}Aseq) [11, 12], researchers have been able to examine the dynamics and various functions of m^{6}A in human, mouse, yeast, rice and other species [2, 5, 13,14,15].
The m^{6}A methylation has been found to be governed or mediated by relevant enzymes, i.e., writers (METTL3/METTL14/WTAP complex [15, 16], KIAA1429 [17], VIRMA [18], RBM15 [19], ZC3H13 [20], etc.,), erasers (FTO [21], ALKBH5 [22], etc.,) as well as readers (YTH family [2, 10, 23,24,25], IGF2BP13 [26], eIF3 [27], etc.,). However, due to the complexity of life, the detailed regulatory circuit of RNA methylate remains uncharted, and it is believed to be more complex than enzyme induced mechanisms.
Till this day, several clustering methods have been proposed to identify comethylation patterns in MeRIPSeq data, trying to elucidate the functional mechanisms of m^{6}A methylation. Liu et al. used four different clustering approaches, such as Kmeans, hierarchical clustering, Bayesian factor regression model as well as nonnegative matrix factorization to unveil the comethylation patterns [15, 28]. To our knowledge, they revealed the linkage between the global comethylation patterns embedded in epitranscriptomic data for the first time. Cui et al. proposed MeTCluster to uncover the potential patterns of m^{6}A methylation. It utilized a hierarchical graphical model to depict the reads counts, suggesting m^{6}A functions could be location specific [29]. We have previously proposed an infinite beta binomial mixture model based on Dirichlet Process (DPBBM) to unveil the comethylation patterns embedded in MeRIPSeq data [30]. All the abovementioned methods focused on clustering methylation sites under all conditions. Current studies have shown that on average 3–5 m^{6}A RNA methylation sites position on each mRNA in human genome [31, 32]. It is conceivable that some specific sites may comethylate under a subset of experimental conditions. Thus, clustering of sites over all conditions may miss biological meaningful information. On the one hand, sites sharing the same regulatory factor are more likely to comethylate together; on the other hand, sites residing on the genes that belong to the same pathway may exhibit comethylation patterns over subsets of experiments. Therefore, we aim to find some local functional blocks (LFBs), where sites are hypermethylated or hypomethylated simultaneously across the specific conditions in the same LFB, to unveil the local function patterns in m^{6}A methylation profile.
Biclustering methods have been widely used to identify coexpressed genes under subsets of conditions in large scale microarray data [33,34,35,36,37]. Ihmels proposed an iterative signature algorithm (ISA) [34] to seek for biclusters, where subsets of coregulated genes and conditions were selected by iterative searching procedure [35]. Murali et al. proposed Xmotifs algorithm, which takes a discretized gene expression matrix as input, to find coexpression patterns, where genes share the same expression level [36]. Prelić et al. proposed a Bimax method, which takes a binarized gene expression matrix as input, to find potential coexpression patterns [37]. The preprocess of discretization of input data results in serious information loss. When profiled by MeRIPSeq technology, the quantification of RNA methylation level needs to be estimated from two complementary integer measurements indicating site reads count from input and IP samples. Conventionally, m^{6}A methylation level is achieved by simple division operation, which calculates the percentage of site reads in IP sample over the total site reads of input and IP samples. However, it is not always accurate. Even if sites show the same percentage in value, their methylation levels maybe quite different due to their different RNA expression level. To be more specific, if the RNA expression level is very low, there may exist noise, which makes the percentage less confident. Therefore, we proposed herein an RNA Expression Weighted Iterative Signature Algorithm (REWISA), which adopted RNA expression level as weight to weaken the confidence sites, then followed an iterative search strategy through rows and columns to seek for LFBs. During the LFB searching strategy, each potential LFB is identified by column threshold (defined as T_{C}) and row threshold (defined as T_{R}). T_{C} and T_{R} are updated automatically according to Standard Deviation within Clusters (SDwC) and Average Similarity within Clusters (ASwC) metrics iteratively. SDwC indicates the closeness of each element in each LFB while ASwC indicates the correlation of each condition pair in each LFB.
REWISA was implemented on simulated data as well as real MeRIPSeq induced m^{6}A methylation level matrix to find potential LFBs. On simulated data, Score of BiClustering (SoBC) metric was followed to evaluate the identification performance of LFBs. On real data, Gene Ontology (GO) analysis and enzyme specificity test were in the next conducted to validate the identified LFBs. As a result, REWISA can find LFBs that cover collaboratively hypermethylated sites under specific conditions.
Results
Performance evaluation
In this study, we applied ISA as well as REWISA for simulated data biclustering for performance comparison. As is known, intersection over union (IoU) is a widely used evaluation metric in object detection, which is define as
where A_{O} represents the intersection between the obtained LFBs and ground truth, while A_{U} indicates the union of the obtained LFBs and ground truth. For example, suppose there are s LFBs embedded in simulated data, and n LFBs are obtained by clustering algorithm. In addition, let \({\varvec{G}} = \{ g_{1} , \ldots , g_{s} \}\) indicate whether there is uncovered LFB matching the s real LFBs respectively. At initialization, all elements in G are 0. So, we can calculate the IoU between each obtained LFB and the real LFBs. To be more specific, the IoU metric for the ith obtained LFB and real one is achieved, and its maximum value is regarded as the final score of the ith LFB, which is indicated by IoU_{id}, representing the ith uncovered LFB matches the dth real LFB best. Thus, g_{d} = 1. For all the n identified LFBs, the average of IoU_{id} with i = 1, …, n, indicated as IoU_{mean} hereafter can be achieved. Since the number of obtained LFBs may differ from real, IoU_{mean} metric may not be sufficient for performance evaluation. Therefore, SoBC is defined to evaluate the agreement between REWISA obtained LFBs and ground truth.
where r indicates the number of ones in G. Thus, r ≤ min(s, n). As SoBC approaches 1, the performance of biclustering is better.
In REWISA clustering procedure, T_{C} and T_{R} are key parameters for clustering stringency, and SDwC, ASwC scores are introduced to determine suitable T_{C} and T_{R}. The mean and standard deviation of each LFB are combined in SDwC by (3).
where N indicates the number of algorithm obtained LFBs, P_{k} is the m^{6}A methylation level of the kth LFB, W_{k} is the RNA expression weight for kth LFB, m_{k} and n_{k} are the number of sites and conditions in the kth LFB, w_{kij} is the RNA expression level of the ith site under condition j in the kth LFB, p_{kij} is the methylation level of the ith site under condition j in the kth LFB, \(\overline{{{\varvec{W}}_{{\varvec{k}}} {\varvec{P}}_{{\varvec{k}}} }} = (1/m_{k} n_{k} )\sum\nolimits_{i = 1}^{{{\varvec{m}}_{{\varvec{k}}} }} {\sum\nolimits_{{{\varvec{j}} = 1}}^{{{\varvec{n}}_{{\varvec{k}}} }} {{\varvec{w}}_{{{\varvec{kij}}}} {\varvec{p}}_{{{\varvec{kij}}}} } }\). Thus, SDwC represents the standard deviation of methylation levels in each LFB.
ASwC is regarded as another concern for T_{C} and T_{R} selection. The pearson correlation between condition a and b in the kth LFB is first calculated as r_{kab},
where W_{ka} and W_{kb} represent the RNA expression level under condition a and b in the kth LFB, P_{ka} and P_{kb} represent the RNA methylation level under condition a and b, \(\overline{{{\varvec{W}}_{{{\varvec{ka}}}} {\varvec{P}}_{{{\varvec{ka}}}} }} = (1/m_{k} )\sum\nolimits_{i = 1}^{{{\varvec{m}}_{{\varvec{k}}} }} {{\varvec{w}}_{{{\varvec{kia}}}} {\varvec{p}}_{{{\varvec{kia}}}} }\), \(\overline{{{\varvec{W}}_{{{\varvec{kb}}}} {\varvec{P}}_{{{\varvec{kb}}}} }} = (1/m_{k} )\sum\nolimits_{i = 1}^{{{\varvec{m}}_{{\varvec{k}}} }} {{\varvec{w}}_{{{\varvec{kib}}}} {\varvec{p}}_{{{\varvec{kib}}}} }\). Then, ASwC is defined as
where n_{k} is the number of conditions in the kth LFB. Thus, ASwC indicates how the involved sites comethylate between conditions in each LFB.
Our original intention is to better reveal the biological functional mechanisms of the comethylation modules based on transcriptome data, larger SDwC and smaller ASwC metrics are preferred to get larger LFBs with more implicit information.
Simulated data
For performance evaluation, a simulated RNA methylation dataset of size 1000 × 15 was generated from a mixture of 4 betabinomial distributions, corresponding to three biclustering blocks and the background (Fig. 1a). The overall distribution characteristics of the simulated data were set similar to that of the real MeRIPSeq data (Fig. 1b, c) to mimic real scenarios. The matched methylation expression data, which is used as the “weight” in the proposed algorithm, was directly calculated from the simulated RNA methylation dataset as previously described.
As is known, T_{C} and T_{R} are defined for subset selection along columns and rows in conventional ISA. They are also defined in REWISA, and play a decisive role in LFB stringency. Since the methylation level matrix is p ≫ n, it is intuitive that the conditions should be selected more carefully, thus T_{C} asks for more cautious exercise, and detailed explanations are given in method section.
In REWISA, a grid search method was followed for parameter optimization. The range of T_{R} is 0.1–5 with step size 0.1, and the range of T_{C} is 0.05–3 with step size 0.05. Their upper bounds are recommended a large value, then adapted automatically in following procedures. For each {T_{R}, T_{C}} setting, 40 experiments were conducted to ensure the robustness. After repeated experiments, the mode of LFB number was adopted as final number of LFBs for each threshold pair, as shown in Fig. 2.
According to Fig. 2, the range of T_{R} and T_{C} can be shrunk. To be more specific, the thresholds that obtain the largest number of LFBs were first reached, then filter out all the combinations with larger T_{R} or T_{C}. This is because, in the abandoned combinations, REWISA can also get the same number of LFBs with smaller T_{R} and T_{C}. However, smaller T_{R} and T_{C} remain more rows and columns in each LFB, which may unveil more useful biological information. Thus, the range of T_{R} becomes 0.1–2.5, while the range of T_{C} becomes 0.05–1.15.
We can see from Fig. 2 that when T_{R} and T_{C} are close to their lower limits, only a few LFBs can be achieved with very large scale, which cannot uncover implicit information for functions. Thus, REWISA raises the lower bound of T_{R} and T_{C} appropriately. Suppose the matrix of LFB number obtained under different threshold setting is \({\varvec{L}} \in {\mathbb{R}}^{rn \times cn}\) with T_{R} and T_{C} adjusted, where rn represents the number of T_{R}s and cn represents the number of T_{C}s considered. Min–max normalization of L is performed to obtain \({\varvec{L}}^{{{\varvec{norm}}}} \in {\mathbb{R}}^{rn \times cn}\), and then the variance of each row and column in L^{norm} is calculated. The variance of the ith row in L^{norm} is vr_{i} (1 ≤ i ≤ rn), and the variance of the jth column is vc_{j} (1 ≤ j ≤ cn).
Furthermore, the mean values of elements in vr and vc are calculated as vr_{mean} and vc_{mean}, respectively. We set \(i^{\prime} = \min \{ i:vr_{i} \ge vr_{mean} \}\) and \(j^{\prime} = \min \{ j:vc_{j} \ge vc_{mean} \}\), and then drop out the first i′ − 1 values of the T_{R} and the first j′ − 1 values of the T_{C} from consideration. Thus, the range of T_{R} further becomes 0.3–2.5, while the range of T_{C} becomes 0.1–1.15.
After shrinking the range of T_{R} and T_{C}, the matrix of LFB numbers under each threshold pair setting is updated to be \(\user2{L^{\prime}} \in {\mathbb{R}}^{rn^{\prime} \times cn^{\prime}}\), where \(rn^{\prime} = rn  i^{\prime} + 1\) and \(cn^{\prime} = cn  j^{\prime} + 1\). Then, within the selected threshold range of T_{R} and T_{C}, a sliding window of size η_{r} × η_{c} is used to help find more stable selections of T_{R} and T_{C}. The value of η_{r} and η_{c} are selected by Eqs. (8) and (9),
where step_{r} represents the variable step size of T_{R}, step_{c} represents that of T_{C}, and \(\left\lceil \cdot \right\rceil\) is round up to integer operation. The sliding window is obvious to cover odd number of rows and columns, which makes the thresholds value of interest locate in the center of the sliding window. Specifically, for the element l′_{ij} locating in the ith (1 ≤ i ≤ rn′) row and the jth (1 ≤ j ≤ cn′) column of L′, the mode of the values covered by sliding window is calculated, then compared to the center value l′_{ij}. If they are equal, the threshold setting is maintained for further consideration, and ls′_{ij} = 1 is recorded in the matrix \({\varvec{LS}} \in {\mathbb{R}}^{rn^{\prime} \times cn^{\prime}}\). Otherwise, ls′_{ij} = 0. It is worth noting that when sliding the elements on the boundary of L′, we only select the effective elements in the sliding window to test the stability.
Through L′ and the stable score matrix LS, the threshold pairs with stable LFB number can be screened out. Let S = L′ × LS, \({\varvec{S}} \in {\mathbb{R}}^{rn^{\prime} \times cn^{\prime}}\), the threshold pairs corresponding to nonzero elements in S are stable threshold combinations. After filtering out 0 in S, the number of obtained LFBs in S are counted to provide frequency f_{ln}, where ln represents the number of LFBs, and the result is shown in Table 1.
According to the statistics of the number of stable LFBs, that is, Table 1, we find that the number of LFBs with the highest frequency is 3. Therefore, we can reasonably conclude that there is a total of 3 LFBs in the simulated data.
Furthermore, the threshold pairs in LS are filtered according to the number of obtained LFBs is 3. At the same time, we update the ASwC calculated under effective settings of T_{R} and T_{C}. The ASwC value obtained by effective threshold pair is shown in Fig. 3.
Because ASwC is used to measure the similarity between columns in LFBs, the smaller ASwC is, the more information is contained in each LFB. Since smaller ASwC can help retrieve more biological meaningful information, we calculate the mean of ASwC achieved under effective threshold pairs, and remain the threshold pairs with which ASwC scores are less than the mean score to further shrink the range of T_{R} and T_{C}.
Within the narrowed T_{R} and T_{C} range, the SDwC values of each threshold pair are further compared, and the result is shown in Fig. 4.
As shown in Fig. 4, the optimal value of T_{R} and T_{C}, where SDwC gets its maximum indicating the loose and information abundance of each LFB, are 1.2 and 0.35 respectively. Since smaller threshold may find the larger LFB, REWISA chooses smaller T_{R} and T_{C} when the maximum value of SDwC is achieved under multiple pairs of T_{R} and T_{C}.
In a word, it is suggested that the upper bound initialization of T_{R} and T_{C} be set to larger values, and REWISA can automatically shrink the range. Besides, the step size of T_{R} and T_{C} during grid search has no effect on final result. The REWISA thresholds optimization algorithm is in the following.
Algorithm 1: Thresholds optimization process of REWISA 

Input: Methylation level matrix P, weight matrix W and initialize the range of T_{R} and T_{C} 
Output: The optimal T_{R} and T_{C}, and the number of LFBs determined by the above T_{R} and T_{C} 
Step1: Run REWISA within the initial threshold range, obtain the threshold pair that generates the most LFBs, and then shrink the range of T_{R} and T_{C} 
Step2: Within the threshold range after contraction, the LFB number matrix L′, stable score matrix LS, stable LFB number matrix S, compactness SDwC and ASwC are calculated 
Step3: Count the frequency of the number of LFBs in the matrix S 
Step4: According to the maximum frequency, select the corresponding optimal number of LFBs 
Step5: The ASwC value corresponding to each threshold pair is calculated, and the thresholds ranges are further reduced 
Step6: Select the optimal T_{R} and T_{C} according to the maximum SDwC within the selected thresholds range 
Return The optimal T_{R} and T_{C}, and the number of LFBs determined by the optimal T_{R} and T_{C} 
To validate the automatic parameter selection procedure of T_{R} and T_{C}, we investigated the SoBC of ISA and REWISA identified LFBs with varying T_{C}s (0.1 to 1.15) and T_{R}s (0.3 to 2.5), as shown in Fig. 5.
As shown Fig. 5b, in REWISA, the optimal value of T_{R} and T_{C} locate between 1.2–1.4 and 0.1–0.65 respectively on simulated data. This is consistent with the beforementioned parameter selection procedure given in Algorithm 1. Besides, we also found that the SoBC metric of REWISA can reach around 0.9, while that of ISA is around 0.2, implying that the LFBs uncovered by REWISA is more effective than ISA.
Real data
A total of 69,446 human m^{6}A sites identified by six baseresolution miCLIP and m^{6}ACLIP experiments were obtained by WHISTLE project [38,39,40,41,42]. However, miCLIP and m^{6}ACLIP only report the positioning of m^{6}A sites, but do not provide the methylation level of each site. The information of methylation level still comes from MeRIPSeq data. To be more specific, 32 samples in 10 publicly human m^{6}A MeRIPSeq data sets were collected [2, 5, 11, 17, 43,44,45,46,47], and most of them can be retrieved from MeTDBV2.0 database [48]. The detailed description of data was given in additional file (see Additional file 1: Table S1).
As is known, MeRIPSeq data profiles the m^{6}A epitranscriptome by input and IP data. Thus, we first followed [42, 49] to quantify methylation level of each site. The biological replicates of the same cell line from the same experiment were merged, and the methylation level of the combined samples is essentially the average of all the biological replicates. All the sequencing data were downloaded in SRA format from Gene Expression Omnibus, and the reads were aligned to human reference genome hg19 with Tophat2 (with default settings as readmismatches = 2, readgaplength = 2, readeditdist = 2, minanchor = 8, minintronlength = 50 and maxintronlength = 500,000) for Fragments Per Kilobase of transcript per Million (FPKM) statistics [50].
The methylation level was then quantified by calculating the ratio of fold enrichment of reads in IP sample over the total of IP and input samples. To be more specific, let t_{ij} representing FPKM of the ith site in IP sample under the jth condition, and h_{ij} representing FPKM of the ith site in input sample under the jth condition. Let P indicate the methylation level matrix, the methylation level of the ith site under the jth condition p_{ij} can be calculated following (10).
where α is a very small value, aiming to avoid NaN where FPKM of both IP and input samples are zeros, and p_{ij} resides in (0,1).
We also constructed the weight matrix W corresponding to P,
where 1 was added to ensure w_{ij} ≥ 0. With the employment of W, the less confident sites with lower expression level are weakened for further biclustering analysis.
Then, REWISA is conducted based on P and W. Within the range of T_{R} being 0.1–5 with step size 0.1, and T_{C} being 0.05–3 with step size 0.05, T_{R} and T_{C} are optimized through grid search method. The experiments were repeated 10 times for each parameter setting.
As shown in Fig. 6, maximum number of LFBs is 14. The upper bounds of T_{R} and T_{C} are 3.5 and 1.35 respectively. Furthermore, the variances of each row and column in the LFB number matrix are calculated according to (6) and (7), and then the mean values of row variances and column variances are calculated respectively. The first elements which is larger than the above mean values are selected from the obtained row and column variance vectors, and the corresponding T_{R} and T_{C} are the new lower bounds of T_{R} and T_{C}. Based on the above process, the lower bounds of T_{R} and T_{C} are set to 1.3 and 0.6. The statistics of the number of LFBs obtained under different T_{R} and T_{C} are shown in Table 2. Thus, the number of LFBs is preferred to be 6.
Based on the threshold pairs that achieve 6 LFBs, ASwC scores are presented in Fig. 7.
Then, the mean of ASwC scores is calculated, and the threshold pairs that get greater ASwC than the mean is further filtered. For the remained threshold pairs, their corresponding SDwC values are calculated, as shown in Fig. 8.
Based on Fig. 8, T_{R} and T_{C} are selected to be 1.6 and 1 as optimal, where the largest SDwC appears.
To further explore the biological relevance of the reported six LFBs, we first annotated the Entrez Gene ID and Gene Symbol of genes corresponding to each site in each LFB, then conducted pathway and GO enrichment analysis. Six KEGG pathways known to be regulated by RNA methylation [3, 11, 21] were selected to validate whether a pathway is significantly enriched in a specific LFB using Fisher’s exact test. The output pvalue shows the significance of association between the obtained LFBs and the biological pathways with multiple hypothesis corrections.
We could see from Table 3 that obtained LFB1, LFB2 and LFB3 are significantly enriched in fatty acid metabolism. Fatty acids are a substance of the aliphatic group, and the efficacy and function of fatty acids are mainly supplemented for human absorption. Also, studies have found that fatty acids play important roles in regulating metabolism, growth and development and cell differentiation. LFB2 and LFB3 are also enriched in p53 pathway, which consists of a network responding to a variety of intrinsic and extrinsic stress signals that impact upon cellular homeostatic mechanisms, disrupting DNA replication, chromosome segregation and cell division, etc. [51, 52]. As is known, Gamma and UV irradiation could also result in DNA damage [53]. LFB6 is shown to be significantly enriched in both Fatty Acid Metabolism and UV response Down pathways, implying that LFB6 is composed of methylation sites that are relevant to life development process from fatty acid metabolism affected by UV stimulation. For LFB5, it is not enriched to any of the six KEGG pathways, indicating that LFB5 may have other implicit biological significance, and further analysis is carried out in the next.
The GO enrichment analysis was then conducted by clusterProfiler Bioconductor package [54] for each obtained LFB, with pvalue cutoff set as 0.05 and qvalue cutoff 0.2. For all the GO terms enriched by genes in each LFB, the negative log transform of pvalue was employed as their enrichment scores.
where p_{i} is the pvalue of the ith GO term.
In fact, GO terms with more enriched genes may not show higher enrichment scores. As to LFBs, the proportion of genes that involved in LFB is also an important factor. It is conceivable that we adhere a weight to better describe the contribution of each GO term. The weight of the ith GO term is defined as m_{i}/M, where m_{i} is the number of genes of the ith GO term enriched in this LFB and M is the total number of genes in this LFB. Therefore, WE_score is defined as (13).
where l is the number of GO terms that enriched in each LFB, m_{non} is the number of genes covered by LFB but not enriched by any GO term. The higher the WE_score is, the more biologically significant the LFBs are [55]. Three state of the art algorithms, Xmotifs [36], Bimax [37], ISA [35] were conducted for biclustering on real data in comparison to REWISA. Besides, subsets with different number of sites and conditions were selected randomly as LFBs from real data. WE_score for all the obtained LFBs by all the algorithms are given in Fig. 9.
It can be seen from Fig. 9 that the REWISA algorithm is effective, and the result is consistent with many related research results [37, 56]. The average WE_score of LFBs inferred by REWISA is 13.2% higher than that of ISA, which implied more biological significance of REWISA. Besides, LFBs identified by the four algorithms achieve significantly higher WE_score than random one, which also indicates the biological significance.
We further examined whether the identified LFBs show enzyme’s substrate specificity. Since LFB covers hypermethylated sites and conditions, the sites and conditions involved in each LFB are more likely to be the target sites of m^{6}A methyltransferases. Therefore, we investigated the association between each LFB and four m^{6}A methyltransferases, including METTL3, METTL14, WTAP as well as KIAA1429. For this purpose, 12,643 METTL3 targeted sites, 7689 METTL14 targeted sites, 13,124 WTAPtargeted sites and 399 KIAA1429 targeted RNA methylation sites were first identified by TREW [48], as shown in Table 4.
Then, the association between the sites in each LFB and m^{6}A methyltransferases target sites was further evaluated by Fisher’s exact test. The reported pvalue indicates the significance of association between sites and methyltransferase target sites. As shown in Table 5, all the four m^{6}A methyltransferases targeted sites in the six obtained LFBs are significantly enriched (FDR < 0.05), which means the LFBs obtained by REWISA were indeed the collaboratively hypermethylated sites under specific conditions.
In LFB1, LFB2, LFB3 and LFB6, Venn diagrams of the sites, conditions and functional annotations of genes that selected sites involved in each LFB reside on were shown in Fig. 10. As shown in Fig. 10a, it was obvious that this four LFBs contain 12,971 identical methylation sites. From the perspective of conditions, the conditions involved in LFB1 were all covered by LFB3 and LFB6, while LFB3 and LFB6 contain two conditions that were not contained by LFB1, respectively, as shown in Fig. 10b. It is also worth mentioning that for LFB2, all the conditions included in it are from human liver hepatocellular cells (HepG2) cell lines, indicating some LFBs may be condition specific. Although the conditions contained in LFB1 and LFB3 are very similar, they still contain over one hundred unshared functional annotations, which may be due to site differences between them, as shown in Fig. 10c.
Since LFB2 was found enriched in three KEGG pathways previously, and might be condition specific, LFB2 was compared with LFB4 and LFB5 for further study, and the Venn diagram is shown in Fig. 11.
It can be seen from Fig. 11 that although LFB2, LFB4 and LFB5 share 2544 sites, they share only one condition, which leads to 203 functional annotations that are not shared at all, indicating that the three LFBs may play different roles in m^{6}A methylation.
We further investigated the functions of LFB2 in detail, as shown in Fig. 12.
Some genes that sites in LFB2 reside on are found to be involved in m^{6}Arelated pathways, such as Ras protein signal transduction [57], macromolecule methylation [58], peptidyllysine modification [59], histone modification [60] and covalent chromatin modification [61], implying LFB2 may further help elaborate the functional mechanisms of m^{6}A methylation. Besides, some pathways, such as response to heat [10, 62], are found to be significantly enriched in LFB2, which is also consistent with previous analysis that LFB2 covers conditions with HepG2 cells that exposed to ultraviolet radiation, heat shock, hepatocyte growth factor (HGF; also known as scatter factor (SF)), and interferonγ.
Since LFB5 is not enriched in any of the six KEGG pathways in the previous analysis, the functionality of LFB5 is similarly examined in further detail, and the result is shown in Fig. 13.
We can see that LFB5 is mainly enriched in functional annotations related to histone and lysine modification, in which the modification form is mainly acetylation modification. M^{6}A modification in RNA has been found to be determined by histone modification [63]. Therefore, the genes contained in LFB5 may help uncover the relationship between histone and lysine modification and m^{6}A methylation.
Discussion
More and more studies have shown that m^{6}A RNA methylation plays an extremely important role in a variety of biological processes. Moreover, the functions of m^{6}A methylation have been revealed by more and more researchers. Through the study of m^{6}A methylation, we could understand the pathogenesis of the disease at posttranscriptional level, which would help us build a more comprehensive understanding of life process such as disease mechanisms. However, unveiling the functional m^{6}A methylation sites through biological experiments is timeconsuming and expensive, so it is very necessary to develop some effective computational algorithms to predict potential functional m^{6}A sites. In this paper, we developed an RNA expression weighted ISA method, REWISA, to uncover the potential local methylation patterns across subsets of condition. REWISA approached 6 LFBs based on MeRIPSeq data from 10 cell lines under 32 different conditions. Further GO analysis and some specificity tests show that REWISA obtained LFBs can find hypermethylated local functional patterns, which are highly relevant with conditions.
REWISA could achieve reliable biclustering patterns because of its adoption of RNA expression level. For the m^{6}A methylation level matrix, the level was drawn based on the ratio between IP and input samples, and there are no additional supplements for RNA expression level. To be more specific, the methylation levels of sites of high expression level should be more confident than those of low expression level since the reads count statistics in low expression sites may come from noise, which makes them unconfident. By incorporation of RNA expression level, sites with very low abundance of reads count will be assigned very low weight, thus, excluded for consideration of biclustering. Of course, REWISA still has some deficiencies that needs to be improved in the future. REWISA seek for LFBs of hypermethylated sites under subsets of conditions based on methylation level, which is achieved by simple division operation. However, it may lead to information loss during the division operation. The information carried by both input and IP samples should be more than the methylation level and RNA expression level. In the future, we will develop new computational model to overcome these limitations.
Conclusions
With comparison with conventional ISA method, we believe that our test suggests REWISA as a simple but effective tool for local functional pattern recognition tasks. Through the experiments, we also showed that REWISA is also feasible for realworld applications with similar issues as local pattern analysis problem in m^{6}A methylation profiles.
Methods
In conventional ISA method, rows and columns of data are standardized first, and subsets of rows and columns are updated iteratively according to their own thresholds. However, in REWISA, we propose to import weights to enhance the confidence of methylation level estimation, so the min–max normalization was employed instead of zscore normalization. The methylation level matrix \({\varvec{P}} \in {\mathbb{R}}^{p \times n}\) turns into P^{R} after row min–max normalization, and turns into P^{C} after column min–max normalization. The flowchart of REWISA is shown in Fig. 14. In general, REWISA consists of two steps. The first step aims to form the methylation level and weight matrix for all sites under all conditions. The second part conducts iteratively selection of subsets of rows and columns for LFBs. With the employment of \({\varvec{W}} \in {\mathbb{R}}^{p \times n}\), the contribution of sites showing similar methylation level may be distinguishable due to their different expression level.
For subset selection of columns, the updated subsets are achieved following (14).
where V is the column set of P, \(p_{{u{\kern 1pt} v}}^{R}\) refers to the uth site under vth condition in P^{R}, w_{uv} is the RNA expression level of the uth site under vth condition. In Eq. (14), \(e_{{U^{\prime}{\kern 1pt} v}}^{C}\) is calculated based on P^{C} and W for column subset selection, but only the conditions involved in U′ are considered. Higher T_{C} setting will result in less conditions in LFB.
Then, the subsets of rows are updated following (15).
where U is the row set of P, \(p_{{u{\kern 1pt} v}}^{C}\) refers to the uth site under vth condition in P^{C}, \(e_{{u{\kern 1pt} V^{\prime}}}^{R}\) is calculated based on P^{R} and W for row subset selection, but only the sites involved in V′ are considered. Higher T_{R} results in less sites in LFB.
Since P is \(p \gg n\), it is intuitive that LFBs will cover more sites than conditions. Thus, U′ ≫ V′. The selection of sites needs to be more strict, thus T_{R} is recommended to be larger for less sites inclusion for each LFB. On the contrary, there are not that much conditions in P, thus, T_{C} is recommended to be smaller for loose constrain of conditions in each LFB.
In the parameter selection procedure, it is recommended that the upper bound of T_{R} and T_{C} be set larger values, and the algorithm can automatically shrink the thresholds range. Although the larger upper bound may introduce computation load, it is still acceptable since no LFBs can be achieved under large thresholds setting. The optimization of T_{R} and T_{C} is reached by grid search. With optimized T_{R} and T_{C}, a subset of sites is randomly selected as U′, and then the subset of conditions V′ is selected according to (14). U′ and V′ are updated iteratively by (14) and (15) until convergenece is satisfied.
where ε is the default convergence criteria, U′′ represents the subset of sites in previous iteration, and U′ represents the subset of site in current iteration.
The implementation of REWISA following the above definition is summarized in the following.
Algorithm 2: REWISA biclustering algorithm 

Input: Methylation site V, conditions U, methylation level matrix P, weight matrix W and converge threshold ε 
Output: A series of LFBs (U′, V′) 
Step1: Construct row normalized matrix P^{R}, construct column normalized matrix P^{C} 
Step2: Given the predefined range for T_{R} and T_{C}, get the automatically optimized parameter settings 
Step3: Under the optimized parameters T_{R} and T_{C}, initialize the sites subset U′ and update U′ and V′ iteratively until the convergence condition is met 
Step4: Report U′ and V′ 
Return A series of LFBs (U′, V′) 
Availability of data and materials
The code implemented to perform the analysis is deposited at https://github.com/cstcumt/REWISA. The detailed information of real data consists of MeRIPSeq data from 10 cell lines under 32 kinds of treatments, as listed in Additional file 1: Table S1. For each type of treatment, MeRIPSeq gets input and IP sequences respectively. Among the 32 conditions, 30 can be retrieved from MeTDBV2.0 at https://180.208.58.19/metdb_v2, and the other two can be retrieved by GEO accession numbers (SRR5080301SRR50312 and SRR5239086SRR5239109).
Abbreviations
 m^{6}A:

N^{6}mthyladenosine
 REWISA:

RNA Expression Weighted Iterative Signature Algorithm
 LFB:

Local functional block
 MeRIPSeq:

Methylated RNA immunoprecipitation sequencing
 DPBBM:

Dirichlet process based beta binomial mixture model
 ISA:

Iterative signature algorithm
 SDwC:

Standard deviation within clusters
 ASwC:

Average similarity within clusters
 SoBC:

Score of biclustering
 GO:

Gene ontology
 IoU:

Intersection over union
 FPKM:

Fragments Per Kilobase of transcript per Million
 HGF:

Hepatocyte growth factor
 HepG2:

Human liver hepatocellular cells
References
Long WL, Guo H, Sheng J, Song RH, Xu Y. Role of m6A RNA methylation in tumorigenesis and development. Biotechnol Bull. 2019;6:25.
Wang X, Lu Z, Gomez A, Hon GC, Yue Y, Han D, Fu Y, Parisien M, Dai Q, Jia G, et al. N6methyladenosinedependent regulation of messenger RNA stability. Nature. 2014;505(7481):117–20.
Xiang Y, Laurent B, Hsu CH, Nachtergaele S, Lu Z, Sheng W, Xu C, Chen H, Ouyang J, Wang S, et al. RNA m6A methylation regulates the ultravioletinduced DNA damage response. Nature. 2017;543(7646):573–6.
Shay G, Sharon MM, Dan D, Abed AlFatah M, Nitzan K, Mali SD, Vera H. m6A mRNA methylation facilitates resolution of naive pluripotency toward differentiation. Science (New York). 2015;6225(347):1.
Fustin JM, Doi M, Yamaguchi Y, Hida H, Nishimura S, Yoshida M, Isagawa T, Morioka Masaki S, Kakeya H, Manabe I, et al. RNAmethylationdependent RNA processing controls the speed of the circadian clock. Cell. 2013;155(4):793–806.
Yoon KJ, Ringeling FR, Vissers C, Jacob F, Pokrass M, JimenezCyrus D, Su Y, Kim NS, Zhu Y, Zheng L, et al. Temporal control of mammalian cortical neurogenesis by m6A methylation. Cell. 2017;171(4):877889.e817.
Han D, Liu J, Chen C, Dong L, Liu Y, Chang R, Huang X, Liu Y, Wang J, Dougherty U, et al. Antitumour immunity controlled through mRNA m6A methylation and YTHDF1 in dendritic cells. Nature. 2019;566(7743):270–4.
Shi H, Zhang X, Weng YL, Lu Z, Liu Y, Lu Z, Li J, Hao P, Zhang Y, Zhang F, et al. m6A facilitates hippocampusdependent learning and memory through YTHDF1. Nature. 2018;563(7730):249–53.
Haussmann IU, Bodi Z, SanchezMoran E, Mongan NP, Archer N, Fray RG, Soller M. m6A potentiates Sxl alternative premRNA splicing for robust Drosophila sex determination. Nature. 2016;540(7632):301–4.
Zhou J, Wan J, Gao X, Zhang X, Jaffrey SR, Qian SB. Dynamic m6A mRNA methylation directs translational control of heat shock response. Nature. 2015;526(7574):591–4.
Dominissini D, MoshitchMoshkovitz S, Schwartz S, SalmonDivon M, Ungar L, Osenberg S, Cesarkas K, JacobHirsch J, Amariglio N, Kupiec M. Topology of the human and mouse m6A RNA methylomes revealed by m6Aseq. Nature. 2012;485(7397):201–6.
Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehar J, Kryukov GV, Sonkin D, et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483(7391):603–7.
Hess ME, Hess S, Meyer KD, Verhagen LA, Koch L, Brönneke HS, Dietrich MO, Jordan SD, Saletore Y, Elemento O. The fat mass and obesity associated gene (Fto) regulates activity of the dopaminergic midbrain circuitry. Nat Neurosci. 2013;16(8):1042–8.
Schwartz S, Agarwala SD, Mumbach MR, Jovanovic M, Mertins P, Shishkin A, Tabach Y, Mikkelsen TS, Satija R, Ruvkun G. Highresolution mapping reveals a conserved, widespread, dynamic mRNA methylation program in yeast meiosis. Cell. 2013;155(6):1409–21.
Liu J, Yue Y, Han D, Wang X, Fu Y, Zhang L, Jia G, Yu M, Lu Z, Deng X. A METTL3–METTL14 complex mediates mammalian nuclear RNA N6adenosine methylation. Nat Chem Biol. 2014;10(2):93–5.
Ping XL, Sun BF, Wang L, Xiao W, Yang X, Wang WJ, Adhikari S, Shi Y, Lv Y, Chen YS. Mammalian WTAP is a regulatory subunit of the RNA N6methyladenosine methyltransferase. Cell Res. 2014;24(2):177–89.
Schwartz S, Mumbach MR, Jovanovic M, Wang T, Maciag K, Bushkin GG, Mertins P, TerOvanesyan D, Habib N, Cacchiarelli D. Perturbation of m6A writers reveals two distinct classes of mRNA methylation at internal and 5′ sites. Cell Rep. 2014;8(1):284–96.
Yue Y, Liu J, Cui X, Cao J, Luo G, Zhang Z, Cheng T, Gao M, Shu X, Ma H. VIRMA mediates preferential m6A mRNA methylation in 3′ UTR and near stop codon and associates with alternative polyadenylation. Cell discovery. 2018;4(1):1–17.
Patil DP, Chen CK, Pickering BF, Chow A, Jackson C, Guttman M, Jaffrey SR. m6A RNA methylation promotes XISTmediated transcriptional repression. Nature. 2016;537(7620):369–73.
Wen J, Lv R, Ma H, Shen H, He C, Wang J, Jiao F, Liu H, Yang P, Tan L. Zc3h13 regulates nuclear RNA m6A methylation and mouse embryonic stem cell selfrenewal. Mol Cell. 2018;69(6):1028–38.
Jia G, Fu Y, Zhao X, Dai Q, Zheng G, Yang Y, Yi C, Lindahl T, Pan T, Yang YG. N6methyladenosine in nuclear RNA is a major substrate of the obesityassociated FTO. Nat Chem Biol. 2011;7(12):885.
Zheng G, Dahl JA, Niu Y, Fedorcsak P, Huang CM, Li CJ, Vågbø CB, Shi Y, Wang WL, Song SH. ALKBH5 is a mammalian RNA demethylase that impacts RNA metabolism and mouse fertility. Mol Cell. 2013;49(1):18–29.
Wang X, Zhao BS, Roundtree IA, Lu Z, Han D, Ma H, Weng X, Chen K, Shi H, He C. N6methyladenosine modulates messenger RNA translation efficiency. Cell. 2015;161(6):1388–99.
Xiao W, Adhikari S, Dahal U, Chen YS, Hao YJ, Sun BF, Sun HY, Li A, Ping XL, Lai WY. Nuclear m6A reader YTHDC1 regulates mRNA splicing. Mol Cell. 2016;61(4):507–19.
Xu C, Wang X, Liu K, Roundtree IA, Tempel W, Li Y, Lu Z, He C, Min J. Structural basis for selective binding of m6A RNA by the YTHDC1 YTH domain. Nat Chem Biol. 2014;10(11):927–9.
Huang H, Weng H, Sun W, Qin X, Shi H, Wu H, Zhao BS, Mesquita A, Liu C, Yuan CL. Recognition of RNA N6methyladenosine by IGF2BP proteins enhances mRNA stability and translation. Nat Cell Biol. 2018;20(3):285–95.
Meyer KD, Patil DP, Zhou J, Zinoviev A, Skabkin MA, Elemento O, Pestova TV, Qian SB, Jaffrey SR. 5′ UTR m6A promotes capindependent translation. Cell. 2015;163(4):999–1010.
Liu L, Zhang SW, Zhang YC, Liu H, Zhang L, Chen R, Huang Y, Meng J. Decomposition of RNA methylome reveals comethylation patterns induced by latent enzymatic regulators of the epitranscriptome. Mol Biosyst. 2015;11(1):262–74.
Cui X, Meng J, Zhang S, Rao MK, Chen Y, Huang Y. A hierarchical model for clustering m6A methylation peaks in MeRIPseq data. BMC Genom. 2016;17(7):520.
Zhang L, He Y, Wang H, Liu H, Huang Y, Wang X, Meng J. Clustering countbased RNA methylation data using a nonparametric generative model. Curr Bioinform. 2019;14(1):11–23.
Liu H, Flores MA, Meng J, Zhang L, Zhao X, Rao MK, Chen Y, Huang Y. MeTDB: a database of transcriptome methylation in mammalian cells. Nucl Acids Res. 2014;43:D197.
Sun W, Li J, Liu S, Wu J, Zhou H, Qu L, Yang J. RMBase: a resource for decoding the landscape of RNA modifications from highthroughput sequencing data. Nucl Acids Res. 2015;44:D259–65.
van Dam S, Vosa U, van der Graaf A, Franke L, de Magalhaes JP. Gene coexpression analysis for functional classification and gene–disease predictions. Brief Bioinform. 2018;19(4):575–92.
Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N. Revealing modular organization in the yeast transcriptional network. Nat Genet. 2002;2002:1089–1089.
Bergmann S, Ihmels J, Barkai N. Iterative signature algorithm for the analysis of largescale gene expression data. Phys Rev E. 2003;67:031902.
Murali T, Kasif S. Extracting conserved gene expression motifs from gene expression data. In: Biocomputing 2003. Singapore: World Scientific; 2002, p. 77–88.
Prelić A, Bleuler S, Zimmermann P, Wille A, Bühlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics. 2006;22(9):1122–9.
Linder B, Grozhik AV, OlarerinGeorge AO, Meydan C, Mason CE, Jaffrey SR. Singlenucleotideresolution mapping of m6A and m6Am throughout the transcriptome. Nat Methods. 2015;12(8):767–72.
Ke S, Alemu EA, Mertens C, Gantman EC, Fak JJ, Mele A, Haripal B, ZuckerScharff I, Moore MJ, Park CY. A majority of m6A residues are in the last exons, allowing the potential for 3′ UTR regulation. Genes Dev. 2015;29(19):2037–53.
Vu LP, Pickering BF, Cheng Y, Zaccara S, Nguyen D, Minuesa G, Chou T, Chow A, Saletore Y, MacKay M. The N6methyladenosine (m6A)forming enzyme METTL3 controls myeloid differentiation of normal hematopoietic and leukemia cells. Nat Med. 2017;23(11):1369.
Ke S, PandyaJones A, Saito Y, Fak JJ, Vågbø CB, Geula S, Hanna JH, Black DL, Darnell JE, Darnell RB. m6A mRNA modifications are deposited in nascent premRNA and are not required for splicing but do specify cytoplasmic turnover. Genes Dev. 2017;31(10):990–1006.
Chen K, Wei Z, Zhang Q, Wu X, Rong R, Lu Z, Su J, de Magalhaes JP, Rigden DJ, Meng J. WHISTLE: a highaccuracy map of the human N6methyladenosine (m6A) epitranscriptome predicted using a machine learning approach. Nucl Acids Res. 2019;47(7):e41–e41.
Pendleton KE, Chen B, Liu K, Hunter OV, Xie Y, Tu BP, Conrad NK. The U6 snRNA m6A methyltransferase METTL16 regulates SAM synthetase intron retention. Cell. 2017;169(5):824–35.
Barbieri I, Tzelepis K, Pandolfini L, Shi J, MillánZambrano G, Robson SC, Aspris D, Migliori V, Bannister AJ, Han N. Promoterbound METTL3 maintains myeloid leukaemia by m6Adependent translation control. Nature. 2017;552(7683):126–31.
Li Z, Weng H, Su R, Weng X, Zuo Z, Li C, Huang H, Nachtergaele S, Dong L, Hu C. FTO plays an oncogenic role in acute myeloid leukemia as a N6methyladenosine RNA demethylase. Cancer Cell. 2017;31(1):127–41.
Meyer KD, Saletore Y, Zumbo P, Elemento O, Mason CE, Jaffrey SR. Comprehensive analysis of mRNA methylation reveals enrichment in 3′ UTRs and near stop codons. Cell. 2012;149(7):1635–46.
Batista PJ, Molinie B, Wang J, Qu K, Zhang J, Li L, Bouley DM, Lujan E, Haddad B, Daneshvar K. m6A RNA modification controls cell fate transition in mammalian embryonic stem cells. Cell Stem Cell. 2014;15(6):707–19.
Liu H, Wang H, Wei Z, Zhang S, Hua G, Zhang SW, Zhang L, Gao SJ, Meng J, Chen X. MeTDB V20: elucidating contextspecific functions of N6methyladenosine methyltranscriptome. Nucl Acids Res. 2018;46(D1):D281–7.
Wu X, Wei Z, Chen K, Zhang Q, Su J, Liu H, Zhang L, Meng J. m6Acomet: largescale functional prediction of individual m6A RNA methylation sites from an RNA comethylation network. BMC Bioinform. 2019;20(1):223.
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36.
Harris SL, Levine AJ. The p53 pathway: positive and negative feedback loops. Oncogene. 2005;24(17):2899–908.
Vogelstein B, FauLane D, Lane D, FauLevine AJ, Levine AJ. Surfing the p53 network. Nature. 2000;408(6810):307–10.
Levine AJ, Hu W, Feng Z. The P53 pathway: what questions remain to be explored? Cell Death Differ. 2006;13(6):1027–36.
Yu G, Wang LG, Han Y, He QY. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics J Integr Biol. 2012;16(5):284–7.
Li L, Guo Y, Wu W, Shi Y, Cheng J, Tao S. A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data. BioData Min. 2012;5(1):8.
Richards AL, Holmans P, O’Donovan MC, Owen MJ, Jones L. A comparison of four clustering methods for brain expression microarray data. BMC Bioinform. 2008;9(1):490.
Yang F, Jin H, Que B, Chao Y, Zhang H, Ying X, Zhou Z, Yuan Z, Su J, Wu B. Dynamic m6A mRNA methylation reveals the role of METTL3m6ACDCP1 signaling axis in chemical carcinogenesis. Oncogene. 2019;38(24):4755–72.
Steitz TA, Moore PB. RNA, the first macromolecular catalyst: the ribosome is a ribozyme. Trends Biochem Sci. 2003;28(8):411–8.
Zhang C, Chen Y, Sun B, Wang L, Yang Y, Ma D, Lv J, Heng J, Ding Y, Xue Y. m6A modulates haematopoietic stem and progenitor cell specification. Nature. 2017;549(7671):273–6.
Lin X, Chai G, Wu Y, Li J, Chen F, Liu J, Luo G, Tauler J, Du J, Lin S. RNA m6A methylation regulates the epithelial mesenchymal transition of cancer cells and translation of Snail. Nat Commun. 2019;10(1):1–13.
Liu J, Dou X, Chen C, Chen C, Liu C, Xu MM, Zhao S, Shen B, Gao Y, Han D. N6methyladenosine of chromosomeassociated regulatory RNA regulates chromatin state and transcription. Science. 2020;367(6477):580–6.
Yu J, Li Y, Wang T, Zhong X. Modification of N6methyladenosine RNA methylation on heat shock protein expression. PLoS ONE. 2018;13(6):e0198604.
Huang H, Weng H, Zhou K, Wu T, Zhao BS, Sun M, Chen Z, Deng X, Xiao G, Auer F. Histone H3 trimethylation at lysine 36 guides m6A RNA modification cotranscriptionally. Nature. 2019;567(7748):414–9.
Acknowledgements
Not applicable.
Funding
This work has been supported by Fundamental Research Funds for the Central Universities (Research Projects No. 2015XKQY21 to HL), the National Natural Science Foundation of China (Research Projects Nos. 61971422 to LZ, 31871337 to HL). The funding body did not play any roles in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Author information
Authors and Affiliations
Contributions
LZ and SC built the architecture for REWISA, designed and implemented the experiments, analyzed the result, and wrote the paper. JZ conducted the experiments, analyzed the result, and revised the paper. HL and JM supervised the project, analyzed the result, and revised the paper. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1. Table S1:
The detailed information of real data. The information of real data. Experiments with light green background were not included in MeTDBV2.0 yet. Experiment ID: The index of experiments; Cell line: The cell line that MeRIPSeq profiled. Expr_name in MeTDB V2.0: The retrieval information in MeTDBV2.0. If the data was not included in MeTDBV2.0, the GEO accession numbers were provided instead (indicated as light green background). Treatment: The treatment applied for the experiment. Reference: The title of the source reference. Reference ID: Reference number in the article.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Zhang, L., Chen, S., Zhu, J. et al. REWISA: unveiling local functional blocks in epitranscriptome profiling data via an RNA expressionweighted iterative signature algorithm. BMC Bioinformatics 21, 447 (2020). https://doi.org/10.1186/s1285902003787w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1285902003787w