- Open Access
DeepM6ASeq: prediction and characterization of m6A-containing sequences using deep learning
© The Author(s) 2018
- Published: 31 December 2018
N6-methyladensine (m6A) is a common and abundant RNA methylation modification found in various species. As a type of post-transcriptional methylation, m6A plays an important role in diverse RNA activities such as alternative splicing, an interplay with microRNAs and translation efficiency. Although existing tools can predict m6A at single-base resolution, it is still challenging to extract the biological information surrounding m6A sites.
We implemented a deep learning framework, named DeepM6ASeq, to predict m6A-containing sequences and characterize surrounding biological features based on miCLIP-Seq data, which detects m6A sites at single-base resolution. DeepM6ASeq showed better performance as compared to other machine learning classifiers. Moreover, an independent test on m6A-Seq data, which identifies m6A-containing genomic regions, revealed that our model is competitive in predicting m6A-containing sequences. The learned motifs from DeepM6ASeq correspond to known m6A readers. Notably, DeepM6ASeq also identifies a newly recognized m6A reader: FMR1. Besides, we found that a saliency map in the deep learning model could be utilized to visualize locations of m6A sites.
We developed a deep-learning-based framework to predict and characterize m6A-containing sequences and hope to help investigators to gain more insights for m6A research. The source code is available at https://github.com/rreybeyb/DeepM6ASeq.
- RNA modification
- Deep learning
More than 100 types of RNA modification have been discovered in eukaryotic RNAs ; among them, N6-methyladenosine (m6A) is a common and abundant RNA modification type found in various species, such as human, mouse and yeast [2–4]. m6A is preferentially located near 3’ untranslated regions (3’ UTR) and its nearby sequences mostly conform to certain motifs, i.e., DRACH (where D = A, G or U; R = A or G; H = A, C or U) in the mammalian genome  and RAC in the yeast genome . m6A is involved in diverse RNA activities including alternative splicing , an interplay with microRNAs  and translation efficiency . In addition, m6A has been linked with caner progression. It is reported that METTL3 and METTL4, which are both m6A-forming enzymes, have an impact on differentiation and apoptosis of human myeloid leukemia cell lines [10, 11].
m6A can be detected in a high-throughput manner owing to the rapid development of high-throughput sequencing technologies. m6A-Seq and Methylated RNA immunoprecipitation sequencing (MeRIP-Seq) [2, 3] are the main sequencing methods for detection of genomic regions with m6A sites via antibody capturing. Recently, m6A individual-nucleotide-resolution cross-linking and immunoprecipitation (miCLIP-Seq) enables detection of m6A at single-base resolution [5, 12]. Several bioinformatics tools have been developed to predict m6A sites in different species, e.g., m6Apred  and iRNA-Methyl  for the yeast genome, SRAMP  for the mammalian genome. These tools mainly apply existing knowledge as feature input such as a combination of k-mers and chemical properties to build models using random forest (RF) or support vector machine (SVM) algorithm. Although these tools can predict single-base m6A, the biological information surrounding m6As is still limited; this situation poses a challenge for researchers. Therefore, here we implemented a deep-learning-based framework, named DeepM6ASeq, to predict m6A-containing sequences and characterize biological features surrounding m6A. In recent years, deep learning became an state-of-the-art technology and is now employed more and more in the field of biology [16–18]. The strength of deep learning is not only in its better prediction power (in comparison with traditional machine learning classifiers), but also its ability to recognize motifs in genomic sequences. Because miCLIP-Seq data revealed precise locations of m6A sites, we explored on such data by utilizing convolutional neural network (CNN) layer as a motif detector to characterize biological features surrounding m6A, then capturing m6A’s positional preference out of the deep learning model we built. In addition, we made use of a saliency map to visualize locations of m6A sites in the sequences. The development of DeepM6ASeq, model performance and analysis of biological information will be discussed in details in the following sections.
The miCLIP-Seq dataset
Given that miCLIP-Seq data can pinpoint m6A sites at single-base resolution, these data provide us with ideal conditions to study sequences surrounding m6A sites. We collected miCLIP-Seq data from human, mouse and zebrafish [5, 12, 19]. Human and mouse data are from the same source as SRAMP, which included five cell line and tissue types, that is A549, CD8T, HEK293, brain and liver. For zebrafish, the data consisted of two biological replicates from embryonic stem cells.
For positive samples, we defined sequences with the window size of 101 bp containing m6A sites. First, all m6A sites were mapped to the longest transcripts of genes using the ENSEMBL database (release 91, http://www.ensembl.org/). Then, we randomly located m6A sites in the fixed-size windows and extracted the surrounding sequences with length up to 101 bp (if m6A sites are near a terminus of a transcript, we sliced 101-bp-size windows from the terminus). To avoid sample redundancy (because m6A sites are reported to cluster together ), before randomly locating we merged m6A sites within 50 bp and chose the centered one among the merged sites. Because zebrafish data consisted of two replicates, we chose common sites as positive samples.
A summary of dataset size
The m6A-Seq dataset
To test our model on real peaks data, we used m6A-Seq data from the HepG2 cell line and human brain (two different cell types from those used in the model) from Dominissini’s study  and processed this dataset according to their protocol . For positive samples, we retrieved the top 1000 positive peaks detected by MACS  with the highest fold enrichment and the false discovery rate (FDR) ≤ 0.05. We extracted sequences of 101 bp around the peak summits and overlapped these regions with peaks from MeT-DB database  (The MeT-DB peak score greater than 6 was required, which is the median score for human data.) to obtain reliable m6A-containing sequences. As negative samples, we used negative peaks detected by MACS (MACS identifies negative peaks by swapping immunoprecipitation samples and control samples) and split each peak into bins with a size of 101 bp(because HepG2 has limited negative peaks, we used a sliding window with a step of 20 bp when spliting peaks for data augmentation). We chose bins overlapping with exon regions and not overlapping with peaks from MeT-DB database. To evaluate the generalization of our model and to conduct a fair comparison with SRAMP, we used CD-HIT  to remove test sequence redundancy with the training data of both our model and SRAMP at an 80% similarity threshold, which is the lowest threshold provided by CD-HIT. Besides, we kept only sequences with DRACH motifs because SRAMP scans only A sites with DRACH motifs in given sequences. Finally, we got 663 positive samples and 413 negative samples in total.
The development of deep learning models
During the process of model construction, we chose the filter sizes of 10 and 5, the filter numbers of 256 and 128 for each convolution layer. The activation function for CNN layers is rectified linear unit (ReLU), tanh for the BLSTM layer and sigmoid activation after the FC layer to obtain prediction output. Additionally, we applied batch normalization and dropout  after each convolutional procedure to accelerate training and avoid overfitting separately. We used binary cross entropy as a loss function to measure the difference between the target and the predicted output and Adam as an optimization algorithm. The deep learning framework is implemented using Pytorch (https://pytorch.org).
There are three phases during the process of model building. First, we performed five-fold cross-validation on training data for optimization of hyperparameters. In this phase, we used the grid-search strategy for optimization of hyperparameters. The details of tuning parameters are given in Additional file 1: Table S1. Then, we used 1/8 of training data, which equals to 10% of the whole dataset, as validation data and fed the best parameters from the previous phase to the training phase. In the last phase, we applied our model to the independent dataset. We selected a batch size of 256, 50 maximum epochs and an early stopping strategy of patience to 5 in the first two phases.
Conversion of filters to motifs
where m stands for each element in M, and p and q are the row number and column number respectively.
The saliency map
Derivation of other classifiers
We built models of RF, Logistic Regression (LR) and SVM on mammalian dataset using sklearn (http://scikit-learn.org). For RF and LR, the feature inputs were normalized counts of kmers of 1-5. For SVM, the feature inputs were commonly used 4-kmer for saving training time. We applied the grid-search strategy on hyperparameter optimization for each classifier and chose the parameters with the best performance. The parameters used in the grid-search were listed in Additional file 1: Table S2.
where TP is true positive, TN is true negative, FP is false positive and FN is false negative. Additionally, we plotted Receiver Operating Characteristic (ROC) curves and Precision-Recall (PR) curves and calculated the areas under the curves, which are denoted by AUROC and AUPR, respectively.
Prediction of m6A-containing sequences
Model training and hyperparameter optimization
We used the mammalian dataset that consists of both human and mouse miCLIP-seq data, for optimizing the hyperparameters during the development of the model. The details of the model development are described in the Materials and Methods section. In brief, we built a deep-learning-based model that mainly consists of two CNN layers, one BLSTM layer and one FC layer, to predict whether a sequence contains m6A sites. During hyperparameter optimization, the grid-search strategy was applied to find the best parameter combination of maxpooling size, dropout rate, learning rate, units of the BLSTM layer and the FC layer. The metrics of mean performance for different parameters settings are shown in Additional file 1: Table S3. We found that no maxpooling, a higher dropout rate and a more complicated model structure contribute to the improvement of performance. Then, we chose the best parameter setting to train the model on the mammalian validation dataset and got AUROC = 0.843 and AUPR = 0.832 for validation as illustrated in Additional file 2: Figure S1.
The comparison of DeepM6ASeq with other classifiers
Performance metrics for comparison of DeepM6ASeq with other classifiers on the mammalian independent dataset
Support vector machine
DeepM6ASeq performance on m6A-Seq data
Performance metrics for comparison of DeepM6ASeq with SRAMP on the m6A-Seq dataset
We built models for human, mouse and zebrafish separately. The cross-species performance is illustrated in Additional file 2: Figure S2. As expected, the cross-species prediction was stable between human and mouse; howerver, there was a gap in the prediction of the mouse and human dataset by the zebrafish model and vice versa. Because the zebrafish dataset is from one cell line, it is possible that models from other species have limitations in terms of generalization due to the cell-line specificity.
Biological information on sequences surrounding m6A sites
Learned motifs for each species
Location preference for m6A-containing sequences
m6A is characterized by enrichment near 3’ UTR of transcripts, thus we wanted to know if our predictor could capture such location information. We performed the position analysis in a way without prior knowledge in which we split the transcripts of the independent test dataset into bins of 101-bp size, get bins with confident prediction scores and check if these bins have location preference with regard to the transcript structure. We established three confidence categories (moderate, high and very high) for prediction scores, which corresponds to 90%, 95% and 99% specificity respectively in the validation datasets (see Additional file 1: Table S5).
First, we computed the percentage of potential m6A-containing bins with scores above moderate confidence in the bins of the the whole transcripts, all exons and last exons. The result indicated that these potential m6A-containing bins are not enriched in the last exons. This finding suggests that sequences with a potential to contain m6A sites are widely distributed along the exons of transcripts (Additional file 2: Figure S4).
In summary, our location analysis indicates that sequences with a potential to contain m6A sites are widely distributed along the exons of transcripts, in particular, the potential m6A-containing sequences in the last exons are preferentially located near the start of 3’ UTR.
The saliency map for visualizing m6A sites
A saliency map is commonly used in computation version for showing each pixels’ unique quality. In the context of a genome sequence, a saliency map can measure the nucleotide importance which can have an impact on the prediction scores. Given that we had precise m6A locations from miCLIP-Seq data, we were curious whether locations of m6A sites could be uncovered by way of a saliency map. We obtained saliency maps for potential m6A-containing sequences in the independent datatsets with prediction scores with higher-than-moderate confidence via the method described by Lanchantin et al. , which, in briefly, performs point-wise multiplication of the absolute derivative of the input sequences from back-propagation and their one-hot encoding.
First, we checked the distribution of the types of the most salient nucleotides in the sequences. We extracted the nucleotides with the highest saliency score for each sequence and plotted the distribution. As shown in Additional file 2: Figure S5, nucleotide type A accounted for the majority among all the most salient nucleotides. For those most salient nucleotides rather than A, we plotted the distribution of the distance from these non-A nucleotides to the closest mapped miCLIP m6A sites as depicted in Additional file 2: Figure S6, in which the majority of these most salient non-A nucleotides are located near mapped miCLIP m6A sites.
After that, we wondered how many of these most salient As are overlapped with known m6A sites. Our result revealed that nearly 40–50% of these As belong to known m6A sites from miCLIP-data (Additional file 2: Figure S7). Besides, some of non-miCLIP m6A could be mapped to the predicted m6A sites in the Met-DB single-base m6A database. Although in zebrafish, the most salient As overlapping neither with miCLIP-Seq data nor Met-DB are more than those in human and mouse, actually, over 30% of these As belongs to the miCLIP m6A sites of one of the replicate zebrafish samples.
We propose DeepM6ASeq as a framework useful for identifying m6A-containing sequences. Nonetheless, we have some thoughts about the future research. First, although the zebrafish model has higher predictive power, biological information extracted from this model is limited probably due to the single source of the cell type. We expect additional miCLIP-Seq data to become available for zebrafish in the future to improve the current model and provide more biological information. Second, because the second CNN layer detects the combination of motifs at a higher level, it would be interesting to explore what the deep learning model could detect in this layer. An alternative approach is to apply word-embedding, a strategy widely used in the natural language processing. In this way, input sequences can be converted to words and then a deep learning model can be built to discern some patterns among the sequence words. The word-embedding strategy has been utilized for identifying chromatin accessibility . Finally, to characterize biological features surrounding m6A sites in some way without prior knowledge, we employed all the m6A sites rather than limiting ourselves to m6A sites with DRACH motifs. We believe that deep leaning method may also exert its power for predicting single-base m6A sites with DRACH motifs, in particular combined with other features such as secondary structure and conservation score.
In conclusion, we developed DeepM6ASeq, a model based on deep learning framework, to predict m6A-containing sequences and characterize biological features surrounding m6A sites. DeepM6ASeq showed better performance as compared to other machine learning classifiers and is competitive at predicting m6A-containing sequences. In addition, DeepM6ASeq can recognize the position preference of sequences harboring m6A sites. All these data corroborate the effectiveness of our models. Furthermore, taking advantage of function of motif detectors and saliency maps in the deep learning model, DeepM6ASeq learned a newly recognized m6A reader, FMR1 and helped to visualize mapped and potential m6A sites. We hope that DeepM6ASeq will provide more insights for m6A research.
YZ and MH are grateful to Tsukasa Fukunaga and Chao Zeng for valuable discussions.
Publication costs are funded by Waseda University [basic research budget]. This study was supported by the Ministry of Education, Culture, Sports, Science and Technology (KAKENHI) [grant numbers JP17K20032, JP16H05879, JP16H01318, and JP16H02484 to MH].
Availability of data and materials
The datasets and materials can be downloaded from https://github.com/rreybeyb/DeepM6ASeq.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 19 Supplement 19, 2018: Proceedings of the 29th International Conference on Genome Informatics (GIW 2018): bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-19-supplement-19.
MH conceived and supervised this study. YZ developed the pipeline and conducted all the computational experiments. YZ and MH contributed to the analysis and interpretation of the data. YZ drafted the manuscript, and MH revised it critically. All the authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Roundtree IA, Evans ME, Pan T, He C. Dynamic RNA Modifications in Gene Expression Regulation. Cell. 2017; 169(7):1187–200.View ArticleGoogle Scholar
- Meyer KD, Saletore Y, Zumbo P, Elemento O, Mason CE, Jaffrey SR. Comprehensive analysis of mRNA methylation reveals enrichment in 3’ UTRs and near stop codons. Cell. 2012; 149(7):1635–46.View ArticleGoogle Scholar
- Dominissini D, Moshitch-Moshkovitz S, Schwartz S, Salmon-Divon M, Ungar L, Osenberg S, Cesarkas K, Jacob-Hirsch J, Amariglio N, Kupiec M, Sorek R, Rechavi G. Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature. 2012; 485(7397):201–6.View ArticleGoogle Scholar
- Clancy MJ, Shambaugh ME, Timpte CS, Bokar JA. Induction of sporulation in Saccharomyces cerevisiae leads to the formation of N6-methyladenosine in mRNA: a potential mechanism for the activity of the IME4 gene. Nucleic Acids Res. 2002; 30(20):4509–18.View ArticleGoogle Scholar
- Linder B, Grozhik AV, Olarerin-George AO, Meydan C, Mason CE, Jaffrey SR. Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome. Nat Methods. 2015; 12(8):767–72.View ArticleGoogle Scholar
- Schwartz S, Agarwala SD, Mumbach MR, Jovanovic M, Mertins P, Shishkin A, Tabach Y, Mikkelsen TS, Satija R, Ruvkun G, Carr SA, Lander ES, Fink GR, Regev A. High-resolution mapping reveals a conserved, widespread, dynamic mRNA methylation program in yeast meiosis. Cell. 2013; 155(6):1409–21.View ArticleGoogle Scholar
- Roundtree IA, He C. Nuclear m(6)A Reader YTHDC1 Regulates mRNA Splicing. Trends Genet. 2016; 32(6):320–1.View ArticleGoogle Scholar
- Chen T, Hao YJ, Zhang Y, Li MM, Wang M, Han W, Wu Y, Lv Y, Hao J, Wang L, Li A, Yang Y, Jin KX, Zhao X, Li Y, Ping XL, Lai WY, Wu LG, Jiang G, Wang HL, Sang L, Wang XJ, Yang YG, Zhou Q. m(6)A RNA methylation is regulated by microRNAs and promotes reprogramming to pluripotency. Cell Stem Cell. 2015; 16(3):289–301.View ArticleGoogle Scholar
- Wang X, Zhao BS, Roundtree IA, Lu Z, Han D, Ma H, Weng X, Chen K, Shi H, He C. N(6)-methyladenosine Modulates Messenger RNA Translation Efficiency. Cell. 2015; 161(6):1388–99.View ArticleGoogle Scholar
- Vu LP, Pickering BF, Cheng Y, Zaccara S, Nguyen D, Minuesa G, Chou T, Chow A, Saletore Y, MacKay M, Schulman J, Famulare C, Patel M, Klimek VM, Garrett-Bakelman FE, Melnick A, Carroll M, Mason CE, Jaffrey SR, Kharas MG. The N6-methyladenosine (m6A)-forming enzyme METTL3 controls myeloid differentiation of normal hematopoietic and leukemia cells. Nat Med. 2017; 23(11):1369–76.PubMedPubMed CentralGoogle Scholar
- Weng H, Huang H, Wu H, Qin X, Zhao BS, Dong L, Shi H, Skibbe J, Shen C, Hu C, Sheng Y, Wang Y, Wunderlich M, Zhang B, Dore LC, Su R, Deng X, Ferchen K, Li C, Sun M, Lu Z, Jiang X, Marcucci G, Mulloy JC, Yang J, Qian Z, Wei M, He C, Chen J. METTL14 Inhibits Hematopoietic Stem/Progenitor Differentiation and Promotes Leukemogenesis via mRNA m6A Modification. Cell Stem Cell. 2018; 22(2):191–205.View ArticleGoogle Scholar
- Ke S, Alemu EA, Mertens C, Gantman EC, Fak JJ, Mele A, Haripal B, Zucker-Scharff I, Moore MJ, Park CY, Vagb? CB, Kus?nierczyk A, Klungland A, Darnell JE, Darnell RB. A majority of m6A residues are in the last exons, allowing the potential for 3’ UTR regulation. Genes Dev. 2015; 29(19):2037–53.Google Scholar
- Chen W, Tran H, Liang Z, Lin H, Zhang L. Identification and analysis of the N(6)-methyladenosine in the Saccharomyces cerevisiae transcriptome. Sci Rep. 2015; 5:13859.View ArticleGoogle Scholar
- Chen W, Feng P, Ding H, Lin H, Chou KC. iRNA-Methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition. Anal Biochem. 2015; 490:26–33.View ArticleGoogle Scholar
- Zhou Y, Zeng P, Li YH, Zhang Z, Cui Q. SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features. Nucleic Acids Res. 2016; 44(10):91.View ArticleGoogle Scholar
- Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol. 2015; 33(8):831–8.View ArticleGoogle Scholar
- Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods. 2015; 12(10):931–4.View ArticleGoogle Scholar
- Quang D, Xie X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 2016; 44(11):107.View ArticleGoogle Scholar
- Zhang C, Chen Y, Sun B, Wang L, Yang Y, Ma D, Lv J, Heng J, Ding Y, Xue Y, Lu X, Xiao W, Yang YG, Liu F. m6A modulates haematopoietic stem and progenitor cell specification. Nature. 2017; 549(7671):273–6.View ArticleGoogle Scholar
- Dominissini D, Moshitch-Moshkovitz S, Salmon-Divon M, Amariglio N, Rechavi G. Transcriptome-wide mapping of N(6)-methyladenosine by m(6)A-seq based on immunocapturing and massively parallel sequencing. Nat Protoc. 2013; 8(1):176–89.View ArticleGoogle Scholar
- Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008; 9(9):137.View ArticleGoogle Scholar
- Liu H, Wang H, Wei Z, Zhang S, Hua G, Zhang SW, Zhang L, Gao SJ, Meng J, Chen X, Huang Y. MeT-DB V2.0: elucidating context-specific functions of N6-methyl-adenosine methyltranscriptome. Nucleic Acids Res. 2018; 46(D1):281–7.View ArticleGoogle Scholar
- Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012; 28(23):3150–2.View ArticleGoogle Scholar
- LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015; 521(7553):436–44.View ArticleGoogle Scholar
- Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997; 9(8):1735–80. https://doi.org/10.1162/neco.1922.214.171.1245.
- Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014; 15(1):1929–58.Google Scholar
- Lanchantin J, Singh R, Wang B, Qi Y. Deep motif dashboard: visualizing and understanding genomic sequences using deep neural networks. Pac Symp Biocomput. 2017; 22:254–65.PubMedPubMed CentralGoogle Scholar
- Gupta S, Stamatoyannopoulos JA, Bailey TL, Noble WS. Quantifying similarity between motifs. Genome Biol. 2007; 8(2):24.View ArticleGoogle Scholar
- Liu N, Zhou KI, Parisien M, Dai Q, Diatchenko L, Pan T. N6-methyladenosine alters RNA structure to regulate binding of a low-complexity protein. Nucleic Acids Res. 2017; 45(10):6051–63.View ArticleGoogle Scholar
- Edupuganti RR, Geiger S, Lindeboom RGH, Shi H, Hsu PJ, Lu Z, Wang SY, Baltissen MPA, Jansen PWTC, Rossa M, Muller M, Stunnenberg HG, He C, Carell T, Vermeulen M. N6-methyladenosine (m6A) recruits and repels proteins to regulate mRNA homeostasis. Nat Struct Mol Biol. 2017; 24(10):870–8.View ArticleGoogle Scholar
- Batista PJ, Molinie B, Wang J, Qu K, Zhang J, Li L, Bouley DM, Lujan E, Haddad B, Daneshvar K, Carter AC, Flynn RA, Zhou C, Lim KS, Dedon P, Wernig M, Mullen AC, Xing Y, Giallourakis CC, Chang HY. m(6)A RNA modification controls cell fate transition in mammalian embryonic stem cells. Cell Stem Cell. 2014; 15(6):707–19.View ArticleGoogle Scholar
- Castro-Mondragon JA, Jaeger S, Thieffry D, Thomas-Chollier M, van Helden J. RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections. Nucleic Acids Res. 2017; 45(13):119.View ArticleGoogle Scholar
- Min X, Zeng W, Chen N, Chen T, Jiang R. Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding. Bioinformatics. 2017; 33(14):92–101.View ArticleGoogle Scholar