Identification and characterization of conserved lncRNAs in human and rat brain
© The Author(s). 2017
Published: 28 December 2017
Long noncoding RNAs (lncRNAs) are involved in diverse biological processes and play an essential role in various human diseases. The number of lncRNAs identified has increased rapidly in recent years owing to RNA sequencing (RNA-Seq) technology. However, presently, most lncRNAs are not well characterized, and their regulatory mechanisms remain elusive. Many lncRNAs show poor evolutionary conservation. Thus, the lncRNAs that are conserved across species can provide insight into their critical functional roles.
Here, we performed an orthologous analysis of lncRNAs in human and rat brain tissues. Over two billion RNA-Seq reads generated from 80 human and 66 rat brain tissue samples were analyzed. Our analysis revealed a total of 351 conserved human lncRNAs corresponding to 646 rat lncRNAs.
Among these human lncRNAs, 140 were newly identified by our study, and 246 were present in known lncRNA databases; however, the majority of the lncRNAs that have been identified are not yet functionally annotated. We constructed co-expression networks based on the expression profiles of conserved human lncRNAs and protein-coding genes, and produced 79 co-expression modules. Gene ontology (GO) analysis of the co-expression modules suggested that the conserved lncRNAs were involved in various functions such as brain development (P-value = 1.12E-2), nervous system development (P-value = 1.26E-3), and cerebral cortex development (P-value = 1.31E-2). We further predicted the interactions between lncRNAs and protein-coding genes to better understand the regulatory mechanisms of lncRNAs. Moreover, we investigated the expression patterns of the conserved lncRNAs at different time points during rat brain growth. We found that the expression levels of three out of four such lncRNA genes continuously increased from week 2 to week 104, which is consistent with our functional annotation.
Our orthologous analysis of lncRNAs in human and rat brain tissues revealed a set of conserved lncRNAs. Further expression analysis provided the functional annotation of these lncRNAs in humans and rats. Our results offer new targets for developing better experimental designs to investigate regulatory molecular mechanisms of lncRNAs and the roles lncRNAs play in brain development. Additionally, our method could be generalized to study and characterize lncRNAs conserved in other species and tissue types.
Long non-coding RNAs (lncRNAs) act as regulators in diverse biological processes and are involved in many human diseases, including cancer. The expression alterations of some lncRNAs are associated with cancer patient survival . The number of identified lncRNAs has been accumulating rapidly in recent years . Despite the many efforts that have been made to predict how they function , presently, only a small fraction of lncRNAs are well characterized .
Evolutionarily conserved lncRNAs show stable and critical functions across species, despite their low number . Chodroff et al. discovered four highly conserved lncRNAs in the mouse brain. The expression pattern of these lncRNAs further indicated their putative functions in vertebrate brain development . Rats are one of the most widely used animal model organisms for elucidating drug mechanisms and studying chemical toxicity. Importantly, the genome and transcriptomic BodyMap of the rat have been generated recently . Detailed investigation of the lncRNAs conserved between humans and rats can more accurately indicate the functions of lncRNAs and further guide the experimental studies of lncRNAs in rats.
Here, we develop a computational framework for the identification and annotation of conserved lncRNAs based on gene co-expression networks, lncRNA-protein interactions, and temporal expression patterns. More than 2 billion human and rat brain RNA sequencing reads from the Sequencing Quality Control (SEQC) consortium were processed. The lncRNAs identified by our integrative pipeline and annotated by Ensembl were combined to discover lncRNAs conserved between humans and rats. Further gene ontology (GO) analysis and lncRNA-protein interactions  of the enriched co-expressed gene modules indicated the potential functions of the lncRNAs. Our study represents a new method for investigating lncRNAs and provides insight into their regulation. The results can be used to design and guide experiments that aim to validate lncRNA functions in rats. This method can be applied to study conserved lncRNAs across other species and tissue types.
Conserved lncRNAs in human and rat
Co-expression network of conserved lncRNAs and protein-coding genes
The coordinating expression of lncRNAs and protein-coding genes indicated their functional relevance. We performed a gene ontology (GO) analysis on the protein-coding genes of each module to discover their enriched GO terms and to infer the potential functions of the lncRNAs in the same module. We found that 56 of 79 co-expression modules were significantly associated with at least one biological process term (P < 0.05). Additionally, we calculated the interaction scores of the lncRNA and protein-coding gene pairs in the module using lncPro , a software tool used to predict interactions between lncRNAs and protein-coding genes. Based on the collective evidence from the co-expression analysis and interaction evaluations, we can infer the putative function of these lncRNAs.
Bidirectional lncRNA and protein-coding gene pairs
Bidirectional lncRNA protein-coding gene (PCG) pairs share the same promoter regions, which can indicate a functional relationship. Many bidirectional promoters that are associated with lncRNAs and PCGs were indicated to be associated with neuronal functions. Of the 233 human lncRNAs in the family, 41 lncRNAs were divergently transcribed from their adjacent protein coding genes, which were located at 2000 or fewer base pairs away. Furthermore, 16 of these 41 lncRNAs had the same neighboring protein-coding genes in rats. A subsequent GO analysis of 16 common protein-coding genes revealed 11 significant biological process terms. Notably, 10 of the 11 enriched biological process terms were associated with brain or neural functions in both humans and rats. Interestingly, none of the bidirectional lncRNA and protein-coding genes presented simultaneously in the co-expression modules that we identified in the previous steps, suggesting that lncRNAs exert a variety of regulatory mechanisms.
Temporal expression of lncRNAs in rat brain over the lifespan
The lifespan of rats is approximately 2.6 years. The RNA-Seq data used in this study were generated from rat brain tissues at week 2, week 6, week 21, and week 104. A temporal expression analysis of 646 conserved rat lncRNAs showed that the expression levels of 48 lncRNAs consistently increased, whereas that of 57 decreased over the average rat’s lifespan. Moreover, we found that 63 conserved human lncRNA isoforms corresponded to 48 continuously up-regulated rat lncRNAs, and 126 human lncRNA isoforms corresponded to 57 continuously down-regulated rat lncRNAs. Most of these lncRNAs do not yet have a functional annotation. When searching lncRNAdb , a database that offers functional annotations of eukaryotic lncRNAs, we found the functional annotations of eight lncRNAs (Additional file 1: Table S1). Five of these lncRNAs have functions related to the brain [6, 17–21], and two lncRNAs [22–24] have roles in tumor development.
In this study, we used the lncRNAs identified by our method and those annotated by Ensembl to detect lncRNAs conserved between humans and rats. Based on the RNA-Seq data from human and rat brain tissues, we found that many Ensembl lncRNAs were not expressed in brain due to tissue-specific expression patterns of lncRNAs. Only 40% of the annotated conserved human lncRNAs were expressed with median transcripts per million (TPM > 0) in the brain tissue samples, compared to 79% expressed newly identified lncRNAs (Additional file 2: Figure S1). These results suggest that we identified more brain-specific lncRNAs. The conserved lncRNAs between humans and rats can benefit and further guide future studies.
The genomes of most eukaryotes are complex. One gene often contains multiple isoforms with varying structures resulting from alternative splicing. These complexities challenge the computational approaches for assembling the full-length transcripts . The assemblers, such as Cufflinks and Trinity, tended to generate new isoforms belonging to the same gene family . Rat gene annotation, especially that of lncRNAs, is largely incomplete. At present, only 3267 lncRNAs are annotated in Ensembl. Multiple lncRNAs may be located within the same conserved genomic region. For instance, RP11-472I20.3–001 is a human lncRNA located in chromosome 11. We found 3 annotated lncRNAs (red) and 5 assembled lncRNAs (black) located in the corresponding orthologous rat genome region. (Additional file 3: Figure S2). This finding explained why we obtained 351 conserved human lncRNAs corresponding to 646 rat lncRNAs in our study.
Despite various assembly methods that have been developed, detecting full-length transcripts from RNA-Seq data remains a challenge. The best-performing assembly method can only detect approximately 21% of full-length human protein-coding genes from RNA-Seq data in humans . These partially detected transcripts can produce false positive lncRNA identification due to their incomplete coding sequence. Our integrative method enables the identification of more full-length lncRNAs. Additionally, the lncScore that we employed in our analysis showed higher accuracy than other methods, including CPAT, CNCI and PLEK for protein-coding potential assessments. To ensure the reliability of the downstream analysis, we applied stringent filters to further reduce false positives; however, this may have filtered out some true lncRNAs. Nevertheless, this improved assembly method will lead to more comprehensive and accurate lncRNA identification.
The method we proposed here focused on the characterization of conserved lncRNAs. Though the number of conserved lncRNAs represents only a small fraction of all lncRNAs, several studies have reported their functional importance. Thus, functional annotation of these lncRNAs could provide a critical understanding of conserved lncRNAs, which comprise an essential group of lncRNAs.
In this study, we identified lncRNAs conserved in human and rat brain. We found that these conserved lncRNAs have important functional roles and tend to be more active than most protein-coding genes. The gene co-expression network analysis suggested the potential functions of the lncRNAs. Moreover, identification of the protein-coding genes that are highly likely to interact with lncRNAs yielded novel insights into the regulatory mechanisms of lncRNAs. Our results provide targets to investigate lncRNA functions and regulatory mechanisms using the rat model.
We processed and assembled raw RNA-Seq data utilizing an integrative method (Fig. 1 left panel). Our integrative method combined reference-guided and de novo assembly strategies, enabling a more comprehensive assembly of transcripts from RNA-Seq data. After QA/QC (quality assessment and quality control) using FASTQC (v0.10.1) and Trimmomatic (v0.36) , the low quality reads were removed. The remaining reads were assembled separately by STAR (v2.4.0)-Cufflinks (v2.2.1) and Trinity (v2.1.1)-GMAP (version 2015–12-31). Then, Cuffmerge was applied to integrate the expressed transcripts (TPM > 0) from STAR-Cufflinks and Trinity-GMAP.
A series of stringent filters was adopted to distinguish lncRNAs from all assembled transcripts (Fig. 1 top middle panel). (i) LncScore (v1.0.2)  was used to remove transcripts of less than 200 bp and those having high (> 0.5) coding potential values. (ii) Cuffcompare was utilized to compare the assembled transcripts with existing gene annotations. The assembled transcripts were cataloged into specific types. We removed the transcripts that overlapped with an opposite DNA strand of known gene annotation, single-exonic transcripts without annotation, and transcripts that overlapped with protein-coding genes.
We utilized liftOver to compare the genome coordinates of human lncRNAs (hg38) to the rat genome (rn6) according to hg38ToRn6.over.chain (Fig. 1 bottom middle panel). Default parameters of liftOver were adopted. The rat lncRNAs located within or overlapping with conserved human genome regions were considered to be conserved pair-wise with human lncRNAs.
Signed weighted co-expression network construction
The expression of the protein-coding transcripts and lncRNAs in all human samples was measured by TPM (kallisto, v0.43.0) . The expression matrix was entered into the WGCNA (v1.51) to build the co-expression network. Accounting for both up- and down-regulation, we built a signed network with a minimum module size of 30 nodes (genes). After the gene module detection, the cutoff of the topological overlap of two nodes was set to 0.2 for further analysis, including degree assessment.
This project is supported by NIHR15GM114739, FDABAA-15-00121, AEDC grant #77138 and NIGMS P20GM103429.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 18 Supplement 14, 2017: Proceedings of the 14th Annual MCBIOS conference. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-18-supplement-14.
MY conceived the project, MY and DL designed the experiments, DL carried out the experiments, DL and MY performed the analysis. Both authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Li T, Xie J, Shen C, Cheng D, Shi Y, Wu Z, et al. Upregulation of long noncoding RNA ZEB1-AS1 promotes tumor metastasis and predicts poor prognosis in hepatocellular carcinoma. Oncogene. 2016;35:1575–84.View ArticlePubMedGoogle Scholar
- Iyer MK, Niknafs YS, Malik R, Singhal U, Sahu A, Hosono Y, et al. The landscape of long noncoding RNAs in the human transcriptome. Nat Genet. 2015;47:199–208.View ArticlePubMedPubMed CentralGoogle Scholar
- Liao Q, Liu C, Yuan X, Kang S, Miao R, Xiao H, et al. Large-scale prediction of long non-coding RNA functions in a coding–non-coding gene co-expression network. Nucleic Acids Res. 2011;39:3864–78.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhao Y, Li H, Fang S, Kang Y, wu W, Hao Y, et al. NONCODE 2016: an informative and valuable data source of long non-coding RNAs. Nucleic Acids Res. 2016;44:D203–8.View ArticlePubMedGoogle Scholar
- Necsulea A, Soumillon M, Warnefors M, Liechti A, Daish T, Zeller U, et al. The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature. 2014;505:635–40.View ArticlePubMedGoogle Scholar
- Chodroff RA, Goodstadt L, Sirey TM, Oliver PL, Davies KE, Green ED, et al. Long noncoding RNA genes: conservation of sequence and brain expression among diverse amniotes. Genome Biol. 2010;11:R72.View ArticlePubMedPubMed CentralGoogle Scholar
- Yu Y, Fuscoe JC, Zhao C, et al. A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages. Nat Commun. 2014;5:3230. https://doi.org/10.1038/ncomms4230.PubMedPubMed CentralGoogle Scholar
- Lu Q, Ren S, Lu M, Zhang Y, Zhu D, Zhang X, et al. Computational prediction of associations between long non-coding RNAs and proteins. BMC Genomics. 2013;14:651.View ArticlePubMedPubMed CentralGoogle Scholar
- Seqc/Maqc-Iii Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium. Nat Biotechnol. 2014;32:903–14.View ArticleGoogle Scholar
- Zhao J, Song X, Wang K. lncScore: alignment-free identification of long noncoding RNA from assembled novel transcripts. Sci Rep. 2016;6:34838. https://doi.org/10.1038/srep34838.View ArticlePubMedPubMed CentralGoogle Scholar
- Steijger T, Abril JF, Engström PG, Kokocinski F. The RGASP consortium, Hubbard TJ, et al. assessment of transcript reconstruction methods for RNA-seq. Nat. Methods. 2013;10:1177–84.Google Scholar
- Sun L, Liu H, Zhang L, Meng J. lncRScan-SVM: a tool for predicting long non-coding RNAs using support vector machine. PLoS One. 2015;10:e0139654.View ArticlePubMedPubMed CentralGoogle Scholar
- Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15:1034–50.View ArticlePubMedPubMed CentralGoogle Scholar
- Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559.View ArticlePubMedPubMed CentralGoogle Scholar
- Semple BD, Blomgren K, Gimlin K, Ferriero DM, Noble-Haeusslein LJ. Brain development in rodents and humans: identifying benchmarks of maturation and vulnerability to injury across species. Prog Neurobiol. 2013;106–107:1–16.View ArticlePubMedGoogle Scholar
- Quek XC, Thomson DW, Maag JLV, Bartonicek N, Signal B, Clark MB, et al. lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res. 2015;43:D168–73.View ArticlePubMedGoogle Scholar
- Schratt GM, Tuebing F, Nigh EA, Kane CG, Sabatini ME, Kiebler M, et al. A brain-specific microRNA regulates dendritic spine development. Nature. 2006;439:283–9.View ArticlePubMedGoogle Scholar
- Michelhaugh SK, Lipovich L, Blythe J, Jia H, Kapatos G, Bannon MJ. Mining Affymetrix microarray data for long noncoding RNAs: altered expression in the nucleus accumbens of heroin abusers. J Neurochem. 2011;116(3):459–66. https://doi.org/10.1111/j.1471-4159.2010.07126.x.View ArticlePubMedGoogle Scholar
- Gordon FE, Nutt CL, Cheunsuchon P, Nakayama Y, Provencher KA, Rice KA, et al. Increased expression of Angiogenic genes in the brains of mouse Meg3-null embryos. Endocrinology. 2010;151:2443–52.View ArticlePubMedPubMed CentralGoogle Scholar
- Clemson CM, Hutchinson JN, Sara SA, Ensminger AW, Fox AH, Chess A, et al. An architectural role for a nuclear non-coding RNA: NEAT1 RNA is essential for the structure of Paraspeckles. Mol Cell. 2009;33:717–26.View ArticlePubMedPubMed CentralGoogle Scholar
- Uhde CW, Vives J, Jaeger I, Li M. Rmst is a novel marker for the mouse ventral mesencephalic floor plate and the anterior dorsal midline cells. PLoS One. 2010;5:e8641.View ArticlePubMedPubMed CentralGoogle Scholar
- Tseng Y-Y, Moriarity BS, Gong W, Akiyama R, Tiwari A, Kawakami H, et al. PVT1 dependence in cancer with MYC copy-number increase. Nature. 2014;512:82–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Takahashi Y, Sawada G, Kurashige J, Uchi R, Matsumura T, Ueo H, et al. Amplification of PVT-1 is involved in poor prognosis via apoptosis inhibition in colorectal cancers. Br J Cancer. 2014;110:164–71.View ArticlePubMedGoogle Scholar
- Mourtada-Maarabouni M, Hedge VL, Kirkham L, Farzaneh F, Williams GT. Growth arrest in human T-cells is controlled by the non-coding RNA growth-arrest-specific transcript 5 (GAS5). J Cell Sci. 2008;121:939–46.View ArticlePubMedGoogle Scholar
- Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–20.View ArticlePubMedPubMed CentralGoogle Scholar
- Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34:525–7.View ArticlePubMedGoogle Scholar