- Open Access
Dissecting the biological relationship between TCGA miRNA and mRNA sequencing data using MMiRNA-Viewer
BMC Bioinformatics volume 17, Article number: 336 (2016)
MicroRNAs (miRNA) are short nucleotides that interact with their target genes through 3′ untranslated regions (UTRs). The Cancer Genome Atlas (TCGA) harbors an increasing amount of cancer genome data for both tumor and normal samples. However, there are few visualization tools focusing on concurrently displaying important relationships and attributes between miRNAs and mRNAs of both cancer tumor and normal samples. Moreover, a deep investigation of miRNA-mRNA target and biological relationships across multiple cancer types by integrating web-based analysis has not been thoroughly conducted.
We developed an interactive visualization tool called MMiRNA-Viewer that can concurrently present the co-relationships of expression between miRNA-mRNA pairs of both tumor and normal samples into a single graph. The input file of MMiRNA-Viewer contains the expression information including fold changes between normal and tumor samples for mRNAs and miRNAs, the correlation between mRNA and miRNA, and the predicted target relationship by a number of databases. Users can also load their own input data into MMiRNA-Viewer and visualize and compare detailed information about cancer-related gene expression changes, and also changes in the expression of transcription-regulating miRNAs.
To validate the MMiRNA-Viewer, eight types of TCGA cancer datasets with both normal and control samples were selected in this study and three filter steps were applied subsequently. We performed Gene Ontology (GO) analysis for genes available in final selected 238 pairs and also for genes in the top 5 % (95 percentile) for each of eight cancer types to report a significant number of genes involved in various biological functions and pathways. We also calculated various centrality measurement matrices for the largest connected component(s) in each of eight cancers and reported top genes and miRNAs with high centrality measurements.
With its user-friendly interface, dynamic visualization and advanced queries, we also believe MMiRNA-Viewer offers an intuitive approach for visualizing and elucidating co-relationships between miRNAs and mRNAs of both tumor and normal samples. We suggest that miRNA and mRNA pairs with opposite fold changes of their expression and with inverted correlation values between tumor and normal samples might be most relevant for explaining the decoupling of mRNAs and their targeting miRNAs in tumor samples for certain cancer types.
Cancer is a disease caused by an uncontrolled division of abnormal cells, and it can start from anywhere in the human body and spread into surrounding tissues/organs. There are more than 100 types of cancer reported (http://www.cancer.gov/about-cancer/what-is-cancer). Cancer is affecting millions of lives across the world. According to the World Cancer Report, cancer rates could further increase by 50 % to 15 million new cases in the year 2020 (http://www.who.int/mediacentre/news/releases/2003/pr27/en/). The leading causes of cancer death among men are different from the leading causes of cancer death among women. Cancer occurrence also has racial/ethnic and geographic variations (http://www.cdc.gov/cancer/dcpc/data/).
The advent of next-generation sequencing (NGS) has opened a new avenue by which, in theory, all of the limitations of traditional technology can be overcome at a reasonable cost and the underlying chromosomal structure of an individual’s DNA can be fully characterized down to the nucleotide level. Next-generation sequencing technology in cancer studies has become an effective way to provide high sensitivity and resolution in the post-genomic era.
In 2006, the Cancer Genome Atlas (TCGA), a project initiated by the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), aimed to catalogue mutations responsible for cancer. So far, more than 30 cancer types and 10 cancer tissues (Breast, Central Nervous System, Endocrine, Gastrointestinal, Gynecologic, Head and Neck, Hematologic, Skin, Soft Tissue, Thoracic, Urologic) have been presented for potential characterization and their DNA and RNA sequencing data are publicly accessible to the community and researchers (http://cancergenome.nih.gov). The Cancer Genome Atlas project provides various types of NGS sequencing data including Exome, SNP, Methylation, mRNA, miRNA, and Clinical. Moreover, sequencing data for multiple individuals of both tumor and normal samples are available for each of these cancer types .
MicroRNAs (miRNA) are short nucleotides that interact with their target genes through 3′ untranslated regions (UTRs). MicroRNAs can at the same time target many mRNAs and fine-tune gene expression by means of cooperative or combinatorial targeting. In other words, one miRNA can at the same time target many RNAs, and many miRNAs can cooperatively target a single mRNA . In mammals, the main effects of the miRNA-mRNA interaction are the destabilization of the target mRNAs by the pairing miRNAs .
It is common to attempt to describe and/or visualize miRNA-mRNA interaction networks in order to better understand their contributions to various diseases states, including cancer. Generic studies that involve normal negative correlation have been noticed. In fact, there exist more complex issues ““decoupling” like genes being either tumor promoters or suppressors depending on the tumor-type. The miRNA and mRNA pairs can fail to show negative correlation when expected, and conversion from negative to positive correlation. Specifically, a study on miRNA-mRNA network in brain regions of human alcoholics suggested that miRNAs would finally overcome the adaptive upregulation of the targeted genes and resultantly turn those initially upregulated alcohol-responsive genes to be downregulated . The hsa-miR-183 was found to be expressed higher in most of the breast cancers, but lower expressed in estrogen receptor-positive breast tumors, which suggests the hsa-miR-183 plays different roles in different cancer cells . In a study of gastric cancer, hsa-miR-183 acts as both a tumor promoter (onco-miRNA) and tumor suppressor miRNA, which depends on the type and/or subtype of cancer .
Furthermore, some pairs of predicted/verified miRNA and their target mRNA were found to fail to show the anti-correlation in vivo . The potential mechanisms by which a target mRNA might “avoid” or become “uncoupled” from its targeting miRNA was also explained . The study across multiple TCGA cancer types by combining all cancers into a global analysis was performed .
We have previously developed a web interface tool MMiRNA-Tar  (http://bioinf1.indstate.edu/MMiRNA-Tar) that can calculate and plot the correlation of expression for mRNA-miRNA pairs across samples or over a time course using a pre-defined correlation cutoff and prediction confidence. MMiRNA-Tar provides researchers a convenient tool to calculate the co-relationship between mRNAs and miRNAs to predict their targeting relationship. In order to facilitate effective interpretation of the important attributes and values identified for each miRNA and mRNA pair, in this study we developed a prototype of the web interface tool MMiRNA-Viewer that can concurrently render the co-relationships of mRNA − miRNA pairs in both tumor and normal samples. We also investigated the target relationship between mRNA and miRNA pairs for TCGA cancer datasets and studied their biological functions in a systematic way.
The workflow of identifying and selecting both tumor and normal pairs for eight cancer types is illustrated in Fig. 1. The details of each step are described below:
Download the matched miRNA and mRNA sequencing datasets of both tumor and normal samples for available cancer types from TCGA website
We first downloaded the miRNA and mRNA expression files for all 34 cancer types from TCGA website. The expression results were taken from the TCGA Data Level 3. Specifically, miRNA-Seq data were generated by Baylor College Human Genome Sequencing Center (BCGSC), and RNA-Seq data were generated by University of North Carolina at Chapel Hill (UNC). To make measurement units between two sequencing data sets consistent, we adopted transcripts per million (TPM) expression values for both miRNA and mRNA analyses.
Combine all individual samples for both miRNA and mRNA data for each cancer type
Every sample downloaded from TCGA contains mRNA and miRNA expression values for individual samples. We used in-house developed C programs to match patients’ tumor and normal samples in the same sample order to generate four tabular data files (tumor and normal each for mRNA and miRNA expression profiles) for each cancer type.
Calculate correlation values and database prediction outcomes between miRNA and mRNA pairs
A customized C program was written to calculate Pearson correlation coefficient (PCC) and check three target prediction databases (TargetProfiler , TargetScan , and miRanda ) for prediction results of both tumor and norm samples. When we searched for the match between pre-miRNA from TCGA and mature miRNA from target prediction database, we ignored the case and omitted the last digit (tail). Although miRNA IDs with different last digits represent the distinct precursor sequences, they express identical mature sequence. A match was also called for compared cases with different lettered suffixes since they denote closely related mature sequence. The existence of the targeting relationship was claimed if a target prediction outcome was supported by at least one of the three databases mentioned above.
Calculate statistical significance of miRNA and mRNA correlation pairs
A customized R script was written to perform normalization and calculate Transcript Per Million (TPM), Average, Median, P-values and False Discovery Rate (FDR) values, including multiple testing for miRNA and mRNA expression files in both tumor and norm samples. The TPM was calculated for normalized expression. The average expression was used for calculating the Fold Change (FC) between tumor and normal samples. The calculated P-values and FDR values were used to select statistically significant pairs.
Select cancer types and obtain input miRNA and mRNA pairs
Among 34 cancer types, 11 cancer types do not have their normal samples; One cancer type (GBM) has no tumor sample available; The mRNA and miRNA files in tumor samples for 7 cancer types do not match their counts. Two cancer types (STAD and COAD) do not match their mRNA and miRNA counts in both tumor and normal samples. There are two cancer types (PAAD and CHOL) having no common target pairs while selecting opposite expression pairs. Two cancer types (PCPG and CESC) were excluded due to the FDR cutoff values (<0.1) that we employed. The cancer type (THYM) having only two matched tumor or normal samples was also excluded from the subsequent analysis. Therefore, we obtained eight total cancer types (UCEC, KICH, HNSC, THCA, KIRP, LIHC, LUSC, BLCA) with complete transcriptome data available for both tumor and normal samples. 29 ambiguous (“?”) genes were excluded because of a similar reason mentioned in the study .
Combine both tumor and normal samples and analyze differentially expressed mRNAs and miRNAs pairs between tumor and normal samples
We combined both tumor and normal results by selecting miRNA and mRNA pairs with opposite correlation values and FDR < 0.1 and predicted by at least one of three prediction algorithms. The FC in terms of expression ratio of tumor over normal was calculated for miRNAs and mRNAs.
Distinguish the proportion of upregulated, downregulated, and unchanged miRNAs and mRNAs between tumor and normal samples
In order to compare the regulation directional change for miRNA and mRNA between tumor and normal samples within and between cancers, The fraction of upregulated, downregulated, and unchanged for miRNAs and mRNAs were calculated. The scatter plots of the miRNAs in tumor vs in normal and of the mRNAs in tumor vs in normal for eight cancer types were plotted using a customized R script.
Determine the importance of mRNAs and miRNAs in cancer networks
We used various centrality measures (spanning tree, degree, closeness, betweenness) to identify important mRNAs and miRNAs in tumor and normal samples for each of the eight cancers. For degree, closeness, and betweenness calculation, we used methods described in the paper  and implemented with tnet package (http://toreopsahl.com/tnet). The calculation of spanning tree was done using a modified method described in the paper  and the tools used in implementation were the octave package for matrix calculation (https://www.gnu.org/software/octave/).
We first processed all pairs selected for each cancer type. The connected components (groups) were created according to the available connection after removing duplicated nodes (mRNAs/miRNAs). We only studied the largest connected component in the analysis. Specifically, for each studied large connected component, edge lists were generated and tnet package were executed to calculate degree, closeness, and betweenness centrality. A customized script was designed to generate adjacency matrices and calculate Laplacian matrices and determinant of reduced Laplacian matrices. The centrality of spanning tree was calculated using the overall determinant and the values from Laplacian matrices. The implementations for the measurement of all centralities were done using R and customized C programs. All measurement results were normalized using the max values in its group. “Nodes” with positive values for most of centrality measurement metrics for each of eight cancers were selected for cancer annotation.
Annotate cancer association annotation for genes and miRNAs
We annotated cancer association for gene and miRNA pairs via querying different databases. Specifically, all the genes in the table were queried to get annotations of pathways using KEGG Mapper – Search Pathway , of gene ontology using Ensembl BioMart , of cancer-related diseases using Catalogue of Somatic Mutations In Cancer (COSMIC) , and of human-related diseases using Online Mendelian Inheritance In Man (OMIM) . All the miRNAs were queried to get the experimentally validated miRNAs and annotations of their related diseases using DIANA-TarBase , Human MicroRNA Disease Database (HMDD) , and miR2Disease .
Specifically, in the KEGG Mapper – Search Pathway , the genes in the pairs were entered as objects to search against ko with all the default parameters. In the Esembl Biomart , the genes in the pairs were put into the input external references with an ID list of HGNC symbol(s) to search against the Ensembl Genes 84 Homo sapiens genes (GRCh38.p5) with all the default parameters. From the COSMIC website , the Cancer Gene Census file was downloaded and parsed. From the OMIM website , the genemap2.txt file was downloaded and parsed. On the Diana tools TarBase website , each miRNA was searched to get its related diseases. From the HMDD website , the whole dataset of miRNA-disease association data file was downloaded and parsed. On the miR2Disease website , the “All Entries” file was downloaded and parsed.
The annotations of genes and experimentally validated miRNAs from the above-mentioned databases were manually scrutinized, categorized, and tagged to tell whether they were related to the eight cancer types in this study. The annotations that were not related to the eight cancer types were not included. After the annotation filtering step, each cancer type that the gene and miRNA pair belongs to were classified into three categories: the cancer type was found in neither gene nor miRNA annotations(neither_exist), in either gene or miRNA annotations (either_exist), and in both gene and miRNA annotations (both_exist).
Analyze gene functional enrichment for eight cancers
We searched the Database for Annotation, Visualization and Integrated Discovery (DAVID)  for functional information about smaller sets of genes with their predicted targeting miRNAs having high correlation values and reported clusters with enrichment scores greater than 0.99. We also did ClueGo of Cytoscape plug-in  functional annotation for larger sets of genes which were selected according to the criteria that pairs’ correlation values to the top 5 % (95 percentile) in either tumor or normal samples in each cancer type due to the fact that the range of correlation values varies substantially among different cancer datasets.
We also ran ClueGo for a bigger size of gene list. This set of genes was selected according to the criteria that pairs’ correlation values to the top 5 % (95 percentile) in either tumor or normal samples in each cancer type due to the fact that the range of correlation values varies substantially among different cancer datasets.
Visualize relationships between miRNA and mRNA pairs using MMiRNA-Viewer
Only pairs with high confidence (FDR values are less than 0.1) were uploaded into MMiRNA-Viewer for visualization. The visualization graph presented by MMiRNA-Viewer is supported by the Node JS Application (D3.js and JQuery: https//d3js.org) with browser side application prototype. The algorithm for drawing out the graph starts with the links between mRNA and miRNA pairs. Links indicate the databases that validated the connection and normal/tumor correlation. We use link color intensity to represent the number of predictions by different databases. Then nodes are drawn based on its type. Basically we use squares to represent mRNA while circles represent miRNA. Motion actions like click, drag, and double click are also attached to the nodes.
The MMiRNA-Viewer can be played using the following steps:
Upload input miRNA and mRNA pair data. The file uploaded to the this tool is a text file and should contain expression correlations between mRNA and microRNA, mRNA and microRNA normal/tumor P() and FDR values, Number of Databases that validated the connection and Normal/Tumor correlation, and Fold Changes values for miRNA and mRNA pairs.
Search and filter miRNA and mRNA pairs. Users can search for a specific node in the graph by inserting the gene name in the search box. After inputting an mRNA or microRNA name, the user can click the search button to center its position in the graph. The ID search is case-sensitive. Additionally, there are various filters that can be applied to filter data at will. For example, when users select Node Filter as “mRNA” and Connections Filter as “>10”, then only the nodes representing mRNA with connected nodes greater than 10 will be highlighted in the graph.
Show miRNA and mRNA pair data. Users can click on two nodes that are connected with each other to get the annotation values in normal and tumor samples, which are directly displayed in the table right below the “Filters”. Users can collapse and expand the legend on the top right corner in the graph by clicking the legend icon.
Calculate Pearson Correlation Coefficient (PCC) values between MMiRNA-Viewer and CrossHub for 100 miRNA and mRNA pairs in the HNSC tumor data set
To demonstrate merits of the co-relationship for mRNA-miRNA calculated by MMiRNA-Viewer, we compared our MMiRNA-Viewer with CrossHub , a tool that can also use the TCGA mRNA RNA-seq data and miRNA-seq data to calculate the expression correlations between miRNAs and mRNAs of each cancer type. To do so, we selected 100 unique genes and miRNAs from calculated HNSC_tumor_output data set, which contains 15,870 genes and 398 miRNAs, The shell script “miRNA-Seq-RNA-Seq.co-expr.sh” in CrossHub.1.3.3 was used to calculate the PCC values.
miRNA and mRNA data samples
The number of cancer and normal samples downloaded from TCGA for each cancer type are listed in Table 1.
Significant and inversely correlated miRNA and mRNA pairs
Upon excluding cancer types that do not have matched miRNA and mRNA samples and running customized expression calculation and database prediction scripts, the filtered expression data for the eight selected cancer types are generated. We obtained 14,505 total miRNA and mRNA pairs (both tumor and normal types) which met the statistical cutoff criteria (inverse correlation values and FDR < 0.1) for eight cancer types (Additional file 1). The target prediction results by three selected databases are shown in Fig. 2. It can be clearly seen that TargetScan consistently predicts more targeting pairs than the two other tools. The result indicates that the target prediction algorithms show similar prediction trends as in our previous study .
Opposite FC miRNA and mRNA pairs
Among 14,505 pairs, we identified 238 pairs meeting the condition of opposite FC between tumor and normal samples (Additional file 2). These pairs could be cases where mRNAs are regulated by miRNAs in opposite ways. In addition, we found that KICH, KIRP, and HNSC contain the most number of mRNA-miRNA pairs with opposite FC values between tumor and normal samples. On the other hand, LUSC is the cancer with the least number of pairs with opposite FC values between tumor and normal samples. We did not find any significant pairs with mRNA up-regulation (FC > 1) and miRNA down-regulation (FC < 1) in UCEC (Additional file 3).
Common miRNA and mRNA pairs across cancers
We also compared the number of common mRNA-miRNA pairs across the eight cancer types. We observed that the overlapping pairs between cancers are not consistent (Table 2). In particular, KIRP and HNSC have the highest overlap pairs without counting KIRP and KICH because both are kidney cancer related.
Distribution of upregulated, downregulated, and unchanged miRNAs and mRNAs between tumor and normal
In different cancers the ratio of upregulated, downregulated and unchanged miRNAs and mRNAs are different among cancers (Fig. 3). Especially, the high fraction of upregulation of miRNAs are constantly accompanied by the high fraction of downregulation of the mRNAs. The extreme case is the UCEC, which has all the miRNA upregulated and all the mRNA downregulated. There are relatively higher percentages of miRNAs upregulated in each of the BLCA, HNSC, KIRP, LIHC, LUSC, and UCEC, whereas there are fewer percentages of miRNAs upregulated in the KICH and THCA. The scatter plot of differentially expressed genes in miRNA and mRNA for all 238 pairs in eight cancers is shown in Additional file 4.
Connections in significant miRNA and mRNA correlation pair for eight cancers
We obtained connection groups for each selected cancer and calculated various centrality matrices as mentioned in the Method section. All selected cancers except LUSC contain only one connected group with greater than 20 connections/nodes (LUSC: 71; BLCA: 810; HNSC: 2954; KICH: 926; KIRP: 6109; LIHC: 1264; THCA: 1164; UCEC: 193). The nodes are selected genes and miRNAs in the largest group for each cancer. In LUSC’s case, there are three groups with node size greater than 20. A connection topology for one of the top clusters with 42 “nodes” in LUSC is shown in Fig. 4.
The centrality scores of genes and miRNAs based on various matrix measurement for the largest connected component of eight cancer types were calculated and shown in Additional file 5.
Cancer association annotation for genes and miRNAs with positive centrality measurement values
When looking at the gene and miRNA cancer association, the results revealed one hit of “both_exist”, 135 hits of “either_exist”, and 102 hits of “neither_exist”. The both_exist gene and miRNA pair is fibroblast growth factor receptor 3 (FGFR3) and hsa-miR-100 in the BLCA. FGFR3 and hsa-miR-100 pair in BLCA have been identified by this approach and both of them were reported as the bladder cancer gene and miRNA. In other words, in the BLCA cancer dataset, the top negatively correlated gene and miRNA in expression values were both annotated to be related to the cancer BLCA in the above-mentioned databases. Specifically, FGFR3 gene is reported to be involved in Bladder cancer based on the result (hsa05219) from the KEGG pathway mapping database . Previous studies  showed that the loss of hsa-miR-100 leads to upregulation of FGFR3 before its mutation. Therefore, literature provide the support that our bioinformatics pipeline can detect such pairs with the opposite regulation between mRNAs and miRNAs. Literature search results associated with cancers for other pairs which are reported here to have opposite FC values between mRNAs and miRNAs and inverse correlations between tumor and normal samples are shown in Additional file 6.
In the nodes of eight cancer types, there are 510 nodes (Genes/miRNAs) across cancer types, namely appearing in more than two different cancer types. There is one gene node across four cancer types.
There are 3 miRNA nodes across 7 cancer types. The hsa-miR-203 was decreased in esophageal adenocarcinoma compared to paired adjacent mucosa ; the hsa-miR-221 was downregulated in prostate cancer ; hsa-miR-222 was upregulated in anaplastic thyroid carcinoma . There are 4 miRNA nodes across 6 cancer types. The hsa-miR-182 was upregulated in the esophageal cancer . The hsa-miR-183 was upregulated in the metastatic breast cancer . The hsa-miR-21 was upregulated in the gastric cancer . The hsa-miR-629 that was functionally inhibited could suppress motility and invasion of the clear cell renal cell carcinoma . There are 9 miRNA nodes across 5 cancer types. The hsa-miR-106b was downregulated in the metastatic pancreatic cancer . The hsa-miR-1266 was significantly decreased in expression in gastric cancer tissues . The has-mir-141 could downregulate the expression of TM4SF1 to inhibit the pancreatic cancer cells’ migration and invasion . The hsa-miR-145 was downregulated in laryngeal squamous cell carcinoma . The hsa-miR-378 was downregulated in cutaneous squamous cell carcinoma . The hsa-miR-454 was downregulated in osteosarcoma . The hsa-miR-484 was upregulated in expression in the serum samples of early breast cancer patients . The hsa-miR-625 was upregulated in bile duct cancer . The hsa-miR-940 was downregulated in hepatocellular carcinoma tissues and cell lines .
Gene functional enrichment analysis
We searched the Database for Annotation, Visualization, and Integrated Discovery (DAVID)  for functional information about 238 pairs of genes with their predicted targeting miRNAs at high correlation values. Enrichment of these genes was found in several GO biological processes. Some of these genes are involved in protein transport and localization, Some of these genes are involved in regulation of protein modification process, some of these genes are associated with oxidation-reduction, and some genes are involved in protein modification metabolic process. Genes associated GO terms from 238 pairs are shown in Table 3.
We used Cytoscape ClueGO plugin to do the GO enrichment of the 32 files of the top 5 % expression correlation coefficient of different levels at Normal High (NH), Normal Low (NL), Tumor High (TH), Tumor Low (TL) genes of eight cancers. Only BLCA, HNSC, KICH, KIRP, LIHC, and THCA have GO enrichment results. BLCA GO terms were only found in TL dataset, so we only used HNSC, KICH, KIRP, LIHC, and THCA to discover GO terms that would be shared by NH and TL or by NL and TH in each of the five cancers.
The GO terms shared by NH and TL or by NL and TH in HNSC, KICH, KIRP, LIHC, and THCA were denoted by Venn diagrams (Additional file 7). Only HNSC, KICH, and KIRP had GO terms shared by NH and TL or by NL and TH. In LIHC and THCA there was no GO terms shared by NH and TL or by NL and TH. We found that there were 35 shared GO terms in HNSC_shared_GO_in_NH_TL, 2 in HNSC_shared_GO_in_NL_TH, 41 in KICH_shared_GO_in_NH_TL, 2 in KICH_shared_GO_in_NL_TH, 99 in KIRP_shared_GO_in_NH_TL, 47 in KIRP_shared_GO_in_NL_TH, 0 in LIHC_shared_GO_in_NH_TL, 0 in LIHC_shared_GO_in_NL_TH, 0 in THCA_shared_GO_in_NH_TL, and 0 in THCA_shared_GO_in_NL_TH.
It is interesting that the HNSC_shared_GO_in_NH_TL and KICH_shared_GO_in_NH_TL have many common GO terms like mitotic sister chromatid segregation, G1/S transition of mitotic cell cycle, regulation of transcription involved in G1/S transition of mitotic cell cycle, and others. HNSC_shared_GO_in_NL_TH and KICH_shared_GO_in_NL_TH have the exact two common GO terms, histone deacetylase complex and positive regulation of transporter activity. These common GO terms suggest that the HNSC and KICH cancer have genes of similar functions that interacts with the miRNA, but the expression correlation coefficients between these genes and miRNAs are reversed by both cancers. Genes associated GO terms for these larger datasets are shown in Additional file 8.
We have developed a prototype visualization tool -MMiRNA-Viewer with a friendly user interface. Currently MMiRNA-Viewer supports various types of miRNA-mRNA co-expression profile data. It also has the function of filtering various miRNAs and mRNA pairs for better prediction accuracy. The tool also presents analysis results in intuitive visualizations and support dynamic uploading and comparison of files to help users search biological annotation of customized miRNAs and mRNAs pairs. Specifically, the graph shows two types of nodes summarizing average expression information for mRNAs and miRNAs, and two types of links representing target relationships between miRNA and mRNA pairs in normal and tumor samples. Users can visualize detailed information about cancer-related gene expression changes, and also changes in the expression of transcription-regulating miRNAs for well-characterized cancer genomes. Users can also drag nodes and their associated partners to isolated positions for better visualization. This can help users to better study the interaction relationships of a miRNA and mRNA pair and can make a node-link diagram more suitable for publication. Moreover, with displayable gene/miRNA labeling for each node, users can capture the screenshots of some sub-networks for their publications. Not only does MMiRNA-Viewer allow for the viewing of a single sample’s mRNA miRNA pairs, but this tool also has the functionality to display pairs in common between two separate files (e.g. customized input file vs built-in TCGA datasets). An interface screenshot of MMiRNA-Viewer is shown in Fig. 5.
MMiRNA-Viewer is available at http://bioinf1.indstate.edu/MMiRNA-Viewer.
Comparison of Pearson Correlation Coefficient (PCC) values between MMiRNA-Viewer and CrossHub for 100 miRNA and mRNA pairs
When compared to a short running time (less than one minute) using MMiRNA-Viewer, it took CrossHub 9 min to calculate correlation coefficient values for 1836 pairs. The average value of absolute difference between two methods is ~0.07. The calculated correlation values by two methods are shown in Additional file 9.
In this study, we developed a web-based tool MMiRNA-Viewer to upload input data files and visualize the results in an intuitive way that user can quickly locate their interested pairs for further functional study. The current version has many interesting features, but its functionalities can be further expanded. For instance, we would like to incorporate other functionalities with publicly available literature search and pathways mapping study.
Our study assumed that the tumor is activated via mRNAs mutation directly targeted by miRNAs. Results that miRNAs and mRNAs are inversely correlated exist in 238 selected pairs for eight studied cancers. Our protocol that inversely correlated miRNA and mRNA pairs should be oppositely regulated in tumor samples could be useful for prioritizing cancer associated pairs.
Current example input files for MMiRNA-Viewer were generated using TCGA cancer sequencing data (Level 3) to compute the correlation coefficients and target relationships for miRNA and mRNA pairs between tumor and normal samples using three state-of-art database prediction algorithms. We took FDR cutoff (<0.1) to select significantly correlated miRNA and mRNA pairs in the initial step so that more pairs potentially associated with cancers through inverse correlation could be included. Under the assumption that miRNA and mRNA pairs are often inversely correlated and their FC values should be opposite between tumor and normal samples, we obtained 238 pairs for GO biological function and pathway studies.
We employed a modified spanning tree centrality method for calculation. We have found that due to the definition of spanning tree, some nodes have pre-determined values of simply 1 and 0 and thus are not very interesting and can be predicted. This does help to calculate the result easier as we can remove some nodes and still generate the exact results for all nodes that cannot be predicted, thus increasing speed and precision of the calculation. We can then go back and fill in or modify the predictable values. Further, we have introduced a method to modify the value of the 1 s to better reflect their actual centrality while still using the spanning tree method as a back bone. For KIRP cluster, we used an altered approach for the calculation of spanning tree due to the large number of connections in the cluster. Specifically, the algorithm first found the most distant point from each “node” (gene or miRNA), then broke the nodes into two groups and removed branches from each two halves to calculate adjusted spanning tree values. Finally, two sets of results were merged according to original connections to report final spanning tree values.
Unlike our algorithm that filters out the pairs using P-values and FDR of the calculated correlation coefficients and using the above-mentioned 3-database matching, CrossHub, doesn’t filter out the calculated correlation coefficients of miRNAs and mRNAs via their reported P-values, via unreported FDR of their correlation coefficients, and via their reported miRNA-mRNA targeting databases. The two programs are basically consistent with respect to the calculation of the expression correlation coefficients of the miRNA-mRNA in the same dataset (HNSC) in TCGA. The slight difference of calculated correlation coefficient values might be due to the fact that CrossHub adopted read number mapped to each genes, whereas MMiRNA-Viewer uses TPM, so the mapping process might cause differences. In CrossHub pipeline, “By default, CrossHub performs the analysis for top 30 overexpressed microRNA with read counts (across all the samples) from 20,000 to 4,000,000. These miRNA represent the most important fraction in the context of biological regulation”. In addition, the memory usage of CrossHub is huge and running time is quite long.
We developed an intuitive graph visualization tool MMiRNA-Viewer that provides a user interface for various cancer types of miRNA-mRNA expression data with significant interaction pairs. Our MMiRNA-Viewer supports dynamic queries through intuitive user interactions to help users search meaningful cancer biological findings. We also suggested that miRNA and mRNA pairs with opposite FC values of their expression and with inverted correlation values between tumor and normal samples might be potential pairs to be responsible for transcription decoupling for certain cancer types.
The Cancer Genome Atlas
National Cancer Institute
National Human Genome Research Institute
Baylor College Human Genome Sequencing Center
University of North Carolina at Chapel Hill
Transcripts Per Million
Pearson correlation coefficient
False discovery rate
Kyoto encyclopedia of genes and genomes
Catalogue of somatic mutations in cancer
Online mendelian inheritance in man
Human MicroRNA disease database
Database for annotation, visualization and integrated discovery
Bladder urothelial carcinoma
Breast invasive carcinoma
Cervical squamous cell carcinoma and endocervical adenocarcinoma
Lymphoid neoplasm diffuse large B-cell Lymphoma
FFPE Pilot Phase II
Head and neck squamous cell carcinoma
Kidney renal clear cell carcinoma
Kidney renal papillary cell carcinoma
Acute myeloid leukemia
Brain lower grade glioma
Liver hepatocellular carcinoma
Lung squamous cell carcinoma
Ovarian serous cystadenocarcinoma
Pheochromocytoma and Paraganglioma
Skin cutaneous melanoma
Testicular germ cell tumors
Uterine corpus endometrial carcinoma
Tomczak K, Czerwińska P, Wiznerowicz M. The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp Oncol. 2015;19(1A):A68–77.
Nunez YO, Truitt JM, Gorini G, Ponomareva ON, Blednov YA, Harris RA, Mayfield RD. Positively correlated miRNA-mRNA regulatory networks in mouse frontal cortex during early stages of alcohol dependence. BMC Genomics. 2013;14(1):1–21.
Guo H, Ingolia NT, Weissman JS, Bartel DP. Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature. 2010;466(7308):835–40.
Li P, Sheng C, Huang L, Zhang H, Huang L, Cheng Z, Zhu Q. MiR-183/-96/-182 cluster is up-regulated in most breast cancers and increases cell proliferation and migration. Breast Cancer Res. 2014;16(6):1–17.
Cao LL, Xie JW, Lin Y, Zheng CH, Li P, Wang JB, Lin JX, Lu J, Chen QY, Huang CM. miR-183 inhibits invasion of gastric cancer by targeting Ezrin. Int J Clin Exp Pathol. 2014;7(9):5582–94.
Shahab SW, Matyunina LV, Mezencev R, Walker LD, Bowen NJ, Benigno BB, McDonald JF. Evidence for the complexity of MicroRNA-mediated regulation in ovarian cancer: a systems approach. PLoS One. 2011;6(7):e22508.
Jansson MD, Lund AH. MicroRNA and cancer. Mol Oncol. 2012;6(6):590–610.
Jacobsen A, Silber J, Harinath G, Huse JT, Schultz N, Sander C. Analysis of microRNA-target interactions across diverse cancer types. Nat Struct Mol Biol. 2013;20(11):1325–32.
Liu Y, Baker S, Jiang H, Stuart G, Bai Y. Correlating bladder cancer risk genes with their targeting microRNAs using MMiRNA-Tar. Genomics Proteomics Bioinformatics. 2015;13(3):177–82.
Oulas A, Karathanasis N, Louloupi A, Iliopoulos I, Kalantidis K, Poirazi P. A new microRNA target prediction tool identifies a novel interaction of a putative miRNA with CCND2. RNA Biol. 2012;9(9):1196–207.
Agarwal V, Bell GW, Nam J-W, Bartel DP. Predicting effective microRNA target sites in mammalian mRNAs. eLife. 2015;4:e05005.
John B, Enright AJ, Aravin A, Tuschl T, Sander C, Marks DS. Human MicroRNA targets. PLoS Biol. 2004;2(11):e363.
Qi X, Fuller E, Luo R, Zhang C-q. A novel centrality method for weighted networks based on the Kirchhoff polynomial. Pattern Recogn Lett. 2015;58:51–60.
Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44(D1):D457–62.
Kinsella RJ, Kahari A, Haider S, Zamora J, Proctor G, Spudich G, Almeida-King J, Staines D, Derwent P, Kerhornou A, Kersey P, Flicek P. Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database (Oxford). 2011;2011:bar030.
Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, Ding M, Bamford S, Cole C, Ward S, et al. COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. 2015;43(D1):D805–11.
Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33(Database Issue):D514–7.
Vlachos IS, Paraskevopoulou MD, Karagkouni D, Georgakilas G, Vergoulis T, Kanellos I, Anastasopoulos IL, Maniou S, Karathanou K, Kalfakakou D, Fevgas A, Dalamagas T, Hatzigeorgiou AG. DIANA-TarBase v7.0: indexing more than half a million experimentally supported miRNA:mRNA interactions. Nucleic Acids Res. 2015;43(Database issue):D153–9.
Lu M, Zhang Q, Deng M, Miao J, Guo Y, Gao W, Cui Q. An analysis of human MicroRNA and disease associations. PLoS One. 2008;3(10):e3420.
Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, Li M, Wang G, Liu Y. miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res. 2009;37 suppl 1:D98–104.
da Huang W, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37(1):1–13.
Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, Fridman W-H, Pagès F, Trajanoski Z, Galon J. ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics. 2009;25(8):1091–3.
Krasnov GS, Dmitriev AA, Melnikova NV, Zaretsky AR, Nasedkina TV, Zasedatelev AS, Senchenko VN, Kudryavtseva AV. CrossHub: a tool for multi-way analysis of The Cancer Genome Atlas (TCGA) in the context of gene expression regulation mechanisms. Nucleic Acids Res. 2016.
Catto JW, Miah S, Owen HC, Bryant H, Myers K, Dudziec E, Larre S, Milo M, Rehman I, Rosario DJ, et al. Distinct microRNA alterations characterize high- and low-grade bladder cancer. Cancer Res. 2009;69(21):8472–81.
Hezova R, Kovarikova A, Srovnal J, Zemanova M, Harustiak T, Ehrmann J, Hajduch M, Svoboda M, Sachlova M, Slaby O. Diagnostic and prognostic potential of miR-21, miR-29c, miR-148 and miR-203 in adenocarcinoma and squamous cell carcinoma of esophagus. Diagn Pathol. 2015;10:42.
Xiaoli Z, Yawei W, Lianna L, Haifeng L, Hui Z. Screening of target genes and regulatory function of miRNAs as prognostic indicators for prostate cancer. Med Sci Monit. 2015;21:3748–59.
Fassina A, Cappellesso R, Simonato F, Siri M, Ventura L, Tosato F, Busund LT, Pelizzo MR, Fassan M. A 4-MicroRNA signature can discriminate primary lymphomas from anaplastic carcinomas in thyroid cytology smears. Cancer Cytopathol. 2014;122(4):274–81.
Zhao JY, Wang F, Li Y, Zhang XB, Yang L, Wang W, Xu H, Liu DZ, Zhang LY. Five miRNAs considered as molecular targets for predicting esophageal cancer. Med Sci Monit. 2015;21:3222–30.
Marino AL, Evangelista AF, Vieira RA, Macedo T, Kerr LM, Abrahao-Machado LF, Longatto-Filho A, Silveira HC, Marques MM. MicroRNA expression as risk biomarker of breast cancer metastasis: a pilot retrospective case-cohort study. BMC Cancer. 2014;14:739.
Wang D, Fan Z, Liu F, Zuo J. Hsa-miR-21 and Hsa-miR-29 in tissue as potential diagnostic and prognostic biomarkers for gastric cancer. Cell Physiol Biochem. 2015;37(4):1454–62.
Jingushi K, Ueda Y, Kitae K, Hase H, Egawa H, Ohshio I, Kawakami R, Kashiwagi Y, Tsukada Y, Kobayashi T, et al. miR-629 Targets TRIM33 to Promote TGFbeta/Smad Signaling and Metastatic Phenotypes in ccRCC. Mol Cancer Res. 2015;13(3):565–74.
Luo ZL, Luo HJ, Fang C, Cheng L, Huang Z, Dai R, Li K, Tian FZ, Wang T, Tang LJ. Negative correlation of ITCH E3 ubiquitin ligase and miRNA-106b dictates metastatic progression in pancreatic cancer. Oncotarget. 2016;7(2):1477–85.
Chen L, Lu MH, Zhang D, Hao NB, Fan YH, Wu YY, Wang SM, Xie R, Fang DC, Zhang H. miR-1207-5p and miR-1266 suppress gastric cancer growth and invasion by targeting telomerase reverse transcriptase. Cell Death Dis. 2014;5:e1034.
Xu L, Li Q, Xu D, Wang Q, An Y, Du Q, Zhang J, Zhu Y, Miao Y. hsa-miR-141 downregulates TM4SF1 to inhibit pancreatic cancer cell invasion and migration. Int J Oncol. 2014;44(2):459–66.
Karatas OF, Suer I, Yuceturk B, Yilmaz M, Hajiyev Y, Creighton CJ, Ittmann M, Ozen M. The role of miR-145 in stem cell characteristics of human laryngeal squamous cell carcinoma Hep-2 cells. Tumour Biol. 2015.
Sand M, Skrygan M, Georgas D, Sand D, Hahn SA, Gambichler T, Altmeyer P, Bechara FG. Microarray analysis of microRNA expression in cutaneous squamous cell carcinoma. J Dermatol Sci. 2012;68(3):119–26.
Niu G, Li B, Sun J, Sun L. miR-454 is down-regulated in osteosarcomas and suppresses cell proliferation and invasion by directly targeting c-Met. Cell Prolif. 2015;48(3):348–55.
Zearo S, Kim E, Zhu Y, Zhao JT, Sidhu SB, Robinson BG, Soon P. MicroRNA-484 is more highly expressed in serum of early breast cancer patients compared to healthy volunteers. BMC Cancer. 2014;14:200.
Wang M, Wen TF, He LH, Li C, Zhu WJ, Trishul NM. A six-microRNA set as prognostic indicators for bile duct cancer. Int J Clin Exp Med. 2015;8(10):17261–70.
Yuan B, Liang Y, Wang D, Luo F. MiR-940 inhibits hepatocellular carcinoma growth and correlates with prognosis of hepatocellular carcinoma patients. Cancer Sci. 2015;106(7):819–24.
The results published here are in whole or part based upon data generated by the TCGA Research Network: http://cancergenome.nih.gov/. We thank Andrew Carrillo for assisting web design, Yihua Bai for assisting to debug Python scripts, and Hunter Gill, Meghan VanWanzeele, and Puni Rawala for helping with data check. We also thank Cheng Zhao for offering valuable mathematical advice.
This article has been published as part of BMC Bioinformatics Volume 17 Supplement 13, 2016: Proceedings of the 13th Annual MCBIOS conference. The full contents of the supplement are available online at http://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-17-supplement-13.
This research and its publication were supported by the Start-Up funds, University Research Committee (URC) and COMPETE grants of Indiana State University, USA to YB.
Availability of data and materials
The datasets supporting the conclusions of this article are included within the article and its Supplementary files.
YB designed and supervised the project, provided biological interpretation, and wrote the manuscript. LD performed cancer association analyses and helped with data preparation. SB designed the web interface and deployed the software on the server. SB wrote C code to perform pairwise correlation calculation and target database prediction. JB downloaded the cancer data and calculated statistics and selected pairs in each step. ER conducted target prediction algorithm comparison and provided MMiRNA-Viewer documentation. FJ performed centrality calculation. JW performed the GO analysis. HJ wrote R code for statistical filtering and offered statistical advice. GS was involved in developing the MMiRNA-Viewer, interpreting findings, and critically reviewing this manuscript for content and accuracy. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
14,505 total miRNA and mRNA pairs (both tumor and normal types) which met the statistical cutoff criteria (inverse correlation values and FDR < 0.1). This file contains 14,505 pairs for eight cancer types. (XLSX 1655 kb)
238 miRNA and mRNA pairs (both tumor and normal types) which met the cutoff criteria of inverse correlation and opposite fold change values and FDR < 0.1. This file contains 14,505 pairs for eight cancer types. (XLSX 93 kb)
Statistics of the presence in eight cancer types for 238 miRNA and mRNA pairs (both tumor and normal types) which met the cutoff criteria of inverse correlation and opposite fold change values and FDR < 0.1. (XLSX 48 kb)
The scatter plot of differentially expressed genes in miRNA and mRNA for all 238 pairs in eight cancers. In this file, the expression mean values of the miRNA and mRNA of the 238 pairs of each cancer are plotted. The X-axis value of a dot in the plot is the miRNA or mRNA average expression value in the normal tissues. The Y-axis value of a dot in the plot is the miRNA or mRNA average expression value in the tumor tissues. (PDF 331 kb)
The centrality scores of genes and miRNAs based on various matrix measurement for the largest connected component of eight cancer types. This file contains results for the largest cluster for all seven other cancers and the second largest cluster for LUSC. (XLSX 886 kb)
Cancer association results for 238 pairs which are reported to have opposite FC values between mRNAs and miRNAs and inverse correlations between tumor and normal samples. (XLSX 23 kb)
The Venn Diagrams of GO terms overlapping for the top 5 % expression correlation coefficient of different levels at Normal High (NH), Normal Low (NL), Tumor High (TH), Tumor Low (TL) genes of eight cancers. (PDF 920 kb)
The Genes and their associated GO terms for the top 5 % expression correlation coefficient of different levels at Normal High (NH), Normal Low (NL), Tumor High (TH), Tumor Low (TL) genes of eight cancers. (XLSX 44 kb)
Comparison of correlation coefficient values between MMiRNA-Viewer and CrossHub (XLSX 156 kb)
About this article
Cite this article
Bai, Y., Ding, L., Baker, S. et al. Dissecting the biological relationship between TCGA miRNA and mRNA sequencing data using MMiRNA-Viewer . BMC Bioinformatics 17, 336 (2016). https://doi.org/10.1186/s12859-016-1219-y