An integrated network of microRNA and gene expression in ovarian cancer

Background Ovarian cancer is a deadly female reproductive cancer. Understanding the biological mechanisms underlying ovarian cancer could help lead to quicker and more accurate diagnosis and more effective treatments. Both changes in microRNA(miRNA) expression and miRNA/mRNA dysregulation have been associated with ovarian cancer. With the availability of whole-genome miRNA and mRNA sequencing we now have new potentials to study these associations. In this study, we performed a comprehensive analysis of miRNA and mRNA expression in ovarian cancer using an integrative network approach combined with association analysis. Results We developed an integrative approach to construct a network that illustrates the complex interplay among miRNA and gene expression from a systems perspective. Our method is composed of expanding networks from eQTL associations, building network associations in eQTL analysis, and then combine the networks into an integrated network. This integrated network takes account of miRNA expression quantitative trait loci (eQTL) associations, miRNAs and their targets, protein-protein interactions, co-expressions among miRNAs and genes respectively. Applied to the ovarian cancer data set from The Cancer Genome Atlas (TCGA), we created an integrated network with 167 nodes containing 108 miRNA-target interactions and 145 from protein-protein interactions, starting from 44 initial eQTLs. This integrated network encompassed 26 genes and 14 miRNAs associated with cancer. In particular, 11 genes and 12 miRNAs in the integrated network are associated with ovarian cancer. Conclusion We demonstrated an integrated network approach that integrates multiple data sources at a systems level. We applied this approach to the TCGA ovarian cancer dataset, and constructed a network that provided a more inclusive view of miRNA and gene expression in ovarian cancer. This network included four separate types of interactions among miRNAs and genes. Simply analyzing each interaction component in isolation, such as the eQTL associations, the miRNA-target interactions or the protein-protein interactions, would create a much more limited network than the integrated one.


Background
Ovarian cancer is the deadliest reproductive cancer in women, accounting for 5% of female cancer deaths. It is estimated that there will be 21,980 new cases and 14,270 deaths from ovarian cancer in 2014 [1]. The overall 5 year relative survival is less than 44%, but when diagnosed in the earliest stages the 5 year survival is over 90%. [2]. Therefore, a better understanding of the biological mechanisms of ovarian cancer is crucial to create earlier diagnosis and more effective treatment.
MicroRNAs (miRNAs) are small (~22 nucleotides) noncoding RNAs that regulate gene expression by targeting complementary mRNA, which triggers a translational blockade or degradation [3]. miRNAs have been increasingly recognized as an important participant of gene regulation activity [4] that affect various cellular processes [5] and contribute to disease pathogenesis in a wide variety of diseases [6] including cancer [7]. Changes in miRNA expression and miRNA dysregulation have been found in cancer, including ovarian cancer [8][9][10][11][12][13]. In another recent study, miRNAs were described as regulators of tumor phenotype [14]. Early studies show that miRNA expression profiles can be used to classify cancer phenotypes [15]. A study by The Cancer Genome Atlas (TCGA) found three miRNA subtypes in high grade serous ovarian cancer [16]. By exploiting the miRNA expression profiles and their effects on gene expression and other phenotypes in cancer, more profound markers for prognosis can be identified and utilized.
Genetic variation has also been determined to have functional effects on cancer through miRNA expression as well. Various studies have indicated the genetic variants such as single nucleotide polymorphisms (SNPs) are associated with cancer risk in individuals [17]. Particularly, eQTL analysis [18][19][20][21][22][23][24][25] provides a powerful tool to study how genetic variation affects quantitative traits, including measurable phenotypes like gene or miRNA expression. For instance, miRNA eQTL analysis [6,26,27] has also been performed to find genetic loci that introduce differences in miRNA expression.
In general, eQTL analysis analyzes the effect of genetic variation on gene expression by utilizing correlation or regression analysis between the genotypes of a genetic variant and a particular trait [25]. Such analysis assumes that genetic variants are independent, and that the expressions of different genes are independent as well. Nonetheless, many genetic variants are correlated, and the genes can be co-regulated as well. To address the correlation or interactions among these genetic variants or genes, new machine learning methods such as multi-task learning methods [28,29] have been designed to consider all the genes during regression while considering the network structure among genetic variants and genes [30][31][32]. New methods have also been developed to seek for other relationships, such as redundancy among eQTLs [33] where multiple genetic variants have the same effect on a particular trait and are thus exchangeable.
However, it is still unclear to what extent that changes in miRNA expression, an epigenetic marker, affect gene expression and bring about fluctuations in biological pathways and networks in ovarian cancer. It has recently been determined that some miRNAs promote epithelialto-mesenchymal transition (EMT) and in turn, EMT transition promotes cell mobility and invasion of cancerous cells [14]. In efforts to understand such complex effects of miRNA expression on gene expression, we develop an integrated network approach that combines expression quantitative trait locus (eQTL) analysis with network analysis using various types of relationships and interactions among miRNAs and genes.
In order to understand the effects discussed above, we perform a genome wide eQTL analysis between miRNA expression and gene expression in ovarian cancer. The suggested method can help identify a set of miRNA eQTLs that have significant impact on gene expression. Using these miRNA eQTLs and their perturbed genes as seed nodes for network expansion, we created an integrated network that utilizes multiple types of networks such as miRNA targets and protein-protein interactions. In parallel, we used another graph guided machine learning model for eQTL analysis that incorporates correlations among miRNAs and genes simultaneously [30]. With these various types of relationships combined, we are able to generate an integrated network that not only captures different types of interactions among these miRNAs and genes, but also includes important cancer genes embedded in a network view. Such an integrative approach thus generates insightful results that could shed light on the interaction of miRNA, gene expression and their contribution to the development and progression of ovarian cancer.

Methods
We developed an integrative approach that combines the strength of eQTL analysis with network analysis, toward the goal of building an integrated network to understand the relationship between miRNA expression and gene expression. The integrated network is composed of various types of relationships among miRNAs and genes, including miRNA-gene eQTL associations, miRNAs and their targets, protein-protein interactions, and correlation networks of both miRNA and gene expression.
The overall workflow of our integrative approach is illustrated in Figure 1. Our approach consisted of five phases, including data preprocessing, eQTL analysis, network association, network expansion and network integration. Prior to analysis, we performed data pre-processing in which, we collected and normalized the data. The next phase of our workflow was the eQTL analysis between miRNA and gene expression. Here, we used Matrix eQTL to perform the analysis. In the network expansion phase, we used TarBase [34] to identify targets based on the eQTLs generated from Matrix eQTL [35]. Next, we found protein-protein interactions connected with the identified miRNA targets. In order to select the most confident interactions, we chose the interactions with a p-value ≤ 0.05. From there, we expanded the network using the selected protein-protein interactions using DAPPLE [36], generating the initial integrated network. Simultaneously, in the network association phase we used MtLasso2G [30] to discover network associations between miRNAs to build miRNA correlation networks. Finally, we merged the eQTL expanded network and the miRNA correlation networks to create the integrated network.

Data preprocessing
We downloaded miRNA isoform and mRNA expression data for 480 samples of serous ovarian cancer from the Figure 1 The overall workflow of our integrative approach. The workflow to create the integrated network consisted of four phases; eQTL analysis, network association, network expansion and network integration. We began with data pre-processing in which we collected data and normalized it. The first phase of the workflow started with an eQTL analysis between miRNA and gene expression data using Matrix eQTL. Once the eQTL analysis was complete, we proceeded to the network expansion phase. In the network expansion phase, we used TarBase to identify miRNA targets based on the eQTLs found in Matrix eQTL. Next, we found protein-protein interactions for the identified targets. To select the most confident interactions, we chose the interactions with a p-value ≤ 0.05. Then we expanded the network with the selected protein-protein interactions using DAPPLE, creating the first integrated network. Simultaneously, we used MtLasso2G to discover network associations between miRNAs, creating miRNA correlation networks. We then combined the eQTL expanded network with the correlation networks to create the final integrated network. TCGA data portal [37]. Both data sets were generated on the Illumina HiSeq platform. To pre-process the data, we removed samples with missing data and extracted quantifications for 183 miRNAs and 13,536 genes. To reduce the variation between samples, we used between-sample quantile normalization to normalize both the miRNA and gene expression data.

Network expansion
To help exploit the relationships between miRNA and gene expression, we performed an eQTL analysis between the miRNA expression and gene expression profiles. The eQTL analysis was performed using Matrix eQTL, an R package [35]. Matrix eQTL was chosen because of its efficient computational time with large datasets [35] and its proven efficiency in many large studies [22,23]. We performed cis eQTL analysis, whereby we only tested miRNA and gene pairs that were within 1 MB of each other. To test for associations between the miRNA and genes, we used a linear regression model provided by Matrix eQTL. To correct for multiple comparisons, we utilized a multi-test correction based on false discovery rate (FDR) and chose the significant eQTLs with a FDR cutoff at < 0.01 for further analysis.
Using the eQTLs identified in Matrix eQTL as seed nodes for an initial network, we searched for linked targets of the miRNAs to expand the network. In order to identify miRNA targets, we used TarBase to generate a list of the miRNAs and their corresponding gene targets [34]. Tar-Base is a manually curated database of all experimentally verified miRNA targets. In generating the list, we discovered that some miRNAs may have multiple designated names associated with the 3' untranslated region (UTR) and the 5' untranslated region. In this case, we used both variations to keep it consistent with our data. For example, if there was an ambiguous miRNA name used in the TarBase search, it was replaced by both the -3' and -5' version of the name.
We used the miRNA target genes gathered from Tar-Base, in combination with the perturbed genes from the eQTLs, to further extend the network using proteinprotein interactions. We chose the Disease Associated Protein-Protein Link Evaluator (DAPPLE) [36] for this step of network expansion since it's been well evaluated in finding candidate genes through expanding networks on signals from genome wide association studies [36,38,39]. By using DAPPLE on the miRNA target genes and the miRNA perturbed genes in the eQTL associations, we were able to find protein-protein interactions for miRNA targets and the miRNA eQTL perturbed genes. To perform the link evaluation, DAPPLE uses a statistical model to extend a network from an input seed gene list. The protein-protein interactions were derived from the InWeb database [40].
At each network expansion step, DAPPLE selects a new edge from the existing node set with the highest probability based on its significance against random permutations of the protein-protein interaction network. During the permutations, the network topology is conserved. For our analysis, we performed 1000 random permutations of the network. As part of our selection criteria, we only selected the interactions that contained a seed gene with a corrected p-value of ≤ 0.05. This allowed us to keep those interactions with a high-confidence for further analysis. We then overlapped the selected protein-protein interactions, the eQTLs and the miRNA-target interactions to create an initial integrated network.

Network associations
In parallel with the network expansion, we used a twograph guided multitask Lasso (MtLasso2G) to perform a second eQTL analysis [30]. MtLasso2G is an orthologous approach capable of determining network associations. Unlike Matrix eQTL, which we used to perform linear regression for each pair of miRNA and gene independently for eQTL analysis, MtLasso2G uses a sparse learning model that incorporates the relationships among miRNAs and genes. Here, we used a correlation expression network among miRNAs, and another co-expressed network among genes that potentially captured the structure of coregulation networks. The MtLasso2G model accomplished eQTL analysis by adding two regularization terms based on the two graphs on features (i.e. miRNA expression) and labels (i.e. gene expression), which are encoded as miRNA co-expression network and gene co-expression network in our analysis respectively. The model can then be learned by minimizing the following objective function in a linear system: The two graphs, G 1 = (V 1 , E 1 ) and G 2 = (V 2 , E 2 ) represent the correlation networks among genes and miRNAs respectively. V 1 and V 2 are the nodes in G 1 and G 2 . E 1 and E 2 are the edges representing the correlations among the corresponding vertices. w(·) is the edge weight. Individual edges in G 1 and G 2 are represented as e m,l and e f,g respectively. The correlation between y m and y l is denoted as r m,l , and r f,g denotes the correlation between y f and y g . The learned b matrix from the model represents the association between each feature and label. Those nonzero terms stand for the identified eQTLs. As the MtLasso2G model captures the correlation structures among the miRNAs and genes, it produces not only a set of eQTLs, but also network associations. These network associations include a subnetwork capturing miRNA correlations, and another subnetwork representing the correlations among the perturbed genes.

Network integration
We finally created an integrated network by merging the expanded network extended using DAPPLE, and the network associations generated from the MtLasso2G model. Specifically, we added representations of the correlation graphs from MtLasso2G to the expanded network from DAPPLE as following. If one miRNA was present in the current integrated network, we added highly correlated miRNAs. If both genes were already present in the current integrated network, we added an edge to the network.
The edges between miRNAs and perturbed genes from network associations in MtLasso2G are added to the integrated network as well. By merging the networks generated from DAPPLE and MtLasso2G as stated above, we created the final integrated network which was then visualized using Cytoscape [41]. Integrating the networks in this manner allowed us to conserve many of the predetermined relationships found early on in the analysis.

Results and discussion
Our approach generates an integrated network (Figure 2) that incorporates the comprehensive relationship between miRNA and gene expression. We create this network from multiple sources of data and analysis including miRNA eQTL and associated genes, miRNAs and their target genes, protein-protein interactions among miRNA eQTL perturbed genes and miRNA target genes. This network thus captures the complexity of the interplay among miR-NAs and genes in ovarian cancer, and point to a list of candidate miRNAs and genes for future studies.
Our method is scalable and applicable to integrating many genomic datasets. Here, as a demonstration of our approach, we apply it to the miRNA and gene expression data in ovarian cancer from TCGA [37]. After preprocessing, we use the expression of 183 miRNAs, and the expression of 13,536 genes from 480 samples.
We identified 44 miRNA eQTLs with a FDR < 0.01 using Matrix eQTL. There were 16 unique miRNAs, and 44 unique genes represented in the eQTLs. We found 310 unique target genes using TarBase, of which 244 could be used as inputs for DAPPLE. Figure 3 shows the miRNA-taget and eQTL network. There were 236 direct connections found using DAPPLE, and 145 that contained a seed gene with a corrected p-value of <0.05. We added 9 correlated miRNAs, and 18 correlated gene edges to the network based on the MtLas-so2G correlation graph.
With MtLasso2G, we found 48 eQTLs, and were able to add 8 of them to the network, composed of the miRNA-target interactions found in TarBase, the eQTLs from Matrix eQTL, and the MtLasso2G correlations. This process created the final integrated network including 167 nodes and 277 edges shown in Figure 2. Table 1 summarizes the number of cancer genes found in each part of the analysis. Of the 145 gene nodes in this integrated network, 26 were found to be cancer related, and 11 of those were associated with ovarian cancer [42,43], as illustrated in Table 2.
One of the genes found in the integrated network, AKT 1, has been implicated in tumorigenesis, specifically in ovarian cancer [44,45]. In our analysis AT K1 was found in the DAPPLE extended network, the miRNAtarget network and the MtLasso2G gene-correlation graph. Figure 4 shows a subnetwork involving AKT 1 of our integrated network. This subnetwork involves 5 miR-NAs and 12 genes, connected by 31 edges. Three of the edges are eQTLs, and six edges were miRNA-target Figure 3 The miRNA-target and eQTL network. This network contains the eQTLs as well as the target genes from TarBase for the miRNAs. The initial integrated network is formed when the protein-protein interactions for the genes in this network are added.
interactions. Another gene in the subnetwork was ESR1, which has been associated with breast cancer, and ESR1 expression has been suggested as a predictor of ovarian cancer survival [46][47][48].
Interestingly, in addition to capturing known ovarian cancer genes, this subnetwork provides a set of miRNAs and genes interacting with these known cancer genes. We propose that these interactive miRNAs and genes offer a new list of potential miRNA and gene markers for tumorigenesis, particularly in ovarian cancer.
Another interesting gene in the integrated network is E2F 1 (Figure 2), which was targeted by three miRNAs. E2F 1 has been studied as a predictor of drug resistance in ovarian cancer, and the down regulation of E2F 1 has been shown to inhibit growth of ovarian cancer [49,50]. Therefore, E2F 1 and their interactive partners are worthy of follow up studies for tumorigenesis as well.
We then analyzed the role of the miRNA eQTLs in different diseases including cancer and particularly ovarian cancer. Using multiple datasets (HMDD2, miRCancer, Phenomir, miRdSNP, miR2Disease, miRPD and Marchini) [51][52][53][54][55][56][57], we found that 19 of the 22 miRNAs have been implicated in some type of disease. Of these miRNAs, 14 were associated with a form of cancer, and 12 were specifically associated with ovarian cancer. Table 3 lists these cancer and ovarian cancer related miRNAs in our integrated network. Some of these miRNAs have demonstrated biological and clinical utility and could serve as disease biomarkers. For example, increased expression of hsa-miR-200c, one of the miRNAs found in our analysis, has been associated with increased survival, decreased chance of relapse and increased sensitivity to paclitaxel, a chemotherapy drug [57][58][59]. hsa-mir-22 has been identified as an marker for both survival and recurrence of ovarian cancer [60]. hsa-mir-22 has also been suggested as an inhibitor of cell movement and invasion in ovarian cancer [61]. Both hsa-mir-22 and hsa-mir-200c are found in the subnetwork, with hsa-mir-22 targeting ESR1.   Figure 4 A subnetwork illustrating multiple data sources for network construction. This subnetwork illustrates how the multiple data sources are involved in the integrated network. If the network just contained the eQTLs in green, and the miRNA-target interactions in red, most of the connections would not exist. Adding the protein-protein interactions of the genes, allows us to visualize potential downstream effects.

Conclusions
Changes in miRNA and mRNA expression are known to be involved in both ovarian cancer development and progression. Pin-pointing the exact changes and the relationships that occur between them could lead to advances in how ovarian cancer is treated and diagnosed. Creating an integrated network involving eQTLs, miRNA targets, protein-protein interactions and correlation graphs is one way to explore these relationships. Integrating multiple data sources allows us to create a wider and more holistic view of the interactions in ovarian cancer. This can lead us to understanding more about the underlying mechanisms that affect progression in ovarian cancer. Therefore, in this paper we developed a new method of constructing an integrated network by combining the strength of eQTL study and network analysis. As a demonstration, we applied our method to TCGA ovarian cancer data and constructed an integrated network of miRNA and gene interactions in ovarian cancer. Utilizing disparate data sources allowed us to find relationships that would not have been visible otherwise. The network information we incorporated included protein-protein interactions, eQTL associations and miRNA-target interactions. None of those interactions alone would have shown all of the connections, but combining all these network information together allows us to see the complex interplay between miRNAs and genes in ovarian cancer. Our integrated analysis found miRNAs and genes linked to cancer and specifically to ovarian cancer. Utilizing integrated approaches like this one, could lead to a more comprehensive understanding of the biology that drives ovarian cancer.
In this study we used multiple data sources including one miRNA target database (i.e. TarBase), a proteinprotein interaction network (i.e. InWeb) and expression data from the TCGA data portal to construct an integrated network. Based on recent studies [15,17], it is known that miRNAs and genetic variation have a critical impact on cancer progression and to an extent, cancer phenotype. Our approach allows us to incorporate various data sources into an integrated network. The generated network contained miRNA to target gene relationships, eQTL associations, protein-protein interactions, the correlations or coexpressions among microRNAs and genes respectively.
In the future, we will extend our approach by incorporating multiple databases of a particular network type, such as the STRING database [62] for protein-protein interaction networks. We will also incorporate other types of network information. For example, we can include regulatory and signaling pathways accumulated in existing pathway databases. Using various data sources, we can exploit the relationships between different molecular components to help understand how ovarian cancer progresses. Although ovarian cancer is a very complex disease, our approach is a leading step for future investigative methods.