 Research
 Open Access
 Published:
MicroRNA dysregulational synergistic network: discovering microRNA dysregulatory modules across subtypes in nonsmall cell lung cancers
BMC Bioinformatics volume 19, Article number: 504 (2018)
Abstract
Background
The majority of cancerrelated deaths are due to lung cancer, and there is a need for reliable diagnostic biomarkers to predict stages in nonsmall cell lung cancer cases. Recently, microRNAs were found to have potential as both biomarkers and therapeutic targets for lung cancer. However, some of the microRNA’s functions are unknown, and their roles in cancer stage progression have been mostly undiscovered in this clinically and genetically heterogeneous disease. As evidence suggests that microRNA dysregulations are implicated in many diseases, it is essential to consider the changes in microRNAtarget regulation across different lung cancer subtypes.
Results
We proposed a pipeline to identify microRNA synergistic modules with similar dysregulation patterns across multiple subtypes by constructing the MicroRNA Dysregulational Synergistic Network. From the network, we extracted microRNA modules and incorporated them as prior knowledge to the Sparse Group Lasso classifier. This leads to a more relevant selection of microRNA biomarkers, thereby improving the cancer stage classification accuracy. We applied our method to the TCGA Lung Adenocarcinoma and the Lung Squamous Cell Carcinoma datasets. In crossvalidation tests, the area under ROC curve rate for the cancer stages prediction has increased considerably when incorporating the learned microRNA dysregulation modules. The extracted modules from multiple independent subtypes differential analyses were found to have high agreement with microRNA family annotations, and they can also be used to identify mutual biomarkers between different subtypes. Among the topranked candidate microRNAs selected by the model, 87% were reported to be related to Lung Adenocarcinoma. The overall result demonstrates that clustering microRNAs from the dysregulation pattern between microRNAs and their targets leads to biomarkers with high precision and recall rate to known differentially expressed diseaseassociated microRNAs.
Conclusions
The results indicated that our method improves microRNA biomarker selection by detecting similar microRNA dysregulational synergistic patterns across the multiple subtypes. Since microRNAtarget dysregulations are implicated in many cancers, we believe this tool can have broad applications for discovery of novel microRNA biomarkers in heterogeneous cancer diseases.
Background
Lung cancer accounts for more than 1.5 million deaths globally per year and is the leading cause of cancerrelated mortality. About 87% of the lung cancer cases are classified as NonSmall Cell Lung Cancer, and the 5year survival rate of all stages is below 17% because the majority of lung cancer patients (57%) are diagnosed at later stages because early disease is typically asymptomatic [1]. Even when diagnosed early, the only recommended treatment is surgical resection, despite that up to 30% of those successfully treated will still die within five years of initial diagnosis [2]. Therefore, the development of early diagnosis and treatment strategy is critical and essential for the control of this deadly disease. Recently, it has been found that microRNAs have the potential as both biomarkers and therapeutic targets for lung cancer [3, 4].
MicroRNAs (miRNAs) are a recently discovered class of small noncoding RNA. Approximately 22nt, miRNAs posttranscriptionally target messengerRNAs (mRNAs) to regulate the translation of target genes. They have been found to play a critical role in various biological functions such as proliferation, differentiation, and apoptosis [5]. Thus, abnormal miRNA regulatory events can cause a significant impact on various cellular functions, ultimately resulting in complex events leading to cancer. Increasing evidence suggests that miRNAs can have a causal role in tumorigenesis [6].
Due to the significant role of miRNAs found in cancer biology, many existing lung cancer studies use miRNA expression profiles for accurate prediction of lung cancer stages or subtypes [7, 8]. In a typical differential expression analysis, a univariate statistical method (e.g., student’s ttest, false discovery rate threshold) is performed to select miRNAs with a significant deviation between normal and tumor sample groups. However, the results are not always satisfactory, as largescale multiomics analysis of nonsmall cell lung adenocarcinoma (LUAD) revealed distinct interactions of miRNA to target mRNA that are specific to histological subtypes [9]. In other words, an identified miRNA biomarker may correctly classify tumor based on analyses done on one particular subtype but may misclassify cases of other subtypes, where it may target a different set of mRNAs. Therefore, for a more robust selection of miRNA biomarker, analysis of the deviation in miRNAtarget interactions between various lung cancer subtypes should be considered to assess their potential as predictor to this heterogeneous disease.
Experimental evidence has shown that multiple miRNAs can potentially target a gene through synergism, in which two or more miRNAs can cooperatively coregulate an individual gene [10]. Studying the synergism of miRNAs within a specific cellular environment is another critical step to determine their diseasespecific functions at the system level. Construction of the miRNA coregulation network by considering regulatory targets with similar functions [11] revealed a miRNAmiRNA functional synergistic network; however, the study of the changes in miRNAtarget interactions between different cancer subtypes has mainly left uncovered.
To further our understanding of the role of miRNAs in lung cancers, we aim to identify differentially expressed miRNAs while considering miRNAtarget dysregulations among different cancer subtypes. We extended the brilliant miRNAtarget dysregulation idea from Xu et al. [12] and proposed a novel miRNA clustering strategy to identify miRNA dysregulatory modules. We hypothesize that by identifying the contextspecific group structures among the miRNAs, the differential analysis procedure can benefit from a more robust selection of miRNA biomarkers that can accurately predict cancer stages across different subtypes.
Methods
Dataset and notations
We denote the miRNA and mRNA expression profiles as column vectors \(\mathbf {x_{i}}=\left [x_{i}^{1}, x_{i}^{2},\ldots,x_{i}^{s}\right ]^{\top }\) and \(\mathbf {y_{j}}=\left [y_{j}^{1}, y_{j}^{2},\ldots,y_{j}^{s}\right ]^{\top }\) to represent the expression level of miRNA i and mRNA j across s samples, respectively. To represent miRNA and mRNA expressions for a specific group of samples, we denote column vectors \(\mathbf {x_{i}^{C}}=\left [x_{i}^{1}, x_{i}^{2}, \ldots, x_{i}^{n_{C}}\right ]^{\top }\) and \(\mathbf {y_{j}^{C}}=\left [y_{j}^{1}, y_{j}^{2}, \ldots, y_{j}^{n_{C}}\right ]^{\top }\), respectively, where n_{C} is the number of samples attributed with a particular phenotype C, e.g., normal, stage I cancer, stage II cancer, etc. Note, boldface variables are to represent vectors and nonboldface for scalars. Also, for expression data, we use subscripts to identify a specific miRNA or mRNA expression level, and superscripts to identify a sample group.
Identification of miRNA biomarkers for lung cancer
As an overview of our pipeline, illustrated in Fig. 1, we developed a novel approach to identify miRNA dysregulation modules by detecting changes in miRNAtarget associations between different cancer subtypes. First, we identify significant deviations in miRNAtarget correlations between two sample groups. For each miRNAtarget pair found significantly deviated, we form a connection to build a miRNAtarget dysregulation association matrix. From the identified miRNAtarget dysregulations, miRNA modules are extracted such that functionally similar miRNAs belong in the same module if they dysregulate similar targets across multiple cancer subtypes. To accomplish this, a miRNAmiRNA Dysregulational Synergism Network (MDSN) is constructed, and a graph partitioning method is applied to identify significant miRNA modules. At the final step, classification analysis predicts cancer stage and selects relevant biomarkers only from miRNA expression profile data. A Sparse Group Lasso regularization is applied with the intuition that if a miRNA is relevant, the rest of miRNAs in the same module are probably also relevant.
Step 1: Identifying miRNAtarget dysregulations between subtypes
For every putative miRNAtarget pairs, we incorporated samplematched miRNA expression and mRNA expression data from distinct sample groups to identify aberrant miRNAtarget interactions. More specifically, the aim is to find regulatory changes by differential analysis of the miRNAtarget pair’s correlation values between two sample groups of different lung cancer subtypes. This Dysregulation criterion was proposed by Xu et al. [12], which defines the difference of the Pearson’s correlations between a tumor and a nontumor group for miRNA i and target j as:
where \(\sigma _{x_{i}^{A}}\) and \(\sigma _{x_{i}^{B}}\) denote the standard deviation of miRNA i expressions of sample groups A and B, respectively. To determine whether the deviation of the correlation between the two groups is significant, Xu et al. randomly assigned patients to the two groups and recalculated Dys 10,000 times, and obtained a pvalue by the frequency of the random Dys being higher than the actual Dys.
To improve the computational performance of obtaining a significance value for the deviation between two correlation coefficients, we instead applied Fisher’s transformation [13] as utilized in our previous publication [14]. To summarize, for a given miRNA i and target j, we calculated the two Pearson’s correlation values r_{A} and r_{B} from each sample group then obtained their corresponding zvalues z_{A} and z_{B} through Fisher’s transformation \(z=\frac {1}{2}\ln \left (\frac {1+r}{1r}\right)\). The zvalue for the difference between z_{A} and z_{B} is obtained by
Finally, we can convert the absolute value of z_{AB} to a pvalue (twotailed) and thereby obtain a statistical significance of the difference between two miRNAtarget correlations. The cutoff for the pvalue threshold was chosen at 0.001, as it has been commonly used as a threshold in several correlation studies.
Step 2: Building the miRNAtarget dysregulation association matrix
One primary function of miRNAs is the cleavage of the transcript of its target gene to regulate gene expression. Thus, in the task of identifying aberrant miRNAtarget interactions, the inverse correlation should be a prerequisite for candidate miRNA and target pairs to avoid falsepositives. In other words, only miRNAtarget pairs which have a negative Pearson’s correlation in at least one of the sample groups, A or B, were considered.
Furthermore, since the primary goal of this study is to discover novel miRNA biomarkers to help understand cancer stage progression, it is essential to consider as many miRNAs as possible. In this study, the miRNAtarget relationship prediction algorithms, e.g., TargetScan 7.1 [15] and miRanda [16], were not utilized as the interaction databases only covered a total of 263 miRNAs out of 1881 miRNAs present in the miRNA expression profiles.
For each putative miRNA i and target j considered, we repeated the dysregulation analysis procedure in Step 1 between all pairs of different lung cancer subtypes as independent dysregulation analyses. Then, all miRNAtarget dysregulations found significant were encoded by constructing a matrix A with entry A_{ij} equal to 1 if the pvalue of the miRNA i and target j dysregulation passes the pvalue threshold and 0 otherwise. For each independent dysregulation analyses, the matrix A is concatenated. This matrix is interpreted as a new feature set, where each row characterizes a miRNA’s dysregulation targets that were present across multiple cancer subtypes dysregulation analyses.
Step 3: Calculating miRNAmiRNA dysregulation functional similarity
As it has been reported, miRNAs that are functionally similar tend to have the same targets. Using the identified miRNAtarget dysregulations, we inferred the contextspecific functional similarity between two miRNAs by considering their mutual dysregulated targets. The functional similarity score between two miRNAs p and q is calculated by cosine similarity, defined as
where A_{i.} is a row vector indicating the dysregulated targets of miRNA i. The cosine similarity value ranges [0,1] and can be interpreted as the number of mutual dysregulation targets shared between two miRNAs normalized by their total connections. By calculating the similarity between every miRNAmiRNA pairs, an adjacency matrix is produced to construct a miRNAmiRNA similarity network. Since it is difficult to uncover cluster structures when the network is dense, it is necessary to prune the weaker miRNAmiRNA connections.
Step 4: Constructing the MDSN and pruning with scalefree thresholding
The scalefree topology property exists in most biological graphs, including miRNAs [17], which indicates that the miRNAmiRNA network connections follow a powerlaw distribution in which more miRNAs tend to have fewer neighbors and fewer miRNAs tend to have more neighbors. A wellknown framework, Weighted Gene Coexpression Network Analysis (WGCNA) is utilized to prune lower weight edges with a threshold chosen such that the graph’s scalefree property still holds while preserving as many edges as possible.
After all miRNAmiRNA pairs’ cosine similarity scores are computed, they are used as edge weights in the MDSN. This is constructed by an adjacency matrix M with entries M_{pq}=s(p,q) for all miRNAs p, g. Similar to the approach used in most biological networks, the miRNA node degrees is expected to exhibit a scalefree distribution under some thresholding. We applied the hardthresholding technique in WGCNA [18] by removing from the network any edge with weight lower than the threshold, which was chosen to be the least stringent threshold such that the degree distribution maintains a desirable powerlaw fitting score.
Step 5: Identifying miRNA dysregulation modules with community detection
After pruning of the MDSN, we utilized the graph partitioning approach to extract miRNA modules by assigning miRNA nodes into communities using a modularity objective proposed in the Louvain method [19]. Using a fast greedy iterative procedure, the Louvain method assigns nodes into communities by optimization of the modularity objective, which measures the density of links inside communities compared to links between communities.
To summarize the algorithm, initially, each node is assigned to its own community. At the first phase, node i consider each of its neighbor j and evaluate the gain of modularity if i is placed in j’s community, and then selects the neighbor j with the maximum modality gain. This first phase repeats iteratively until convergence. The algorithm then alternates to the second phase to build a new network whose nodes are the newly formed communities found in the first phase. The first and second phase are repeated iteratively until there is only one community that includes all nodes. In the final result, the algorithm gives a hierarchical community structure of all nodes in the MDSN network. The partition in this dendrogram with the highest modularity value by the Louvain algorithm is selected as the miRNA modules assignment.
Step 6: Classification of cancer stage with identified miRNA modules
It is known that a classifier with ℓ 1 norm regularization is typically used for feature selection in problems with "small n, large p." However, for problems known to have grouped features, adding group information as prior knowledge can improve feature selection and classification performance. We applied a multiclass logistic classifier with Sparse Group Lasso (SGL) with the intuition that if a miRNA predictor to cancer stage is found relevant, other miRNAs in the same group are also likely relevant since they share similar dysregulation targets across the cancer subtypes.
SGL is a linear logistic classifier with combined ℓ 1 and Group Lasso ℓ 2 norm regularization to achieve a sparse solution at both the group and within group level [20]. We used an indicator vector c_{i}∈{0,1}^{k} to represent the i^{th} sample’s reported cancer stage. In this study, k is 5, indicating whether a sample is labeled as normal, stage I, II, III, or IV. The objective function is as follows:
where λ is the sparsity coefficient, α is the mixing coefficient between ℓ 1 and Group Lasso ℓ 2 norm, which is defined as:
where g is the size of the group. The Python package pylearnparsimony was used to train the logistic regression classifier with SGL regularization.
Result
Applications in TCGA nonsmall cell lung adenocarcinoma dataset
We downloaded miRNA and mRNA expression data of the LUAD cohort from The Cancer Genome Atlas (TCGA) [9], utilizing the TCGAAssembler tool [21]. Expression quantitation of miRNAs was calculated from the BCGSC miRNA profiling pipeline. The mRNA expression profiles were obtained using Illumina HiSeq RNASeq (v2). The Read Per Million miRNA Mapped (RPKM) values were log2 transformed and scaled to zeromean and standard deviation. In total, there were 1881 miRNA expressions and 20,484 mRNA expressions profiled. The sample size characteristics of LUAD subjects are shown in Table 1.
Identified miRNAtarget dysregulations between LUAD subtypes
We identified significant dysregulations for every miRNAtarget pair between 1881 miRNAs and 20,484 mRNAs. Each miRNAtarget pair is tested for significant change in correlations between different subtype sample groups. Due to insufficient sample size in some subtypes, only four histological LUAD subtypes were selected for subtypes dysregulation analysis, as outlined in Table 1. To build the miRNAtarget dysregulation matrix, we performed an independent dysregulation analysis for each pairwise combination of the four subtypes.
Setting the pvalue threshold parameter at p<0.001, we obtained a sum of 1,896,631 miRNAtarget dysregulations from a union of six independent dysregulation analyses for the Acinar, Bronchioloalveolar, Colloid, and Papillary subtypes. In other words, we identified miRNAtarget dysregulations between Acinar vs. Bronchioloalveolar, Bronchioloalveolar vs. Colloid, Acinar vs. Colloid, and so on. Since it is very likely that falsepositives exist among the identified miRNAtarget dysregulations, we accounted for this by careful selection of the threshold parameter to prune weaker miRNA synergism similarities.
Selection of threshold parameter for the scalefree topology of MDSN for LUAD cohort
After identifying miRNAtarget dysregulations among the lung cancer subtypes, we computed the miRNAmiRNA cosine similarity score for every pair of miRNAs to construct the MDSN. For every pair of the 1314 miRNAs (found dysregulated), we computed a total of 754,086 cosine similarity scores. The power law fitting score [18] is defined as corr(log_{10}(s),log_{10}(p(s)))^{2} where s is the similarity scores and the distribution p(s) is modeled by a histogram of binned data samples. The R^{2} score computed over all miRNAmiRNA pairs was 0.9135, which satisfies the R^{2}>0.8 criterion and indicates the network has a scalefree topology. The similarity score power parameter was kept at β=1.
Next, we proceeded to select a hardthreshold parameter to prune edges from the MDSN with a tradeoff between maximizing the scalefree topology fit score and maintaining information in the network for modules discovery. The tradeoff can be visualized in Fig. 2a. We selected the threshold at 0.55, where the scalefree topology score is above 0.8, and pruned all edges which have cosine similarity score lower than 0.55. After edge pruning, the number of nonisolate miRNA nodes remaining in the MDSN was 423. From the reduced MDSN network, we applied the Louvain community detection method to identify miRNA modules, and the assignment of miRNAs to the module is indicated by color as shown in Fig. 3.
Applications in the TCGA lung squamous cell carcinoma dataset
We also obtained matched miRNA and mRNA expression profiles from the TCGA Lung Squamous Cell Carcinoma (LUSC) cohort [22]. The preprocessing procedure of miRNA and mRNA expression profiles are the same as in the LUAD cohort. An overview of the sample sizes and clinical characteristics is summarized in Table 2. According to the clinical data compiled by TCGAAssembler [21], only less than 20 samples had a histologic subtype labeled, and the majority of samples were labeled as Not Otherwise Specified. Thus, we could not perform the miRNAtarget dysregulation analyses from the provided LUSC histological subtypes information due to the insufficient sample size of labeled data.
One reason for this issue is that it has been known the lung squamous cell carcinoma is clinically and genetically heterogeneous, and it is challenging to substratify this heterogeneity. However, a study by Wilkerson et al. [23] discovered reproducible and clinically significant LUSC subtypes that can be predicted from the mRNA expression profiles. A representative expression profile for each of the four subtypes, Primitive, Classical, Basal, and Secretory, were summarized by a cluster centroid consisting of 196 genes. Using the cluster centroids representing the four LUSC subtypes, we performed subtype prediction for all LUSC samples using the nearestcentroid classification algorithm proposed in [24].
Identified miRNAtarget dysregulations between LUSC subtypes
After the subtype prediction of the LUSC samples were obtained, we tested for significant dysregulation for every miRNAtarget pair between 1870 miRNAs and 20,472 mRNAs. Six independent dysregulation analyses were performed for every pairwise combination of the four subtypes, e.g., Primitive vs. Classical, Basal vs. Secretory, Primitive vs. Basal, and so on. A union of the six analyses revealed a sum of 1,560,419 miRNAtarget dysregulations found at the pvalue cutoff of 0.001.
Selection of threshold parameter for the scalefree topology of the MDSN for LUSC cohort
For every pair of the 1490 miRNAs found with dysregulation patterns across multiple LUSC subtypes, we computed a total of 754,086 cosine similarity scores. Similar to the procedure applied to the network in LUAD cohort, we selected the edgeprune threshold at 0.50, where the scalefree topology criterion R^{2} score is higher than 0.8, shown in Fig. 2b. The number nonisolate miRNA nodes that remained in the MDSN is 391.
Extracted miRNA modules are consistent between independent subtypes dysregulation analyses
To evaluate the consistency of the extracted miRNA modules resulting from independent differential analyses, we compared the miRNA module assignments between different pairwise subtypes dysregulation analyses, combined analyses of all subtypes, normaltumor dysregulation analysis, and miRNA family information. The score which measures the agreement between two clustering assignments is the Normalized Mutual Information (NMI) metric. As shown in Fig. 4, the extracted miRNA modules showed agreement in some of the independent subtypes dysregulation analyses for both LUAD and LUSC cohorts. For example, in Fig. 4a, after identifying dysregulations between "Bronchio vs. Colloid" subtypes and forming the MDSN, the extracted miRNA modules have a similar clusters structure to that of the modules extracted in "Acinar vs. Colloid." This may indicate the same groups of miRNA are dysregulated in the Acinar, Bronchioloalveolar, and Colloid subtypes. Similarly in the LUSC cohort shown in Fig. 4b, extracted miRNA modules identified from "Classical vs. Primitive" are highly similar to those from "Basal vs. Primitive," indicating the same groups of miRNA are dysregulated in these three subtypes. Notably, "tumor vs. normal" miRNA modules were not similar to any of the subtypes dysregulation analyses.
Incorporating miRNA modules information improves prediction of LUAD lung cancer stage
We applied the logistic classifier with SGL using the extracted miRNA modules as prior information to the Sparse Group Lasso regularization. Using a onevsrest scheme for multiclass classification, SGL classifies between normal, stage I, stage II, stage III, and stage IV samples, with numbers of samples corresponding to the first column of Table 1. We empirically set the sparsity parameters λ=1.0 and α=0.5 that were found to give the best prediction performance from 5fold crossvalidation tests.
To assess whether adding miRNA clusters information improves stage prediction performance, we compared crossvalidation scores between SGL and a logistic regression classifier with only ℓ 1 regularization. With each classifier, we computed the area under the ROC curve rates for each stage from a traintest split of 20%, as shown in Fig. 5.
MicroRNA groups lead to higher recall and precision of candidate miRNA biomarkers
To validate whether the extracted miRNA modules aid the SGL classifier in selecting relevant miRNA biomarkers, we investigated how many of candidate miRNA biomarkers selected are known LUADassociated miRNAs. We utilized a benchmark database of differentially expressed LUAD miRNAs from the dbDEMC [25]. Last updated June 2014 as of this writing, the dbDEMC contains 545 miRNAs reported by highthroughput experiments to be differentially expressed in LUAD. In a normal vs. tumor binary classification experiment using SGL which incorporates the extracted miRNA modules, we showed high precision and recall rates of topranked candidate miRNAs to known differentially expressed LUAD miRNAs from the dbDEMC database in Fig. 6.
Discussion
In this study, we integrated paired miRNA and mRNA expression data to detect aberrant miRNAtarget interactions between lung cancer subtypes to discover novel miRNA biomarkers to predict lung cancer stages. We have developed an efficient method to identify dysregulations among millions of potential regulatory relationships between 1,881 miRNAs and more than 20,000 mRNAs across multiple lung cancer subtypes. Among all the regulatory relationships considered, 4.9% of the miRNAtarget pairs were found to have aberrant behavior across the different subtypes of the lung cancer diseases. Since the LUAD and LUSC are clinically and genetically heterogeneous diseases, utilizing this information would provide a glimpse into the miRNAs’ role in cancer pathogenesis in some specific lung cancer subtypes. This was apparent in Fig. 4, where it is apparent that some specific lung cancer subtypes possessed similar groups of dysregulated miRNA modules across multiple independent subtypes dysregulation analyses. For instance, note that the Primitive subtype in LUSC has high NMI values between the Secretory vs. Primitive, Classical vs. Primitive, and Basal vs. Primitive analyses. This indicates that in the Primitive subtype samples, there are possibly a few groups of miRNAs that have a consistent set of dysregulated targets, exclusive to all other LUSC subtypes. It would be interesting to report an analysis on such group of miRNAtarget dysregulations in this Primitive subtype, which coincidently has the worst survival outcome (p<0.05) than the other three subtypes [23]. Such an observation may not be apparent with only a normal vs. tumor differential analysis, as it is shown in Fig. 4 where the NMI values are near zero in the normal vs. tumor dysregulation analysis compared to all other subtypes dysregulation analyses.
Despite that a growing number of miRNAs have been rigorously studied, the functions of most miRNAs are still unknown. Furthermore, only a small fraction of miRNAs were considered in the target prediction algorithms that provide a database of putative miRNAmRNA relationships. By considering all potential miRNAs and their targets, our method can be used for novel miRNA functions discovery. However, a primary concern of this task is that selection of various thresholding hyperparameters may produce unstable results. We performed the miRNAtarget dysregulation analysis with varying pvalue threshold at 0.01 and 0.001 and found similar patterns in the NMI similarity comparison from extracted miRNA modules in Fig. 4. Furthermore, all subtypes dysregulation analyses showed high NMI similarity with the miRNA family assignments without having incorporated this prior knowledge. This implies that despite possible falsepositives in identifying miRNAtarget dysregulations, the pruned MDSN can still be an excellent tool to reveal miRNAmiRNA functional synergism when inferring novel miRNA functions.
Conclusions
By utilizing a dysregulation metric that allows for analysis of multiple cancer subtypes, we proposed a pipeline to cluster miRNAs with high functional synergism. The extracted miRNA modules, when applied to grouped feature selection, can improve phenotype prediction and result in biomarkers with high precision and recall rate to known LUADassociated miRNAs. Furthermore, the predicted miRNA modules extracted from different subtype analyses can be used to reveal common miRNA dysregulations across multiple subtypes in heterogeneous cancer types. Since miRNAtarget dysregulations are implicated in many cancers, where multimodal differential analyses between multiple cancer subtypes have mainly left undiscovered, we believe this tool can have broad applications in the development of new diagnosis and treatment strategies.
Abbreviations
 AUC:

Area under the curve
 Dys:

Dysregulation
 FDR:

False discovery rate
 LUAD:

Lung adenocarcinoma
 LUSC:

Lung squamous cell carcinoma
 MDSN:

MicroRNA dysregulational synergistic network
 miRNA:

microRNA
 mRNA:

messengerRNA
 NMI:

Normalized mutual information
 ROC:

Receiver operating characteristic
 RPKM:

Reads per million mapped
 SCC:

Squamous cell carcinoma
 SGL:

Sparse group lasso
 TCGA:

The cancer genome atlas
 WGCNA:

Weighted gene coexpression network analysis
References
 1
DeSantis CE, Lin CC, Mariotto AB, Siegel RL, Stein KD, Kramer JL, Alteri R, Robbins AS, Jemal A. Cancer treatment and survivorship statistics, 2014. CA Cancer J Clin. 2014; 64(4):252–71.
 2
Chansky K, Sculier JP, Crowley JJ, Giroux D, Van Meerbeeck J, Goldstraw P. The international association for the study of lung cancer staging project: prognostic factors and pathologic tnm stage in surgically managed nonsmall cell lung cancer. J Thorac Oncol. 2009; 4(7):792–801.
 3
Inamura K, Ishikawa Y. MicroRNA In Lung Cancer: Novel Biomarkers and Potential Tools for Treatment. J Clin Med. 2016; 5(3):36.
 4
Wiggins JF, Ruffino L, Kelnar K, Omotola M, Patrawala L, Brown D, Bader AG. Development of a lung cancer therapeutic based on the tumor suppressor microrna34. Cancer Res. 2010; 70(14):5923–30.
 5
Calin GA, Croce CM. Microrna signatures in human cancers. Nat Rev Cancer. 2006; 6(11):857–66.
 6
EsquelaKerscher A, Slack FJ. Oncomirs—micrornas with a role in cancer. Nature Reviews Cancer. 2006; 6(4):259.
 7
Bishop JA, Bishop JA, Benjamin H, Benjamin H, Cholakh H, Cholakh H, Chajut A, Chajut A, Clark DP, Clark DP, Westra WH, Westra WH. Accurate Classification of NonSmall Cell Lung Carcinoma Using a Novel MicroRNABased Approach. Clin Cancer Res. 2010; 16(2):610–9.
 8
Saito M, Schetter AJ, Mollerup S, Kohno T, Skaug V, Bowman ED, Mathe EA, Takenoshita S, Yokota J, Haugen A, Harris CC. The Association of MicroRNA Expression with Prognosis and Progression in EarlyStage, NonSmall Cell Lung Adenocarcinoma: A Retrospective Analysis of Three Cohorts. Clin Cancer Res. 2011; 17(7):1875–82.
 9
Network CGAR, et al. Comprehensive molecular profiling of lung adenocarcinoma. Nature. 2014; 511(7511):543–50.
 10
Wu S, Huang S, Ding J, Zhao Y, Liang L, Liu T, Zhan R, He X. Multiple micrornas modulate p21cip1/waf1 expression by directly targeting its 3 untranslated region. Oncogene. 2010; 29(15):2302.
 11
Xu J, Li CX, Li YS, Lv JY, Ma Y, Shao TT, Xu LD, Wang YY, Du L, Zhang YP, et al. Mirna–mirna synergistic network: construction via coregulating functional modules and disease mirna topological features. Nucleic Acids Res. 2010; 39(3):825–36.
 12
Xu J, Li CX, Lv JY, Li YS, Xiao Y, Shao TT, Huo X, Li X, Zou Y, Han QL, et al. Prioritizing candidate disease mirnas by topological features in the mirna target–dysregulated network: Case study of prostate cancer. Mol Cancer Ther. 2011; 10(10):1857–66.
 13
Davidson GS, Wylie BN, Boyack KW. Cluster stability and the use of noise in interpretation of clustering. In: Infovis. San Diego: Proceedings of the IEEE Symposium on Information Visualization: 2001. p. 23–30.
 14
Tran N, Abhyankar V, Nguyen K, Ahmad I, Weidanz J, Gao J. Microrna dysregulational synergistic network: Learning contextspecific microrna dysregulations in lung cancer subtypes. In: Bioinformatics and Biomedicine (BIBM), 2017 IEEE International Conference On. Kansas City: IEEE: 2017. p. 142–5.
 15
Lewis BP, Shih Ih, JonesRhoades MW, Bartel DP, Burge CB. Prediction of mammalian microrna targets. Cell. 2003; 115(7):787–98.
 16
GriffithsJones S, Saini HK, van Dongen S, Enright AJ. mirbase: tools for microrna genomics. Nucleic Acids Res. 2007; 36(suppl_1):154–8.
 17
Zhang W, Zang J, Jing X, Sun Z, Yan W, Yang D, Guo F, Shen B. Identification of candidate mirna biomarkers from mirna regulatory network with application to prostate cancer. J Transl Med. 2014; 12(1):66.
 18
Zhang B, Horvath S, et al. A general framework for weighted gene coexpression network analysis. Stat Appl Genet Mol Biol. 2005; 4(1):1128.
 19
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech Theory Exp. 2008; 2008(10):10008.
 20
Simon N, Friedman J, Hastie T, Tibshirani R. A sparsegroup lasso. J Comput Graph Stat. 2013; 22(2):231–45.
 21
Zhu Y, Qiu P, Ji Y. Tcgaassembler: opensource software for retrieving and processing tcga data. Nat Methods. 2014; 11(6):599–600.
 22
Network CGAR, et al. Comprehensive genomic characterization of squamous cell lung cancers. Nature. 2012; 489(7417):519.
 23
Wilkerson MD, Yin X, Hoadley KA, Liu Y, Hayward MC, Cabanski CR, Muldrew KL, Miller CR, Randell SH, Socinski MA, et al. Lung squamous cell carcinoma mrna expression subtypes are reproducible, clinicallyimportant and correspond to different normal cell types. Clin Cancer Res. 2010; 16(19):4864–75.
 24
Hu Z, Fan C, Oh DS, Marron J, He X, Qaqish BF, Livasy C, Carey LA, Reynolds E, Dressler L, et al. The molecular portraits of breast tumors are conserved across microarray platforms. BMC Genom. 2006; 7(1):96.
 25
Yang Z, Ren F, Liu C, He S, Sun G, Gao Q, Yao L, Zhang Y, Miao R, Cao Y, Zhao Y, Zhong Y, Zhao H. dbDEMC: a database of differentially expressed miRNAs in human cancers. BMC Genom. 2010; 11(Suppl 4):5.
Acknowledgements
The authors thank the editor and the anonymous reviewers for their constructive comments to improve this work.
Funding
The authors are grateful for the GAANN support from the Department of Education under the grant P200A150192.
Availability of data and materials
The TCGA LUAD and LUSC data were obtained using the TCGAAssembler tool at http://www.compgenome.org/TCGAAssembler/.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 19 Supplement 20, 2018: Selected articles from the IEEE BIBM International Conference on Bioinformatics & Biomedicine (BIBM) 2017: bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume19supplement20.
Author information
Affiliations
Contributions
JG directed the study and experimental procedure, reviewed the manuscript, and supervised the project. NT conceived the idea, designed the coding work, performed the computational experiments, and drafted the manuscript. VA, KN, and JW guided the aim of study and interpreted early results of the project. All authors read and approved the final version of the manuscript.
Corresponding author
Correspondence to Jean Gao.
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Tran, N., Abhyankar, V., Nguyen, K. et al. MicroRNA dysregulational synergistic network: discovering microRNA dysregulatory modules across subtypes in nonsmall cell lung cancers. BMC Bioinformatics 19, 504 (2018). https://doi.org/10.1186/s1285901825360
Published:
Keywords
 microRNA dysregulation
 Differential analysis
 Biomarker discovery
 Scalefree network
 Synergistic module