Analyzing miRNA co-expression networks to explore TF-miRNA regulation
- Sanghamitra Bandyopadhyay^{1} and
- Malay Bhattacharyya^{1}Email author
DOI: 10.1186/1471-2105-10-163
© Bandyopadhyay and Bhattacharyya; licensee BioMed Central Ltd. 2009
Received: 14 January 2009
Accepted: 28 May 2009
Published: 28 May 2009
Abstract
Background
Current microRNA (miRNA) research in progress has engendered rapid accumulation of expression data evolving from microarray experiments. Such experiments are generally performed over different tissues belonging to a specific species of metazoan. For disease diagnosis, microarray probes are also prepared with tissues taken from similar organs of different candidates of an organism. Expression data of miRNAs are frequently mapped to co-expression networks to study the functions of miRNAs, their regulation on genes and to explore the complex regulatory network that might exist between Transcription Factors (TFs), genes and miRNAs. These directions of research relating miRNAs are still not fully explored, and therefore, construction of reliable and compatible methods for mining miRNA co-expression networks has become an emerging area. This paper introduces a novel method for mining the miRNA co-expression networks in order to obtain co-expressed miRNAs under the hypothesis that these might be regulated by common TFs.
Results
Three co-expression networks, configured from one patient-specific, one tissue-specific and a stem cell-based miRNA expression data, are studied for analyzing the proposed methodology. A novel compactness measure is introduced. The results establish the statistical significance of the sets of miRNAs evolved and the efficacy of the self-pruning phase employed by the proposed method. All these datasets yield similar network patterns and produce coherent groups of miRNAs. The existence of common TFs, regulating these groups of miRNAs, is empirically tested. The results found are very promising. A novel visual validation method is also proposed that reflects the homogeneity as well as statistical properties of the grouped miRNAs. This visual validation method provides a promising and statistically significant graphical tool for expression analysis.
Conclusion
A heuristic mining methodology that resembles a clustering motivation is proposed in this paper. However, there remains a basic difference between the mining method and a clustering approach. The heuristic approach can produce priority modules (PM) from an miRNA co-expression network, by employing a self-pruning phase, which are analyzed for statistical and biological significance. The mining algorithm minimizes the space/time complexity of the analysis, and also handles noise in the data. In addition, the mining method reveals promising results in the unsupervised analysis of TF-miRNA regulation.
Background
Throughout the last decade, much research was devoted to unearth the functionality of microRNAs (miRNAs), which are small (21–23 nt), non-coding RNAs regulating mRNA stability and translation through the action of the RNA-induced silencing complex (RISC) [1–3]. Earlier investigations [2, 4] have discovered that miRNAs regulate a variety of key biological functions that includes insulin secretion, apoptosis, cell proliferation and differentiation, etc. More importantly, recent beliefs hypothesize that miRNAs are indirectly responsible, due to disorders in functionality, for a number of diseases as they can dysregulate post-transcriptional gene expression [5]. Emerging evidences suggest that miRNAs regulate brain development, dendritic spine morphology, and neurite outgrowth, i.e., certain processes that are hypothesized to be associated with schizophrenia neuropathology. Moreover, they also have influencing activities in regulating the diseases like Tourette's syndrome, Fragile × syndrome [2], several varieties of cancers [4] and many others [5].
Microarray profiling is a high-throughput experimentation that can be used to study the expressibility/repressibility measure of thousands of genes in parallel [6, 7]. In the recent past, microarray data has been studied extensively for gene expression analysis leading to many methodological works. But the field of analyzing miRNA microarrays is not well-explored. The expression profiles of miRNAs derived from microarray experiments are most of the times tissue-specific in nature. In addition, miRNAs are sometimes taken for expression profiling from common tissues (by locality) of different patients for the purpose of disease diagnosis. Not surprisingly, due to the short length of miRNAs, the purity, variance and dimension of the microarray datasets of miRNAs are smaller than those of the genes. Thus, developing efficient methods that could shed light into the underlying biological activity of miRNAs is imperative, without depending on the methods developed for gene expression data [7–9].
A natural approach in microarray study is mapping the simultaneous overexpression/underexpression of miRNA pairs into a co-expression network. These co-expression networks are analyzed to study the functional enrichment and regulatory activities of miRNAs [10, 11]. However, the most important (and ignored) target remains in preparing the blueprint of the complex regulatory network that hypothetically exists between transcription factors (TFs), genes and miRNAs. Some of the earlier studies advocated that the miRNAs targeting the same gene together with a TF might be regulated by the same TF [12]. By exercising on the established knowledge in TRANSFAC database and microRNA registry, an earlier study was done on TF and miRNA regulation relating to prostate cancer cells [13]. A recent study pursues the same hypothesis adding that there are TF-miRNA pairs that participate in a complex recurring network and exert regulatory effects on each other [3]. But, these previous analyzes either follow supervised learning based on the established results available in the databases like TargetScan [14] and PicTar [15] or lack exhaustive empirical study. There exists an impressive number of works on clustering miRNA co-expression networks with various motivations like identification of the set of miRNAs derived from common primary transcripts [10], co-expression analysis between neighboring miRNAs [11], study of diseases [4], co-expression analysis of miRNA with mRNA [16], etc. Again, these approaches do not target the construction of TF-miRNA regulatory networks. Moreover, they employ clustering tools commonly used for gene expression analysis though, as mentioned earlier, the scalability and the other characteristics of miRNA expression data are somewhat different.
This paper introduces a novel unsupervised mining method that can heuristically self-prune a co-expression network constructed from miRNA profiled microarray data. The iterative mining methodology produces a set of priority modules (PM s) from the dataset. The statistical (and hypothetically the biological) significance of the PM s decreases as they are generated by stepwise reduction. The results show that the transcription factor binding sites (TFBSs) of the grouped miRNAs in the 5' untranscribed region (UR) have large common portions establishing the existence of commonly regulating TFs. In a recent work having similar goal, clustering of miRNAs was done based on their commonalties in loci [3]. Evidently, their defined putative upstream region (<10 kb) will contain a large number of common TFs for the clustered miRNAs. This was a kind of supervised approach, and from this viewpoint the mining process discussed here is a novel one of its kind. A schizophrenia patient-specific, a tissue-specific and a stem cell-based microarray dataset are comprehensively analyzed. The studies show that these datasets are useful to explore common TFs which might regulate a module of miRNAs. Such TF-miRNA regulation information might in turn accelerate the reconstruction of TF-miRNA regulatory networks.
A network (in general, a weighted undirected network) is often defined by the triplet (N, A, W), where N denotes a finite set of nodes {n_{1}, n_{2},..., n_{|N|}} (cardinality of the set N is represented as |N|), denotes a set of edges between the node pairs, and W: A → [0, ∞) is a weight function associated with the edges. Here, a network, = (N, A, W), is referred to as an miRNA co-expression network if the node set (N) corresponds to a set of miRNAs and W : A → [0, 1] denotes a co-expression function mapped from each miRNA pair in A.
In general, miRNA co-expression networks can be thought of as fuzzy complete graphs [17] by excluding the arcs having a co-expression value of zero. This transformation occurs by the mapping of miRNAs to the vertices and co-expression values to the fuzzy membership values. Thus a module identified in a fuzzy complete graph will evidently denote a set of miRNAs by such transformation. A recent study proposes an O(n^{2} log n) algorithm for identifying the largest dense N-vertexlet (a set of vertices of cardinality N, ), in a fuzzy scale-free graph [17]. The miRNA co-expression networks, being of this nature, could be mined step by step using a similar approach. For describing the proposed mining process that integrates this earlier work [17], the following theoretical details are given.
Definition 1 (Fuzzy Complete Graph) A fuzzy complete graph (FCG), = ( V , , Ω), is defined as a graph in which V denotes the set of vertices, denotes the set of fuzzy relations (v_{ i }, v_{ j }) (v_{ i } ≠ v_{ j }, ∀v_{ i }, v_{ j }∈ V) and Ω is a fuzzy membership function defined over the set such that Ω: → (0, 1].
In Eqn. (1), denotes the fuzzy membership value of the edge (v_{ i }, v_{ j }). This density definition computes the degree of participation of a single vertex with respect to a set of vertices. By putting the constraint of a lower bound to this density factor for every vertex within a group of vertices, the association density of an N-vertexlet is now defined as follows.
Suppose, an arbitrary association density value δ is given. If the association density of an N-vertexlet, , equals or exceeds δ, then is called a dense N-vertexlet with respect to δ and is denoted as (δ). The proposed method derives a set of modules comprising a set of vertices (corresponding to miRNAs here) which are equivalent to such dense N-vertexlets. Thus, the proposed method mines an FCG for identifying the dense N-vertexlets which are equivalent to finding modules in an miRNA co-expression network. Let an arbitrary FCG induced by the node set N', in an FCG = (V, , Ω), be denoted as , where and Ω_{ N' }are the edge set and the fuzzy membership function induced by the node set N' in and Ω respectively. Then, a set of PM s in this FCG is defined as follows.
- 1.,
- 2.,
- 3..
The basic goal of this work is determining a significant set of PM s from the miRNA microarray profiled data for the unsupervised analysis of the TF-miRNA regulation.
Results and discussion
Again, the fuzzy membership values of the miRNAs, over all the tissues/patients, are computed for these datasets. These fuzzy membership values of the miRNAs for all the experiments, in the form of a histogram (shown in Additional file 1 section 2.2), reflect that a large number of miRNA pairs are highly co-expressed. The distributions of the miRNA sizes reflected in these histograms against the fuzzy membership values help to select the lower density threshold (δ_{ lower }) and the density decay constant (ξ) employed by the proposed method. By studying the histograms, we selected δ_{ lower }= 0.95 and ξ = 0.005 (for smooth tail) for the schizophrenia dataset, δ_{ lower }= 0.99 and ξ = 0.001 (for sharp tail) for the tissue-specific dataset, and δ_{ lower }= 0.93 and ξ = 0.003 (for smooth tail) for the stem cell dataset.
An unsupervised algorithm for mining FCGs mapped from miRNA co-expression networks
Input: An FCG = (V, , Ω), a lower density threshold = δ_{ lower }and a density decay constant ξ. |
---|
Output: A set of k number of PM s { }. |
Formal steps: |
1: Set t ← 0 |
2: Set δ_{ t }← 1 |
3: while δ_{ t }≥ δ_{ lower }do |
4: Find the largest PM, , from with respect to the association density δ_{ t } |
5: if ≠ ɸ then |
6: V ← V - |
7: |
8: end if |
9: t ← t + 1 |
10: δ_{ t }← δ_{ t }(1 - ξ) |
11: end while |
12: k ← t |
The PM s obtained by applying the mining method over the schizophrenia dataset
t | δ _{ t } | Module size | SE | ΣSE | SI _{C/V} |
---|---|---|---|---|---|
0 | 1 | - | - | - | - |
... | ... | - | - | - | - |
3 | 0.9850 | 13 | 0.18 | 70.32 | 0.9966 |
4 | 0.9802 | 8 | 0.23 | 69.79 | 0.9953 |
5 | 0.9752 | 26 | 0.49 | 72.71 | 0.9913 |
6 | 0.9704 | 15 | 0.69 | 71.89 | 0.9857 |
7 | 0.9655 | 4 | 0.85 | 70.25 | 0.9758 |
8 | 0.9607 | 14 | 1.25 | 72.51 | 0.9729 |
9 | 0.9559 | 25 | 1.22 | 72.86 | 0.9786 |
10 | 0.9511 | 6 | 1.37 | 64.76 | 0.9904 |
The PM s obtained by applying the mining method over the tissue-specific dataset
t | δ _{ t } | Module size | SE | ΣSE | SI _{ C/V } |
---|---|---|---|---|---|
0 | 1 | 14 | 3.68 | 1.31E6 | 1.0000 |
1 | 0.9990 | 20 | 26.72 | 1.36E6 | 1.0000 |
2 | 0.9980 | 29 | 139.19 | 1.44E6 | 0.9998 |
3 | 0.9970 | 16 | 489.49 | 1.32E6 | 0.9992 |
4 | 0.9950 | 17 | 1.42E3 | 1.33E6 | 0.9978 |
5 | 0.9940 | 3 | 1.47E3 | 1.23E6 | 0.9965 |
6 | 0.9920 | 9 | 3.75E3 | 1.27E6 | 0.9935 |
7 | 0.9900 | 10 | 5.87E3 | 1.28E6 | 0.9899 |
The PM s obtained by applying the mining method over the stem cell dataset
t | δ _{ t } | Module size | SE | ΣSE | SI _{ C/V } |
---|---|---|---|---|---|
0 | 1 | - | - | - | - |
1 | 0.9970 | 4 | 4.23E5 | 7.26E9 | 0.9999 |
2 | 0.9940 | 20 | 2.0E6 | 7.4E9 | 0.9996 |
3 | 0.9910 | 16 | 4.37E6 | 7.38E9 | 0.9991 |
4 | 0.9881 | 10 | 8.53E6 | 7.33E9 | 0.9982 |
5 | 0.9851 | 14 | 1.13E7 | 7.37E9 | 0.9977 |
6 | 0.9821 | 17 | 1.73E7 | 7.41E9 | 0.9965 |
7 | 0.9733 | 24 | 4.04E7 | 7.52E9 | 0.9920 |
8 | 0.9646 | 8 | 6.31E7 | 7.38E9 | 0.9853 |
9 | 0.9617 | 14 | 6.78E7 | 7.43E9 | 0.9863 |
10 | 0.9588 | 15 | 8.0E7 | 7.43E9 | 0.9848 |
11 | 0.9531 | 20 | 1.17E8 | 7.56E9 | 0.9757 |
12 | 0.9417 | 12 | 1.68E8 | 7.54E9 | 0.9609 |
13 | 0.9304 | 7 | 2.53E8 | 7.58E9 | 0.9313 |
Another important clustering index, the Silhouette Index [7, 21], is measured to verify the inter-cluster dissimilarity between the PM s found. Often, the Silhouette Index (SI_{C/V}) is defined for a single cluster C with respect to a background set V [17]. Using this measure, the SI_{C/V}values have been computed (details in Additional file 1 section 2.3) for the PM s derived from all the datasets and are given in the last columns of the Tables 2, 3, 4. The value of SI_{C/V}ranges within [-1,+1], with higher values indicating better mined modules. As expected, the values of SI_{C/V}of the PM s generally decrease in the order of their derivation. Some exceptions in this trend may be noted (from Table 2 and Table 4) for the schizophrenia and stem cell dataset for the last few PM s, as was also seen in the case of SE values. This might be due to the selection of lower density threshold (δ_{ lower }) which is required to be tuned more tightly.
The sizes of the miRNA groups found are validated following a method of deriving the upper bound of a clique of a graph (see Additional file 1 section 2.4) introduced in [22]. The upper bound is found to be 119 by setting δ = 0.95, 121 by setting δ = 0.99, and 187 by setting δ = 0.93 for the schizophrenia dataset, tissue-specific and stem cell datasets, respectively. These are the expected sizes of the most compact miRNA modules present in the networks. From the pruning method we have used, the sizes of the significant set of miRNAs are found as 111 (~46%), 118 (~67%) and 181 (~41%). These are significantly similar to the upper bounds derived theoretically, and thus important.
The motivation of the current work may bias the importance of the mining method by suggesting that it is suitable only for the miRNA expression data or scalable up to their standard size (as miRNA expression datasets have lower dimensions than the gene expression datasets). But, this is not the case. The procedure is equally good for a gene expression dataset. The miRNA expression datasets are studied here to motivate our hypothesis on TF-miRNA regulation. However, for verifying the effectiveness of the proposed method a gene expression dataset was considered. This dataset consists of expression values of 6167 genes over 52 time points (details in Additional file 1 section 2.5). The results show that the proposed method is well applicable to this larger dataset indicating its scalability. Moreover, the discussion on the algorithmic complexity (see Additional file 1 section 2.5) highlight that it is polynomial in nature. In the following subsections, we include an exhaustive analysis for validating the PM s in the perspective of bioinformatics research incorporating visual, statistical and biological analysis.
Visual Validation
Statistical Validation
For the statistical analysis of the PM s, we have used a randomized model [3]. Here, a cluster matrix of size n × k (n denotes the number of miRNAs selected in the PM s and k is the number of PM s) is first constructed from the information available about the PM s. An element (i, j) in the cluster matrix is assigned a value "1", if miRNA i is found in the PM j, otherwise it is set to "0". Depending on the matrix, an r-randomized degree preserving model is derived by randomly swapping the edges r times for computing the co-occurrence of miRNA pairs by chance. Using the model, the p-values (details in Additional file 1 section 2.7) of the co-occurrence of all the miRNA pairs in the PM s are computed for all the three datasets. We obtained the values 6.4E-3, 2E-15 and <1E-3 for the schizophrenia, tissue-specific and stem cell datasets, respectively. This shows that the results obtained are not by chance and the PM s are statistically significant.
Comparative ΣNSE values
Methods | Schizophrenia dataset | Tissue-specific dataset | Stem cell dataset |
---|---|---|---|
K-means | 0.9 | 780.95 | 5.94E7 |
Average linkage (UPGMA) | 0.81 | 2069.49 | 6.62E7 |
Complete linkage | 0.7 | 1975.34 | 4.67E7 |
DIANA | 0.71 | 2107.95 | 5.02E7 |
Fanny | 0.67 | 1558.66 | 4.45E7 |
SOM | 1.22 | 1126.4 | 2.46E8 |
Iclust | 12.8 | 1470.83 | 1.19E8 |
SiMM-TS | 23.71 | 1666.38 | 1.17E8 |
Proposed | 0.64 | 763.31 | 4.55E7 |
Biological Validation
From a biological perspective, it may be expected that the miRNAs within a single PM are regulated by common TFs. To verify this hypothesis, an exhaustive biological investigation has been conducted. Since the complete information related to TF-miRNA regulation is not yet available, we relied on the established knowledge of the conserved TFBSs based on the UCSC hg18 genome assembly [25]. We have used the wgRNA table under the sno/miRNA track of this database (details in Additional file 1 section 2.8) pertaining to the information about the location of miRNAs in the chromosomes. Motivated from an earlier study [3], the region 10 kb upstream of the start of an miRNA sequence is defined as the putative regulatory region of the miRNA assumed to contain the regulatory binding sites. After defining the putative regulatory regions of the miRNAs found in the individual PM s, we identified the TFs, which are known to bind to this region, from the tfbsConsSites table under the TFBS Conserved track in the UCSC Table Browser [25]. In this way the list of the miRNA pairs, containing the TFBSs of common regulatory TFs in their putative upstream region, belonging to a single PM are accumulated for the study (provided in Additional files 2, 3 and 4 for the schizophrenia, tissue-specific and stem cell datasets, respectively).
Statistics of the miRNA pairs explored regulated by the common TFs
Priority modules | Schizophrenia dataset | Tissue-specific dataset | Stem cell dataset |
---|---|---|---|
PM 1 | 589 | 1074 | 1585 |
PM 2 | 51 | 235 | 148 |
PM 3 | 81 | 0 | 211 |
PM 4 | 260 | 291 | 2 |
PM 5 | 0 | 12 | 8 |
PM 6 | 13 | 40 | 30 |
PM 7 | 1 | 20 | 110 |
PM 8 | 24 | 64 | 13 |
PM 9 | - | - | 0 |
PM 10 | - | - | 0 |
PM 11 | - | - | 13 |
PM 12 | - | - | 0 |
PM 13 | - | - | 73 |
Since computational analysis of miRNA regulation is still in a nascent stage, such information is biologically significant. The PM s provide information, in a compact form, about a set of miRNAs that might be regulated by common TFs. Interestingly, in many cases it has been observed that some miRNAs present in consecutive PM s (not in the same one) are associated with same TFs. This might indicate that these miRNAs should have been within a single PM, but got separated because of the choice of the density decay constant (ξ). Thus an exhaustive sensitivity analysis of the method on ξ needs to be carried out in future. Details are shown in Table 6. The assignment of optimal association density threshold (δ) value and the density decay constant (ξ) play an important role in the selection of significant module by the proposed mining methodology. This parameter, not tuned properly might cause the inclusion of irrelevant miRNAs in the significant module selected or might disrupt the comprehensiveness of this significant module.
Biological Insight
Computed p-values of the occurrence of commonly regulated miRNA pairs found by the proposed method in the three datasets
Dataset | p-value |
---|---|
Schizophrenia | < 1E - 4 |
Tissue-specific | < 1E - 4 |
Stem cell | < 1E - 4 |
A deeper in silico analysis of the PM s derived by the heuristic mining procedure sheds light on some important biological results hitherto unexplored. In a recent study [26], the molecular evolution of an miRNA cluster and its paralogs has been reconstructed. This cluster of miRNAs consists of hsa-miR-17, hsa-miR-18, hsa-miR-19a, hsa-miR-19b, hsa-miR-20, hsa-miR-25, hsa-miR-92, hsa-miR-93, hsa-miR-106a, and hsa-miR-106b. To study the co-expression similarity of this set of miRNAs, we investigated the PM s that contain these miRNAs from the results of the schizophrenia dataset. Most of these miRNAs are found in separate PM s or are pruned out, and therefore, are not co-expressed. Strikingly, although the hsa-miR-19a and hsa-miR-19b are known to be closely related mature sequences (generally represented as hsa-miR-Xa/b/...), yet they are not found in same PM s (or even close ones). This might be due to the reason that they are evolutionary clustered. In short, they are not found to be co-expressed although they are paralogs. Therefore, this indicates that the expression profiles might not be dependent on the evolutionary relationship of the miRNAs.
Conclusion
This paper introduces a novel unsupervised method of exploring commonly regulated modules of human miRNAs by targeting TFs. The method integrates a self-pruning subroutine to discard the portion of the microarray data that might be noisy or insignificant for the particular study. The method has a different motivation from a general clustering approach. It can produce priority-based modules pertaining biological significance. For validating the efficacy of the pruning methodology, a novel tool is devised for visualizing the expression data from a statistical perspective. The results show the generation of a set of PM s in the decreasing order of statistical significance. The coherence of these modules is validated with a novel compactness measure. Biologically, with respect to regulation by TFs, this ordering might not be important, even though these PM s are found to be effective in the exploration of TF-miRNA regulatory activity. By a deeper analysis, a large number of TFs are identified, which might be regulating multiple miRNAs common to a module. Supporting an earlier study [3], these results might be significant for reconstructing the complex regulatory network that hypothetically exists between TFs and miRNAs. The results also indicate that the miRNAs which are evolutionarily related may not be biologically corregulated.
Methods
To apply the proposed heuristic mining process, we initially construct an FCG from the microarray data. As this study integrates the concept of FCG, reflecting similarity measure within (0,1], there should be some normalized similarity measure as the fuzzy membership function. Here, a fuzzy membership function, based on the squared Euclidean distance, is used. A commonly used normalization method is performing the zero mean and unit normalization operation (see Additional file 1 section 2.1) on the entire dataset. However, with prior zero mean and unit normalization, the squared Euclidean distance metric coincides with the Pearson correlation coefficient. We employ a novel fuzzy membership function to compute the miRNA-miRNA membership value (relation) in the final FCG.
In Eqn. (6), ε_{1i}represents the i^{ th }element of the expression vector ε_{1} and NF denotes a normalization factor that is calculated as .
The FCG to be explored is prepared using the aforesaid measure. Once the FCG is prepared they can be equivalently considered as a co-expression network. The proposed mining method produces a set of N-vertexlets (groups of miRNAs which we call PM s) by stepwise pruning of the constructed FCG until a stopping criterion is reached.
The proposed mining methodology is given in formal steps in Table 1. This complete process is followed by a post-processing technique. The basic algorithm efficiently groups the miRNAs in the descending order of coherence and prunes out the insignificant residual part. It takes an miRNA co-expression network (in the form of FCG) and the two controlling parameters a lower density threshold and a density decay constant as inputs. Staring from the zeroth time point (t = 0), at each iteration (time point) the algorithm discovers the largest PM (largest dense N-vertexlet) in the current co-expression network. This (step 4) is done by using an algorithm proposed in a recent work to identify largest N-vertexlets from a scale-free graph [17].
- 1.
For every single vertex in the FCG a neighboring list of vertices is prepared. This contains the series of vertices in their descending order of fuzzy membership value with respect to the corresponding vertex.
- 2.
The vertex having the maximum association density with respect to the remaining ones is selected as the seed vertex.
- 3.
The seed vertex is expanded heuristically by weighted combination of the neighboring list until a threshold of association density (here δ_{ t }) is reached.
- 4.
The final expanded list provides the largest PM.
The selected largest PM obtained using the above subroutine is extracted from the original network and the association density is decayed. The decay of density does not occur linearly, rather, it is done inspired by an approach similar to simulated annealing associating a decay constant ξ. This decayed density and the residual network are taken as the current density and current network, respectively, in the subsequent iteration. The self-pruning is continued until the lower density threshold is reached and the left-out network is treated as the insignificant subpart of the original network. On completion of the iterations, the number of PM s is returned by the variable t. The output is produced in the form of a finite set of PM s. From the entire set of V miRNAs, miRNAs are mined as significant part and the left portion is pruned out. Thus, it statistically integrates a noise-pruning characteristic to produce accurate results.
Subsequent to this mining procedure a post-processing routine is performed on the final set of PM s { } produced as the output. These PM s are selected as a set of initialized modules and the centers of these modules are computed. With respect to all the miRNAs, the modules are reconstructed by associating each miRNA to a closer module center. Again, the module centers are computed for the reconstructed modules and the same process is iterated. This finally produces the modules of miRNAs of importance.
Supplementary materials along with the datasets are available at the webpage of the corresponding author: http://www.isical.ac.in/~malay_r/Supplementary.html.
Declarations
Acknowledgements
The authors wish to thank the anonymous reviewers for their valuable suggestions that greatly helped to improve the contents of this paper. Additionally, the first author gratefully acknowledges the financial support from the grant no.- DST/SJF/ET-02/2006-07 under the Swarnajayanti Fellowship scheme of the Department of Science and Technology, Government of India.
Authors’ Affiliations
References
- Bartel DP: MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 2004, 116(2):281–297. 10.1016/S0092-8674(04)00045-5View ArticlePubMedGoogle Scholar
- Perkins DO, Jeffries CD, Jarskog LF, Thomson JM, Woods K, Newman MA, Parker JS, Jin J, Hammond SM: microRNA expression in the prefrontal cortex of individuals with schizophrenia and schizoaffective disorder. Genome Biol 2007, 8(2):R27. 10.1186/gb-2007-8-2-r27PubMed CentralView ArticlePubMedGoogle Scholar
- Shalgi R, Lieber D, Oren M, Pilpel Y: Global and local architecture of the mammalian microRNA-transcription factor regulatory network. PLoS Comput Biol 2007, 3(7):e131. 10.1371/journal.pcbi.0030131PubMed CentralView ArticlePubMedGoogle Scholar
- Lu J, Getz G, Miska EA, Saavedra EA, Lamb J, Peck D, Cordero AS, Ebert BL, Mak RH, Ferrando AA, Downing JR, Jacks T, Horvitz HR, Golub TR: MicroRNA expression profiles classify human cancers. Nature 2005, 435: 834–838. 10.1038/nature03702View ArticlePubMedGoogle Scholar
- Brown D, Conrad R, Devroe E, Goldrick M, Keiger K, Labourier E, Moon I, Powers P, Shelton J, Shingara J: Methods and compositions involving microRNA. US Patent 20070161004 2007, A1.Google Scholar
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998, 95: 14863–14868. 10.1073/pnas.95.25.14863PubMed CentralView ArticlePubMedGoogle Scholar
- Bandyopadhyay S, Mukhopadhyay A, Maulik U: An improved algorithm for clustering gene expression data. Bioinformatics 2007, 23(21):2859–2865. 10.1093/bioinformatics/btm418View ArticlePubMedGoogle Scholar
- Datta P, Datta S: Evaluation of clustering algorithms for gene expression data. BMC Bioinformatics 2006, 7: S17. 10.1186/1471-2105-7-S4-S17PubMed CentralView ArticlePubMedGoogle Scholar
- Chopra P, Kang J, Yang J, Cho H, Kim HS, Lee MG: Microarray data mining using landmark gene-guided clustering. BMC Bioinformatics 2008, 9: 92. 10.1186/1471-2105-9-92PubMed CentralView ArticlePubMedGoogle Scholar
- Sempere LF, Freemantle S, Pitha-Rowe I, Moss E, Dmitrovsky E, Ambros V: Expression profiling of mammalian microRNAs uncovers a subset of brain-expressed microRNAs with possible roles in murine and human neuronal differentiation. Genome Biol 2004, 5(3):R13. 10.1186/gb-2004-5-3-r13PubMed CentralView ArticlePubMedGoogle Scholar
- Baskerville S, Bartel DP: Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA 2005, 11: 241–247. 10.1261/rna.7240905PubMed CentralView ArticlePubMedGoogle Scholar
- Hornstein E, Shomron N: Canalization of development by microRNA. Nat Genet (Supplementary) 2006, S20-S24. 10.1038/ng1803Google Scholar
- Wang G, Wang Y, Feng W, Wang X, Yang JY, Zhao Y, Wang Y, Liu Y: Transcription factor and microRNA regulation in androgen-dependent and -independent prostate cancer cells. BMC Genomics 2008, 9(Suppl 2):S22. 10.1186/1471-2164-9-S2-S22PubMed CentralView ArticlePubMedGoogle Scholar
- Lewis BP, Burge CB, Bartel DP: Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 2005, 120: 15–20. 10.1016/j.cell.2004.12.035View ArticlePubMedGoogle Scholar
- Krek A, Grün D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, Rajewsky N: Combinatorial microRNA target predictions. Nat Genet 2005, 37(5):495–500. 10.1038/ng1536View ArticlePubMedGoogle Scholar
- Tian Z, Greene AS, Pietrusz JL, Matus IR, Liang M: MicroRNAtarget pairs in the rat kidney identified by microRNA microarray, proteomic, and bioinformatic analysis. Genome Res 2008, 18: 404–411. 10.1101/gr.6587008PubMed CentralView ArticlePubMedGoogle Scholar
- Bandyopadhyay S, Bhattacharyya M: Mining the Largest Dense N-vertexlet in a Fuzzy Scale-free Graph. In Technical Report No. MIU/TR-03/08. Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India; 2008.Google Scholar
- Laurent LC, Chen J, Ulitsky I, Mueller F, Lu C, Shamir R, Fan J, Loring JF: Comprehensive MicroRNA Profiling Reveals a Unique Human Embryonic Stem Cell Signature Dominated by a Single Seed Sequence. Stem Cells 2008, 26: 1506–1516. 10.1634/stemcells.2007-1081View ArticlePubMedGoogle Scholar
- Brock GN, Shaffer JR, Blakesley RE, Lotz MJ, Tseng GC: Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes. BMC Bioinformatics 2008, 9: 12.PubMed CentralView ArticlePubMedGoogle Scholar
- Handl J, Knowles J, Kell DB: Computational cluster validation in post-genomic data analysis. Bioinformatics 2005, 21(15):3201–3212. 10.1093/bioinformatics/bti517View ArticlePubMedGoogle Scholar
- Rousseeuw P: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 1987, 20: 53–65. 10.1016/0377-0427(87)90125-7View ArticleGoogle Scholar
- Amin AT, Hakimi SL: Upper bounds on the order of a clique of a graph. SIAM J Appl Math 1972, 22(4):569–573. 10.1137/0122052View ArticleGoogle Scholar
- McGill R, Tukey JW, Larsen WA: Variations of Box Plots. The American Statistician 1978, 32: 12–16. 10.2307/2683468Google Scholar
- Slonim N, Atwal GS, Tkačik G, Bialek W: Information-based clustering. Proc Natl Acad Sci USA 2005, 102(51):18297–18302. 10.1073/pnas.0507432102PubMed CentralView ArticlePubMedGoogle Scholar
- Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ: The UCSC Table Browser data retrieval tool. Nucleic Acids Res 2004, 32: D493-D496. 10.1093/nar/gkh103PubMed CentralView ArticlePubMedGoogle Scholar
- Tanzer A, Stadler PF: Molecular evolution of a microRNA cluster. J Mol Biol 2004, 339(2):327–335. 10.1016/j.jmb.2004.03.065View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.