Multiobjective triclustering of time-series transcriptome data reveals key genes of biological processes
- Anirban Bhar^{1},
- Martin Haubrock^{1},
- Anirban Mukhopadhyay^{2} and
- Edgar Wingender^{1}Email author
https://doi.org/10.1186/s12859-015-0635-8
© Bhar et al.; licensee BioMed Central. 2015
Received: 5 November 2014
Accepted: 1 June 2015
Published: 26 June 2015
Abstract
Background
Exploratory analysis of multi-dimensional high-throughput datasets, such as microarray gene expression time series, may be instrumental in understanding the genetic programs underlying numerous biological processes. In such datasets, variations in the gene expression profiles are usually observed across replicates and time points. Thus mining the temporal expression patterns in such multi-dimensional datasets may not only provide insights into the key biological processes governing organs to grow and develop but also facilitate the understanding of the underlying complex gene regulatory circuits.
Results
In this work we have developed an evolutionary multi-objective optimization for our previously introduced triclustering algorithm δ-TRIMAX. Its aim is to make optimal use of δ-TRIMAX in extracting groups of co-expressed genes from time series gene expression data, or from any 3D gene expression dataset, by adding the powerful capabilities of an evolutionary algorithm to retrieve overlapping triclusters. We have compared the performance of our newly developed algorithm, EMOA- δ-TRIMAX, with that of other existing triclustering approaches using four artificial dataset and three real-life datasets. Moreover, we have analyzed the results of our algorithm on one of these real-life datasets monitoring the differentiation of human induced pluripotent stem cells (hiPSC) into mature cardiomyocytes. For each group of co-expressed genes belonging to one tricluster, we identified key genes by computing their membership values within the tricluster. It turned out that to a very high percentage, these key genes were significantly enriched in Gene Ontology categories or KEGG pathways that fitted very well to the biological context of cardiomyocytes differentiation.
Conclusions
EMOA- δ-TRIMAX has proven instrumental in identifying groups of genes in transcriptomic data sets that represent the functional categories constituting the biological process under study. The executable file can be found at http://www.bioinf.med.uni-goettingen.de/fileadmin/download/EMOA-delta-TRIMAX.tar.gz.
Keywords
Microarray gene expression data Developmental biology Tricluster Multi-objective optimization Eigen gene Affirmation score TRANSFACBackground
One of the main aims of functional genomics is to understand the dynamic features encoded in the genome such as the regulation of gene activities. It often refers to high-throughput approaches devised to gain a complete picture about all genes of an organism in one experiment. Several steps, such as transcription, RNA splicing and translation are involved in the process of gene expression, which is subject to a great many of regulatory mechanisms. Analysis of such gene expression data provides enormous leverages to understand the principles of cellular systems, diseases mechanisms, molecular networks etc. Genes having similar expression profiles are frequently found to be regulated by similar mechanisms. Previous studies elucidated the impact of highly connected intra-modular hub genes on such regulations [1–3]. Detecting hub genes and analyzing their roles may facilitate understanding the basal control mechanisms of a certain normal or disease cellular phenotype to develop.
Microarray technology is used to measure the expression of thousands of genes over a set of biological replicates simultaneously. In recent years, such expression signatures have increasingly been monitored for sets of time points in order to follow the course of biological processes. In case of such three-dimensional datasets, at each time point the activity of all genes is measured for a number of biological replicates. Although the experimental setups are kept identical for these replicates, variations between them can still occur. For instance, stochastic effects can result in delays or accelerations of a certain cell state transition. Thus, grouping similar biological replicates may facilitate the analysis of time series gene expression data. Moreover, expression profiles of genes may also vary over different time points. Appropriate computational methods are therefore required to analyze such high-throughput datasets specifically to identify temporal expression patterns over biological replicates and time points. Clustering, one of the unsupervised learning approaches, has been used to explore such two-dimensional gene expression datasets. Clustering algorithms aim to maximize similarity within or to minimize similarity between clusters, based on a distance measure [4]. Clustering is able to group genes or samples over a set of samples or genes, respectively, but it has been reported in previous studies that genes are not necessarily to be co-expressed over all samples. Hence to find such local patterns, i.e. genes having similar expression profiles over a subset of samples in 2D gene expression datasets biclustering algorithms are used [5]. In previous studies, biclusters have been found to be biologically more significant as these algorithms aim to extract groups of correlated genes from a subset of samples. Such subspace clustering techniques find clusters in multiple overlapping subspaces. To deal with time series gene expression datasets, biclustering algorithms fail to extract genes that have similar expression profiles over a subset of samples during a subset of time points. To perform co-expression analysis in such three-dimensional gene expression datasets triclustering algorithms have to be employed. Zhao et al. proposed the TRICLUSTER algorithm that aims to retrieve groups of genes that have similar expression profiles over a subset of samples during a subset of time points [6]. In a recent work, Tchagang et al. proposed a triclustering algorithm (OPTricluster) for mining short time series gene expression datasets. OPTricluster effectively mines time series gene expression data having approximately 3-8 time points and 2-5 samples. According to their definition of a tricluster, genes belonging to a tricluster must have constant, coherent or order preserving expression patterns over a subset of samples during a subset of time points. In case of an order-preserving tricluster, there must be a permutation of the time points such that expression levels of genes form a monotonic function [7]. In our previous work we have proposed a triclustering algorithm δ-TRIMAX by introducing a novel mean squared residue score (MSR) to mine a 3D gene expression dataset and each tricluster must have an MSR score below a threshold δ [8, 9]. In spite of its proven merits [8, 9], δ-TRIMAX has some limitations: a) it can not retrieve overlapping triclusters, b) due to its greedy approach it often gets stuck at local optima. Finding overlapping triclusters is important in biological context, since each gene may participate in several biological processes, thus being subject to multiple regulatory influences [10]. A subset of genes may therefore be involved in a set of biological processes and consequently belong to several triclusters. However, the goals of δ-TRIMAX algorithm were to maximize the volume and minimize the MSR score of the resultant triclusters. Hence the problem of optimizing such multiple conflicting objectives can be classified as multi-objective optimization problem where a set of alternative solutions of equivalent quality exists instead of one single optimal solution. To optimize the conflicting objectives of δ-TRIMAX we have used a non-dominated sorting genetic algorithm-II (NSGA-II) [11] as a multi-objective optimization method to develop EMOA- δ-TRIMAX (Evolutionary Multi-objective Optimization Algorithm for δ-TRIMAX). It could demonstrate that EMOA- δ-TRIMAX effectively copes with the problems of δ-TRIMAX.
Methods
Definitions
Time series gene expression dataset (D): Such a dataset can be modeled as a G × C × T matrix, of which each element d _{ ijk } corresponds to the expression value of the ith gene over the jth sample and across the kth time point where i ∈ (g _{1},g _{2},....,g _{ G }), j ∈ (c _{1},c _{2},....,c _{ C }), k ∈ (t _{1},t _{2},....,t _{ T }).
Tricluster (M): A tricluster can be defined as a sub-matrix M(I,J,K) = [ m _{ ijk }], where i ∈ I, j ∈ J, k ∈ K. Sub-matrix M represents a subset of genes (I) that have similar expression profiles over a subset of samples (J) during a subset of time points (K).
Perfect shifting tricluster: A tricluster M(I,J,K) is called perfect shifting tricluster if each element of the tricluster is represented as: m _{ ijk }=Γ+α _{ i }+β _{ j }+η _{ k }, where Γ is a constant value of the tricluster and α _{ i }, β _{ j } and η _{ k } are the shifting factors of ith gene, jth sample, kth time point respectively.
where the mean of the ith gene is \( m_{\textit {iJK}} = \frac {1}{|J||K|}\sum _{j \in J, k \in K} m_{\textit {ijk}}\), the mean of the jth sample is \(m_{\textit {IjK}} = \frac {1}{|I||K|} \sum _{i \in I, k \in K} m_{\textit {ijk}}\), the mean of the kth time point is \(m_{\textit {IJk}} = \frac {1}{|I||J|} \sum _{i \in I, j \in J} m_{\textit {ijk}}\), and the mean of tricluster is \(m_{\textit {IJK}} = \frac {1}{|I||J||K|} \sum _{i \in I, j \in J, k \in K} m_{\textit {ijk}}\).
The MSR score of a tricluster represents the level of coherence among the elements of the tricluster. Hence a lower MSR score means better quality of a tricluster. For a perfect shifting tricluster the MSR score is zero. If we use some global normalization, like min–max normalization globally on the whole dataset, it does not affect the algorithm. Moreover, it can be shown that gene–wise Z-normalization only on a tricluster does not affect the MSR score. However, when we apply similar normalization on the whole dataset, it affects the triclusters, and in turn affects our algorithm. Still we prefer to normalize the dataset in order to eliminate the variability in gene expression profiles due to experimental errors and noises and as normalization reduces the effects of scaling patterns, scaling patterns could also be identified partially.
Steps of EMOA- δ-TRIMAX
Multi-objective optimization problem
The multi-objective optimization problem is equivalent to finding the vector \(\bar {x}^{*} = [x_{1}^{*},x_{2}^{*},\ldots,x_{n}^{*}]^{T}\) of decision variables that satisfies a number of equality and inequality constraints by optimizing the vector function \(\bar {f}(\bar {x}) = [f_{1}(\bar {x}),f_{2}(\bar {x}),\ldots,f_{r}(\bar {x})]^{T}\) subject to some constraints. Here the constraints correspond to the feasible region F that holds all the acceptable solutions; \(\bar {x}^{*}\) stands for an optimal solution. For a minimization problem, Pareto optimality can be formally delineated as: A decision vector \(\bar {x}^{*}\) is referred to as Pareto optimal if and only if there is no \(\bar {x}\) such that ∀i∈{1,2,..,r}, \(f_{i}(\bar {x}) \leq f_{i}(\bar {x}^{*})\) and ∃i∈{1,2,…,r}, \(f_{i}(\bar {x}) \textless f_{i}(\bar {x}^{*})\). In words, \(\bar {x}^{*}\) is called Pareto optimal if there exists no possible vector \(\bar {x}\) that induces a diminution of some criterion without a contemporaneous increase of at least one other criterion [11, 14].
Genetic algorithm
A genetic algorithm is a search heuristic that imitates the process of Darwinian evolution [11, 14]. Here the population is generated randomly and consists of a set of chromosomes that encode the parameters of the search space. A fitness function corresponds to the objective function to be optimized and is used to estimate the goodness of each chromosome in the population. Genetic operators such as selection, crossover and mutation are used to evolve subsequent generations. If some particular criterion is met or the maximum generation limit is reached, then the algorithm finishes its execution.
Encoding chromosome
Each chromosome is represented by a binary string that has three parts. A chromosome encodes a possible tricluster. For a time series gene expression dataset having G number of genes, C number of samples and T number of time points, the first G bits correspond to genes, the next C bits represent the samples and the last T positions stand for the time points. Hence each string is represented by (G+C+T) bits, having a value either 1 or 0. A value 1 means the corresponding gene or sample or time point is a member of the tricluster. Suppose for a 3D gene expression dataset having 10 genes, 5 samples and 8 time points, a string {10010011100011101010101} represents that genes {g _{1},g _{4},g _{7},g _{8},g _{9}}, samples {s _{3},s _{4},s _{5}} and time points {t _{2},t _{4},t _{6},t _{8}} are the members of the tricluster. The initial population consists of a set of randomly generated chromosomes. Retrieval of overlapping genes belonging to several triclusters are guaranteed by the step of chromosome encoding. As each bit of a chromosome in the population represents the presence or absence of genes, replicates and time points in one resultant tricluster, often we could find an overlap between the positions of any two chromosomes containing a value 1. Thus different chromosomes can encode overlapping triclusters. Some genes and/ or samples and/ or time points could be added to the initial population inspite of lying far away from the feature space. To remove such nodes from the population, δ-TRIMAX has been used as a local search heuristic.
Objective functions
where d _{ i } is the difference between the ranks of average expression values (sorted either in ascending or descending order) over a subset of samples at ith time point of each pair of genes in one tricluster and n is the number of time points in that tricluster. Here the goal is to maximize the non-parametric Spearman correlation coefficient (f _{3}) [15] of the resultant triclusters.
Motivations of objective functions
As the aim of our proposed algorithm is to find triclusters having a lower MSR score and a higher volume, the first two objective functions (f _{1} and f _{2}) ensure to accomplish those goals. Moreover the objective function f _{3} is used to maximize the correlation coefficients among genes belonging to the resultant triclusters. We have taken the absolute values of the correlation coefficients just considering the fact that coregulated genes can be both up- and down-regulated by the transcription factors across a subset of time points.
Genetic operators
Here, non-dominated sorting and crowding distance are used for fitness assignment and comparison [11]. A crossover is a generalization of several mutations performed at once, which we have not applied in this work [16]. Instead, we have used bit string mutations with a high mutation probability to generate offspring population from a parent population. In this case, the mutation occurs at random positions through bit flips. For instance, for a binary string {1011010010} we generate a random number ranges from 0 to 1 for each bit of the string. If this random number for a particular bit is less than or equal to the mutation probability, mutation occurs and the value 1 or 0 is changed to a value 0 or 1, respectively. The mutation probability remains same for each of the bits of chromosome. After applying the mutation operator on each individual of the population, some genes/samples/time points can be added to the population that are lying far away from the feature space. To cope with this problem we have applied δ-TRIMAX as a local search heuristic.
Elitism
We have included elitism to keep track of non-dominated Pareto optimal solutions after each generation [11]. Stopping criteria is measured by the convergence metric delineated in equation (8).
Tricluster eigengene
KEGG pathway enrichment
To establish the biological significance of the genes belonging to each resultant tricluster for both datasets we have performed a KEGG pathway enrichment analysis using the GOStats package in R with a p-value cutoff (BH-corrected p-value) of 0.05 [18, 19].
TFBS enrichment analysis
Genes that exhibit similar expression profiles are supposed to be regulated by the same mechanism. To analyze the potential co-regulation of co-expressed genes, we have done a transcription factor binding site (TFBS) enrichment analysis using the TRANSFAC library (version 2012.2) [20]. Here we used 52 million TFBS predictions that are conserved between human, mouse, dog and cow [21]. Out of these 52 million conserved TFBSs we have selected the highest-scoring 1 % for each TRANSFAC matrix to identify the most specific regulator (transcription factor) - target interactions. We have applied a hypergeometric test and Benjamini Yekutieli-FDR for p-value correction to find over-represented binding sites (p-value ≤ 0.05) in the upstream regions of genes belonging to each tricluster [22, 23].
Datasets
Description of the artificial datasets
Artificial dataset 1 (AD1):
First, we have applied the proposed algorithm to an artificial dataset containing 1000 genes, 5 samples and 4 time points. We have then embedded 3 perfect shifting triclusters (standard deviation (σ)=0) of size 100 × 4 × 4, 80 × 4 × 4 and 60 × 4 × 4 into the dataset. In the next step, we have implanted 3 noisy triclusters with different levels of noise (σ=0.1,0.3,0.5,0.7,0.9) into the synthetic dataset.
Artificial dataset 2 (AD2):
Moreover, we have generated another artificial dataset which contains 200 genes, 10 replicates and 10 time points. Afterwards, we have implanted 3 perfect shifting triclusters (standard deviation (σ)=0) of size 50 × 3 × 3, 50 × 3 × 3 and 50 × 3 × 3 into the dataset. In the next step, we have added different levels of noise (σ=0.1,0.3,0.5,0.7,0.9) into the synthetic dataset.
Artificial dataset 3 (AD3):
To evaluate the performance of the proposed algorithm in case of the datasets containing different number of time points, we have generated three additional artificial datasets of size 200 (genes) × 10 (replicates) × 20 (time points), 200 (genes) × 10 (replicates) × 25 (time points) and 200 (genes) × 10 (replicates) × 30(time points) in which we have embedded 3 perfect shifting triclusters of size 30 × 3 × 8, 30 × 3 × 6 and 30 × 3 × 4.
Artificial dataset 4 (AD4):
In order to show the performance of the algorithm for the dataset containing missing values, we have randomly deleted the values of 0.5 %, 1 %, 1.5 % and 2 % of all elements of one artificial dataset of size 200 × 10 × 20 containing three triclusters of size 30 × 3 × 8, 30 × 3 × 6 and 30 × 3 × 4.
Description of real-life datasets
Dataset 1:
In this work, this previously published dataset has only been used for comparing the performance of the proposed algorithm with that of the other existing triclustering algorithms since one of the algorithms we wanted to compare our approach with, OPTricluster, can only be efficiently applied to a short time series gene expression dataset and thus, was not suitable to be used for dataset 2 (see below) [7]. Dataset 1 holds 54675 Affymetrix human genome U133 plus 2.0 probe ids, 3 samples and 4 time points (0, 3, 6 and 12 hours) (GSE11324) [24]. The goal of this experiment was to determine cis-regulatory sites in previously uncharted genome regions, responsible for conveying estrogen responses, and to identify the cooperating transcription factors that also contribute to estrogen signaling in MCF7 breast cancer cells.
Dataset 2:
This dataset contains 48803 Illumina HumanWG-6 v3.0 probe ids, 3 replicates and 12 time points (days 0, 3, 7, 10, 14, 20, 28, 35, 45, 60, 90 and 120) (GSE35671) [12]. All these replicates are independent of each other. The aim of this study was to provide insights into the molecular regulation of hiPSC differentiation to cardiomyocytes.
Dataset 3:
This experiment was carried out to study the dynamics of expression profiles of 54675 Affymetrix human genome U133 plus 2.0 probe ids in response to IFN-beta-1b treatment across four time points over 6 patients (GSE46280) [25].
Results and discussion
Results on an artificial dataset
where, T _{ im } is the set of implanted triclusters, T _{ res } represents the set of triclusters extracted by any triclustering algorithm, \(SM^{*}_{G}(T_{\textit {im}}, T_{\textit {res}})\) is the average gene affirmation score, \(SM^{*}_{C}(T_{\textit {im}}, T_{\textit {res}})\) is the average sample affirmation score and \(SM^{*}_{T}(T_{\textit {im}}, T_{\textit {res}})\) is the average time point affirmation score of T _{ res } with respect to T _{ im }. The value of S M ^{∗}(T _{ im },T _{ res }) ranges from 0 to 1. If T _{ res }=T _{ im }, then the affirmation score is 1.
Values of input parameters of EMOA- δ-TRIMAX namely, λ and δ for different levels of noise in case of the artificial dataset 1 (AD1)
Noise levels (σ) | Values of λ | Values of δ |
---|---|---|
0 | 1.2 | 0.0002 |
0.1 | 1.2 | 0.025 |
0.3 | 1.2 | 0.115 |
0.5 | 1.2 | 0.26 |
0.7 | 1.2 | 0.49 |
0.9 | 1.2 | 0.85 |
Values of input parameters of EMOA- δ-TRIMAX namely, λ and δ for different levels of noise in case of the artificial dataset 2 (AD2)
Noise Levels (σ) | Values of λ | Values of δ |
---|---|---|
0 | 1.2 | 0.00002 |
0.1 | 1.2 | 0.045 |
0.3 | 1.2 | 0.06 |
0.5 | 1.2 | 0.29 |
0.7 | 1.2 | 0.59 |
0.9 | 1.2 | 0.8 |
Values of input parameters of EMOA- δ-TRIMAX namely, λ and δ for different levels of noise in case of the artificial dataset 3 (AD3_{a, b, c})
λ (AD3 _a) | δ (AD3 _a) | λ (AD3 _b) | δ (AD3 _b) | λ (AD3 _c) | δ (AD3 _c) |
---|---|---|---|---|---|
1.2 | 0.0002 | 1.2 | 0.02 | 1.2 | 0.02 |
Values of input parameters of EMOA- δ-TRIMAX namely, λ and δ for different levels of noise in case of the artificial dataset 4 (AD4)
% of missing values | λ | δ |
---|---|---|
0.5 | 1.2 | 0.02 |
1 | 1.2 | 0.02 |
1.5 | 1.2 | 0.02 |
2 | 1.2 | 0.02 |
Comparison between EMOA- δ-TRIMAX, δ-TRIMAX and TRICLUSTER algorithm in terms of affirmation score for the artificial dataset 3 (AD3 _a, AD3 _b, AD3 _c)
Dataset | EMOA- δ-TRIMAX | δ-TRIMAX | TRICLUSTER |
---|---|---|---|
AD3 _a | 1 | 1 | 1 |
AD3 _b | 1 | 1 | 1 |
AD3 _c | 1 | 1 | 1 |
Robustness of the evolutionary algorithm
Standard deviations of the affirmation scores yielded by the EMOA- δ-TRIMAX algorithm for artificial dataset 1 (AD1) and 2 (AD2)
Noise levels (σ) | Standard deviation (AD1) | Standard deviation (AD2) |
---|---|---|
0 | 0.05 | 0.003 |
0.1 | 0.05 | 0.004 |
0.3 | 0.02 | 0.02 |
0.5 | 0.005 | 0.02 |
0.7 | 0.003 | 0.03 |
0.9 | 0.004 | 0.02 |
Results on real-life datasets
Values of input parameters of EMOA- δ-TRIMAX for each of the real-life datasets
Datasets | Dataset 1 | Dataset 2 | Dataset 3 |
---|---|---|---|
λ | 1.2 | 1.2 | 1.2 |
δ | 0.012382 | 0.008 | 0.008754 |
Number of generations | 100 | 100 | 100 |
Population Size | 100 | 100 | 100 |
Mutation probability | 0.9 | 0.9 | 0.9 |
Percentage of probe ids, replicates and time points covered by the resultant triclusters for each of the real-life datasets
Datasets | Dataset 1 | Dataset 2 | Dataset 3 |
---|---|---|---|
Coverage of probe ids | 99.02 % | 88.14 % | 93 % |
Coverage of replicates | 100 % | 100 % | 100 % |
Coverage of time points | 100 % | 100 % | 100 % |
Convergence of solutions
Performance comparison
Performance comparison between EMOA- δ-TRIMAX, δ-TRIMAX, TRICLUSTER and OPTricluster in terms of SDB score for Dataset 1
Algorithms | SDB score |
---|---|
EMOA- δ-TRIMAX | 2.49851 |
δ-TRIMAX | 2.140935 |
TRICLUSTER | 2.094091 |
OPTricluster | 0.4956035 |
Performance comparison between EMOA- δ-TRIMAX, δ-TRIMAX and TRICLUSTER in terms of SDB score for Dataset 2
Algorithms | SDB score |
---|---|
EMOA- δ-TRIMAX | 13.88559 |
δ-TRIMAX | 12.10529 |
TRICLUSTER | 7.520363 |
Performance comparison between EMOA- δ-TRIMAX, δ-TRIMAX, TRICLUSTER and OPTricluster in terms of SDB score for Dataset 3
Algorithms | SDB score |
---|---|
EMOA- δ-TRIMAX | 9.454915 |
δ-TRIMAX | 8.945816 |
TRICLUSTER | 7.076184 |
OPTricluster | 0.4383489 |
Biological significance
where \({N^{i}_{T}}\) is the intersection gene set of tricluster T and its enriched KEGG pathway term i; |T| is the total number of genes in tricluster T. A higher hit score signifies that more genes in T participate in a canonical pathway.
where \({P^{i}_{T}}\) is the intersection gene set of tricluster T and its enriched TRANSFAC matrix i; |T| is the total number of genes in tricluster T. A higher hit score signifies that more genes in T are regulated by a common transcription factor.
Importance of clustering biological replicates in 3D gene expression datasets
Identifying key genes of triclusters and analyzing their roles during hiPSC differentiation into cardiomyocytes
Conclusion
In this work, we have shown that the improved version of our previously proposed triclustering algorithm EMOA- δ-TRIMAX outperforms the other algorithms when applied to four synthetic datasets as well as on three real-life datasets used in this work. Moreover, after retrieving groups of co-expressed and co-regulated genes over a subset of samples and across a subset of time points from a microarray gene expression dataset of hiPSC-derived cardiomyocyte differentiation, using the singular value decomposition method we have detected tricluster key genes most of which have already been shown or inferred to play instrumental roles in cardiac development. Thus, the other identified key genes can be hypothesized to be meaningful in this context as well, which needs to be experimentally validated. Furthermore, the enriched biological processes for the identified key genes of each tricluster not only resulted in a set of biological processes, associated with stem cell differentiation into cardiomyocytes but also a set of metabolic processes, the majority of which are known to play crucial roles in preventing cardiac diseases. Thus, the identified metabolic processes can be used to provide insights into potential therapeutic strategies to the treatment of cardiovascular diseases. Moreover, the triclusters for which the identified key genes are found to be involved in heart development might be facilitative to unravel regulatory mechanisms during different stages of cardiomyocyte development.
Declarations
Acknowledgements
The authors are thankful to Torsten Schoeps for providing computer resources. AB gratefully acknowledges the financial support from Erasmus Mundus External Cooperation Window, Lot 13 - India - EURINDIA project and ExiTox project funded by German Federal Ministry of Education and Research (BMBF) (Grant no.- 031 A269C). AM acknowledges the support received from the German Academic Exchange Service (DAAD) scholarship for research stay at University of Goettingen. All authors are thankful to Goettingen University for the open access publication fund.
Authors’ Affiliations
References
- Saris CG, Horvath S, Vught van PW, Es van MA, Blauw HM, Fuller TF, et al. Weighted gene co-expression network analysis of the peripheral blood from amyotrophic lateral sclerosis patients. BMC Genomics. 2009; 10:405. 10.1186/1471-2164-10-405.View ArticlePubMedPubMed CentralGoogle Scholar
- Min JL, Nicholson G, Halgrimsdottir I, Almstrup K, Petri A, Barrett A, et al. Coexpression network analysis in abdominal and gluteal adipose tissue reveals regulatory genetic loci for metabolic syndrome and related phenotypes. PLoS Genet. 2012; 8:e1002505. 10.1371/journal.pgen.1002505.View ArticlePubMedPubMed CentralGoogle Scholar
- deJong S, Boks MP, Fuller TF, Strengman E, Janson E, Kovel de CG, et al. A gene co-expression network in whole blood of schizophrenia patients is independent of antipsychotic-use and enriched for brain-expressed genes. PLoS One. 2012; 7:e39498. 10.1371/journal.pone.0039498.View ArticleGoogle Scholar
- Milligan GW, Cooper MC. Methodology Review: clustering Methods. Appl Psychol Meas. 1987; 11:329–54. 10.1177/014662168701100401.View ArticleGoogle Scholar
- Eren K, Deveci M, Kuecuektunc O, Catalyuerek UV. A comparative analysis of biclustering algorithms for gene expression data. Brief Bioinform. 2012; 14:279–92. 10.1093/bib/bbs032.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhao L, Zaki MJ. triCluster: An Effective Algorithm for Mining Coherent Clusters in 3D Microarray Data. In: Proc. of the 2005 ACM SIGMOD International Conference on Management of Data. New York: ACM Press: 2005. p. 694–705.Google Scholar
- Tchagang AB, Phan S, Famili F, Shearer H, Fobert P, Huang Y, et al. Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm. BMC Bioinformatics. 2012; 13:54. 10.1186/1471-2105-13-54.View ArticlePubMedPubMed CentralGoogle Scholar
- Bhar A, Haubrock M, Mukhopadhyay A, Maulik U, Bandyopadhyay S, Wingender E. δ-TRIMAX: Extracting triclusters and analysing coregulation in time series gene expression data In: Raphael B, Tang J, editors. Algorithms in Bioinformatics, 12th International Workshop, WABI 2012, Ljubljana, Slovenia, September 10-12, 2012. Berlin Heidelberg: Springer: 2012. p. 165–77. LNBI 7534.Google Scholar
- Bhar A, Haubrock M, Mukhopadhyay A, Maulik U, Bandyopadhyay S, Wingender E. Coexpression and coregulation analysis of time-series gene expression data in estrogen-induced breast cancer cell. Algorithms Mol Biol. 2013; 8:9. 10.1186/1748-7188-8-9.View ArticlePubMedPubMed CentralGoogle Scholar
- Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P. Coexpression analysis of human genes across many microarray data sets. Genome Res. 2004; 14:1085–1094.View ArticlePubMedPubMed CentralGoogle Scholar
- Deb K, Pratap A, Agarwal S, Meyarivan T. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Trans Evol Comput. 2002; 6:182–97. 10.1109/4235.996017.View ArticleGoogle Scholar
- Babiarz JE, Ravon M, Sridhar S, Ravindran P, Swanson B, Bitter H, et al. Determination of the Human Cardiomyocyte mRNA and miRNA differentiation network by fine-scale profiling. Stem Cells Dev. 2012; 21:1956–1965. 10.1089/scd.2011.0357.View ArticlePubMedGoogle Scholar
- Hsu YC, Lee DC, Chiu IM. Neural stem cells, neural progenitors, and neurotrophic factors. Cell Transplant. 2007; 16:133–50.PubMedGoogle Scholar
- Maulik U, Mukhopadhyay A, Bhattacharyya M, Kaderali L, Brors B, Bandyopadhyay S, et al. Mining Quasi-Bicliques from HIV-1–human protein interaction network: a multiobjective biclustering approach. IEEE/ACM Trans Comput Biol Bioinform. 2013; 10:423–35. 10.1109/TCBB.2012.139.View ArticlePubMedGoogle Scholar
- Spearman C. The proof and measurement of association between two things. Am J Psychol. 1987; 100:441–71.View ArticlePubMedGoogle Scholar
- Fogel DB, Atmar JW. Comparing Genetic Operators with Gaussian Mutations in Simulated Evolutionary Processes Using Linear Systems. Biol Cybernet. 1990; 63:111–4.View ArticleGoogle Scholar
- Langfelder P, Horvath S. Eigengene networks for studying the relationships between co-expression modules. BMC Syst Biol. 2007; 1:54.View ArticlePubMedPubMed CentralGoogle Scholar
- Falcon S, Gentleman R. Using GOstats to test gene lists for GO term association. Bioinformatics. 2007; 23:257–8.View ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B. 1995; 57:289–300. 10.2307/2346101.Google Scholar
- Wingender E, Chen X, Fricke E, Geffers R, Hehl R, Liebich I, et al. The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 2001; 29:281–3.View ArticlePubMedPubMed CentralGoogle Scholar
- Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, et al. Systematic discovery of regulatory motifs in human promoters and 3â ^{TM} utrs by comparison of several mammals. Nature. 2005; 434:338–45.View ArticlePubMedPubMed CentralGoogle Scholar
- Boyle EI, Weng S, Gollub J, Jin H, Botstein D, Cherry JM, et al. GO::TermFinder-open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics. 2004; 20:3710–715.View ArticlePubMedPubMed CentralGoogle Scholar
- Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Stat. 2001; 29:1165–88.View ArticleGoogle Scholar
- Carroll JS, Meyer CA, Song J, Li W, Geistlinger TR, Eeckhoute J, et al. Genome-wide analysis of estrogen receptor binding sites. Nat Genet. 2006; 38:1289–97.View ArticlePubMedGoogle Scholar
- Hecker M, Thamilarasan M, Koczan D, Schroeder I, Flechtner K, Freiesleben S, et al. MicroRNA expression changes during interferon-beta treatment in the peripheral blood of multiple sclerosis patients. Int J Mol Sci. 2013; 14:16087–110. 10.3390/ijms140816087.View ArticlePubMedPubMed CentralGoogle Scholar
- Cheng Y, Church GM. Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol. 2000; 8:93–103.PubMedGoogle Scholar
- Maulik U, Bandyopadhyay S, Mukhopadhyay A. Multiobjective fuzzy biclustering in microarray data: method and a new performance measure. In: Evolutionary Computation, 2008. CEC 2008: 2008. p. 1536–1543. 10.1109/CEC.2008.4630996.
- Chen L, Wang H, Zhang L, Li W, Wang Q, Shang Y, et al. Uncovering packaging features of co-regulated modules based on human protein interaction and transcriptional regulatory networks. BMC Bioinformatics. 2010; 11:392. 10.1186/1471-2105-11-392.View ArticlePubMedPubMed CentralGoogle Scholar
- Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat. 1947; 18:50–60. 10.1214/aoms/1177730491.View ArticleGoogle Scholar
- Schaefer EJ, Asztalos BF. Cholesteryl ester transfer protein inhibition, high-density lipoprotein metabolism and heart disease risk reduction. Curr Opin Lipidol. 2006; 17:394–8.View ArticlePubMedGoogle Scholar
- Harmon HJ, Sanborn MR. Effect of naphthalene on respiration in heart mitochondria and intact cultured cells. Environ Res. 1982; 29:160–73.View ArticlePubMedGoogle Scholar
- Deussen A, Lloyd HG, Schrader J. Contribution of S-adenosylhomocysteine to cardiac adenosine formation. J Mol Cell Cardiol. 1989; 21:773–82.View ArticlePubMedGoogle Scholar
- Tian R, Ingwall JS. How does folic acid cure heart attacks?. Circulation. 2008; 117:1772–4. 10.1161/CIRCULATIONAHA.108.766105.View ArticlePubMedGoogle Scholar
- Herrmann H, Kram D. Incorporation of fucose in the intact heart and dissociated heart cells of the chick embryo. Exp Cell Res. 1977; 107:455–6.View ArticlePubMedGoogle Scholar
- Tagliavini S, Genedani S, Bertolini A, Bazzani C. Ischemia- and reperfusion-induced arrhythmias are prevented by putrescine. Eur J Pharmacol. 1991; 194:7–10.View ArticlePubMedGoogle Scholar
- Pegg AE, Hibasami H. Polyamine metabolism during cardiac hypertrophy. Am J Physiol. 1980; 239:372–8.Google Scholar
- Lopaschuk GD, Barr RL. Measurements of fatty acid and carbohydrate metabolism in the isolated working rat heart. Mol Cell Biochem. 1997; 172:137–47.View ArticlePubMedGoogle Scholar
- Krishnamurthy M, Selvaraju M, Tamilarasan M. Turbinaria conoides (J. Agardh) sulfated polysaccharide protects rat’s heart against myocardial injury. Int J Biol Macromol. 2012; 50:1275–9. 10.1016/j.ijbiomac.2012.03.012.View ArticlePubMedGoogle Scholar
- Schaefer S, Ramasamy R. Glycogen utilization and ischemic injury in the isolated rat heart. Cardiovasc Res. 1997; 35:90–8.View ArticlePubMedGoogle Scholar
- Qiu H, Liu JY, Wei D, Li N, Yamoah EN, Hammock BD, et al. Cardiac-generated prostanoids mediate cardiac myocyte apoptosis after myocardial ischaemia. Cardiovasc Res. 2012; 95:336–45. 10.1093/cvr/cvs191.View ArticlePubMedPubMed CentralGoogle Scholar
- Nebigil CG, Maroteaux L. A novel role for serotonin in heart. Trends Cardiovasc Med. 2001; 11:329–35.View ArticlePubMedGoogle Scholar
- Cole AG, Meinertzhagen IA. The central nervous system of the ascidian larva: mitotic history of cells forming the neural tube in late embryonic Ciona intestinalis. Dev Biol. 2004; 271:239–62.View ArticlePubMedGoogle Scholar
- Murashov AK, Pak ES, Katwa LC. Parallel development of cardiomyocytes and neurons in embryonic stem cell culture. Biochem Biophys Res Commun. 2005; 332:653–6.View ArticlePubMedGoogle Scholar
- Christoforou N, Liau B, Chakraborty S, Chellapan M, Bursac N, Leong KW. Induced pluripotent stem cell-derived cardiac progenitors differentiate to cardiomyocytes and form biosynthetic tissues. PLoS One. 2013; 8:e65963. 10.1371/journal.pone.0065963.View ArticlePubMedPubMed CentralGoogle Scholar
- Lian X, Hsiao C, Wilson G, Zhu K, Hazeltine LB, Azarin SM, et al. Robust cardiomyocyte differentiation from human pluripotent stem cells via temporal modulation of canonical Wnt signaling. Proc Natl Acad Sci U S A. 2012; 109:1848–57. 10.1073/pnas.1200250109.View ArticleGoogle Scholar
- Otsuji TG, Kurose Y, Suemori H, Tada M, Nakatsuji N. Dynamic link between histone H3 acetylation and an increase in the functional characteristics of human ESC/iPSC-derived cardiomyocytes. PLos One. 2012; 7:e45010. 10.1371/journal.pone.0045010.View ArticlePubMedPubMed CentralGoogle Scholar
- Heallen T, Morikawa Y, Leach J, Tao G, Willerson JT, Johnson RL, et al. Hippo signaling impedes adult heart regeneration. Development. 2013; 140:4683–690. 10.1242/dev.102798.View ArticlePubMedPubMed CentralGoogle Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.