Skip to main content

Mining differential top-k co-expression patterns from time course comparative gene expression datasets



Frequent pattern mining analysis applied on microarray dataset appears to be a promising strategy for identifying relationships between gene expression levels. Unfortunately, too many itemsets (co-expressed genes) are identified by this analysis method since it does not consider the importance of each gene within biological processes to a cellular response and does not take into account temporal properties under biological treatment-control matched conditions in a microarray dataset.


We propose a method termed TIIM (Top-k Impactful Itemsets Miner), which only requires specifying a user-defined number k to explore the top k itemsets with the most significantly differentially co-expressed genes between 2 conditions in a time course. To give genes different weights, a table with impact degrees for each gene was constructed based on the number of neighboring genes that are differently expressed in the dataset within gene regulatory networks. Finally, the resulting top-k impactful itemsets were manually evaluated using previous literature and analyzed by a Gene Ontology enrichment method.


In this study, the proposed method was evaluated in 2 publicly available time course microarray datasets with 2 different experimental conditions. Both datasets identified potential itemsets with co-expressed genes evaluated from the literature and showed higher accuracies compared to the 2 corresponding control methods: i) performing TIIM without considering the gene expression differentiation between 2 different experimental conditions and impact degrees, and ii) performing TIIM with a constant impact degree for each gene. Our proposed method found that several new gene regulations involved in these itemsets were useful for biologists and provided further insights into the mechanisms underpinning biological processes. The Java source code and other related materials used in this study are available at “”.


Identification of relationships between gene regulatory events is one of the main methods through which the biological effects of stimuli or changes in the environment are revealed. Microarrays are a highly efficient way to simultaneously measure the expression of massive numbers of genes. In these respects, multiple microarrays could be further used to quantify the expression of each gene during time course experiments. However, analysis and proper presentation of biological insights into these large-scale datasets is a big challenge.

Currently, frequent pattern-based mining analysis is widely used to identify groups of genes that are frequently co-expressed in most biological conditions in a microarray dataset. These methods include using the apriori algorithm [1], half-spaces [2], relational-based analysis [3], gene annotation integrated method [4], row enumeration-based method [5], column enumeration-based method [6], temporal-based method [7], rule induction [8], and FP-tree algorithm [9]. A gene itemset {gene x↑, gene y↓, gene z↑} states that upregulation of gene x, downregulation of gene y and upregulation of gene z frequently occur at the same time. Support is defined as the proportion of transactions in the data set that contain the itemset. Only gene itemsets with their support values no less than a user-set minimum support threshold can be defined as frequent patterns. Therefore, a gene itemset with a greater support value could have a high probability of becoming an interactome within a biological process. Although methods for traditional frequent pattern-based mining have been successfully proposed in previously published studies, these methods give the same weight to each gene during the execution process. In other words, these methods assume all genes have similar importance, which is often not in true in actual applications. Based on these challenges, some preceding studies on utility mining [10-17] have become predominant topics for solving these problems in the field of data mining.

The utility value of an itemset is the summation of each item quantity multiplied by its matched weight/importance in the co-expression transactions. An itemset is called a high utility itemset as long as its utility value is not less than a user-specified minimum utility threshold. However, traditional methods [10-17] for mining such high utility itemsets could not ensure that the items contained in a high utility itemset individually possess high utility values, since a longer itemset containing more items would have a higher utility value than shorter itemsets. To tackle this problem, a newer algorithm for mining average utility itemsets [18, 19] was proposed; the discovered utility itemsets would be normalized with the number of items within the itemset. The resulting itemsets would be preserved, namely high average utility itemsets, as long as theirs utility values were not less than a user-specified minimum average utility. To the best of our knowledge, all of the above-mentioned methods could not be used to explore significantly differential itemsets between 2 different experimental conditions, e.g., biological treatment versus control, in time course datasets. How to choose those thresholds is also a big challenge; too many unpromising itemsets would be identified due to a lower threshold, whereas a strict threshold would yield few itemsets.

In this study, we propose a method called TIIM (Top-k Impactful Itemsets Miner) to identify the top-k impactful itemsets from time course comparative gene expression datasets. The proposed method only requires specifying a user-desired number k to explore the k most significantly differential gene itemsets between 2 experimental conditions on a microarray dataset. For each gene, the summation of frequencies at the same time point was defined as the quantity, and the number of neighboring genes that were differentially expressed in the dataset on the gene regulatory network (GRN) was defined as the impact degree, i.e., the importance of each gene. According to the quantity and impact degree, the impactful itemsets with most significant changes in gene expression can be efficiently explored. An impactful itemset considered more than just the node degrees (i.e., number of neighboring genes in the GRN) of each gene contained in the itemset. First, the quantity (transformed from the gene expression values) of each gene contained in an itemset was used as an important reference to calculate the impactful value of the itemset. Second, only the number (impact degree) of significant neighboring genes that were differently expressed between 2 comparative conditions in the microarray dataset on the GRN was calculated. Therefore, well-studied genes may not dominate others in terms of impact degree, i.e., well-studied genes may not always have higher impact degrees even if they had more neighboring genes in the GRN. Two baseline methods were considered as follows: i) performing TIIM without considering the deviations between gene expression levels of 2 experimental conditions and the impact degree, and ii) performing TIIM with a constant impact degree for each gene, and our proposed TIIM was performed on 2 real datasets from human and mouse microarrays.

The remainder of this paper is organized as follows: “Methods” provides some problem definitions and presents the proposed method. “Results and discussion” consists of the application of the approach to 2 real datasets in order to study the significance of the discovered impactful itemsets. Finally, we present conclusions based on our findings.


In this section, we first describe the TIIM (Top-k Impactful Itemsets Miner) algorithm as shown in Figure 1. Before we utilize the TIIM algorithm to discover the top-k impactful itemsets, microarray and GRN datasets must undergo a transformation process. Thereafter, research problem is defined. Finally, Section “TIIM” shows the proposed TIIM algorithm in detail.

Figure 1

A flowchart of TIIM for discovering differential top-k impactful itemsets.

Gene expression data transformation

Conversion of gene expression into a transaction format

As with the frequent pattern-based method of gene expression data analysis proposed by Creighton and Hanash in 2003, each expression value in the dataset was transformed as up (↑; expressed; readings are greater than 0.2 for the log base 10 of fold-change 1.58 as an upper-bound), down (↓; repressed; readings are lower than −0.2 for the log base 10 of fold-change −1.58 as a lower-bound), or normal (neither expressed nor repressed). Based on most previous microarray analysis studies, the threshold value was set at a reasonable range (fold change from 1.5 to 2.0) to identify differentially expressed probes. Only the gene expression values transformed as up or down were preserved into the transaction dataset. Each time point was recognized as a transaction. After the process of transformation, gene x↑ (denoted as G x ↑) and gene x ↓ (denoted as G x ↓) were defined as 2 different gene items. An example is shown in Figure 2. Suppose that we have a dataset in which a detection of 5 genes (G 1 to G 5 ) at 4 time points is performed on 6 samples (triplicate for 2 conditions). For the first condition (Condition 1) dataset, G 5 in sample 1 and time point 1 (TP 1 ) was transformed into G5↓ in transaction 1 (T 1 ) since its expression level was less than −0.2.

Figure 2

Example of transforming gene expression data into the transaction data format.

Transaction value integration step

In this step, for each gene item, the transformed values over repeated samples in the same transaction and condition were summed up as the quantity. Higher quantities represented stronger consistency and greater confidence in the gene items. Figure 3 gives an illustration of the transformed item value integration process. In the Condition 1 dataset of the above example, the quantity of G 5 ↓ in T 1 was 3.

Figure 3

Example of integrating transformed gene item values over repeated samples.

Transaction value differentiation step

The purpose of this study was to discover significantly differentially expressed gene itemsets between 2 different conditions. Here, we show how quantity differences for each gene item in every transaction were calculated for the 2 conditions. An example is shown in Figure 4. In this example, the quantities of G 4 ↑ in T 4 for the 2 conditions were 3 and 1, respectively. Since G 4 ↑ had a greater quantity in Condition 1, the quantity difference 2 was assigned to G 4 ↑ in T4 of Condition 1, and the quantity 0 was assigned to G 4 ↑ in T 4 of Condition 2. In contrast, since the gene item G 5 ↓ in T 4 shown in Condition 2 had a greater quantity (3) than the matched gene in Condition 1 (0), the quantity difference 3 was predominant in Condition 2.

Figure 4

Example of identifying differential gene items.

Building the impact degreetable

To generate a table with various weights of each gene, the number (impact degree) of significant neighboring genes that were differently expressed in the microarray dataset of the GRN was calculated. In Figure 5A, each node within the GRN represents a gene. If there is biological regulation between 2 genes, they are linked together. Student’s t-tests were performed to examine the expression of each gene for each of the 2 conditions at each time point. A gene was defined as a significant gene if it exhibited a significant change in expression (p-value < 0.05) between 2 conditions at any time point. In Figure 5A, 4 significant genes, i.e., G 1 , G 3 , G 4 , and G 5 , are shown; G 2 was not a significant gene and is therefore presented as a dotted node. According to the aforementioned definition, G 1 had 3 significant neighboring genes (G 3 , G 4 and G 5 ) on the GRN, and therefore the impact degree of G 1 was defined as 3 in the impact degree table of this dataset (Figure 5B). In this study, the impact degree of Gx represented the impact degree of both G x ↑ and G x ↓ gene items.

Figure 5

Generation of an impact degree table.

Basic definitions

Give a finite set of gene items I = {i 1 , i 2 , …, i m }. Each gene item i x (1 ≤ x ≤ m) has an unique impact degree d(i x ). A gene itemset S is a set of l distinct gene items, namely l-itemset; l is the length of S, denoted as l s . A gene transaction database was defined as D = {T 1 , T 2 , …, T n }. Each gene item i x in the transaction T y (1 ≤ y ≤ n) is associated with a unique quantity q(i x , T y ).

Definition 1

The impactful value of a gene item i x in D is denoted as i(i x ) and defined as shown in Formula 1. For example, according to the gene transaction table of Condition 2 in Figure 4 and the impact degree table in Figure 5B, the impactful value of gene item G 5 ↑ is i(G 5 ↑) = (d(G 5 ↑) × q(G 5 ↑,T 1 )) + (d(G 5 ↑) × q(G 5 ↑,T 2 )) = 1 × 1 + 1 × 2 = 3.

i i x = i x T y Ty D i i x , T y = i x T y T y D d i x × q i x , T y

Definition 2

The impactful value of a gene itemset S in D is denoted as i(S) and defined as shown in Formula 2. The algorithm computes the impactful value of S and only considers transactions that contain S. For example, according to the gene transaction table of Condition 2 in Figure 4 and the impact degree table in Figure 5B, the impactful value of gene itemset {G 1 ↑, G 5 ↑} is i({G 1 ↑, G 5 ↑}) = (1 / l S ) × (d(G 1 ↑) × q(G 1 ↑,T 1 ) + d(G 5 ↑) × q(G 5 ↑,T 1 )) = (1 / 2) × (3 × 2 + 1 × 1) = 3.5.

i S = S T y T y D i S , T y = 1 l S × i x S S T y T y D d i x × q i x , T y

Definition 3

The top-k is the user-defined number of impactful gene itemsets. A gene itemset S is a top-k impactful itemset if l s is greater than 1 and there are no more than k - 1 gene itemsets whose impactful values are greater than S.

Definition 4

The appearance pattern of a gene item i x is a vector A x  = {e 1 , e 2 , …, e n } to record the presence or absence of each transaction T y (1 ≤ y ≤ n) in D. The element e y (1 ≤ y ≤ n) in A x is recorded as 1 when q(i x , Ty) > 0; otherwise, it is recorded as 0.

Property 1

If the impactful value of a gene l-itemset S is greater than the smallest impactful value of the top-k impactful itemsets, it has at least a gene l-1-itemset contained in S within the top-k impactful itemsets.


In this study, we propose the TIIM algorithm to identify impactful gene co-expression patterns through the TIIM algorithm from gene expression datasets. The main process of the algorithm is described as follows:

After the above processes, the algorithm can generate the top-k impactful itemsets. The TIIM contains 3 subroutines: checking 2-itemset in each cluster, checking 2-itemset between clusters, and checking l-itemsets functions.

In this checking 2-itemset in each cluster function, all of the generated gene 2-itemsets can be contained in the same transactions. In other words, in the impactful value computation process, the verification of the generated gene 2-itemset contained in certain transactions is not required. Besides, by sorting gene items within one cluster, each gene item from top to bottom will be examined, whether its impactful values are larger than any one of the current top-k impactful itemsets. The redundant tests are eliminated by steps 10 to 13 to save a lot of time.

In the above function, according to Property 1, we only verify the impactful value of gene l-itemset S in which a gene l-1-itemset S v in the l-1 in top-k list combines with a new item i u from step 4 to 7.

As stated in section “TIIM”, the TIIM algorithm is more efficient since the verification of ineligible gene itemsets is not required.

Results and discussion

To evaluate the performance of our proposed method, we compared it with 2 control methods as baselines. With respect to computational design, the meaning of our TIIM-derived patterns was different from that of traditional frequent pattern mining algorithms. Traditional frequent patterns cannot be transformed through any post processing. For each individual experimental condition, the co-expression genes could be discovered by traditional frequent pattern mining algorithms, but TIIM was proposed to identify the differential co-expression of genes between 2 comparative conditions (e.g., wild-type and mutant samples) during a time period. Therefore, it may not proper to compare these results in this study. On the other hand, technically, there were 2 ways to enforce performing such comparisons regardless of the meaning of patterns: i) compare patterns identified by different methods by tuning their optimal parameter values or ii) using the same parameter values among the comparative methods. The former is hard to perform due to the characteristics stated in the previous paragraph. The latter is also not feasible since there were no common parameters between our proposed TIIM and traditional frequent pattern mining methods. In spite of the limitations, we designed additional control methods, termed “Undifferentiation” and “Constant degree”. The former was similar to traditional frequent pattern algorithms and did not consider the “Transaction value differentiation step” shown in Figure 4 and the impact degree shown in Figure 5. The latter was used to assign a constant degree “1” to each gene if they were given non-zero impact degrees in the impact degree table.

In the first section below (“Dataset”), we provide a brief introduction to the gene regulatory data and 2 gene expression datasets for humans and mice. In the second section, titled “Evaluation with literature”, we present the evaluation results in which the identified top-50 impactful itemsets were manually evaluated using a survey of biological literature. Finally, in the section “GO enrichment analysis”, we made an attempt to analyze the biological characteristics of interesting genes derived from the top 50 to 200 gene itemsets to show that the itemset-contained genes correlated very well with the data from the original microarray experimental designs.


Gene regulatory data from humans and mice were downloaded from the BioGRID [20] and KEGG [21] databases. The GRN of humans comprised 434 genes that interact with one another via 525 transcriptional regulation interactions. The GRN of mice consisted of 297 genes that interact with one another via 372 transcriptional regulation interactions.

We experimented with the proposed TIIM on 2 large-scale time course microarray datasets used in past studies. For the first data set, Yoshizuka et al. attempted to investigate the key endogenous gene expression profiles of cell cycle arrest in response to a long period of human immunodeficiency virus type 1 (HIV-1) Vpr overexpression [22]. A human gene expression microarray was used to tackle this issue. They compared the expression patterns of 21,794 genes in wild-type Vpr-expressing cells with the expression patterns of the same genes in mutant F72A/R73A-Vpr- or R80A-Vpr-expressing cells over 9 time points, including 0, 1, 2, 4, 6, 8, 12, 16, and 24 hours in 5 duplicate samples. For the second dataset used in this study, Sciuto et al. used a mouse model to design a genomic approach to observe genetic alterations involved in the process of reduction-oxidation in murine pulmonary tissues in response to exposure to carbonyl chloride (phosgene) [23]. Forty Crl:CD-1 (ICR) BR mice were exposed (whole-body) to either air or a concentration × time (c × t) amount of 32 mg/m3 (8 ppm) phosgene for 20 min (640 mg × min/m3). Lung tissue was collected from air- or phosgene-exposed mice at 0.5, 1, 4, 8, 12, 24, 48, and 72 hours post-exposure. Both microarray datasets could be readily retrieved from the Gene Expression Omnibus (GEO) database ( with accession numbers GSE2296 (human) and GSE2565 (mouse).

Evaluation with literature

An increasing number of studies on gene regulatory events have been conducted in response to high genetic associations relevant to most biological outcomes. According to the main focus of our method, since the genes involved in each itemset have a high probability of regulating or interacting with each other, the explored gene itemsets were then disassembled into a length of 2 as relationships in order to verify such regulation or interaction with the literature. In biology, genes involved in these relationships have some biological regulations/interactions that may occur through transcriptional regulation, post-transcriptional RNA processing or post-translational modification. In this regard, a big problem arises as to how many relationships are top priorities to be evaluated. Too many relationships would increase the difficulty of the evaluation process. Therefore, we manually scrutinized and validated numerous relationships between gene regulatory events derived from the top-50 impactful itemsets of the 2 individual datasets taken from the literature. For example, in Additional file 1: Table S6, a human dataset-derived gene itemset {BAX (581, 1181007_1) Up, KAT2B (8850, 1188483_1) Down, and TP53 (7157, 1193761_1) Down} in a length of 3 with an impactful value of 43.33 could be disassembled into 3 relationships of possible biological regulations/interactions, including {BAX (581) Up and KAT2B (8850) Down}, {BAX (581) Up, and TP53 (7157) Down} and {KAT2B (8850) Down and TP53 (7157) Down}. If a disassembled relationship has been reported in previous literature, the serial number of the corresponding paper is shown in the reference column. In contrast, serial numbers with an asterisk represent an opposite association between the relationship and evidence from the literature. In the current example, Zhao et al. demonstrated that depsipeptide caused little or no changes in the binding of human KAT2B protein and human TP53 protein in A549 cells [24]. However, opposite directions of gene expression for BAX and TP53 was not observed in previous studies since such studies have reported the following: human TP53 protein is necessary for activation of human BAX protein, which can be upregulated by human CCND1 protein in MCF7 cells [25]; human TP53 protein increases the expression of human BAX mRNA in Igrov1 cells [26]; and an S121F mutant form of human TP53 protein increases the transcription of the human BAX gene in Saos 2 cells [27]. In addition to these well-identified gene regulatory events, to the best of our knowledge, the regulations/interactions between human BAX and KAT2B have not yet been reported. Statistics for all of the evaluation results for human and mouse datasets are shown in Tables 1 and 2, respectively. Our proposed method, which considered the gene degrees in the GRNs, appeared to have dramatically higher accuracy compared to the 2 corresponding control methods in both datasets. In this regard, the disassembled relationships identified by our proposed method are more likely to be meaningful of the actual biology of the original experimental design. For example, in Table 1, the 70 new gene relationships discovered by our method may play dominant functional roles in the process of cell cycle arrest in response to overexpression of mutant Vpr (F72A/R73A), whereas 53 new genes relationships were significantly expressed under baseline conditions (i.e., wild-type Vpr). For the mouse dataset shown in Table 2, 39 gene relationships were intimately associated with certain redox pathways induced by exposure to the phosgene. However, well-studied genes may not dominantly contribute to high-utility itemsets due to their high neighbor count. We take the well-documented EP300 as an example; although the gene has the second highest degree number (18) in the raw GRN, it was not found in any discovered top-50 itemsets of the human dataset via our proposed TIIM algorithm. Taken together, our data demonstrated that using the proposed method to discover impactful itemsets allowed a great many new potential gene relationships to be efficiently identified. Through a literature evaluation process, these rules also showed higher accuracies compared to the 2 matched control methods. Therefore, the newly identified gene relationships may be valuable for biologists in terms of providing further insights into the mechanisms of time-dependent changes in gene expression.

Table 1 Evaluation of disassembled relations in human datasets from the literature
Table 2 Evaluation of disassembled relations in mouse datasets from the literature

GO enrichment analysis

In the previous section, although the identified gene relationships were manually evaluated with the literature, proving that the unreported/unpublished relationships derived from different top-k itemsets are reliable remained a challenge. In this section, we attempted to analyze all of the gene relationships shown in Tables 3 and 4 derived from the top 50 to 200 itemsets of each comparison for both species.

Table 3 Statistics for different impactful itemsets in human datasets
Table 4 Statistics for different impactful itemsets in mouse datasets

Gene Ontology (GO) is useful for analyzing the biological characteristics of a set of genes, including biological processes, cellular components, and molecular functions [28]. To test the enrichment of cell cycle-related terms for the genes identified in the human dataset and the redox-related terms for the genes identified in the mouse dataset, all of the lists of gene relationships were separately uploaded to the DAVID bioinformatics analysis tool [29] and analyzed with a web-based functional annotation tool. For each uploaded gene list, we selected and examined the p-value of the terms associated with the original experimental results for all items categorized as “GOTERM_BP_4”. To make a statistically significant comparison for each GO term in the different experimental conditions, we only focused on the GO terms that had at least one p-value less than 0.05. The original paper had already demonstrated that the human cell cycle could be altered by the HIV-1 Vpr protein. Hence, we tested whether our identified results were associated with the cell cycle. Genes involved in top-k gene itemsets that were identified by considering various gene degrees had overall dramatically high cell cycle enrichments compared to the matched control experiments with 2 baseline methods in both comparisons including wild-type versus F72A/R73A and wild-type versus R80A (Figures 6 and 7). For the mouse dataset, the experiment results of the original paper indicated that redox pathways played functional roles in response to exposure to the phosgene. However, we could not obtain any of the eligible GO terms. This may have been due to the following factors: (i) insufficient gene regulation in the GRN: additional gene regulatory events have now been identified and further deposited in the online databases; (ii) probe reading noise is error-prone and inherent to the microarray-based measurement of gene expression [30]; and (iii) GO term enrichment analysis is more suitable for analyzing a set of distinct genes instead of the relationships between certain genes. In spite of these limitations, we still proved that our identified gene relationships exhibited good performance in both datasets via the literature survey illustrated in the previous section. The proposed method also could be applied to other topics of interest.

Figure 6

GO enrichment analysis of wild-type and F72A/R73A mutant Vpr protein for the human dataset. GO1: GO:0006915 ~ apoptosis; GO2: GO:0043066 ~ negative regulation of apoptosis; GO3: GO:0042127 ~ regulation of cell proliferation; GO4: GO:0008284 ~ positive regulation of cell proliferation; GO5: GO:0007050 ~ cell cycle arrest; GO6: GO:0007346 ~ regulation of mitotic cell cycle; GO7: GO:0051726 ~ regulation of cell cycle.

Figure 7

GO enrichment analysis of wild-type and R80A mutant Vpr protein for the human dataset. GO1: GO:0043066 ~ negative regulation of apoptosis; GO2: GO:0006915 ~ apoptosis; GO3: GO:0042127 ~ regulation of cell proliferation; GO4: GO:0008285 ~ negative regulation of cell proliferation; GO5: GO:0008284 ~ positive regulation of cell proliferation; GO6: GO:0007050 ~ cell cycle arrest; GO7: GO:0045786 ~ negative regulation of cell cycle; GO8: GO:0007346 ~ regulation of mitotic cell cycle; GO9: GO:0051726 ~ regulation of cell cycle.


In this study, we proposed the TIIM algorithm to discover top-k impactful itemsets with stronger meanings in biology from 2 gene expression datasets to address the flaws of previous frequent pattern analysis methods. Our method adopted a top-k strategy by introducing a user-selected k to avoid producing redundant insignificant itemsets (below the top-k itemset). During the evaluation process, the gene relationships derived from the top-50 gene itemsets were manually verified with previous literature. This yielded higher accuracies in both microarray datasets compared to the 2 baseline methods. Moreover, GO term enrichment analysis also showed that our identified genes correlated very well with the original literature identifications. These good performance results may be attributed to the consideration of the number of differential neighboring genes in GRNs that could be easily retrieved from the inherent biological involvement of each array-involved gene. Our proposed method is therefore an effective means to provide biologists with further insights into the relationships of gene regulatory events and interactions in certain biological processes.


  1. 1.

    Creighton C, Hanash S: Mining gene expression databases for association rules. Bioinformatics. 2003, 19 (1): 79-86. 10.1093/bioinformatics/19.1.79.

    Article  CAS  PubMed  Google Scholar 

  2. 2.

    Georgii E, Richter L, Ruckert U, Kramer S: Analyzing microarray data using quantitative association rules. Bioinformatics. 2005, 21 (Suppl 2): ii123-ii129. 10.1093/bioinformatics/bti1121.

    Article  CAS  PubMed  Google Scholar 

  3. 3.

    Liu YC, Cheng CP, Tseng VS: Discovering relational-based association rules with multiple minimum supports on microarray datasets. Bioinformatics. 2011, 27 (22): 3142-3148. 10.1093/bioinformatics/btr526.

    Article  CAS  PubMed  Google Scholar 

  4. 4.

    Martinez R, Pasquier N, Pasquier C: GenMiner: mining non-redundant association rules from integrated gene expression data and annotations. Bioinformatics. 2008, 24 (22): 2643-2644. 10.1093/bioinformatics/btn490.

    Article  CAS  PubMed  Google Scholar 

  5. 5.

    McIntosh T, Chawla S: High confidence rule mining for microarray analysis. IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM. 2007, 4 (4): 611-623.

    Article  CAS  PubMed  Google Scholar 

  6. 6.

    Carmona-Saez P, Chagoyen M, Rodriguez A, Trelles O, Carazo JM, Pascual-Montano A: Integrated analysis of gene expression by Association Rules Discovery. BMC Bioinforma. 2006, 7: 54-10.1186/1471-2105-7-54.

    Article  Google Scholar 

  7. 7.

    Nam H, Lee K, Lee D: Identification of temporal association rules from time-series microarray data sets. BMC Bioinforma. 2009, 10 (3): S6-

    Article  Google Scholar 

  8. 8.

    Tran DH, Satou K, Ho TB: Finding microRNA regulatory modules in human genome using rule induction. BMC Bioinforma. 2008, 9 (12): S5-

    Article  Google Scholar 

  9. 9.

    Chen Q, Chen YP: Mining frequent patterns for AMP-activated protein kinase regulation on skeletal muscle. BMC Bioinforma. 2006, 7: 394-10.1186/1471-2105-7-394.

    Article  Google Scholar 

  10. 10.

    Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K: Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans on Knowl and Data Eng. 2009, 21 (12): 1708-1721.

    Article  Google Scholar 

  11. 11.

    Chan R, Yang Q, Shen Y-D: Mining high utility itemsets. Proceedings of the Third IEEE International Conference on Data Mining. 952150. 2003, Melbourne, Florida, USA: IEEE Computer Society, 19-26.

    Google Scholar 

  12. 12.

    Erwin A, Gopalan RP, Achuthan NR: Efficient mining of high utility itemsets from large datasets. Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining. 2008, Osaka, Japan: Springer-Verlag, 554-561. 1786628

    Google Scholar 

  13. 13.

    Li Y-C, Yeh J-S, Chang C-C: Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng. 2008, 64 (1): 198-217. 10.1016/j.datak.2007.06.009.

    Article  Google Scholar 

  14. 14.

    Liu Y, Liao W-k, Choudhary A: A fast high utility itemsets mining algorithm. Proceedings of the 1st international workshop on Utility-based data mining. 2005, Chicago, Illinois: ACM, 90-99. 1089839

    Google Scholar 

  15. 15.

    Tseng VS, Wu C-W, Shie B-E, Yu PS: UP-Growth: an efficient algorithm for high utility itemset mining. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 2010, Washington, DC, USA: ACM, 253-262. 1835839

    Google Scholar 

  16. 16.

    Yao H, Hamilton HJ, Geng L: A unified framework for utility-based measures for mining itemsets. Second International Workshop on Utility-Based Data Mining. 2006, Philadelphia, PA, USA: ACM, 28-37.

    Google Scholar 

  17. 17.

    Yen S-J, Lee Y-S: Mining high utility quantitative association rules. DaWaK. Edited by: Song IY, Eder J, Nguyen TM. 2007, Springer, 283-292. conf/dawak/YenL07

    Google Scholar 

  18. 18.

    Hong T-P, Lee C-H, Wang S-L: Mining high average-utility itemsets. Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics. 2009, San Antonio, TX, USA: IEEE Press, 2526-2530. 1732136

    Google Scholar 

  19. 19.

    Lin C-W, Hong T-P, Lu W-H: Efficiently mining high average utility itemsets with a tree structure. Proceedings of the Second international conference on Intelligent information and database systems: Part I. 2010, Hue, Vietnam: Springer-Verlag, 131-139. 1894770

    Google Scholar 

  20. 20.

    Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006, 34: D535-D539. 10.1093/nar/gkj109. Database issue

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  21. 21.

    Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 1999, 27 (1): 29-34. 10.1093/nar/27.1.29.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  22. 22.

    Yoshizuka N, Yoshizuka-Chadani Y, Krishnan V, Zeichner SL: Human immunodeficiency virus type 1 Vpr-dependent cell cycle arrest through a mitogen-activated protein kinase signal transduction pathway. J Virol. 2005, 79 (17): 11366-11381. 10.1128/JVI.79.17.11366-11381.2005.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  23. 23.

    Sciuto AM, Phillips CS, Orzolek LD, Hege AI, Moran TS, Dillman JF: Genomic analysis of murine pulmonary tissue following carbonyl chloride inhalation. Chem Res Toxicol. 2005, 18 (11): 1654-1660. 10.1021/tx050126f.

    Article  CAS  PubMed  Google Scholar 

  24. 24.

    Zhao Y, Lu S, Wu L, Chai G, Wang H, Chen Y, Sun J, Yu Y, Zhou W, Zheng Q, et al: Acetylation of p53 at lysine 373/382 by the histone deacetylase inhibitor depsipeptide induces expression of p21(Waf1/Cip1). Mol Cell Biol. 2006, 26 (7): 2782-2790. 10.1128/MCB.26.7.2782-2790.2006.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  25. 25.

    Pratt MA, Niu MY: Bcl-2 controls caspase activation following a p53-dependent cyclin D1-induced death signal. J Biol Chem. 2003, 278 (16): 14219-14229. 10.1074/jbc.M209650200.

    Article  CAS  PubMed  Google Scholar 

  26. 26.

    Perego P, Giarola M, Righetti SC, Supino R, Caserini C, Delia D, Pierotti MA, Miyashita T, Reed JC, Zunino F: Association between cisplatin resistance and mutation of p53 gene and reduced bax expression in ovarian carcinoma cell systems. Cancer Res. 1996, 56 (3): 556-562.

    CAS  PubMed  Google Scholar 

  27. 27.

    Menendez D, Inga A, Resnick MA: The biological impact of the human master regulator p53 can be altered by mutations that change the spectrum and expression of its target genes. Mol Cell Biol. 2006, 26 (6): 2297-2308. 10.1128/MCB.26.6.2297-2308.2006.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  28. 28.

    Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics. 2000, 25 (1): 25-29.

    Article  CAS  Google Scholar 

  29. 29.

    Dennis G, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA: DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 2003, 4 (5): P3-10.1186/gb-2003-4-5-p3.

    Article  PubMed  Google Scholar 

  30. 30.

    Febbo PG, Kantoff PW: Noise and bias in microarray analysis of tumor specimens. Journal of clinical oncology: official journal of the American Society of Clinical Oncology. 2006, 24 (23): 3719-3721. 10.1200/JCO.2006.06.7942.

    Article  CAS  Google Scholar 

Download references


This research was partially supported by the National Science Council of Taiwan under grant no. NSC 100-2627-B-006-020 and the Top University Program by the Ministry of Education of Taiwan. We also thank Lukas Horak ( for helping to improve the writing of this paper.

Author information



Corresponding author

Correspondence to Vincent S Tseng.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

YCL and CPC wrote the paper. YCL developed the software and conducted the original experiments. YCL and CPC conceived and designed the experiments. YCL and CPC analyzed the experimental results. VST supervised the whole study. YCL, CPC, and VST read and approved the final manuscript.

Yu-Cheng Liu, Chun-Pei Cheng contributed equally to this work.

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Liu, Y., Cheng, C. & Tseng, V.S. Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC Bioinformatics 14, 230 (2013).

Download citation


  • Gene Regulatory Network
  • Microarray Dataset
  • Gene Item
  • Gene Expression Dataset
  • Phosgene