Extracting biologically significant patterns from short time series gene expression data
- Alain B Tchagang^{1},
- Kevin V Bui^{2},
- Thomas McGinnis^{1} and
- Panayiotis V Benos^{1}Email author
https://doi.org/10.1186/1471-2105-10-255
© Tchagang et al; licensee BioMed Central Ltd. 2009
Received: 27 January 2009
Accepted: 20 August 2009
Published: 20 August 2009
Abstract
Background
Time series gene expression data analysis is used widely to study the dynamics of various cell processes. Most of the time series data available today consist of few time points only, thus making the application of standard clustering techniques difficult.
Results
We developed two new algorithms that are capable of extracting biological patterns from short time point series gene expression data. The two algorithms, ASTRO and MiMeSR, are inspired by the rank order preserving framework and the minimum mean squared residue approach, respectively. However, ASTRO and MiMeSR differ from previous approaches in that they take advantage of the relatively few number of time points in order to reduce the problem from NP-hard to linear. Tested on well-defined short time expression data, we found that our approaches are robust to noise, as well as to random patterns, and that they can correctly detect the temporal expression profile of relevant functional categories. Evaluation of our methods was performed using Gene Ontology (GO) annotations and chromatin immunoprecipitation (ChIP-chip) data.
Conclusion
Our approaches generally outperform both standard clustering algorithms and algorithms designed specifically for clustering of short time series gene expression data. Both algorithms are available at http://www.benoslab.pitt.edu/astro/.
Background
Time series experiments have been widely used to study the dynamic behavior of the cells in a variety of biological processes, including cell proliferation [1], development [2], and response to extracellular stimuli [3, 4]. Time series data can be broadly divided into two classes: the short-time series with few sampled time points (typically 3–8) and long-time series with more than 10 time points sampled. Most algorithms used to analyze time series datasets initially were based on general clustering methods like hierarchical clustering [5], k-means [6], Bayesian networks [7], and self-organizing maps [8]. Although these methods are capable of revealing some biological features, they are not taking into consideration the sequential nature of the time series data. More recently, some groups suggested methodologies specifically designed for clustering time series expression data, including the use of continuous representation of expression profiles [9], hidden Markov models [10], and others [11–14]. However, algorithms such as those developed by Bar-Joseph et al. [9], De Hoon et al. [12] and Peddada et al. [13] perform better on long time series datasets where the statistical power is higher. For short time series data, which represent about 80% of the time series gene expression datasets [15], they are expected to perform less optimal due to data overfitting caused by the small number of sampled time points.
In order to avoid that, some researchers have suggested the use of predefined patterns of expression profiles (either taken directly from the data or from prior biological observations) and matching the observed data to these profiles using some cost function [15–18]. Such approaches usually identify a large number of patterns, but many of them may arise randomly from noise due to the small number of sampled time points. The algorithm proposed by Ernst et al. [15] is capable of partially correcting for this problem with the implementation of heuristics: the user is required to select a set of potential profiles that are expected to represent better the real biological nature of such data. Last but not least, almost all of the approaches mentioned above use a cost function followed by a greedy algorithm to find clusters. As we will show later, such approaches may miss some biologically significant characteristics of the data.
In this paper, we present two new algorithms, ASTRO and MiMeSR, respectively, which are specifically designed to identify biologically relevant clusters of genes from short time series data. ASTRO and MiMeSR are inspired by the order preserving framework and the minimum mean squared residue approach, respectively. Other algorithms have used the same principles in the past, but in the biclustering context [19–21], which makes such algorithms NP hard [21]. We demonstrate the utility of ASTRO and MiMeSR using several well-defined short time datasets. We show that our approaches are robust to noise and random patterns and they can correctly detect the temporal expression profile of relevant functional categories in linear time. Comparative analysis also showed that our approaches outperform both general clustering algorithms and algorithms designed specifically for short time series gene expression data.
Results and Discussion
Robustness to noise
Application on Saccharomyces cerevisiae amino acid starvation dataset
We tested the ability of ASTRO and MiMeSR to identify biologically relevant clusters from short time series data using the yeast amino acid (AA) starvation dataset [3]. Saccharomyces cerevisiae response to stress by AA starvation is measured at time points 0.5 h, 1 h, 2 h, 4 h, and 6 h and at the control (unstimulated) cells (time point 0 h). The data was filtered to remove genes with missing values and genes whose expression level did not change substantially between time points, filtering threshold ε < 2.0 for ASTRO and MiMeSR. The results show that both our approaches can correctly identify the temporal profiles of relevant functional groups. Statistical evaluation of our clusters was performed using external datasets, like the GO categories [23] and AA starvation ChIP-chip data [3, 24]. Compared to general clustering algorithms (e.g., k-means) and algorithms designed specifically for clustering short time series gene expression data [17, 25], our techniques were able to detect more significant patterns.
Evaluation using GO annotations
Evaluation of the clusters identified by ASTRO using Gene Ontology data
ASTRO (OPSM) | ||||
---|---|---|---|---|
Clusters | No. of genes | Top GO term | # of genes in category | p -value |
A1 | 70 | Cellular biosynthetic process | 41 (58.6%) | 7.0e-16 |
Biosynthetic process | 42 (60.0%) | 2.5e-13 | ||
B1 | 86 | Translation | 55 (64.0%) | 1.5e-33 |
Macromolecule biosynthetic process | 55 (64.0%) | 1.5e-27 | ||
C1 | 25 | Ribosome biogenesis and assembly | 14 (56.0%) | 9.8e-15 |
Ribonucleoprotein complex biogenesis and assembly | 14 (56.0%) | 9.6e-10 | ||
D1 | 74 | Ribosome biogenesis and assembly | 44 (59.5%) | 2.6e-34 |
Ribonucleoprotein complex biogenesis and assembly | 44 (59.9%) | 4.6e-31 | ||
E1* | 33 | Sulfur metabolic process | 06 (18.2%) | 9.0e-05 |
Sulfur amino acid metabolic process | 04 (12.1%) | 2.4e-03 | ||
F1* | 26 | Amino acid transport | 04 (15.4%) | 1.4e-03 |
Amine transport | 04 (15.4%) | 3.6e-03 |
Evaluation of the clusters identified by MiMeSR using Gene Ontology data
MiMeSR (MMSR) | ||||
---|---|---|---|---|
Clusters | No. of genes | Top GO term | # of genes in category | p -value |
A2 | 246 | Ribosome biogenesis and assembly | 103 (42.0%) | 2.0e-64 |
Gene expression | 167 (68.0%) | 1.0e-49 | ||
B2* | 66 | Sulfur metabolic process | 13 (20.0%) | 1.2e-12 |
Sulfur amino acid metabolic process | 8 (12.1%) | 4.4e-08 | ||
C2 | 133 | Ribosome biogenesis and assembly | 82 (62.0%) | 1.7e-68 |
Ribonucleoprotein complex biogenesis and assembly | 84 (63.2%) | 5.6e-65 | ||
D2 | 80 | Translation | 62 (77.5%) | 1.5e-46 |
Macromolecule biosynthetic process | 66 (82.5%) | 1.9e-45 | ||
E2 | 109 | Ribosome biogenesis and assembly | 73 (67.0%) | 1.7e-64 |
Ribonucleoprotein complex biogenesis and assembly | 73 (67.0%) | 8.1e-59 | ||
F2 | 60 | Ribosome biogenesis and assembly | 42 (70.0%) | 1.6e-37 |
Ribonucleoprotein complex biogenesis and assembly | 42 (70.0%) | 2.2e-34 | ||
G2 | 94 | Translation | 76 (81.0%) | 3.1e-60 |
Macromolecule biosynthetic process | 77 (82.0%) | 5.2e-53 |
Evaluation using ChIP-chip data
Evaluation of the clusters identified by ASTRO using ChIP-chip data.
ASTRO (OPSM) | ||||||
---|---|---|---|---|---|---|
TFs | A1 | B1 | C1 | D1 | E1 | F1 |
ARO80 | 2/3% (5e-02) | 3/3% (7e-02) | 1/4% (4e-02) | |||
BAS1 | 1/3% (8e-02) | |||||
CBF1 | 5/15% (1e-02) | |||||
CHA4 | 4/4% (9e-03) | |||||
DAL81 | 1/4% (4e-03) | |||||
FHL1 | 25/36% (3e-20) | 43/50% (3e-42) | 3/10% (1e-02) | 19/25% (1e-11) | ||
GCR2 | 3/4% (2e-02) | |||||
GCN4 | 3/9% (7e-02) | |||||
MET31 | 1/3% (7e-02) | |||||
MET32 | 4/12% (6e-12) | |||||
MET4 | 1/3% (7e-02) | |||||
SFP1 | 6/9% (1e-05) | 13/15% (2e-14) | 3/10% (1e-02) | 8/11% (5e-08) |
Evaluation of the clusters identified by MiMeSR using ChIP-chip data.
MiMeSR (MMSR) | |||||||
---|---|---|---|---|---|---|---|
TFs | A2 | B2 | C2 | D2 | E2 | F2 | G2 |
ARO80 | 4/3% (3e-02) | ||||||
BAS1 | |||||||
CBF1 | 10/4% (1e-02) | 9/14% (3e-03) | 3/3% (1e-02) | ||||
CHA4 | |||||||
DAL81 | |||||||
FHL1 | 86/35% (3e-20) | 27/20% (7e-14) | 48/60% (1e-52) | 5/5% (1e-02) | 3/5% (7e-02) | 62/65% (2e-70) | |
GCR2 | 4/5% (5e-03) | ||||||
GCN4 | |||||||
MET31 | |||||||
MET32 | 4/5% (9e-03) | 3/3% (9e-02) | |||||
MET4 | |||||||
SFP1 | 30/12% (1e-08) | 4/3% (1e-02) | 17/21% (1e-21) | 19/20% (1e-20) |
Comparison of the results in Tables 3 and 4 also shows that in general ASTRO finds more clusters that are statistically significantly enriched in genes bound by transcription factors. In other words, ASTRO performs better than MiMeSR in identifying co-regulated genes.
Comparison with other methods
Conclusion
In this study, we presented two new algorithms for analyzing short time series gene expression data: ASTRO that uses the order preserving matrix concept and MiMeSR that uses the minimum mean squared residue concept. Both these algorithms use linear algebra techniques to identify coherent gene clusters over all time points in linear time. This offers a significant advantage over existing methods that employ greedy approaches or heuristics, since ASTRO and MiMeSR avoid problems related to cost functions or the choice of predefined sets of expression profiles [19, 21, 25]. Also, the complexity of our methods is smaller than that of existing algorithms, even when it is compared to those that use a greedy approach to speed up their running time [9–15]. ASTRO identifies all gene clusters with coherent expression patterns, irrespectively of the magnitude of expression change. Genes belonging in such clusters are generally expected to participate in the same biological processes (see, also, Table 1). MiMeSR identifies all gene clusters with coherent expression change both in direction as well as in magnitude. Genes belonging in such clusters are expected to be regulated by the same set of transcription factors (although different transcription factors may act at different time points). In general, MiMeSR identifies clusters that are enriched for genes belonging to the same pathways, while ASTRO identifies clusters with genes that are regulated by the same transcription factors (Tables 1, 2, 3 and Figure 3). In terms of overall statistical significance of the identified clusters, MiMeSR outperforms ASTRO in most cases (Figure 3). In a direct comparison of the two algorithms in simulated data (Figure 1) we found that ASTRO outperforms MiMeSR when fewer data points are available, whereas MiMeSR performs better when more data points are available.
Testing ASTRO &MiMeSR in well-characterized short time series gene expression datasets showed that it is robust to noise and to random patterns, and that it can correctly predict the temporal expression profile of relevant functional categories as confirmed by statistical analysis of GO category membership over-representation and analysis of transcription factor occupancy in the promoters of the gene members of the various clusters. Our approaches are shown to outperform existing clustering algorithms, including the popular k-means, as well as FCV and STEM and they were able to distinguish between closely related but biologically distinct patterns. As expected, ASTRO finds more homogeneous clusters than MiMeSR, with respect to the percentage of genes associated with a given transcription factor. This is because it takes into consideration the magnitude of the change in gene expression, which is more closely related to the transcription factors involved.
In principle, ASTRO and MiMeSR can also be applied to long time series gene expression data (more than 10 time points) or gene expression data sampled over different conditions, but in this case the number of genes in each cluster is expected to be low. However, they can be adapted to identify local patterns, thus overcoming this problem.
Methods
General description of the algorithms
A time series gene expression dataset can be represented by an N × M matrix, A = [a_{ nm }], with rows corresponding to the genes from G = {g_{ 1 },..., g_{ n },..., g_{ N }}, and columns corresponding to the time point measurements from T = {t_{ 1 },..., t_{ m },..., t_{ M }}. The entry a_{ nm }is the expression level of gene n at time point t_{ m }or -simply- m. Given a short time-series gene expression dataset, our goal is to identify the sets of genes with coherent behavior, i.e., genes whose expression levels increase and/or decrease coherently across the time point experiments, by minimizing the effect of noise and random patterns. The input data are pre-processed to identify and remove from the matrix all genes whose expression level remains constant across time points. We consider the expression of a gene to be constant when the difference between the minimum and the maximum value (in log-scale) is less than a positive real number, ε. ε is a user-defined parameter and it can be based on prior knowledge on the expected level of noise on a given experiment.
ASTRO (Rank Order Preserving Matrix framework)
ASTRO seeks to identify sets of genes with similar expression profiles irrespectively of their expression fold-change. Therefore, only the direction of expression change is considered and not its magnitude. This reflects the biological fact that different gene products may be required at different quantities for a given cellular response or function. Under this assumption, the problem of finding a cluster of similarly expressed genes can be caste into a problem of finding an order preserving submatrix ( OPSM ) of the gene expression matrix A. A submatrix C of A is order preserving (OP) if there is a permutation of its columns under which the sequence of values in every row is strictly increasing or decreasing. In other words an OPSM is a set: C = {2-tuples (I,J), I ∈ G and J ∈ T}, such that each row induces the same rank order permutation on the columns. The problem of searching over all possible subsets of columns for identifying the most significant OPSM is NP hard [19]. However, taking advantage of the small number of sampled points in a short time series dataset, one may seek patterns of consistent gene expression over all time points. In such case the order will be required to be preserved in all columns (time-points) for the genes in a cluster (i.e. J = T). In fact, this assumption is now commonly used in analyzing short-time series experiments [9–15]. As we show below, this reduces the complexity of finding OPSMs, which offers an advantage over methods that use probabilistic models and greedy algorithms. ASTRO is guaranteed to find all OPSMs across all time points in O(NM) using linear algebra techniques.
Overview
In this part, we focus on genes with coherent behavior, i.e., genes whose expression levels increase and/or decrease coherently across all time points. ASTRO starts by filtering those genes with constant gene expression across all time points (using a threshold ε). It then proceeds by constructing the rank matrix of the time series gene expression data. Next, it identifies all distinct coherent patterns in the ranked matrix. Finally, it assigns each gene to its corresponding cluster by performing a row comparison between the set of distinct rows of the ranked matrix and the ranked matrix itself. See Additional file 1 for ASTRO pseudo codes.
Rank Matrix Construction
A rank matrix is an N × M matrix, R = [r_{ nm }], in which every row (gene) is a vector of the ranks of the corresponding expression values in A in increasing order. For example, if the expression levels of gene g_{ i }are A_{i*}= (5, 10, 15, 8) then the corresponding row in the rank matrix would be R_{i*}= (1, 3, 4, 2). The ranking is performed in increasing order. If more than two entries have the same value, the user can decide to give them the same ranking or rank them in the order they appear. In this study, we choose the former. By replacing each entry of the gene expression matrix with their rank along the rows, we are no longer considering the expression level of a given gene per se, but its dynamics over all time points. The advantage of this method is its speed and that it avoids the use of probabilistic models, greedy algorithms, or costly column permutations. Also, one will notice that for any k > 1 rows of the ranked matrix that are similar under any permutation of the columns of the gene expression matrix; they will always belong to the same cluster. Finally, the fact that the rank is conserved under any permutation of the columns of the gene expression matrix further reduces the chance that a random pattern might be picked up in a cluster (see, also, the Statistical Significance and Complexity Analysis section).
Pattern Identification
Given a gene expression data matrix, A, the exact number of distinct OP expression profiles that can be found in the dataset (time points t = 1,..., M) is the number of distinct rows, N_{ U }, of its corresponding ranked matrix R. The set of distinct OP patterns, U, can thus be identified by considering the rank matrix R as a set of rows and identify all subsets of identical rows in it. ASTRO is guaranteed to identify the exact number of distinct OP patterns in a given matrix in O(NM).
Identification of Order Preserving Clusters
Once the exact number of distinct OP patterns has been identified, ASTRO assigns each gene to one of the N_{ U }groups by comparing each distinct row U_{k*}of the ranked matrix to the rows R_{n*}of the ranked matrix itself, and assign gene n to cluster G{k} each time U_{k*}= R_{n*}. This approach is guaranteed to identify all OP clusters of size K × M, with K_{ min }≤ K ≤ N, and K_{ min }is the minimum number of genes in a cluster.
Statistical Significance and Complexity Analysis of ASTRO
We use this bound to estimate the significance of any given cluster of size M with K members. The best cluster is the one with the largest statistical significance, i.e., the one with the smallest Z(M,K). Therefore, as long as that upper bound probability is smaller than any desired significance level, the identified cluster in the real gene expression matrix will be statistically significant.
The overall complexity of ASTRO is ~O(NM). Recall that the time series gene expression A is an N × M matrix. The rank matrix can be identified with an O(NM) complexity. The number of distinct OP patterns and the set of distinct OP patterns can be identified with a complexity less than O(N). Finally, clusters can be identified with a complexity less than O(N). In all, the complexity of ASTRO is O(NM) + O(N) + O(N), which is ~O(NM), less that the complexity of existing approaches.
MiMeSR (Minimum Mean Squared Residue)
where c_{ iT }is the mean of the i^{ th }row (expression of gene i over all time points), c_{ Gj }is the mean of the j^{ th }column (expression of all genes at the j time point) and c_{ GT }is the mean of all the elements of C. When C = [c(i,j)] = [a_{ i } + b_{ j }] = [a_{ i }] + [b_{ j }], where [a_{ i }] a matrix with constant values on rows, and [b_{ j }] a matrix with constant values on columns, then it can be shown that the mean squared residue, H(C), of C is zero.
The MiMeSR algorithm that we develop in this study uses this concept to search for submatrices with mean squared residue smaller that a given threshold, δ → 0.
Cheng and Church have shown that when the search extends over all possible subsets of columns, then the solution is NP hard [20]. As we will show below, looking for patterns consistent over all time points (i.e. J = T) reduces the algorithmic complexity to O(NMK), and produces biologically relevant results. MiMeSR uses linear algebra and arithmetic tools to solve the problem, which is advantageous over greedy algorithms or the use of heuristics that were used in the past.
Overview
MiMeSR starts by filtering those genes whose expression levels do not change significantly during the time course (threshold ε). Then, it writes the gene expression matrix A as the sum of matrix Z_{1}, with constant values on columns, and Z_{2} = A - Z_{1}. Finally, it identifies submatrices with constant values on rows in Z_{2}, which correspond to the minimum mean squared residue clusters in the gene expression matrix A. See Additional file 1 for MiMeSR pseudo codes.
Identification of minimum mean squared residue clusters
Statistical Significance and Complexity Analysis of MiMeSR
For practical reasons, it is important to assess the effects of the ε parameter on the clusters that are identified by MiMeSR. This can be done by sensitivity analysis in which the parameter ε is perturbed and the results are compared. For this analysis, it is usually sufficient to consider one or two values above and below the originally selected value of ε. Only clusters that are consistently identified by MiMeSR as ε varies should be retained for further examination. Note that the number of genes in these clusters may also change. The user therefore needs to determine a rule for dealing with genes that may be dropped from the clusters as ε changes. The most conservative approach would be to retain only the genes that remain in the clusters for all values of ε around its selected value. It can be easily shown that the overall complexity of MiMeSR is ~O(NMK), where K is the number of minimum mean squared residue clusters in A. Note that K corresponds to the maximum number of constant columns matrices that can be constructed using the rows of A without identifying redundant clusters.
Declarations
Acknowledgements
This work was supported by NIH grants 1R01LM009657-01, and 2R24RR014214-05 and by NIH-NIAID contract no. NO1 AI-50018. P.V.B. was also supported by NIH grant 1R01LM007994-01.
Authors’ Affiliations
References
- Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 1998, 9(12):3273–3297.PubMed CentralView ArticlePubMedGoogle Scholar
- Arbeitman MN, Furlong EE, Imam F, Johnson E, Null BH, Baker BS, Krasnow MA, Scott MP, Davis RW, White KP: Gene expression during the life cycle of Drosophila melanogaster. Science 2002, 297(5590):2270–2275. 10.1126/science.1072152View ArticlePubMedGoogle Scholar
- Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO: Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell 2000, 11(12):4241–4257.PubMed CentralView ArticlePubMedGoogle Scholar
- Guillemin K, Salama NR, Tompkins LS, Falkow S: Cag pathogenicity island-specific responses of gastric epithelial cells to Helicobacter pylori infection. Proc Natl Acad Sci USA 2002, 99(23):15136–15141. 10.1073/pnas.182558799PubMed CentralView ArticlePubMedGoogle Scholar
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998, 95(25):14863–14868. 10.1073/pnas.95.25.14863PubMed CentralView ArticlePubMedGoogle Scholar
- Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nat Genet 1999, 22(3):281–285. 10.1038/10343View ArticlePubMedGoogle Scholar
- Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. J Comput Biol 2000, 7(3–4):601–620. 10.1089/106652700750050961View ArticlePubMedGoogle Scholar
- Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 1999, 96(6):2907–2912. 10.1073/pnas.96.6.2907PubMed CentralView ArticlePubMedGoogle Scholar
- Bar-Joseph Z, Gerber GK, Gifford DK, Jaakkola TS, Simon I: Continuous representations of time-series gene expression data. J Comput Biol 2003, 10(3–4):341–356. 10.1089/10665270360688057View ArticlePubMedGoogle Scholar
- Schliep A, Schonhuth A, Steinhoff C: Using hidden Markov models to analyze gene expression time course data. Bioinformatics 2003, 19(Suppl 1):i255–263. 10.1093/bioinformatics/btg1036View ArticlePubMedGoogle Scholar
- Ramoni MF, Sebastiani P, Kohane IS: Cluster analysis of gene expression dynamics. Proc Natl Acad Sci USA 2002, 99(14):9121–9126. 10.1073/pnas.132656399PubMed CentralView ArticlePubMedGoogle Scholar
- De Hoon MJ, Imoto S, Miyano S: Statistical analysis of a small set of time-ordered gene expression data using linear splines. Bioinformatics 2002, 18(11):1477–1485. 10.1093/bioinformatics/18.11.1477View ArticlePubMedGoogle Scholar
- Peddada SD, Lobenhofer EK, Li L, Afshari CA, Weinberg CR, Umbach DM: Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference. Bioinformatics 2003, 19(7):834–841. 10.1093/bioinformatics/btg093View ArticlePubMedGoogle Scholar
- Sharan R, Elkon R, Shamir R: Cluster analysis and its applications to gene expression data. Ernst Schering Res Found Workshop 2002, (38):83–108.PubMedGoogle Scholar
- Ernst J, Nau GJ, Bar-Joseph Z: Clustering short time series gene expression data. Bioinformatics 2005, 21(Suppl 1):i159–168. 10.1093/bioinformatics/bti1022View ArticlePubMedGoogle Scholar
- Lu X, Zhang W, Qin ZS, Kwast KE, Liu JS: Statistical resynchronization and Bayesian detection of periodically expressed genes. Nucleic Acids Res 2004, 32(2):447–455. 10.1093/nar/gkh205PubMed CentralView ArticlePubMedGoogle Scholar
- Moller-Levet CS, Cho KH, Wolkenhauer O: Microarray data clustering based on temporal variation: FCV with TSD preclustering. Appl Bioinformatics 2003, 2(1):35–45.PubMedGoogle Scholar
- Zhao LP, Prentice R, Breeden L: Statistical modeling of large microarray data sets to identify stimulus-response profiles. Proc Natl Acad Sci USA 2001, 98(10):5631–5636. 10.1073/pnas.101013198PubMed CentralView ArticlePubMedGoogle Scholar
- Ben-Dor A, Chor B, Karp R, Yakhini Z: Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 2003, 10(3–4):373–384. 10.1089/10665270360688075View ArticlePubMedGoogle Scholar
- Cheng Y, Church GM: Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol 2000, 8: 93–103.PubMedGoogle Scholar
- Madeira SC, Oliveira AL: Biclustering Algorithms for Biological Data Analysis: A Survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2004, 1(1):24–45. 10.1109/TCBB.2004.2View ArticlePubMedGoogle Scholar
- Yeung KY, Ruzzo WL: Principal component analysis for clustering gene expression data. Bioinformatics 2001, 17(9):763–774. 10.1093/bioinformatics/17.9.763View ArticlePubMedGoogle Scholar
- Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, et al.: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 2004, (32 Database):D258–261.Google Scholar
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, et al.: Transcriptional regulatory code of a eukaryotic genome. Nature 2004, 431(7004):99–104. 10.1038/nature02800PubMed CentralView ArticlePubMedGoogle Scholar
- Ernst J, Bar-Joseph Z: STEM: a tool for the analysis of short time series gene expression data. BMC Bioinformatics 2006, 7: 191. 10.1186/1471-2105-7-191PubMed CentralView ArticlePubMedGoogle Scholar
- Ernst J, Vainas O, Harbison CT, Simon I, Bar-Joseph Z: Reconstructing dynamic regulatory maps. Mol Syst Biol 2007, 3: 74. 10.1038/msb4100115PubMed CentralView ArticlePubMedGoogle Scholar
- Jorgensen P, Rupes I, Sharom JR, Schneper L, Broach JR, Tyers M: A dynamic transcriptional network communicates growth potential to ribosome synthesis and critical cell size. Genes Dev 2004, 18(20):2491–2505. 10.1101/gad.1228804PubMed CentralView ArticlePubMedGoogle Scholar
- Blaiseau PL, Isnard AD, Surdin-Kerjan Y, Thomas D: Met31p and Met32p, two related zinc finger proteins, are involved in transcriptional regulation of yeast sulfur amino acid metabolism. Mol Cell Biol 1997, 17(7):3640–3648.PubMed CentralView ArticlePubMedGoogle Scholar
- Dudoit S, Yee Hwa Yang YH, Callow MJ, Speed TP: Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica sinica 2002, 12: 111–139.Google Scholar
- Hassibi A, Vikalo H: A probabilistic model for inherent noise and systematic errors of microarrays. In IEEE International Workshop On Genomic Signal Processing and Statistics: 2005. New Port, Rhode Island, USA; 2005.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.