- Methodology article
- Open Access
Inferring biological functions and associated transcriptional regulators using gene set expression coherence analysis
© Kim et al; licensee BioMed Central Ltd. 2007
- Received: 09 May 2007
- Accepted: 17 November 2007
- Published: 17 November 2007
Gene clustering has been widely used to group genes with similar expression pattern in microarray data analysis. Subsequent enrichment analysis using predefined gene sets can provide clues on which functional themes or regulatory sequence motifs are associated with individual gene clusters. In spite of the potential utility, gene clustering and enrichment analysis have been used in separate platforms, thus, the development of integrative algorithm linking both methods is highly challenging.
In this study, we propose an algorithm for discovery of molecular functions and elucidation of transcriptional logics using two kinds of gene information, functional and regulatory motif gene sets. The algorithm, termed gene set expression coherence analysis first selects functional gene sets with significantly high expression coherences. Those candidate gene sets are further processed into a number of functionally related themes or functional clusters according to the expression similarities. Each functional cluster is then, investigated for the enrichment of transcriptional regulatory motifs using modified gene set enrichment analysis and regulatory motif gene sets. The method was tested for two publicly available expression profiles representing murine myogenesis and erythropoiesis. For respective profiles, our algorithm identified myocyte- and erythrocyte-related molecular functions, along with the putative transcriptional regulators for the corresponding molecular functions.
As an integrative and comprehensive method for the analysis of large-scaled gene expression profiles, our method is able to generate a set of testable hypotheses: the transcriptional regulator X regulates function Y under cellular condition Z. GSECA algorithm is implemented into freely available software package.
- Enrichment Analysis
- Transcription Factor Binding Site
- Regulatory Motif
- Functional Cluster
- Putative Transcriptional Regulator
Advanced high-throughput microarray technologies have facilitated the investigation of gene expression in a genome-wide manner [1, 2]. Because of the complex nature and large volume of data, whole-genome expression profiles often require appropriate and comprehensive analytic methods. Gene clustering according to the expression similarity has been popularly used in this perspective, often as the first step of analysis . In addition, functional enrichment analysis or pathway analysis was proposed to explain the global gene expression changes in the context of available knowledge, such as functional annotation of genes . A classical enrichment analysis uses functionally annotated gene sets a priori defined from external gene databases (functional gene sets) and cross-references them with over- or under-expressed genes [5, 6]. The use of enrichment analysis can be extended for different kinds of biological insights. For example, co-expressed genes grouped by clustering algorithm are likely to be regulated by common transcriptional control . By using another type of gene set classified by the presence or absence of known transcription factor binding sites (TFBS) in promoter regions (regulatory motif gene sets), it can identify overrepresented TFBS with the corresponding putative transcriptional regulators [7, 8].
In spite of promising utility, the conventional enrichment analysis dealing with individual gene clusters has several limitations. First, the size of gene clusters or gene sets is often so small that the statistical evaluation is prone to ascertainment bias, i.e. the significance of enrichment for small gene sets are frequently over- or underestimated. The advanced type of enrichment analysis, gene set enrichment analysis (GSEA) overcame this limitation by dealing with the entire genes represented by array as ranked gene list ordered by phenotypic correlation [9, 10]. However, GSEA is suited for the comparison of two dichotomous phenotypic classes such as tumor versus normal, limiting its general use with gene clustering. Second, the accumulating biological knowledge on genes substantially increased the number of available gene sets to be used in enrichment analysis. Although recently proposed enrichment analysis tools can generate rich descriptions with the help of extended gene sets [11–13], they often produce unmanageably large lists for candidate gene sets to be considered especially when dealing with a large number of clusters. Rigorous statistical evaluation with the correction for multiple tests adjustment might be helpful to some extent, however, the development of integrative method is highly challenging to make the results more comprehensive.
In this study, we propose a method of gene set expression coherence analysis (GSECA) to provide a more advanced solution than the mere combining of gene clustering and enrichment analysis. The algorithm first selects functional gene sets with significantly high expression coherence as biologically relevant candidates for the corresponding expression profiles. Then, gene set clustering further reduces them into a number of functionally related gene sets, or functional clusters. On each functional cluster, putative transcriptional regulators are further identified using modified GSEA algorithm and regulatory motif gene sets. To demonstrate the applicability of our algorithm, we used two publicly available time-series gene expression profiles of the murine myogenesis and erythropoiesis. For respective profiles, our algorithm identified a number of functional themes and putative transcriptional regulators largely consistent with previous reports. As comprehensive and integrative method, GSECA algorithm has extended applicability for the analysis of multiple microarray expression datasets.
The overview of GSECA
The primary goal of GSECA algorithm is the discovery of molecular functions along with the elucidation of transcriptional regulatory logics for the interpretation of microarray datasets. For this purpose, two kinds of gene information – functional annotations in public gene database and the presence of regulatory motif sequences, or TFBS in the promoter regions – are used in terms of functional and regulatory motif gene sets, respectively. GSECA is composed of three major steps: selection of gene sets with significantly high expression coherence, clustering of functional gene sets into functional clusters and the identification of regulatory motifs associated with individual functional clusters.
Some of the candidate functional gene sets showed similar expression changes, making it possible to group them into a number of clusters. Thus, GSECA further categorizes those gene sets into several clusters using conventional clustering methods such as hierarchical or K-means clustering algorithm. The mean expression values of gene sets are used for the clustering and the gene sets with similar expression patterns are assigned into respective functional clusters. The functional annotations of gene sets assigned into a single functional cluster are also likely to represent similar molecular functions or pathways. Thus, this clustering reduces a collection of functional gene sets into more comprehensive set of functional clusters, and we refer collectively to these procedures as "functional clustering".
For each functional cluster, GSECA further identifies putative transcriptional regulators responsible for the expression patterns of the individual functional clusters. For this, GSECA exploits modified GSEA algorithm with regulatory motif gene sets predefined according to the presence of known TFBS in their promoter regions (Fig. 1B). To apply the GSEA algorithm, seed expression values of a functional cluster are first calculated for each time point by averaging the expression values of all genes belonging to the functional cluster. The entire genes in the array are then ordered according to the expression similarity or PCC with the seed values to make a ranked gene list. In the list, genes whose expression changes are similar to the seed values become top-positioned. The gene members of a regulatory motif set are then matched to the ordered rank list and measured for the enrichment using GSEA algorithm . The significance level of enrichment is determined by gene permutation tests. The use of PCC as gene ordering metric is one of distinguishing features in GSECA algorithm and also extends the applicability of the conventional GSEA algorithm for the analysis of time-series expression profiles.
Application of GSECA to murine myogenesis and erythropoiesis expression profiles
Cellular differentiation represent a series of intricate and complex cellular events the majority of which are under the control of transcriptional regulation. Therefore, time-series gene expression profiles derived from an in vitro cell differentiation model are good candidates for the application of GSECA algorithm. For test sets, we selected two kinds of publicly available time-series expression profiles representing the differentiation of murine myocytes  and erythrocytes . First, we selected 1,206 functional gene sets including 5 – 100 genes and calculated expression coherence for each functional gene set. Significance level of expression coherence was determined by gene permutation tests and adjusted for multiple tests. As a result, 31 and 18 functional gene sets with significantly high expression coherence (P < 0.05, Bonferroni corrected) were identified in myogenesis and erythropoiesis expression profiles, respectively. We further used hierarchical clustering to classify functional gene sets with similar expression patterns into individual functional clusters.
It has been known that genes with general housekeeping functions such as ribosomal genes, tend to be strongly correlated in expression profiles without direct evidence for their phenotypic association [16, 17]. This is also the case of myogenesis dataset and the majority of functional gene sets identified with significantly high expression coherence (58%, 18/31 gene sets) were indicative of general housekeeping functions such as nucleotide or protein metabolism. Our study shows that the genes with housekeeping functions have correlated expression patterns not only at the individual gene level but also at the gene set level. Thus, it is reasonable to collectively treat them as a single functional cluster representing general housekeeping function (functional cluster 4).
Among the 18 erythropoiesis-related functional gene sets (Fig. 2B), two gene sets with characteristic functions of red blood cells – oxygen binding and hemoglobin complex – were assigned into functional cluster 1. Higher expression coherence of the two gene sets suggests that the genes with red blood cell function have coordinated and marked transcriptional up-regulation across the process of erythropoiesis. In addition, three gene sets with heterogeneous molecular functions such as cell adhesion and neurotransmitter receptor activity, were assigned into another functional cluster 2. Although speculative, those functions might present the potential functionalities with collaborative roles in erythropoiesis or hematopoiesis. Likewise the case of myogenesis, 13 functional gene sets representing the housekeeping functions showed similar expression changes throughout the erythropoiesis and they were collectively grouped into functional cluster 3.
Identification of putative transcriptional regulators with modified GSEA algorithm and regulatory motif gene sets
List of regulatory motif gene sets significantly enriched in individual functional clusters
Arnt, SREBP-1, Sp-1, MyoD, E2A, USF
Sp-1, USF, LBP-1, Myc
NRF-1, E2F, ATF/CREB, ETF, NF-Y, GABP, Elk-1, ZF5
SREBP-1, USF, GATA-1
NF-Y, NRF-1, ATF/CREB, E2F, Arnt, Tel-2, Egr-3, Myc, ETF, Sp-1, GABP, YY1, HIF-1, Elk-1, ZF5
In addition, the functional cluster 4 of myogenesis profile representing the housekeeping functions showed enrichment for multiple ubiquitous transcription factors such as NRF-1, E2F, CREB, NF-Y, and ZF5. This is also the case of functional cluster 3 of erythropoiesis-related expression profile. The enrichment of multiple transcription factors might indicate the ubiquitous nature of the corresponding factors associated with general housekeeping functions [22, 23]. However, the heterogeneity of functional gene sets might have also caused the enrichment of multiple regulatory motifs because the gene sets with housekeeping functions are manually assigned into a single cluster.
Synergistic motif pairs in murine myogenesis and erythropoiesis
List of putative synergistic motif pairs
Motif 1 (gene size/EC)a
Motif 2 (gene size/EC)
Comparison of GSECA results with conventional enrichment analysis
In case of myogenesis-related 7 functional gene sets, enrichment analyses combined with K-means or SOM clustering both yielded low level of significance which did not reach the threshold level of GSECA (unadjusted P < 4 × 10-5). This is also the case of erythropoiesis-related two functional gene sets. One plausible explanation for this low level of significance is the small size of functional gene sets in that functional gene sets containing less than 10 genes (i.e., troponin complex and sarcoplasmic reticulum) showed the lowest level of significance. In case of 2 gene sets in erythropoiesis, they both have less than 10 genes and showed variable level of significance across the different settings or used clustering methods. This is consistent with our initial assumption that conventional enrichment analysis dealing with small gene cluster or gene sets might be prone to over- or under-estimation of the significance.
The significance for enrichment of regulatory motif gene sets were also improved in GSECA analysis as shown for 6 and 3 gene sets for myogenesis and erythropoiesis expression profiles, respectively. The significance of enrichment for biologically relevant regulatory motifs such as MyoD and GATA-1 is two to three folds higher in GSECA results. The improved statistical power in detecting the regulatory motifs of interest might be due to the modified GSEA algorithm used in our method [9, 10, 27]. The adoption of modified GSEA algorithm is likely to provide the robustness and sensitivity of the advanced GSEA algorithm as possible explanation for improved statistical power over the conventional methods.
Considerations on GSECA methodology
The initial assumption of GSECA is that functional gene sets with significantly high expression coherence suggest putative functionality. It must be noted that annotated functions of gene sets with higher expression coherences do not always correspond directly with the actual biological functions . Nonetheless, many physiological cellular responses require the simultaneous participation of gene products and genes with central roles are likely to have similar regulatory control and expression patterns [28–30]. Comparative analysis also showed that co-expression patterns of many functionally-related genes are conserved across diverse species . Thus, gene sets with significantly high expression coherence might, if not all, represent the key molecular functions of the corresponding expression profiles.
Our algorithm also concerns how the functionality represented by functional clusters can be linked to regulatory motifs to elucidate the putative transcriptional regulators. Cares must be taken in that genes collected from the functional gene sets assigned to a functional cluster might not fully represent the putative transcriptional targets considering that the current functional gene annotation is not complete. To compensate for this, GSECA implements a modified GSEA algorithm to exploit the entire gene expression profiles in terms of correlation with seed values of functional clusters. Similarity-based gene ordering along with the enrichment algorithm is likely to ensure the robustness and sensitivity of GSEA algorithm as demonstrated by the comparison with conventional strategy.
The use of GSEA algorithm also facilitates the adoption of the extended application for GSEA algorithm recently proposed to increase the statistical power or for improved biological insights. For example, by using absolute correlation as ordering parameter, GSEA can detects unique functional categories whose gene members have both extreme transcriptional up- and down-regulation . If such strategy can be applied in GSECA algorithm, it can detect putative regulatory motifs with dual roles of transcriptional enhancers and inhibitors in the cellular contexts. However, one distinguishing feature of GSECA, the use of distance metric such as PCC also limits the use of GSECA algorithm only for time- or condition-series expression profiles as compared with conventional GSEA which is oriented for the comparison of two phenotypic classes.
We also provide an additional method to identify putative synergistic motif pairs among multiple transcription factors. The method has been previously introduced and used to identify the synergistic combination between transcription factors in yeast  and human . However, due to the large number of regulatory motif gene sets in pairwise combination and permutation tests to be considered, the method is often not feasible for general application. Thus, it would be beneficial to select a subset of putative regulatory motifs to reduce the computational work load and GSECA can provide such plausible candidates for the in-depth analyses of combinatorial actions between transcription factors. Expression coherence-based identification of motif synergy would provide clues on the complex structure of regulatory modules and substrates for further experimental validation . However, recent studies on the elucidation of transcription regulatory networks use more sophisticated network assumptions and detailed parameters on the motif sequences and their relationships [36, 37]. Moreover, in silico analysis-based results and significances must be interpreted with care because they do not always represent the actual functionality or causality.
In addition, there have been efforts to incorporate the biological knowledge into the gene clustering to maximize the statistical efficiency and reliability of the analysis results. For example, functional gene annotations can be directly incorporated in the distance metric , or used to guide the clustering procedures [39, 40]. However, most methods in this perspective use the functional GO categories as additional information for fine-tuning of distance metrics to optimize the clustering, or to evaluate the results of conventional clustering algorithms . By contrast, GSECA algorithm directly calculates the expression coherences of predefined gene sets then, categorizes into a number of functional clusters by gene set clustering. Gene set-based clustering used in GSECA provides an additional advantage over the conventional strategy in which gene clusters are individually measured for enrichment with functional or regulatory motif gene sets, i.e. improved statistical power and comprehensive interpretation of the results.
In this study, we address an integrative method for the interpretation of multiple expression profiles in terms of two kinds of gene information; function gene annotation and sequence information of TFBS in the regulatory regions. It measures two kinds of parameters, expression coherence and the extent of enrichment in similarity-based ranked gene list to identify the putative functionality and transcription regulators, respectively. Our method successfully identified the key molecular functions and putative transcriptional regulators for two test expression profiles, which were largely consistent with the literature-based knowledge. With improved statistical power over the conventional strategy, our algorithm has extended applicability for rich descriptions of high-throughput microarray expression data.
Test expression profiles
Examples of microarray datasets were downloaded from public expression databases, Gene Expression Omnibus or NCBI GEO . We used two expression datasets representing time-scaled gene expression changes for the differentiation of murine myocytes (accession no. GDS586 in GEO database)  and erythrocytes (GDS568) . Both datasets were prepared using the same expression microarray platform of Affymetrix MG-U74Av2 with similar hybridization protocols . The global expression profiles were median-centered and normalized to set the sum of the squares of probe intensities to be 1.0. We used NetAffx Gene Ontology Mining Tools  to intersect the used probes into Entrez gene annotation. Through the study, we used Entrez gene annotation as the common link for functional and regulatory motif gene sets.
Preparation of functional and regulatory motif gene sets
We used NetAffx software for the functional categorization of genes to prepare the function gene sets. The gene grouping was based on functional annotations in public gene databases, GO (Gene Ontology), KEGG (Kyoto Encyclopedia of Genes and Genomes) and GenMAPP (Gene Map Annotator and Pathway Profiler) [45–47]. A regulatory motif gene set or a TFBS-annotated gene set is defined as a set of genes containing the sequence motif for corresponding TFBS in their regulatory regions at least once. To prepare regulatory motif gene sets from a publicly available TFBS annotation database , the precomputed fingerprint files were downloaded from Expander package [7, 49]. This database includes the information of putative cis-regulatory sequences predicted based on experimentally validated binding sequence information for known transcription factors. In total, 432 TFBS-annotated regulatory motif gene sets were prepared as previously described  and used for enrichment analyses.
Functional clustering using expression coherence of functional gene sets
For each functional gene set, GSECA first determined the extent of how gene members in a gene set might correlate with each other. As distance measure, GSECA calculated the Pearson correlation coefficient (PCC) for all possible pairs of genes, omitting self-comparisons. The mean value of PCC was used as the "expression coherence" of the functional gene set. For biological relevance, we only used gene sets containing 5 – 100 highly variable genes, because too few genes might lead to selection bias, and the functional annotation of large gene sets is commonly indicative of non-informative general function. To determine the significance level for expression coherence, we used gene permutation tests. For each gene set with n number of genes, expression coherence was calculated for n randomly selected genes, and the fraction of random sets that acquired higher expression coherence in 106 tests was determined as a P value. The nominal P values were adjusted for the multiple testing with Bonferroni correction accounting for the number of functional gene sets. For functional gene sets with significantly high expression coherence, mean expression values of the gene members belonging to the gene set were calculated for each time point. Then, agglomerative hierarchical clustering was used to classify the functional gene sets with similar expression patterns by using the PCC as distance measure. We defined such clustered functional gene sets as individual "functional clusters".
Identification of transcriptional regulators for functional clusters
For each functional cluster, we collected the gene members included in the functional gene sets of the corresponding functional cluster. Mean expression values across different time points were calculated as "seed" values of representative expression changes for the functional cluster. Then, using regulatory motif gene sets, we identified putative transcriptional regulators responsible for the seed expression values of individual functional clusters. The overall procedure is similar to that described for the conventional GSEA algorithm , while the most distinguishing feature of GSECA is that it uses PCC as the gene ordering parameter, rather than signal-to-noise ratio (SNR). First, the entire genes in the array were calculated individually for the similarity of expression to the seed values of each functional cluster in terms of the PCC. Then, the genes were ordered according to the PCC and the genes with higher PCC or those being more similar to seed values are top-ranked in the ordered gene list. Regulatory motif gene sets were matched to such gene lists, calculating enrichment score (ES) using Kolmogorov-Smirnov statistics . The significance level for ES was calculated using 5 × 105 gene permutation tests and adjusted for multiple testing accounting for the number of regulatory motif gene sets. In conventional GSEA algorithm, phenotypic permutation is preferred in that gene to gene correlation is preserved [9, 10]. However, phenotypic permutation is often not feasible for common time-series expression datasets due to the small number of samples. To demonstrate that gene permutation tests can obtain the biologically relevant findings, we used gene permutation in adopting modified GSEA algorithm. However, it must be noted that gene permutation often overestimates the significance levels .
Identification of synergistic motif pairs using expression coherence
Pairs of putative transcriptional regulators acting in combinatorial mode were investigated using previously described method [33, 34]. For a candidate pair of two regulatory motifs, expression coherence was calculated for all pairs of gene members that occurred both in two regulatory motif gene sets. The significance level for the expression coherence was measured by gene permutation tests. For expression coherence of n number of genes that occurred both in two regulatory motif gene sets, two sets of the same number of genes were randomly selected from two regulatory motif gene sets and expression coherence is calculated. The nominal P value was calculated as the fraction of random sets that acquired higher expression coherence in 5,000 permutation tests.
Comparison of significance level with conventional strategy
N and M is the total number of genes in array and cluster gene numbers, n is the size of corresponding gene set and k is the number of genes both occurred in gene set and cluster. For each setting, the most significant enrichment across the clusters was selected and assigned to the individual functional and regulatory motif gene sets.
Implementation of GSECA algorithm
The overall procedures of GSECA are implemented into freely available software. The test files with two expression profiles along with functional and regulatory motif gene sets (human and mouse) are also available with the software package. The software package and technical manual can be downloaded in our website.
Project name: GSECA
Project home page: http://www.systemsbiology.co.kr/GSECA/
Operating system: Microsoft Windows
Programming language: VB.NET
Other requirements: .NET Framework 2.0 or greater
Any restrictions to use by non-academics: None
This work was supported by FG06-12-01 of the 21C Frontier Functional Human Genome Project from the Ministry of Science & Technology in Korea and by 0405-BC02-0604-0004 of Korea Health 21 R&D Project, Ministry of Health & Welfare, Republic of Korea.
- DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 1997, 278: 680–686. 10.1126/science.278.5338.680View ArticlePubMedGoogle Scholar
- Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, Kidd MJ, King AM, Meyer MR, Slade D, Lum PY, Stepaniants SB, Shoemaker DD, Gachotte D, Chakraburtty K, Simon J, Bard M, Friend SH: Functional discovery via a compendium of expression profiles. Cell 2000, 102: 109–126. 10.1016/S0092-8674(00)00015-5View ArticlePubMedGoogle Scholar
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998, 95: 14863–14868. 10.1073/pnas.95.25.14863PubMed CentralView ArticlePubMedGoogle Scholar
- Curtis RK, Oresic M, Vidal-Puig A: Pathways to the analysis of microarray data. Trends Biotechnol 2005, 23: 429–435. 10.1016/j.tibtech.2005.05.011View ArticlePubMedGoogle Scholar
- Beissbarth T, Speed TP: GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 2004, 20: 1464–1465. 10.1093/bioinformatics/bth088View ArticlePubMedGoogle Scholar
- Al-Shahrour F, Diaz-Uriarte R, Dopazo J: FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 2004, 20: 578–580. 10.1093/bioinformatics/btg455View ArticlePubMedGoogle Scholar
- Elkon R, Linhart C, Sharan R, Shamir R, Shiloh Y: Genome-wide in silico identification of transcriptional regulators controlling the cell cycle in human cells. Genome Res 2003, 13: 773–780. 10.1101/gr.947203PubMed CentralView ArticlePubMedGoogle Scholar
- Kim TM, Jung MH: Identification of transcriptional regulators using binding site enrichment analysis. In Silico Biol 2006, 6: 531–544.PubMedGoogle Scholar
- Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC: PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 2003, 34: 267–273. 10.1038/ng1180View ArticlePubMedGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005, 102: 15545–15550. 10.1073/pnas.0506580102PubMed CentralView ArticlePubMedGoogle Scholar
- Backes C, Keller A, Kuentzer J, Kneissl B, Comtesse N, Elnakady YA, Muller R, Meese E, Lenhof HP: GeneTrail – advanced gene set enrichment analysis. Nucleic Acids Res 2007, 35: W186-W192. 10.1093/nar/gkm323PubMed CentralView ArticlePubMedGoogle Scholar
- Liu CC, Lin CC, Chen WS, Chen HY, Chang PC, Chen JJ, Yang PC: CRSD: a comprehensive web server for composite regulatory signature discovery. Nucleic Acids Res 2006, 34: W571-W577. 10.1093/nar/gkl279PubMed CentralView ArticlePubMedGoogle Scholar
- Al-Shahrour F, Minguez P, Tarraga J, Montaner D, Alloza E, Vaquerizas JM, Conde L, Blaschke C, Vera J, Dopazo J: BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments. Nucleic Acids Res 2006, 34: W472-W476. 10.1093/nar/gkl172PubMed CentralView ArticlePubMedGoogle Scholar
- Tomczak KK, Marinescu VD, Ramoni MF, Sanoudou D, Montanaro F, Han M, Kunkel LM, Kohane IS, Beggs AH: Expression profiling and identification of novel genes involved in myogenic differentiation. FASEB J 2004, 18: 403–405.PubMedGoogle Scholar
- Welch JJ, Watts JA, Vakoc CR, Yao Y, Wang H, Hardison RC, Blobel GA, Chodosh LA, Weiss MJ: Global regulation of erythroid gene expression by transcription factor GATA-1. Blood 2004, 104: 3136–3147. 10.1182/blood-2004-04-1603View ArticlePubMedGoogle Scholar
- Lee HK, Braynen W, Keshav K, Pavlidis P: ErmineJ: tool for functional analysis of gene expression data sets. BMC Bioinformatics 2005, 6: 269. 10.1186/1471-2105-6-269PubMed CentralView ArticlePubMedGoogle Scholar
- Pavlidis P, Lewis DP, Noble WS: Exploring gene expression data with class scores. Pac Symp Biocomput 2002, 474–485.Google Scholar
- Molkentin JD, Olson EN: Defining the regulatory networks for muscle development. Curr Opin Genet Dev 1996, 6: 445–453. 10.1016/S0959-437X(96)80066-9View ArticlePubMedGoogle Scholar
- Wei Q, Paterson BM: Regulation of MyoD function in the dividing myoblast. FEBS Lett 2001, 490: 171–178. 10.1016/S0014-5793(01)02120-2View ArticlePubMedGoogle Scholar
- Bessereau JL, Mendelzon D, LePoupon C, Fiszman M, Changeux JP, Piette J: Muscle-specific expression of the acetylcholine receptor alpha-subunit gene requires both positive and negative interactions between myogenic factors, Sp1 and GBF factors. EMBO J 1993, 12: 443–449.PubMed CentralPubMedGoogle Scholar
- Sartorelli V, Webster KA, Kedes L: Muscle-specific expression of the cardiac alpha-actin gene requires MyoD1, CArG-box binding factor, and Sp1. Genes Dev 1990, 4: 1811–1822. 10.1101/gad.4.10.1811View ArticlePubMedGoogle Scholar
- Ishida S, Huang E, Zuzan H, Spang R, Leone G, West M, Nevins JR: Role for E2F in control of both DNA replication and mitotic functions as revealed from DNA microarray analysis. Mol Cell Biol 2001, 21: 4684–4699. 10.1128/MCB.21.14.4684-4699.2001PubMed CentralView ArticlePubMedGoogle Scholar
- Manni I, Mazzaro G, Gurtner A, Mantovani R, Haugwitz U, Krause K, Engeland K, Sacchi A, Soddu S, Piaggio G: NF-Y mediates the transcriptional inhibition of the cyclin B1, cyclin B2, and cdc25C promoters upon induced G2 arrest. J Biol Chem 2001, 276: 5570–5576. 10.1074/jbc.M006052200View ArticlePubMedGoogle Scholar
- Levine M, Tjian R: Transcription regulation and animal diversity. Nature 2003, 424: 147–151. 10.1038/nature01763View ArticlePubMedGoogle Scholar
- Griffin MJ, Sul HS: Insulin regulation of fatty acid synthase gene transcription: roles of USF and SREBP-1c. IUBMB Life 2004, 56: 595–600.View ArticlePubMedGoogle Scholar
- Griffin MJ, Wong RH, Pandya N, Sul HS: Direct interaction between USF and SREBP-1c mediates synergistic activation of the fatty-acid synthase promoter. J Biol Chem 2007, 282: 5453–5467. 10.1074/jbc.M610566200View ArticlePubMedGoogle Scholar
- Subramanian A, Kuehn H, Gould J, Tamayo P, Mesirov JP: GSEA-P: A desktop application for Gene Set Enrichment Analysis. Bioinformatics 2007.Google Scholar
- Jansen R, Greenbaum D, Gerstein M: Relating whole-genome expression data with protein-protein interactions. Genome Res 2002, 12: 37–46. 10.1101/gr.205602PubMed CentralView ArticlePubMedGoogle Scholar
- Segal E, Wang H, Koller D: Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics 2003, 19(Suppl 1):i264-i271. 10.1093/bioinformatics/btg1037View ArticlePubMedGoogle Scholar
- Graeber TG, Eisenberg D: Bioinformatic identification of potential autocrine signaling loops in cancers from gene expression profiles. Nat Genet 2001, 29: 295–300. 10.1038/ng755View ArticlePubMedGoogle Scholar
- Stuart JM, Segal E, Koller D, Kim SK: A gene-coexpression network for global discovery of conserved genetic modules. Science 2003, 302: 249–255. 10.1126/science.1087447View ArticlePubMedGoogle Scholar
- Saxena V, Orgill D, Kohane I: Absolute enrichment: gene set enrichment analysis for homeostatic systems. Nucleic Acids Res 2006, 34: e151. 10.1093/nar/gkl766PubMed CentralView ArticlePubMedGoogle Scholar
- Pilpel Y, Sudarsanam P, Church GM: Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet 2001, 29: 153–159. 10.1038/ng724View ArticlePubMedGoogle Scholar
- Zhu Z, Shendure J, Church GM: Discovering functional transcription-factor combinations in the human cell cycle. Genome Res 2005, 15: 848–855. 10.1101/gr.3394405PubMed CentralView ArticlePubMedGoogle Scholar
- Wasserman WW, Sandelin A: Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 2004, 5: 276–287. 10.1038/nrg1315View ArticlePubMedGoogle Scholar
- Beer MA, Tavazoie S: Predicting gene expression from sequence. Cell 2004, 117: 185–198. 10.1016/S0092-8674(04)00304-6View ArticlePubMedGoogle Scholar
- Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N: Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 2003, 34: 166–176.View ArticlePubMedGoogle Scholar
- Cheng J, Cline M, Martin J, Finkelstein D, Awad T, Kulp D, Siani-Rose MA: A knowledge-based clustering algorithm driven by Gene Ontology. J Biopharm Stat 2004, 14: 687–700. 10.1081/BIP-200025659View ArticlePubMedGoogle Scholar
- Huang D, Pan W: Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data. Bioinformatics 2006, 22: 1259–1268. 10.1093/bioinformatics/btl065View ArticlePubMedGoogle Scholar
- Huang D, Wei P, Pan W: Combining gene annotations and gene expression data in model-based clustering: weighted method. OMICS 2006, 10: 28–39. 10.1089/omi.2006.10.28View ArticlePubMedGoogle Scholar
- Datta S, Datta S: Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC Bioinformatics 2006, 7: 397. 10.1186/1471-2105-7-397PubMed CentralView ArticlePubMedGoogle Scholar
- Cheng J, Sun S, Tracy A, Hubbell E, Morris J, Valmeekam V, Kimbrough A, Cline MS, Liu G, Shigeta R, Kulp D, Siani-Rose MA: NetAffx Gene Ontology Mining Tool: a visual approach for microarray data analysis. Bioinformatics 2004, 20: 1462–1463. 10.1093/bioinformatics/bth087View ArticlePubMedGoogle Scholar
- Dahlquist KD, Salomonis N, Vranizan K, Lawlor SC, Conklin BR: GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nat Genet 2002, 31: 19–20. 10.1038/ng0502-19View ArticlePubMedGoogle Scholar
- Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, et al.: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 2004, 32: D258-D261. 10.1093/nar/gkh066View ArticlePubMedGoogle Scholar
- Kanehisa M, Goto S, Kawashima S, Nakaya A: The KEGG databases at GenomeNet. Nucleic Acids Res 2002, 30: 42–46. 10.1093/nar/30.1.42PubMed CentralView ArticlePubMedGoogle Scholar
- Wingender E, Chen X, Hehl R, Karas H, Liebich I, Matys V, Meinhardt T, Pruss M, Reuter I, Schacherer F: TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res 2000, 28: 316–319. 10.1093/nar/28.1.316PubMed CentralView ArticlePubMedGoogle Scholar
- Shamir R, Maron-Katz A, Tanay A, Linhart C, Steinfeld I, Sharan R, Shiloh Y, Elkon R: EXPANDER – an integrative program suite for microarray data analysis. BMC Bioinformatics 2005, 6: 232. 10.1186/1471-2105-6-232PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.