Content-based microarray search using differential expression profiles
© Engreitz et al; licensee BioMed Central Ltd. 2010
Received: 30 June 2010
Accepted: 21 December 2010
Published: 21 December 2010
With the expansion of public repositories such as the Gene Expression Omnibus (GEO), we are rapidly cataloging cellular transcriptional responses to diverse experimental conditions. Methods that query these repositories based on gene expression content, rather than textual annotations, may enable more effective experiment retrieval as well as the discovery of novel associations between drugs, diseases, and other perturbations.
We develop methods to retrieve gene expression experiments that differentially express the same transcriptional programs as a query experiment. Avoiding thresholds, we generate differential expression profiles that include a score for each gene measured in an experiment. We use existing and novel dimension reduction and correlation measures to rank relevant experiments in an entirely data-driven manner, allowing emergent features of the data to drive the results. A combination of matrix decomposition and p-weighted Pearson correlation proves the most suitable for comparing differential expression profiles. We apply this method to index all GEO DataSets, and demonstrate the utility of our approach by identifying pathways and conditions relevant to transcription factors Nanog and FoxO3.
Content-based gene expression search generates relevant hypotheses for biological inquiry. Experiments across platforms, tissue types, and protocols inform the analysis of new datasets.
With the development of the DNA microarray and other technologies that probe gene expression on an "omic" scale, we are now able to discover associations between biological conditions based on their molecular underpinnings. Seminal work by Golub et al.  classified leukemia samples by their global gene expression profiles, demonstrating that transcriptomic signatures can aid in functional prediction and improve our molecular understanding of disease. Hughes et al.  predicted the effects of novel gene deletions and chemical treatments by profiling yeast mutants and comparing new arrays to this reference. More recent studies examined cellular transcriptional response to drug treatment [3, 4] and disease [5, 6] in order to identify novel relationships between apparently unrelated conditions and compounds. This work not only demonstrated the utility of expression-based discovery, but also suggested that functional studies about drugs and diseases can utilize data from different platforms and cell types. This general approach to hypothesis generation - namely, finding associations between diverse conditions based on gene expression - has great potential to further biological and biomedical research if implemented on a large scale.
Here we develop methods for content-based gene expression search using an entire experiment as a query. That is, given an input experiment comparing case to control, we aim to identify other experiments that show similar patterns of differential expression. This concept is exemplified by the Connectivity Map , which searches for relationships between treatment-control comparisons for small molecules. While the Connectivity Map focused on drug treatment and disease, a similar approach across a sufficiently large data source would allow for the identification of associations between gene knockdowns, diseases, drugs, and myriad other perturbations and phenotypes. Public repositories provide a wealth of data amenable to this task. The largest of these repositories, the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) , now contains over 400,000 individual samples from more than 17,000 experiments detailing the molecular characteristics of diverse cell types, diseases, and drug treatments. The European Bioinformatics Institute (EBI) ArrayExpress Repository  and Stanford Microarray Database  host additional data. While GEO supports searches of its content based on free-text and controlled-vocabulary annotations, there is increasing interest in methods for querying microarray databases based on the molecular measurements themselves [10–14]. The power of this approach would grow with the size of the repository.
Current methods for content-based search typically involve a two-step process: they identify a gene set of interest and then search for experiments in which this gene set is important. Several groups have introduced methods for identifying experiments that co-express  or differentially express  a given gene set. Recently, EBI implemented the Gene Expression Atlas, which provides this latter functionality over their curated array archive . These methods, however, require that both the query and target experiments differentially express genes above some hard threshold, and thus may miss more subtle or noisy relationships . Other approaches, typified by Gene Set Enrichment Analysis (GSEA) , partially bypass this requirement by comparing a subset of genes to ranked profiles, using a hard threshold for the query experiment and a soft threshold for the queried experiments [3, 4].
While previous approaches require designating a group of differentially expressed genes, we explore the possibility of using as a query a differential expression (DE) profile, consisting of a complete list of features and associated expression scores. By examining all genes shared between query and queried experiments, we aim to identify experimental conditions and perturbations that exhibit similar transcriptional responses. A successful strategy in this effort should reconcile differences between species, platform types, and normalization methods, as well as overcome the confounding effects of noise and technical replicability. To achieve this, we consider combinations of methods for three tasks: data representation, dimension reduction, and search algorithm.
First, we consider the problem of data representation. Typical microarray analysis methods represent differential expression as a fold-change, comparing the expression in one set of samples to that in another . However, because public expression databases consist of a broad range of data types and experimental modalities, rank-based representations are often employed to account for the disparities in the distributions of observed data [4, 10]. Here we compare both parametric and nonparametric data representations to determine the best approach for comparing DE profiles. We also consider an alternate representation of gene expression data, and construct DE profiles based on the p-value of differential expression.
A second challenge is that gene expression profiles from high-throughput technologies consist of up to tens of thousands of measurements per sample. In addition to the computational complexity involved in handling these large datasets, high dimensionality often confounds data mining techniques [16, 18]. In particular, high-dimensional, multimodal data lends itself to over-fitting and reduced performance . Many solutions to this problem have been proposed, of which dimensionality reduction is the foremost. Matrix decomposition [20, 21], feature selection , and module or gene-set based approaches [16, 23] attempt to capture the most relevant data while removing redundant or noisy features .
Given an appropriate data representation for differential expression, the final challenge is how best to calculate the similarity between two experiments. While Fujibuchi et al. use Spearman rank correlation to compare individual microarrays , it is not clear whether a similar approach is appropriate for DE profiles. Several recent studies use a modified Pearson correlation measure on rank-normalized profiles [4, 5, 24]. Other work suggests that weighting expression values by each gene's variance may improve classification and analysis [25, 26].
Evaluation of data representation and similarity measures
The receiver operating curves for the best-performing search methods indicated that, on average, about 50% of the true positives could be recovered with greater than 90% specificity. This high specificity is important for search because typically the first few results, rather than a complete list, are examined. To evaluate the performance of our search over the top results in each search, we calculated the "precision at 4" for each of the 32 experiments, permuting labels to create a null model (see Additional file 2). The average precision for Duchenne muscular dystrophy and Huntington's disease exceeded the random model at a 95% confidence interval for 13/15 and 8/8 experiments, respectively. The "precision at 4" for breast cancer, a genetically complex disease, was also high, but it significantly surpassed the random model in only 4/9 experiments.
Constructing a network of GEO differential expression experiments
Application to Nanog knockdown in embryonic stem cells
In addition to mouse ESC datasets, this search produced interesting comparisons with different experimental systems. Result 14 supports a similarity between Nanog knockdown and the comparison of non-small cell lung carcinoma (NSCLC) to small cell lung cancer (SCLC). SCLC, the more aggressive disease, has been linked with expression of stem cell factor  and the Hedgehog signaling pathway . These relationships suggest that, in a broad sense, SCLC compared to NSCLC may have a more stem-like transcriptional program.
Application to FoxO3 knockout in neural stem cells
Other matches from the FoxO3 search (Results 11, 12, 15) point to a role for FoxO3 in cellular response to cytokine interleukin-2 (IL-2) stimulation. All three of these matching profiles compare cytotoxic T cell line (CTLL-2) at 1 hour after IL-2 stimulation to a later time point (6, 12, or 16 hours). From the direction of these comparisons, we would predict that IL-2 stimulates a transcriptional program that is similar to that of FoxO3 knockout. Indeed, IL-2 signaling leads to phosphorylation and inactivation of FoxO3 in CTLL-2 cells , confirming this hypothesis.
In optimizing our data processing and search pipeline, we found that linear combinations of gene expression features derived in a separate compendium benefited our analysis. The most effective dimension reduction technique involved projecting each DE profile into a feature-space identified by independent component analysis. We previously used ICA to identify fundamental components of human gene expression from a large compendium of 10,000 arrays, of which only a small subset overlap with the experiments examined here . The ICA projection method reduced the set of features from on the order of 20,000 to less than 500, allowing for rapid indexing and searching of large libraries of differential expression profiles. Furthermore, this approach outperformed module-based methods, possibly because the linear model incorporated data from all of the genes rather than only those that participate in discrete gene sets. Despite the fact that these ICA features were derived in human data, they proved robust in identifying and ranking experiments in closely related species as well. Thus, our results support previous findings that gene expression features derived in one compendium can be useful for interpreting data from new datasets .
To calculate similarities between differential expression profiles, we introduced a novel weighting scheme that incorporates information about a feature's significance of differential expression. This approach provides an intuitive means for emphasizing the contributions of features that are significantly differentially expressed in both experiments, which may represent the most relevant common biology. At the same time, the weighted correlation incorporates even genes that are not significantly differentially expressed, potentially capturing the effects of broader transcriptional changes. We observed that this scheme worked well with Pearson correlation, but did not perform as well when combined with rank-based correlation. Future work will characterize the behavior of this similarity measure on a larger scale.
We used the most successful data processing pipeline to index all GEO DataSets. Our results with transcription factor experiments suggest that this approach can provide predictions for genes, phenotypes and perturbations that share functional similarities with a query experiment. Analysis of Nanog knockdown in ESCs successfully identified other ESC differentiation time courses, induced by a variety of factors, from amongst almost 10,000 other profiles (Figure 5). The same search predicted a link between small lung cell carcinoma and ESC transcriptional programs. For a less well characterized transcription factor, FoxO3, our method also succeeded in recapitulating known biology across species and experimental systems. Although it is clear that FoxO3 has lineage-specific effects [42, 49], we identified a role for FoxO3 in hypoxia response that appears to transcend tissue type [40, 41]. For uncharacterized comparisons, this information has the potential to provide useful hypotheses for phenotypes and pathways to investigate.
As in more traditional microarray analyses, however, interpretation of the most significant genes identified by our weighting scheme remains difficult. Our analysis of the FoxO3 search revealed a number of genes involved in both hypoxia and FoxO3 signaling, linking these two pathways. However, the top genes in the Nanog knockdown search failed to reveal convincing pathways that might explain the relationship between small lung carcinoma and ESCs. While we focus on the interpretation of several individual genes in this study, future efforts may benefit from the use of gene set enrichment tools to find pathways that are significantly represented in the top gene list.
As experimentalists continue to explore and deposit information about cellular processes and perturbations, the utility of content-based search approaches will increase. With a larger bank of transcriptomic data and a high chance of identifying overlapping and functionally related biology, an "experiment-omic" screen might be the first step in characterizing a novel dataset. To realize this, further ontological indexing of expression databases may also be necessary . Several groups have already begun to integrate expression with textual phenotype data to enable gene function prediction  and automatic disease diagnosis  from large databases. Even for expression-driven methods, controlled annotations for experimental variables, tissue types, and culture systems would allow for more accurate assessments of functional relevance. Finally, ontological indexing of textual annotations will enable the creation of more sophisticated connectivity maps linking not just diseases and drugs, but also gene knockdowns, over-expression studies, and genotype comparisons. These ontology-informed studies may not only search public repositories based on gene expression, but also provide meta-analysis across phenotypic categories.
We have explored computational methods needed to search large repositories for relevant experiments based on differential expression, using an experiment as a query. While previous studies use hard thresholding to select gene sets of interest [4, 11, 13], we propose a data-driven approach that uses information from all shared genes to compare two experiments. Differential expression profiles containing scores for each gene or feature were generated and compared using correlation metrics, following the hypothesis that this direct and intuitive method would perform well across diverse datasets. In a collection of 32 experiments comparing normal to diseased tissue, we achieved an average AUC of 0.737 for retrieving experiments that measure the same disease. We further demonstrated the ability of our method to identify functionally relevant experiments from a large database of studies. Future work will include implementing the principles learned here into a web-based application. Public deployment of these methods will enable discoveries in drug repurposing, disease classification, and systems repositioning as we explore the molecular underpinnings of diverse biological processes and phenotypes.
From a previous collection of disease-associated NCBI GEO microarray experiments , we collected 1,278 processed arrays comprising 32 experiments that compared normal to diseased tissue for Duchenne muscular dystrophy, breast cancer, and Huntington's disease. These experiments represented a variety of species, platforms, tissues, and normalization techniques, factors which might strongly influence the clustering of expression data.
Differential expression profiles
In transcriptomic studies, differential expression analysis identifies the genes and biological processes that vary between two samples. To represent this information from different datasets in a standardized manner, we mapped probesets to Entrez Gene identifiers using AILUN  and generated differential expression (DE) profiles for each comparison using Bioconductor software . Here, a DE profile consists of a list of features (e.g., genes) each with an associated score (e.g., fold change). For each comparison, we represented this differential expression score in three ways.
Log fold-change profile
We converted all microarray data to log values by examining the maximum and minimum values of the normalized probe-level data and applying log2 transformation as needed. We aggregated probes to genes using the fixed effects meta-estimate, calculating an average for each gene weighted by the variance of each probe . We calculated the fold-change difference between normal and disease by averaging samples within each group.
Probes were aggregated as for the log fold-change method. For each gene in each experiment, we determined the probability that the gene was differentially expressed with an empirical Bayes moderated t-statistic implemented in the limma R package (version 2.16.5) . We corrected for multiple hypothesis testing using the Benjamini-Hochberg method . For DE profiles represented in terms of a reduced set of features (see below), we applied limma to assess the differential expression of that feature.
For each sample in each experiment, we ranked probes based on their raw expression score, then averaged all scores for a probe to create a single score for normal and disease sample groups. We mapped from probes to genes by finding the median of the subtractive difference between all pairwise combinations of probes for the same gene in normal and disease.
For all three DE profile representations, we mapped genes to their human homologs using NCBI Homologene, removing genes that did not have one-to-one homologs between species (Additional file 7). While removing species-specific genes may result in loss of important biological information, we hypothesized that comparing global, conserved patterns of gene expression between experiments would prove sufficient to predict functional associations (see Additional file 1 for data on the number of genes mapped for each dataset discussed in the manuscript). Next, we applied one of two methods of dimension reduction.
Projection onto independent components
where A is the final reduced profile (423 features), S is the component matrix (components × genes), and X is the original profile in gene-space.
Fixed effect meta-estimate
To evaluate the performance of the ICA projection method, we also used a set of known features to reduce the dimensionality of our DE profiles. Given a collection of gene sets, we calculated a meta-score for each gene set using the fixed-effect meta-estimate, which represents an average across all genes in the set weighted by their inverse variance . This method summarizes the contributions of functionally coherent gene sets, and may be appropriate for expression analysis. We used MSigDB v2.5, a well described collection of 5,452 gene sets most often used in conjunction with GSEA . For comparison, we also derived gene sets from the ICA features described above: for each independent component, we created a module from all genes that scored three standard deviations above the mean in one direction. These 423 modules represented data-derived functionally coherent gene sets as determined by GO enrichment .
where w i is the weight for feature i, p ij is the FDR-corrected empirical Bayes p-value for experiment j, and C is a scaling factor. For this work, we empirically chose C = 2 because it delivered the best clustering of our disease compendium (data not shown). We used the ROCR package  to evaluate the performance of various data processing methods.
GEO DataSet search
To search GEO for experiments with similar transcriptional patterns, we indexed all GEO DataSets (GDSs). We downloaded processed data from GEO and used the GDS "Value type" field to transform the data to log2 space. Each GDS is manually annotated with one or more factors, e.g., "disease state" or "time", which outline the experimental conditions that vary between groups of samples. Within each GDS, we compared all combinations of groups for a single factor. For each of these comparisons, we created two DE profiles: one in gene-space, and one in the ICA feature-space described above. We calculated p-values for each gene and ICA feature using the empirical Bayes modified t-test as described . To search these DE profiles, we used the absolute value of the p-weighted Pearson correlation metric, since the direction of the comparison is arbitrary. To assess the significance of DE profile comparisons, we selected 10,000 random pairs of comparisons to serve as a background distribution of correlation scores. We estimated the false discovery rate (FDR) of our search results by calculating the percentage of these random comparisons that exceed a given similarity score. Because this random sampling may include true positive comparisons (e.g., two profiles from the same dataset), our corrected p-values may underestimate the significance of new comparisons.
The authors thank Boris Oskotsky and Alex Skrenchuk for assistance and technical support; Nicholas Tatonetti for critical comments; and Anne Brunet and Ashley Webb for interpretation of FoxO3 results. Computing resources at the Stanford Center for Biomedical Informatics Research were funded by the Hewlett Packard Foundation and the Lucile Packard Foundation for Children's Health. Financial support was provided by the National Library of Medicine (R01 LM009719) and Howard Hughes Medical Institute.
- Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286(5439):531–7. 10.1126/science.286.5439.531View ArticlePubMedGoogle Scholar
- Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, Kidd MJ, King AM, Meyer MR, Slade D, Lum PY, Stepaniants SB, Shoemaker DD, Gachotte D, Chakraburtty K, Simon J, Bard M, Friend SH: Functional discovery via a compendium of expression profiles. Cell 2000, 102: 109–26. 10.1016/S0092-8674(00)00015-5View ArticlePubMedGoogle Scholar
- Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, Lerner J, Brunet JP, Subramanian A, Ross KN, Reich M, Hieronymus H, Wei G, Armstrong SA, Haggarty SJ, Clemons PA, Wei R, Carr SA, Lander ES, Golub TR: The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 2006, 313(5795):1929–35. 10.1126/science.1132939View ArticlePubMedGoogle Scholar
- Hassane DC, Guzman ML, Corbett C, Li X, Abboud R, Young F, Liesveld JL, Carroll M, Jordan CT: Discovery of agents that eradicate leukemia stem cells using an in silico screen of public gene expression data. Blood 2008, 111(12):5654–62. 10.1182/blood-2007-11-126003View ArticlePubMedPubMed CentralGoogle Scholar
- Dudley JT, Tibshirani R, Deshpande T, Butte AJ: Disease signatures are robust across tissues and experiments. Mol Syst Biol 2009, 5: 307. 10.1038/msb.2009.66View ArticlePubMedPubMed CentralGoogle Scholar
- Suthram S, Dudley JT, Chiang AP, Chen R, Hastie TJ, Butte AJ: Network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets. PLoS Comput Biol 2010, 6(2):e1000662. 10.1371/journal.pcbi.1000662View ArticlePubMedPubMed CentralGoogle Scholar
- 7. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Muertter RN, Edgar R: NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 2009, (37 Database):D885â€“90. 10.1093/nar/gkn764View ArticleGoogle Scholar
- 8. Parkinson H, Kapushesky M, Kolesnikov N, Rustici G, Shojatalab M, Abeygunawardena N, Berube H, Dylag M, Emam I, Farne A, Holloway E, Lukk M, Malone J, Mani R, Pilicheva E, Rayner TF, Rezwan F, Sharma A, Williams E, Bradley XZ, Adamusiak T, Brandizi M, Burdett T, Coulson R, Krestyaninova M, Kurnosov P, Maguire E, Neogi SG, Rocca-Serra P, Sansone SA, Sklyar N, Zhao M, Sarkans U, Brazma A: ArrayExpress update-from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res 2009, (37 Database):D868â€“72. 10.1093/nar/gkn889View ArticleGoogle Scholar
- 9. Hubble J, Demeter J, Jin H, Mao M, Nitzberg M, Reddy TBK, Wymore F, Zachariah ZK, Sherlock G, Ball CA: Implementation of GenePattern within the Stanford Microarray Database. Nucleic Acids Res 2009, (37 Database):D898â€“901. 10.1093/nar/gkn786View ArticleGoogle Scholar
- Fujibuchi W, Kiseleva L, Taniguchi T, Harada H, Horton P: CellMontage: similar expression profile search server. Bioinformatics 2007, 23(22):3103–4. 10.1093/bioinformatics/btm462View ArticlePubMedGoogle Scholar
- Chen R, Mallelwar R, Thosar A, Venkatasubrahmanyam S, Butte AJ: GeneChaser: identifying all biological and clinical conditions in which genes of interest are differentially expressed. BMC Bioinformatics 2008, 9: 548. 10.1186/1471-2105-9-548View ArticlePubMedPubMed CentralGoogle Scholar
- Caldas J, Gehlenborg N, Faisal A, Brazma A, Kaski S: Probabilistic retrieval and visualization of biologically relevant microarray experiments. Bioinformatics 2009, 25(12):i145–53. 10.1093/bioinformatics/btp215View ArticlePubMedPubMed CentralGoogle Scholar
- 13. Kapushesky M, Emam I, Holloway E, Kurnosov P, Zorin A, Malone J, Rustici G, Williams E, Parkinson H, Brazma A: Gene expression atlas at the European bioinformatics institute. Nucleic Acids Res 2010, (38 Database):D690â€“8. 10.1093/nar/gkp936View ArticleGoogle Scholar
- Huang H, Liu CC, Zhou XJ: Bayesian approach to transforming public gene expression repositories into disease diagnosis databases. Proc Natl Acad Sci USA 2010, 107(15):6823–8. 10.1073/pnas.0912043107View ArticlePubMedPubMed CentralGoogle Scholar
- Hibbs MA, Hess DC, Myers CL, Huttenhower C, Li K, Troyanskaya OG: Exploring the functional landscape of gene expression: directed search of large microarray compendia. Bioinformatics 2007, 23(20):2692–9. 10.1093/bioinformatics/btm403View ArticlePubMedGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005, 102(43):15545–50. 10.1073/pnas.0506580102View ArticlePubMedPubMed CentralGoogle Scholar
- Allison DB, Cui X, Page GP, Sabripour M: Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 2006, 7: 55–65. 10.1038/nrg1749View ArticlePubMedGoogle Scholar
- Raychaudhuri S, Sutphin PD, Chang JT, Altman RB: Basic microarray analysis: grouping and feature reduction. Trends Biotechnol 2001, 19(5):189–93. 10.1016/S0167-7799(01)01599-2View ArticlePubMedGoogle Scholar
- Clarke R, Ressom HW, Wang A, Xuan J, Liu MC, Gehan EA, Wang Y: The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat Rev Cancer 2008, 8: 37–49. 10.1038/nrc2294View ArticlePubMedPubMed CentralGoogle Scholar
- Alter O, Brown PO, Botstein D: Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci USA 2000, 97(18):10101–6. 10.1073/pnas.97.18.10101View ArticlePubMedPubMed CentralGoogle Scholar
- Liebermeister W: Linear modes of gene expression determined by independent component analysis. Bioinformatics 2002, 18: 51–60. 10.1093/bioinformatics/18.1.51View ArticlePubMedGoogle Scholar
- Saeys Y, Inza I, Larrañaga P: A review of feature selection techniques in bioinformatics. Bioinformatics 2007, 23(19):2507–17. 10.1093/bioinformatics/btm344View ArticlePubMedGoogle Scholar
- Draghici S, Khatri P, Tarca AL, Amin K, Done A, Voichita C, Georgescu C, Romero R: A systems biology approach for pathway level analysis. Genome Res 2007, 17(10):1537–45. 10.1101/gr.6202607View ArticlePubMedPubMed CentralGoogle Scholar
- Liu CC, Hu J, Kalakrishnan M, Huang H, Zhou XJ: Integrative disease classification based on cross-platform microarray data. BMC Bioinformatics 2009, 10(Suppl 1):S25. 10.1186/1471-2105-10-S1-S25View ArticlePubMedPubMed CentralGoogle Scholar
- Yeung KY, Medvedovic M, Bumgarner RE: Clustering gene-expression data with repeated measurements. Genome Biol 2003, 4(5):R34. 10.1186/gb-2003-4-5-r34View ArticlePubMedPubMed CentralGoogle Scholar
- Sjögren A, Kristiansson E, Rudemo M, Nerman O: Weighted analysis of general microarray experiments. BMC Bioinformatics 2007, 8: 387.View ArticlePubMedPubMed CentralGoogle Scholar
- Engreitz JM, Daigle BJ Jr, Marshall JJ, Altman RB: Independent component analysis: Mining microarray data for fundamental human gene modules. J Biomed Inform 2010, 43: 932–44. 10.1016/j.jbi.2010.07.001View ArticlePubMedPubMed CentralGoogle Scholar
- Small CL, Shima JE, Uzumcu M, Skinner MK, Griswold MD: Profiling gene expression during the differentiation and development of the murine embryonic gonad. Biol Reprod 2005, 72(2):492–501. 10.1095/biolreprod.104.033696View ArticlePubMedPubMed CentralGoogle Scholar
- Wilhelm D, Palmer S, Koopman P: Sex determination and gonadal development in mammals. Physiol Rev 2007, 87: 1–28. 10.1152/physrev.00009.2006View ArticlePubMedGoogle Scholar
- Loh YH, Wu Q, Chew JL, Vega VB, Zhang W, Chen X, Bourque G, George J, Leong B, Liu J, Wong KY, Sung KW, Lee CWH, Zhao XD, Chiu KP, Lipovich L, Kuznetsov VA, Robson P, Stanton LW, Wei CL, Ruan Y, Lim B, Ng HH: The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nat Genet 2006, 38(4):431–40. 10.1038/ng1760View ArticlePubMedGoogle Scholar
- Mitsui K, Tokuzawa Y, Itoh H, Segawa K, Murakami M, Takahashi K, Maruyama M, Maeda M, Yamanaka S: The homeoprotein Nanog is required for maintenance of pluripotency in mouse epiblast and ES cells. Cell 2003, 113(5):631–42. 10.1016/S0092-8674(03)00393-3View ArticlePubMedGoogle Scholar
- Chambers I, Colby D, Robertson M, Nichols J, Lee S, Tweedie S, Smith A: Functional expression cloning of Nanog, a pluripotency sustaining factor in embryonic stem cells. Cell 2003, 113(5):643–55. 10.1016/S0092-8674(03)00392-1View ArticlePubMedGoogle Scholar
- Hailesellasse Sene K, Porter CJ, Palidwor G, Perez-Iratxeta C, Muro EM, Campbell PA, Rudnicki MA, Andrade-Navarro MA: Gene function in early mouse embryonic stem cell differentiation. BMC Genomics 2007, 8: 85. 10.1186/1471-2164-8-85View ArticlePubMedGoogle Scholar
- Williams RL, Hilton DJ, Pease S, Willson TA, Stewart CL, Gearing DP, Wagner EF, Metcalf D, Nicola NA, Gough NM: Myeloid leukaemia inhibitory factor maintains the developmental potential of embryonic stem cells. Nature 1988, 336(6200):684–7. 10.1038/336684a0View ArticlePubMedGoogle Scholar
- Hibi K, Takahashi T, Sekido Y, Ueda R, Hida T, Ariyoshi Y, Takagi H, Takahashi T: Coexpression of the stem cell factor and the c-kit genes in small-cell lung cancer. Oncogene 1991, 6(12):2291–6.PubMedGoogle Scholar
- Watkins DN, Berman DM, Burkholder SG, Wang B, Beachy PA, Baylin SB: Hedgehog signalling within airway epithelial progenitors and in small-cell lung cancer. Nature 2003, 422(6929):313–7. 10.1038/nature01493View ArticlePubMedGoogle Scholar
- Young GD, Winokur TS, Cerfolio RJ, Van Tine BA, Chow LT, Okoh V, Garver RI Jr: differential expression and biodistribution of cytokeratin 18 and desmoplakins in non-small cell lung carcinoma subtypes. Lung Cancer 2002, 36(2):133–41. 10.1016/S0169-5002(01)00486-XView ArticlePubMedGoogle Scholar
- Cauffman G, De Rycke M, Sermon K, Liebaers I, Van de Velde H: Markers that define stemness in ESC are unable to identify the totipotent cells in human preimplantation embryos. Hum Reprod 2009, 24: 63–70. 10.1093/humrep/den351View ArticlePubMedGoogle Scholar
- Delprat B, Schaer D, Roy S, Wang J, Puel JL, Geering K: FXYD6 is a novel regulator of Na, K-ATPase expressed in the inner ear. J Biol Chem 2007, 282(10):7450–6. 10.1074/jbc.M609872200View ArticlePubMedGoogle Scholar
- Renault VM, Rafalski VA, Morgan AA, Salih DAM, Brett JO, Webb AE, Villeda SA, Thekkat PU, Guillerey C, Denko NC, Palmer TD, Butte AJ, Brunet A: FoxO3 regulates neural stem cell homeostasis. Cell Stem Cell 2009, 5(5):527–39. 10.1016/j.stem.2009.09.014View ArticlePubMedPubMed CentralGoogle Scholar
- Tothova Z, Gilliland DG: FoxO transcription factors and stem cell homeostasis: insights from the hematopoietic system. Cell Stem Cell 2007, 1(2):140–52. 10.1016/j.stem.2007.07.017View ArticlePubMedGoogle Scholar
- Salih DAM, Brunet A: FoxO transcription factors in the maintenance of cellular homeostasis during aging. Curr Opin Cell Biol 2008, 20(2):126–36. 10.1016/j.ceb.2008.02.005View ArticlePubMedPubMed CentralGoogle Scholar
- Bakker WJ, Harris IS, Mak TW: FOXO3a is activated in response to hypoxic stress and inhibits HIF1-induced apoptosis via regulation of CITED2. Mol Cell 2007, 28(6):941–53. 10.1016/j.molcel.2007.10.035View ArticlePubMedGoogle Scholar
- Kallio PJ, Wilson WJ, O'Brien S, Makino Y, Poellinger L: Regulation of the hypoxia-inducible transcription factor 1alpha by the ubiquitin-proteasome pathway. J Biol Chem 1999, 274(10):6519–25. 10.1074/jbc.274.10.6519View ArticlePubMedGoogle Scholar
- Jögi A, Vallon-Christersson J, Holmquist L, Axelson H, Borg A, Påhlman S: Human neuroblastoma cells exposed to hypoxia: induction of genes associated with growth, survival, and aggressive behavior. Exp Cell Res 2004, 295(2):469–87.View ArticlePubMedGoogle Scholar
- Shoshani T, Faerman A, Mett I, Zelin E, Tenne T, Gorodin S, Moshel Y, Elbaz S, Budanov A, Chajut A, Kalinski H, Kamer I, Rozen A, Mor O, Keshet E, Leshkowitz D, Einat P, Skaliter R, Feinstein E: Identification of a novel hypoxia-inducible factor 1-responsive gene, RTP801, involved in apoptosis. Mol Cell Biol 2002, 22(7):2283–93. 10.1128/MCB.22.7.2283-2293.2002View ArticlePubMedPubMed CentralGoogle Scholar
- Stahl M, Dijkers PF, Kops GJPL, Lens SMA, Coffer PJ, Burgering BMT, Medema RH: The forkhead transcription factor FoxO regulates transcription of p27Kip1 and Bim in response to IL-2. J Immunol 2002, 168(10):5024–31.View ArticlePubMedGoogle Scholar
- Tamayo P, Scanfeld D, Ebert BL, Gillette MA, Roberts CWM, Mesirov JP: Metagene projection for cross-platform, cross-species characterization of global transcriptional states. Proc Natl Acad Sci USA 2007, 104(14):5959–64. 10.1073/pnas.0701068104View ArticlePubMedPubMed CentralGoogle Scholar
- Paik JH, Kollipara R, Chu G, Ji H, Xiao Y, Ding Z, Miao L, Tothova Z, Horner JW, Carrasco DR, Jiang S, Gilliland DG, Chin L, Wong WH, Castrillon DH, DePinho RA: FoxOs are lineage-restricted redundant tumor suppressors and regulate endothelial cell homeostasis. Cell 2007, 128(2):309–23. 10.1016/j.cell.2006.12.029View ArticlePubMedPubMed CentralGoogle Scholar
- Shah NH, Jonquet C, Chiang AP, Butte AJ, Chen R, Musen MA: Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics 2009, 10(Suppl 2):S1. 10.1186/1471-2105-10-S2-S1View ArticlePubMedPubMed CentralGoogle Scholar
- Malone BM, Perkins AD, Bridges SM: Integrating phenotype and gene expression data for predicting gene function. BMC Bioinformatics 2009, 10(Suppl 11):S20. 10.1186/1471-2105-10-S11-S20View ArticlePubMedPubMed CentralGoogle Scholar
- Chen R, Li L, Butte AJ: AILUN: reannotating gene expression data automatically. Nat Methods 2007, 4(11):879. 10.1038/nmeth1107-879View ArticlePubMedPubMed CentralGoogle Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004, 5(10):R80. 10.1186/gb-2004-5-10-r80View ArticlePubMedPubMed CentralGoogle Scholar
- Stevens JR, Doerge RW: Combining Affymetrix microarray results. BMC Bioinformatics 2005, 6: 57. 10.1186/1471-2105-6-57View ArticlePubMedPubMed CentralGoogle Scholar
- Smyth GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004, 3: Article 3.Google Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc 1995, B(57):289–300.Google Scholar
- Kaufman L, Rousseeuw PJ:Finding groups in data: an introduction to cluster analysis. Hoboken, N.J.: Wiley; 2005. [http://www.loc.gov/catdir/enhancements/fy0626/2005278659-b.html]Google Scholar
- Hedges LV, Olkin I:Statistical methods for meta-analysis. Orlando: Academic Press; 1985. [http://www.loc.gov/catdir/description/els032/84012469.html]Google Scholar
- Sing T, Sander O, Beerenwinkel N, Lengauer T: ROCR: visualizing classifier performance in R. Bioinformatics 2005, 21(20):3940–1. 10.1093/bioinformatics/bti623View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.