A novel method for cross-species gene expression analysis
© Kristiansson et al.; licensee BioMed Central Ltd. 2013
Received: 25 June 2012
Accepted: 13 February 2013
Published: 27 February 2013
Analysis of gene expression from different species is a powerful way to identify evolutionarily conserved transcriptional responses. However, due to evolutionary events such as gene duplication, there is no one-to-one correspondence between genes from different species which makes comparison of their expression profiles complex.
In this paper we describe a new method for cross-species meta-analysis of gene expression. The method takes the homology structure between compared species into account and can therefore compare expression data from genes with any number of orthologs and paralogs. A simulation study shows that the proposed method results in a substantial increase in statistical power compared to previously suggested procedures. As a proof of concept, we analyzed microarray data from heat stress experiments performed in eight species and identified several well-known evolutionarily conserved transcriptional responses. The method was also applied to gene expression profiles from five studies of estrogen exposed fish and both known and potentially novel responses were identified.
The method described in this paper will further increase the potential and reliability of meta-analysis of gene expression profiles from evolutionarily distant species. The method has been implemented in R and is freely available athttp://bioinformatics.math.chalmers.se/Xspecies/.
KeywordsGene expression Evolution Meta-analysis Orthologs Paralogs Microarray RNA-seq
Gene expression microarray and RNA-seq provide fast and cost-efficient measurement of mRNA abundance for thousands of genes simultaneously. The amount of gene expression data generated by these techniques is constantly increasing and public repositories such as Gene Expression Omnibus and ArrayExpress contains today a large body of information from a wide range of species and experimental conditions[1, 2]. Large-scale gene expression assays are however plagued with high variability which complicates data interpretation. The abundance of mRNA is stochastic by nature, both on a cellular and multicellular level[3, 4], and there are often large variability between gene expression patterns from different organisms. In addition, technical parameters such as tissue heterogeneity, probe affinities and batch effects may introduce substantial levels of noise[6-8]. Gene expression data is therefore non-trivial to analyze and to put into a biological context.
One way to increase the potential of large-scale gene expression analysis is to combine information between different species. If a biological process is evolutionarily conserved between two species, it is also likely that the transcriptional responses associated with that process share similarities. Indeed, cross-species meta-analysis of gene expression profiles has previously been used to address many questions in biology and medicine. For example, gene expression analysis performed in model species such as mouse and rat are commonly used to study human diseases including cancer[10, 11], Alzheimer’s disease, diabetes and hypertension. Comparative analysis of gene expression profiles in human and mouse embryonic stem cells has been used to identify similarities and differences associated with the developmental biology in these species. Cross-species meta-analysis has also proven useful in biogeronotology where evolutionarily conserved age-related gene expression responses have been identified based on data from several species, including the fruit fly Drosophila melanogaster and the worm Caenorhabditis elegans[16, 17]. Another example is ecotoxicology, where changes of molecular biomarkers are used to detect toxic effects and to monitor populations and ecosystem health. Such biomarkers should be as general as possible and thus responsive in a wide range of species. Meta-analysis of gene expression profiles from multiple species therefore provides a powerful tool for identification and evaluation of biomarkers[19, 20].
Cross-species meta-analysis is however not straight-forward. Different species have different genomes and thus also essential differences in their transcriptomes. The evolutionary process of the eukaryotic genome includes events such as duplication and recombination, which creates complex relations between genes. There is no guarantee that genes from different species with a shared common ancestry (orthologs) have a one-to-one correspondence since gene duplications after speciation may have resulted in one or more additional gene copies (in-paralogs). For species with a relatively short evolutionary distance, such as human and mouse, the number of in-paralogs is low (5.9% of all homologs according to Homologene release 65). The numbers are however higher for species with larger evolutionary distance. For example, 9.6% of all human homologs in Drosophila melanogaster have at least one in-paralog and the corresponding numbers for Saccharomyces cerevisiae and Arabidopsis thaliana are 13.2% and 51% respectively (Homologene release 65). The function of paralogous genes tends to diverge over time and have in general a high gene expression diversity compared to single-copy genes[22-27]. Hence, information from all genes, including both orthologs and paralogs, is vital for cross-species analysis of gene expression profiles.
Several methods have previously been suggested for cross-species analysis of gene expression profiles. Fisher’s combined probability test, which transforms p-values from any number of tests into one single p-value, has been a popular method for comparing multiple gene expression experiments[28-31]. Another approach, which was developed by Stuart et al., was used to compare gene expression of homologs (identified using reciprocal best BLAST hits) over a wide range of experimental conditions. Le et al. developed a computationally efficient procedure that compares the distance between ranks of genes from pairs of species. The method was then applied to a large set of microarrays from man and mouse. Another method called mDEDS was developed by Campain and Yang and uses several different statistical measures to perform cross-species comparison of gene expression profiles. Other methods includes LOLA and L2L which are both online tools for comparisons of ranking lists of differentially expressed genes from microarrays studies, including lists from different species. However, all these methods assume a one-to-one correspondence between genes from different species. This assumption may be acceptable when comparing relatively closely related species such as mouse and man, but it makes these procedures inapplicable when comparing more distantly related species.
Lu and co-authors have previously developed methods for analysis of gene expression between different species that takes many-to-many relations into account[36-38]. By using Markov random fields and belief propagation, they were able to identify cell cycling genes in human and yeast. The methods were also used to analyze genes which shared expression profiles in human and mice infected by various pathogens. However, the topology of the Markov random fields depends on the experimental design which makes them hard to adapt to many forms of gene expression experiments. They also make explicit assumption of the distribution of the gene expression, either in the form of an extreme value distribution or a Gaussian distribution. This makes them unsuitable for many heterogeneous datasets with observations from multiple measurement platforms, such as gene expression microarrays and RNA-seq. To enable cross-species meta-analysis of existing and future gene expression data, novel flexible methods that can handle many-to-many relationships between genes are needed[30, 39].
In this paper we describe a new statistical method for meta-analysis of gene expression profiles from different species. The method was derived to take all orthologous and co-orthologous genes into account. Similar to Fisher’s method, the proposed method uses gene-specific p-values, which makes it applicable to many forms of measurement platforms including microarrays and sequencing based techniques such as RNA-seq. A simulation study showed that the proposed method resulted in a substantial gain of statistical power for identification of differentially expressed genes. As a proof of concept, we used the method to identify evolutionarily conserved regulation of stress responsive genes in eight species subjected to heat stress. We also applied the method to gene expression data from aquatic vertebrates exposed to estrogens to demonstrate its applicability within ecotoxicology.
A novel method for cross-species analysis of gene expression
Assume that a number of large-scale gene expression experiments have been performed in a set of species investigating an evolutionarily conserved transcriptional response. Assume further that each experiment has been analyzed individually resulting in a p-value for each measured gene describing the significance of the differential expression (e.g. between two treatments). We will also assume that there is a fixed and known evolutionary structure describing all groups of orthologous and co-orthologous genes present in the species of interest. Such homology groups are readily available from multiple sources, such as Homologene, OrthMCL-DB and InParanoid or can alternatively be inferred de novo by tools such as OrthoMCL.
The method proposed in this paper operates on the gene-specific p-values generated from each experiment. For each homology group and species, the method summarizes all in-paralogs into one single value by selecting the minimum (most significant) p-value. A weighted score is then calculated by summing the negative logarithms of the minimum p-values from each gene expression experiment. A combined p-value for each homology group is finally derived by comparing the observed score to the null distribution which has a known, but non-trivial, analytic form. Finally, a Benjamini-Hochberg false discovery rate (FDR) is calculated to control for the multiple testing of several homology groups (typically ∼10,000 homology groups are tested).
The weights used to combine the different experiments are based on the evolutionary structure. Under the assumption of no differential expression, genes with many in-paralogs are more likely to result in a lower minimum p-value than genes with few or no in-paralogs. The weights therefore decrease with the number of in-paralogs to generate an unbiased score. The weights also contain an arbitrary component, which can be used to weigh individual experiments up or down. For example, the arbitrary weights can be used to prevent bias if multiple experiments are performed in the same species.
Full mathematical details, including the derivation of the weights and the analytical null distribution, can be found in Methods. An R-implementation of the methods if freely available athttp://bioinformatics.math.chalmers.se/Xspecies/.
Evaluation of the statistical power
The proposed method: the most significant p-value of the in-paralogs in each species is combined across species.
The combination method: the expression data from in-paralogs are treated as independent biological replicates from the same gene .
The average method: expression data from in-paralogs are combined into one single observation by taking the average value of the raw expression data .
The random method: only expression data from one in-paralog is used (randomly selected). All other values are discarded .
For the combined, average and random method the cross-species p-value is calculated by Fisher’s combined probability test.
The methods were also evaluated using simulations in more diverse settings. When a second in-paralog was differentially expressed in the same direction, i.e. the same effect added to two genes, the performance of the combined and average method increased (Additional file1: Figure A1). However, when an effect in the opposite direction was added to a second in-paralog (half of the effect subtracted), the power of the average method decreased substantially. At an effect of 6, the power of the average method was reduced from 0.68 to 0.28 while the power for the proposed method decreased from 0.82 to 0.71 (Figure1 and Additional file1: Figure A2). When the normal distribution was replaced by a t-distribution with five degrees of freedom, the power decreased equally for all methods (Additional file1: Figure A3). A similar result was seen when errors were introduced in the homology structure by randomly replacing orthologous genes with non-orthologous genes from the same species (Additional file1: Figure A4 and A5).
Evolutionarily conserved expression changes in response to heat stress
A summary of the experiments used in the meta-analysis of heat stress
Analysis of the transcriptional responses to estrogens in fish
A summary of the experiments used in the meta-analysis of estrogen-exposed fish
E2, injected, 10 mg/kg
Pers. com. TD Williams,
E2, water, 50 ng/L
Pers. com. TD Williams,
EE2, water, 10 ng/L
E2, dietary, 5ppm
EE2, water, 10 ng/L
Furthermore, several significant homology groups contained genes that were not identified as estrogen responsive by any of the individual studies, e.g. fatty acid desaturase 2 (group 582, FDR= 1.5×10−7), sodium/potassium-transporting ATPase subunit alpha-1 (group 61, FDR= 7.8×10−6) and translocon-associated proteins delta and gamma (groups 561 and 1423, FDR= 2.9×10−8 and 3.9×10−9 respectively). These genes have all previously been shown to be estrogen responsive in mammals[68-70]. In addition, the translocon-associated protein subunit delta has been shown to be differentially expressed on protein level in Danio rerio exposed to estrogen.
Meta-analysis of gene expression profiles is hampered by the lack of a one-to-one correspondence between orthologous genes from different species. Evolutionary events, such as gene duplications, have resulted in paralogous genes which makes traditional approaches for meta-analysis inapplicable. We therefore developed a new statistical method for meta-analysis of gene expression profiles between experiments performed in evolutionarily distant species. The method takes advantage of the homology structure between the species of interest and can therefore take any number of orthologous and co-orthologous genes into account. The method is general in the sense that it operates on p-values from individual gene expression experiments and is therefore independent of the type of the raw gene expression data. This makes the method applicable to any gene expression measurement platform, including DNA microarrays and quantitative PCR as well as techniques based on sequencing such as RNA-seq. Using p-values also makes it possible to include results from already analyzed experiments where the raw data is not publicly available or missing.
The proposed method can be seen as an extension of Fisher’s combined probability test, which is widely used statistical method for meta-analysis. In fact, when no in-paralogous genes are present in any of the species, the proposed method and Fisher’s method are equivalent. Similarly to the Fisher’s combined probability test, the proposed method is dependent on the validity of the statistical models used to analyze the individual experiments. The combined cross-species p-values are calculated from an analytical distribution derived based on the assumption of gene-specific p-values that are independently and uniformly distributed under the null hypothesis. An alternative approach, which is less dependent on the model assumptions, is to use permutations. For many experimental designs, the null-distribution can be estimated by randomly permuting the labels of the samples in each experiment. However, permutation-based estimation of the null-distribution requires a relatively large number of biological replicates in order to generate a sufficiently large number of permutations. The heat stress data analyzed in this study had, for example, too few observations for estimation of the null-distribution using permutations.
Cross-species meta-analysis of gene expression is dependent of the evolutionary relationship between the orthologous and co-orthologous genes present in the species of interest. Identification of homologous genes in evolutionarily distant species is however complex and can result in false predictions. Such errors will either group non-related genes in the same homology group or, vice versa, scatter homologous genes between different homology groups. Since the proposed method assumes that the evolutionary structure is known and correct, such errors will affect the results negatively. Improved and more accurate algorithms for predicting homologous genes will thus further increase the potential of cross-species meta-analysis of gene expression. On the other hand, the conserved expression profiles generated by the proposed method can be used to correct false predictions of homology. In the heat stress analysis Homologene group 111895 (HSP70-homologs, Homologene release 65) was found to be highly significant in all species except for D. melanogaster. Interestingly, a closer examination of that homology group showed that the HSP70 functional domain was missing from the D. melanogaster gene and which suggests that it may indeed not be a true homolog.
The statistical power of the proposed method and three previously suggested methods for combining multiple observations in microarray analysis was evaluated using simulations. The proposed method was the only solution that was explicitly developed to handle in-paralogous genes and its power was, not surprisingly, considerably higher (Figure1). The resulting false discovery rate was also lower (Figure2). When multiple in-paralogs from the same homology group had a similar transcriptional pattern the difference in performance between the methods was reduced. However, when then multiple in-paralogs showed a divergent transcriptional pattern, the difference in performance increased in favor of the proposed method. This reflects the underlying assumptions, where the proposed method assumes that only one of the in-paralogs in homology group is differentially expressed while the others are non-responsive. The combination and average methods does, on the other hand, assume that all in-paralogs are affected by the treatment. It should also be noted that conditions used in the simulations are idealized and the results should therefore be interpreted as such. Real gene expression data does not follow a Gaussian distribution and has a complex correlation structure, both between genes and samples[6, 73-75]. The simulation study shows, however, that the loss in statistical power of detecting differentially expressed genes in cross-species meta-analysis may be substantial if in-paralogs are not properly incorporated in the analysis.
The proposed method was used to compare the gene expression response to heat stress based on microarray data from eight eukaryotes. The analysis identified several well-known mechanisms involved in the transcriptional response to heat. Most pronounced was the up-regulation of molecular chaperons and 10 of the 15 most significant homology groups corresponded to heat stress proteins from four of the five major chaperon families (Figure3). Functional enrichment of gene ontology terms revealed additional biological processes associated with the cellular response to heat. The number of significant homology groups was also shown to increase with the number of included species. These results show that the proposed model generated biologically relevant results by combining gene expression profiles from evolutionarily distant species. Analysis of evolutionarily conserved gene expression changes under heat stress has previously been suggested as an efficient approach to further understand the underlying biological processes. It is therefore plausible that a more in-depth analysis of our result from the cross-species meta-analysis may result in more insights and novel findings within this area.
Inter-species extrapolations is a cornerstone of ecotoxicological risk assessment since only a tiny fraction of the species present in the environment can be studied in the laboratory. Comparisons of inter-species gene expression profiles provide an attractive way to identify evolutionarily conserved modes of action and novel biomarkers of exposure or effect. We therefore used the proposed method to find common transcriptional responses in four different fish species. The analysis revealed several known and well-established responses of estrogen, some which have been associated with adverse physiological effects. The method also identified differentially regulated genes that were not classified as estrogen responsive by the individual experiments. This shows that the method can be used to identify evolutionarily conserved transcriptional responses to toxicants in ecologically relevant species and it demonstrates the potential of cross-species meta-analysis within ecotoxicology.
Cross-species analysis of gene expression is dependent on the similarities in the transcriptional responses of the studied species. However, evolutionarily distant species have fundamental differences in their physiology which makes it hard, or even impossible, to perform experiments under identical conditions. Even though the associated biological processes are evolutionarily conserved the differences in experimental design and execution can introduce substantial variability in the transcriptional responses. In the cross-species analysis of heat stress we included data from eight species that were treated with different degrees of heat stress during different time spans. There were also differences in the designs of the estrogen exposures, e.g. exposure concentrations, times and routes. Our results show, however, that for both these examples of cross-species analysis, the experiments were similar enough to generate biological relevant results. It is, on the other hand, hard to estimate what evolutionarily conserved transcriptional responses that are not identified due to differences in the experimental designs.
Cross-species analysis of gene expression is complicated by the non-trivial relationships between genes from different species. The new statistical method proposed in this study takes the evolutionary structure into account and can therefore compare transcriptional profiles from species with any number of orthologous and co-orthologous genes. The performance of the proposed method, compared to other existing solutions, was therefore considerably higher when in-paralogous genes are present. As a proof-of-concept, the method was used to identify evolutionarily conserved transcriptional responses in microarray data from heat stress experiment performed in eight diverse species. The applicability of the method within ecotoxicology was also demonstrated by the identification of known and novel responses in fish exposed to estrogens. An implementation of the method for the statistical language R is available for free athttp://bioinformatics.math.chalmers.se/Xspecies/.
It follows that any pair of genes g ijk and in homology group i are in-paralogs if j = j′ and orthologs or co-orthologs if j ≠ j′.
where K ij is a constant and w j are arbitrary experiment-specific weights summing to 1.
S i is thus a weighted sum of independent exponentially distributed random variables with intensity 1. The weights contains two parts, an experimental specific weight w j and 1/(k Exp[Y i j ]). The latter compensates for the number of paralogs in order to avoid bias from large homology groups. The weights w j are arbitrarily and can be set to weigh individual experiments up and down. This is for example useful when multiple experiments are performed in a single organism (see Estrogen exposure below for an example). However, more sophisticated weighting strategies are also possible, such as weights based on the evolutionary distance between the included species (e.g. evolutionary distinctiveness score).
The hypothesis in 1 can now be tested and a corresponding p-value calculated by comparing the observed value S i with the null distribution of S i .
Simulations were performed on homology groups from Homologene for the species Saccharomyces cerevisiae (4932), Schizosaccharomyces pombe (4896), Arabidopsis thaliana (3702), Oryza sativa (4530), Drosophila melanogaster (7227), Danio rerio (7955) Mus musculus (10090), Homo sapiens (9606) (NCBI Taxonomy IDs are given in parenthesis). Each gene was assumed to be measured in two different groups, one control and one treated, with three independent observations from each. Data was simulated from a Gaussian distribution with mean value 0 and variance 1 and p-value calculated using a two-population t-test assuming equal variance. For differentially expressed orthologous groups (10%, randomly selected) an effect ranging from 0 to 10 was added to the treated group (e.g. changing the expected value from 0 to the effect). For groups and species with in-paralogous genes the effect was added to one single in-paralog (randomly selected). The weights w ij in S i were set to be uniform. For the combined method all observations from in-paralogs treated as independent replicated observations for one single gene (homology group). For the average method, an average was taken over all observations from in-paralogs generating one single observation for each observation. For the random method one of the in-paralogs was randomly selected and other discarded. For these three methods the cross-species p-value was calculated by Fisher’s combined probability test. The false discovery rate for homology group i was estimated by calculating the proportion of false positives among the i most significant groups.
Meta-analysis of gene expression
Pre-processing and analysis of microarray data
Intensity data from Affymetrix type of microarrays was pre-processed using RMA while intensity data from two-channel microarrays was normalized using global loess. The quality of each microarray was assessed by inspecting scatter and MA plots of probe-wise intensity before and after normalization. For all include experiments, differentially expressed genes were identified using the moderated t-statistic implemented in the LIMMA R-package. Cross-species analysis using was performed using the proposed method where up- and down-regulated genes were tested separately using one-sided tests. The most significant p-value was then selected. The cross-species p-values were finally corrected for multiple testing using Benjamini-Hochbergs false discovery rate.
Gene expression data from eight experiments investigating the effects of heat stress in eight species were fetched from Gene Express Omnibus and ArrayExpress (Table1). Homologene release 65 was used to describe the evolutionary relationship between the genes from the different species. The arbitrary component of the weights was set to be uniform over the eight experiments. The homology groups were populated with Gene Ontology terms based on species-specific annotations retrieved from the GO Consortium FTP (ftp://ftp.geneontology.org/pub/go/gene-associations/). Only terms with an experimental evidence code (i.e. EXP, IDA, IPI, IMP, IGI and IEP) were considered. Functional enrichment was inferred using the topGO R package.
The five gene expression experiments included in the analysis are summarized in Table2. Gene expression data was retrieved from the Gene Expression Omnibus, ArrayExpress or through direct contact with the authors. Homology groups were inferred from the corresponding EST and transcript sequences using OrthoMCL with an inflation index of 1.5 (all other parameters had default values). To avoid bias from the multiple experiments performed in Oncorhynchus mykiss the arbitrary weight component was set to 0.25, 0.25, 0.25, 0.125 and 0.125 (following the order in Table2).
This research was supported by the Life Science Area of Advance at Chalmers University of Technology, Sweden, the Swedish Research Council (VR), the Foundation for Strategic Environmental Research (MISTRA) and the Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning (FORMAS) and Swedish Society for Medical Research (SSMF). We also acknowledge Timothy D Williams for providing gene expression data. Support from the Gothenburg Bioinformatics Network (GOTBIN) is also gratefully acknowledged.
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Muertter RN, Holko M, Ayanbule O, Yefanov A, Soboleva A: NCBI GEO: archive for functional genomics data sets - 10 years on. Nucleic Acids Res 2011, 39: D1005-D1010. 10.1093/nar/gkq1184PubMed CentralView ArticlePubMedGoogle Scholar
- Parkinson H, Sarkans U, Kolesnikov N, Abeygunawardena N, Burdett T, Dylag M, Emam I, Farne A, Hastings E, Holloway E, Kurbatova N, Lukk M, Malone J, Mani R, Pilicheva E, Rustici G, Sharma A, Williams E, Adamusiak T, Brandizi M, Sklyar N, Brazma A: ArrayExpress update-an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res 2011, 39: D1002-D1004. 10.1093/nar/gkq1040PubMed CentralView ArticlePubMedGoogle Scholar
- Raser JM, O’Shea EK: Noise in gene expression: origins, consequences, and control. Science 2005, 309: 2010-2013. 10.1126/science.1105891PubMed CentralView ArticlePubMedGoogle Scholar
- Taniguchi Y, Choi PJ, Li GW, Chen H, M Babu JH, Emili A, Xie XS: Quantifying E coli proteome and transcriptome with single-molecule sensitivity in single cells. Science 2011, 329: 533-538.View ArticleGoogle Scholar
- Allison DB, Cui X, Page GP, Sabripour M: Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 2006, 7: 55-56. 10.1038/nrg1749View ArticlePubMedGoogle Scholar
- Kristiansson E, Sjögren A, Rudemo M, Nerman O: Weighted analysis of paired microarray experiments. Stat Appl Genet Mol Biol 2005, 4: Article 30.Google Scholar
- Consortium M: The microarray quality control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 2006, 24: 1151-1161. 10.1038/nbt1239View ArticleGoogle Scholar
- Kuo WP, Liu F, Trimarchi J, Punzo C, Lombardi M, Sarang J, Whipple ME, Maysuria M, Serikawa K, Lee SY, McCrann D, Kang J, Shearstone JR, Burke J, Park DJ, Wang X, Rector TL, Ricciardi-Castagnoli P, Perrin S, Choi S, Bumgarner R, Kim JH, III GFS, Freeman MW, Seed B, Jensen R, Church GM, Hovig E, Cepko CL, Park P, Ohno-Machado L, Jenssen TK: A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies. Nat Biotechnol 2006, 24: 832-840. 10.1038/nbt1217View ArticlePubMedGoogle Scholar
- Ala U, Piro RM, Grassi E, Damasco C, Silengo L, Oti M, Provero P, Di Cunto F: Prediction of human disease genes by human-mouse conserved coexpression analysis. PLoS Comput Biol 2009, 4: e1000043.View ArticleGoogle Scholar
- Segal E, Friedman N, Kaminski N, Regev A, Koller D: From signatures to models: understanding cancer using microarrays. Nat Genet 2005, 37: S38-S45. 10.1038/ng1561View ArticlePubMedGoogle Scholar
- Sweet-Cordero A, Mukherjee S, You ASH, Roix JJ, Ladd-Acosta C, Mesirov J, Golub TR, Jacks T: An oncogenic KRAS2 expression signature identified by cross-species gene-expression analysis. Nat Genet 2005, 37: 48-55.PubMedGoogle Scholar
- Miller JA, Horvath S, Geschwind DH: Divergence of human and mouse brain transcriptome highlights Alzheimer disease pathways. Proc Natl Acad Sci 2010, 107: 220-229.Google Scholar
- Rasche A, Al-Hasani H, Herwig R: Meta-analysis approach identifies candidate genes and associated molecular networks for type-2 Diabetes mellitus. BMC Genomics 2008, 9: 310. 10.1186/1471-2164-9-310PubMed CentralView ArticlePubMedGoogle Scholar
- Marques FZ, Campain AE, Yang YHJ, Morris BJ: Meta-analysis of genome-wide gene expression differences in onset and maintenance phases of genetic hypertension. Hypertension 2010, 56: 319-324. 10.1161/HYPERTENSIONAHA.110.155366View ArticlePubMedGoogle Scholar
- Ginis I, Luo Y, Miura T, Thies S, Brandenberger R, Gerecht-Nir S, Amit M, Hoke A, Carpenter MK, Itskovitz-Eldor J, Rao MS: Differences between human and mouse embryonic stem cells. Dev Biol 2004, 269: 360-380. 10.1016/j.ydbio.2003.12.034View ArticlePubMedGoogle Scholar
- Pan F, Chiu CH, Pulapura S, Mehan MR, Nunez-Iglesias J, Zhang K, Kamath K, Waterman MS, Finch CE, Zhou XJ: Gene Aging Nexus: a web database and data mining platform for microarray data on aging. Nucleic Acids Res 2007, 35: D756-D759. 10.1093/nar/gkl798PubMed CentralView ArticlePubMedGoogle Scholar
- de Magalhaes JP, Curado J, Church GM: Meta-analysis of age-related gene expression profiles identifies common signatures of aging. Bioinformatics 2009, 25: 875-881. 10.1093/bioinformatics/btp073PubMed CentralView ArticlePubMedGoogle Scholar
- Gunnarsson L, Kristiansson E, Rutgersson C, Sturve J, Fick J, Förlin L, Larsson DGJ: Pharmaceutical industry effluent diluted 1:500 affects global gene expression, cytochrome P450 1A activity, and plasma phosphate in fish. Environ Toxicol Chem 2010, 28: 2639-37.View ArticleGoogle Scholar
- Gunnarsson L, Kristiansson E, Förlin L, Nerman O, Larsson DGJ: Sensitive and robust gene expression changes in fish exposed to estrogen-a microarray approach. BMC Genomics 2007, 8: 149. 10.1186/1471-2164-8-149PubMed CentralView ArticlePubMedGoogle Scholar
- Ung CY, Lam SH, Hiaing MM, Winata CL, Korzh S, Mathavan S, Gong Z: Mercury-induced hepatotoxicity in zebrafish: in vivo mechanistic insights from transcriptome analysis, phenotype anchoring and targeted gene expression validation. BMC Genomics 2010, 11: 212. 10.1186/1471-2164-11-212PubMed CentralView ArticlePubMedGoogle Scholar
- Kristensen DM, Wolf YI, Mushegian AR, Koonin EV: Computational methods for gene orthology inference. Brief in Bioinform 2011, 12: 379-91. 10.1093/bib/bbr030View ArticleGoogle Scholar
- Ohno S: Evolution by Gene Duplication. New York: Springer; 1970.View ArticleGoogle Scholar
- Gu Z, Rifkin SA, White KP, Li WH: Duplicate genes increase gene expression diversity within and between species. Nat Genet 2004, 36: 577-579. 10.1038/ng1355View ArticlePubMedGoogle Scholar
- Huminiecki L, Wolfe KH: Divergence of spatial gene expression profiles following species-specific gene duplications in human and mouse. Genome Res 2004, 14: 1870-1879. 10.1101/gr.2705204PubMed CentralView ArticlePubMedGoogle Scholar
- Lynch M, Katju V: The altered evolutionary trajectories of gene duplicates. Trend Genet 2004, 20: 544-9. 10.1016/j.tig.2004.09.001View ArticleGoogle Scholar
- Studer R A, Robinson-Rechavi M: How confident can we be that orthologs are similar, but paralogs differ? Trends Genet 2009, 25: 210-216. 10.1016/j.tig.2009.03.004View ArticlePubMedGoogle Scholar
- Chen X, Zhang J: The ortholog conjecture is untestable by the current gene ontology but is supported by RNA sequencing data. PLoS Comput Biol 2012, 8: e1002784. 10.1371/journal.pcbi.1002784PubMed CentralView ArticlePubMedGoogle Scholar
- Fisher RA: Answer to question 14 on combining independent tests of significance. Amer Statistician 1948, 2: 30.Google Scholar
- Hu P, Greenwood CMT, Beyene J: Statistical methods for meta-analysis of microarray data: a comparative study. Inf Syst Front 2006, 8: 9-20. 10.1007/s10796-005-6099-zView ArticleGoogle Scholar
- Campain A, Yang YH: Comparison study of microarray meta-analysis methods. BMC Bioinformatics 2010, 3: 408.View ArticleGoogle Scholar
- Tseng GC, Ghosh D, Feingold E: Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Res 2012, 40: 3785-3799. 10.1093/nar/gkr1265PubMed CentralView ArticlePubMedGoogle Scholar
- Stuart JM, Segal E, Koller D, Kim SK: A gene-coexpression network for global discovery of conserved genetic modules. Science 2003, 10: 249-255.View ArticleGoogle Scholar
- Le HS, Oltvai ZN, Bar-Joseph Z: Cross-species queries of large gene expression databases. Bioinformatics 2010, 26: 2416-2423. 10.1093/bioinformatics/btq451PubMed CentralView ArticlePubMedGoogle Scholar
- Cahan P, Ahmad AM, Burke H, Fu S, Lai Y, Florea L, Dharker N, Kobrinski T, Kale P, McCaffrey TA: List of lists-annotated (LOLA): a database for annotation and comparison of published microarray gene lists. Gene 2005, 24: 78-82.View ArticleGoogle Scholar
- Newman JC, Weiner AM: L2L: a simple tool for discovering the hidden significance in microarray expression data. Genome Biol 2005, 6: R81. 10.1186/gb-2005-6-9-r81PubMed CentralView ArticlePubMedGoogle Scholar
- Lu Y, Rosenfeld R, Bar-Joseph Z: Identifying cycling genes by combining sequence homology and expression data. Bioinformatics 2006, 22: e314-e322. 10.1093/bioinformatics/btl229View ArticlePubMedGoogle Scholar
- Lu Y, Mahony S, Benos PV, Rosenfeld R, Simon I, Breeden LL, Bar-Joseph Z: Combined analysis reveals a core set of cycling genes. Genome Biol 2007, 8: R146. 10.1186/gb-2007-8-7-r146PubMed CentralView ArticlePubMedGoogle Scholar
- Lu Y, Rosenfeld R, Nau GJ, Bar-Joseph Z: Cross species expression analysis of innate immune response. J Comput Biol 2010, 17: 253-68. 10.1089/cmb.2009.0147PubMed CentralView ArticlePubMedGoogle Scholar
- Ramasamy A, Mondry A, Holmes CC, Altman DG: Key issues in conducting a meta-analysis of gene expression microaray datasets. PLoS Med 2008, 5: e184. 10.1371/journal.pmed.0050184PubMed CentralView ArticlePubMedGoogle Scholar
- Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Federhen S, Feolo M, Fingerman IM, Geer LY, Helmberg W, Kapustin Y, D Landsman DJL, Lu Z, Madden TL, Madej T, Maglott DR, Miller AMBV, Mizrachi I, Ostell J, Panchenko A, Phan L, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Slotta D, Souvorov A, Starchenko G, Tatusova TA, Wagner L, Wang Y, Wilbur WJ, Yaschenko E, Ye J: Database resources of the national center for biotechnology information. Nucleic Acids Res 2011, 39: D38-D51. 10.1093/nar/gkq1172PubMed CentralView ArticlePubMedGoogle Scholar
- Chen F, Mackey AF, Jr CJS, Roos DS: OrthoMCL-DB: quering a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res 2006, 34: D363-D368. 10.1093/nar/gkj123PubMed CentralView ArticlePubMedGoogle Scholar
- Berglund AC, Sjölund E, Östlund G, Sonnhammer ELL: InParanoid 6: eukaryotic ortholog clusters with inparalogs. Nucleic Acids Res 2008, 36: D263-D266.PubMed CentralView ArticlePubMedGoogle Scholar
- Li L, Jr CJS, Roos DS: OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res 2003, 13: 2178-2189. 10.1101/gr.1224503PubMed CentralView ArticlePubMedGoogle Scholar
- Grützmann R, Boriss H, Ammerpohl O, Lüttges J, Kalthoff H, Schackert HK, Klöppel G, Saeger HD, Pilarsky C: Meta-analysis of microarray data on pancreatic cancer defines a set of commonly dysregulated genes. Oncogene 2005, 24: 5079-5088. 10.1038/sj.onc.1208696View ArticlePubMedGoogle Scholar
- Richter K, Haslbeck M, Buchner J: The heat shock response: life on the verge of death. Mol Cell 2010, 40: 253-266. 10.1016/j.molcel.2010.10.006View ArticlePubMedGoogle Scholar
- Feder ME, Hoffman GE: Heat-shock proteins, molecular chaperones, and the stress response: evolutionary and ecological physiology. Annu Rev Physiol 1999, 61: 243-282. 10.1146/annurev.physiol.61.1.243View ArticlePubMedGoogle Scholar
- Laramie JM, Chung TP, Brownstein B, Cobb GDSJP: Transcriptional profiles of human epithelial cells in response to heat: computational evidence for novel heat shock proteins. Shock 2008, 29: 623-630.PubMedGoogle Scholar
- Vallant B, Anderssson SP, Brown-Borg HM, Ren H, Kersten S, Jonnalagadda S, Srinivasan R, Corton J: Analysis of the heat shock response in mouse liver reveals transcriptional dependence on the nuclear receptor peroxisome proliferatoractivated receptor a (PPARa). BMC Bioinformatics 2010, 11: 16. 10.1186/1471-2105-11-16View ArticleGoogle Scholar
- Sorensen JG, Nielsen MM, Kruhoffer M, Justesen J, Loeschcke V: Full genome gene expression analysis of the heat stress response in drosophila melanogaster. Cell Stress Chaperones 2005, 10: 312-328. 10.1379/CSC-128R1.1PubMed CentralView ArticlePubMedGoogle Scholar
- Hu W, Hu G, Han B: Genome-wide survey and expression profiling of heat shock proteins and heat shock factors revealed overlapped and stress specific response under abiotic stresses in rice. Plant Sci 2009, 176: 583-590. 10.1016/j.plantsci.2009.01.016View ArticlePubMedGoogle Scholar
- Kilian J, Whitehead D, Horak J, Wanke D, Weinl S, Batistic O, D’Angelo C, Bornberg-Bauer E, Kudla J, Harter K: The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. Plant J 2007, 50: 347-363. 10.1111/j.1365-313X.2007.03052.xView ArticlePubMedGoogle Scholar
- Chen D, Toone MW, Mata J, Lyne R, Burns G, Kivinen K, Brazama A, Jones N, Bahler J: Global transcriptional responses of fission yeast to environmental stress. Mol Cell Biol 2003, 14: 214-229. 10.1091/mbc.E02-08-0499View ArticleGoogle Scholar
- Berry DB, Gasch AP: Stress-activated genomic expression changes serve a preparative role for impending stress in yeast. Mol Biol Cell 2008, 19: 4580-4587. 10.1091/mbc.E07-07-0680PubMed CentralView ArticlePubMedGoogle Scholar
- Purdom CE, Hardiman PA, Bye VJ, Eno NC, Tyler CR, Sumpter JP: Estrogenic effects of effluents from sewage treatment works. Chem Ecol 1994, 8: 275-285. 10.1080/02757549408038554View ArticleGoogle Scholar
- Larsson DGJ, Adolfsson-Erici M, Parkkonen J, Pettersson M, Berg AH, Olsson PE, Förlin L: Ethinyloestradiol - an undesired fish contraceptive? Aquat Toxicol 1999, 45: 91-97. 10.1016/S0166-445X(98)00112-XView ArticleGoogle Scholar
- Routledge EJ, Sheahan D, Desbrow C, Brighty GC, Waldock M, Sumpter JP: Identification of estrogenic chemicals in STW effluent. 2. In vivo responses in trout and roach. Environ Sci Technol 1998, 32: 1559-1565. 10.1021/es970796aView ArticleGoogle Scholar
- Jobling S, Coey S, Whitmore JG, Kime DE, van Look KJ, McAllister BG, Beresford N, AC ACH, Brighty G, Tyler CR, Sumpter JP: Wild intersex roach (Rutilus rutilus) have reduced fertility. Biol Reprod 2002, 67: 515-524. 10.1095/biolreprod67.2.515View ArticlePubMedGoogle Scholar
- Sumpter JP, Jobling S: Vitellogenesis as a biomarker for contamination of the aquatic environment. Environ Health Perspect 1995, 103: 173-178.PubMed CentralView ArticlePubMedGoogle Scholar
- Thomas-Jones E, Thorpe K, Harrison N, Thomas G, Morris C, Hutchinson T, Woodhead S, Tyler C: Dynamics of estrogen biomarker responses in rainbow trout exposed to 17β-estradiol and 17α-ethinylestradiol. Environ Toxicol Chem 2003, 22: 3001-3008. 10.1897/03-31View ArticlePubMedGoogle Scholar
- Carnevali O, Maradonna F: Exposure to xenobiotic compounds: looking for new biomarkers. Comp Endocrinol 2003, 131: 203-208. 10.1016/S0016-6480(03)00105-9View ArticleGoogle Scholar
- de Wit M, Keil D, van der Ven K, Vandamme S, Witters E, Coen WD: An integrated transcriptomic and proteomic approach characterizing estrogenic and metabolic effects of 17α-ethinylestradiol in zebrafish (Danio rerio). Gen Comp Endocrinol 2010, 167: 190-201. 10.1016/j.ygcen.2010.03.003View ArticlePubMedGoogle Scholar
- Arukwe A, Goksøyr A: Eggshell and egg yolk proteins in fish: hepatic proteins for the next generation: oogenetic, population, and evolutionary implications of endocrine disruption. Comp Hepatol 2003, 2: 4. 10.1186/1476-5926-2-4PubMed CentralView ArticlePubMedGoogle Scholar
- Davis AP, King BL, Mockus S, Murphy CG, Saraceni-Richards C, Rosenstein M, Wiegers T, Mattingly CJ: The comparative toxicogenomics database: update 2011. Nucleic Acids Res 2011, 39: D1067-D1072. 10.1093/nar/gkq813PubMed CentralView ArticlePubMedGoogle Scholar
- Williams TD, Diab AM, George SG, Sabine V, Chipman JK: Gene expression responses of European flounder (Platichthys flesus) to 17-β estradiol. Toxicol Lett 2007, 168: 236-48. 10.1016/j.toxlet.2006.10.020View ArticlePubMedGoogle Scholar
- Geoghegan F, Katsiadaki I, Williams TD, Chipman JK: A cDNA microarray for the three-spined stickleback, Gasterosteus aculeatus L., and analysis of the interactive effects of oestradiol and dibenzanthracene exposures. J of Fish Biol 2008, 72: 2133-53. 10.1111/j.1095-8649.2008.01859.xView ArticleGoogle Scholar
- Martyniuka CJ, Gerrie ER, Popesku JT, Ekker M, Trudeau VL: Microarray analysis in the zebrafish (Danio rerio) liver and telencephalon after exposure to low concentration of 17α-ethinylestradiol. Aquat Toxicol 2007, 84: 38-49. 10.1016/j.aquatox.2007.05.012View ArticleGoogle Scholar
- Tilton SC, Givan SA, Pereira CB, Bailey GS, Williams DE: Toxicogenomic profiling of the hepatic tumor promoters indole-3-carbinol, 17α-estradiol and β-naphthoflavone in rainbow trout. Toxicol Sci 2006, 90: 61-72.View ArticlePubMedGoogle Scholar
- Sárvári M, Hrabovszky E, Kalló T, Galamb O, Solymosi N, Likó T, Molnár B, Tihanyi K, Szombathelyi Z, Liposits Z: Gene expression profiling identifies key estradiol targets in the frontal cortex of the rat. Endocrinology 2010, 151: 1161-1176. 10.1210/en.2009-0911View ArticlePubMedGoogle Scholar
- Kwekel JC, Burgoon LD, Burt JW, Harkema JR, Zacharewski TR: A cross-species analysis of the rodent uterotrophic program: elucidation of conserved responses and targets of estrogen signaling. Citation Physiol Genomics 2005, 23: 327-342. 10.1152/physiolgenomics.00175.2005View ArticleGoogle Scholar
- Henríquez-Hernández LA, Flores-Morales A, Santana-Farré R, Axelson M, Nilsson P, Norstedt G, Fernández-Pérez L: Role of pituitary hormones on 17α-ethinylestradiol-induced cholestasis in rat. J Pharmacol Exp Ter 2007, 320: 695-705.View ArticleGoogle Scholar
- Xu R, Li X: A comparison of parametric versus permutation methods with applications to general and temporal microarray gene expression data. Bioinformatics 2003, 19: 1284-1289. 10.1093/bioinformatics/btg155View ArticlePubMedGoogle Scholar
- Chen F, Mackey AJ, Vermunt JK, Roos DS: Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS One 2007, 2: e383. 10.1371/journal.pone.0000383PubMed CentralView ArticlePubMedGoogle Scholar
- Kristiansson E, Sjögren A, Rudemo M, Nerman O: Quality optimised analysis of general paired microarray experiments. Stat Appl Genet Mol Biol 2006, 5: Article 10.Google Scholar
- Klebanov L, Jordan C, Yakovlev A: A new type of stochastic dependence revealed in gene expression data. Stat Appl Genet Mol Biol 2006, 5: Article 7.Google Scholar
- Sjögren A, Kristiansson E, Rudemo M, Nerman O: Weighted analysis of general microarray experiments. BMC Bioinformatics 2007, 8: 387. 10.1186/1471-2105-8-387PubMed CentralView ArticlePubMedGoogle Scholar
- Forbes EV, Calow P: Extrapolation in ecological risk assessment: balancing pragmatism and precaution in chemical controls legislation. Bioscience 2002, 52: 249-257. 10.1641/0006-3568(2002)052[0249:EIERAB]2.0.CO;2View ArticleGoogle Scholar
- Isaac NJB, Turvey ST, Collen B, Waterman C, Baillie JEM: Mammals on the EDGE: conservation priorities based on threat and phylogeny. PLoS One 2007, 2: e296. 10.1371/journal.pone.0000296PubMed CentralView ArticlePubMedGoogle Scholar
- Good IJ: On the weighted combination of significance tests. J Roy Statist Soc Ser B (Methodological) 1955, 17: 264-265.Google Scholar
- Bhoj DS, Schiefermayr K: Approximations to the distribution of weighted combination of independent probabilites. J Statist Comput and Simul 2008, 68: 153-159.View ArticleGoogle Scholar
- Bolstad BM, Irizarry RA, øAstrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on bias and variance. Bioinformatics 2003, 19: 185-193. 10.1093/bioinformatics/19.2.185View ArticlePubMedGoogle Scholar
- Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 2002, 30: e15. 10.1093/nar/30.4.e15PubMed CentralView ArticlePubMedGoogle Scholar
- Smythe GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004, 3: Article 3.Google Scholar
- Alexa A, Rahnenführer J, Lengauer T: Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 2006, 22: 1600-1607. 10.1093/bioinformatics/btl140View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.