Human gene expression sensitivity according to large scale meta-analysis
- Pei Hao†1, 2, 3,
- Siyuan Zheng†1, 2,
- Jie Ping2, 4,
- Kang Tu1, 2,
- Christian Gieger5,
- Rui Wang-Sattler5,
- Yang Zhong3Email author and
- Yixue Li1, 2, 4Email author
© Hao et al; licensee BioMed Central Ltd. 2009
Published: 30 January 2009
Genes show different sensitivities in expression corresponding to various biological conditions. Systematical study of this concept is required because of its important implications in microarray analysis etc. J.H. Ohn et al. first studied this gene property with yeast transcriptional profiling data.
Here we propose a calculation framework for gene expression sensitivity analysis. We also compared the functions, centralities and transcriptional regulations of the sensitive and robust genes. We found that the robust genes tended to be involved in essential cellular processes. Oppositely, the sensitive genes perform their functions diversely. Moreover while genes from both groups show similar geometric centrality by coupling them onto integrated protein networks, the robust genes have higher vertex degree and betweenness than that of the sensitive genes. An interesting fact was also found that, not alike the sensitive genes, the robust genes shared less transcription factors as their regulators.
Our study reveals different propensities of gene expression to external perturbations, demonstrates different roles of sensitive genes and robust genes in the cell and proposes the necessity of combining the gene expression sensitivity in the microarray analysis.
Genes show divergent expression patterns under various biological conditions, therefore a common task for biologists and biostatisticians is to find the differentially expressed genes between different conditions, such as treatment versus control, or normal versus abnormal, so as to identify the condition specific gene markers [1–3]. With the high throughput microarray technology, expression levels of several thousands of genes can be detected simultaneously and compared in parallel between numerous biological samples [4, 5], thus facilitating the study of gene expression-environment interactions.
Although external environment has important influences on the gene expression profiles, genes show different susceptivity. An intuitive example is the housekeeping genes which are required for the maintenance of the basal cellular functions  and believed to constitutively express in most of the tissues, though different expression levels can be observed (data not shown). This hypothesis was once used to identify HK genes, and promoted the understanding of HK genes [7–9].
Recently J.H. Ohn et al  constructed a non-directed bipartite perturbation network to study the yeast gene expression sensitivity to external perturbations. Through an 'excess retention' approach , they show significant differences between perturbation sensitive genes and perturbation resistant genes in protein interaction network, regulatory network and functional categories. As an exploratory work their study was based on the transcriptional profiling of gene deletion experiments of yeast and got very significant results. It is worthy of generalizing such kind of idea to human genes based on general biological condition variations to obtain a global view of the intrinsic properties of human gene expression as a response to perturbations. For this purpose we selected human gene expression data resulted from divergent experiments stored at the GEO database  and developed a meta-analysis method to study gene expression sensitivity globally. Regarding to our calculations it was found that the human genes show different expression sensitivities and can be categorized into sensitive or robust groups, according to the properties of how they response to the perturbations. Furthermore, in order to know the detail properties about related functions and interaction properties of both gene groups we assigned them onto protein-protein interaction networks and gene transcriptional regulatory networks. It was discovered that the robust genes tend to be involved in essential cellular processes. In contrast, the sensitive genes perform their functions diversely. We also found even if genes from both groups show similar geometric centrality by coupling them onto integrated protein networks, the robust genes have higher vertex degree and betweenness than that of the sensitive genes. Finally, an interesting fact has been found, not alike the sensitive genes, the robust genes share less transcription factors as their regulators. These facts discovered here maybe are useful for deciphering functions and related regulatory mechanisms of genes.
Data collection and preprocessing
All the GDS data sets of Affymetrix HGU133a platform in the GEO database  were downloaded to incorporate as many as biological samples. The reason why we chose the Affymetrix HGU133a platform is that, it is one of the most widely used platforms, i.e. there are far more data sets of HGU133a (236 data sets) than that of HGU133plus2.0 (69 data sets) in GEO. Data sets with less than 10 arrays were discarded. For each sample, the expression values that were below 10 were truncated to 10, and then were logarithmic transformed (base 2). The expression values of all probes for a given gene were reduced to a single value by taking the maximum expression value in each sample.
Calculate the matrix of standard deviations
For every data set, calculate the standard deviation (sd) for each gene g. Because the data sets are heterogeneous, expression standard deviations from different data sets for gene g can not be compared directly, therefore the sd of every data set were rank ordered, generating a rank sd matrix.
The final SS is the maximum deviation from zero. SS ranges from -1 to 1, and more closer to 1, more expression robust and vice versa. To evaluate the significance of an observed SS, a null distribution of SS is generated by randomly permutating the L. By 1000 random permutations of L, SSnull was computed for each SS and the nominal P value was assigned as the negative or positive portion of the SSnull corresponding to the observed sign of SS.
Denote the SS from random permutations as SSπ, the observed SS distribution as SSα. For a specific SS > 0, calculate the percentage of SSπ > SS which SSπ > 0; calculate the percentage of SSα > SS which SSα > 0, the FDR (False Discovery Rate) for SS is computed as ratio of the two percentages when SS > 0, similarly if SS < 0.
Though the Affymetrix HGU133a microarray does not represent all the human genes, by calculating the Sensitivity Score (SS) we can identify gene classes which are assumed to be rich in expression sensitive and robust genes. We investigated the genomic characteristics of the respective groups, including functional enrichment, centralities in the protein interaction network and regulations in the transcriptional regulatory network.
Assignment of expression robust and sensitive genes
Housekeeping genes (HK genes) have constitutive expressions , therefore comparative small expression variations of HK genes are expected under the numerous biological conditions. Eisenberg et al. identified 575 human HK genes  with a transcriptional profiling data set . We compared the SS of aforementioned HK genes with the overall SS and found that the HK genes have significant higher SS (Wilcox rank sum test, p < 2.2e-16) (Figure 1B).
Based on the above observations that the SS is a reasonable measurement for gene expression sensitivity, we selected two groups of genes as representative expression robust (661 genes) and sensitive genes (441 genes) based on the SS score cutoff 0.55, -0.5 respectively (see supplementary text for the discussion of statistical significance of SS). The functional analysis results are not sensitive to the exact SS cutoff. To evaluate the robustness of this categorization to different microarray platforms, we conducted a similar analysis on the HGU133plus2.0 microarray data sets following the same pipeline and test if the robust/sensitive genes remain robust/sensitive. Although the HGU133plus2.0 microarray represents much more genes, the result shows that the robust/sensitive genes identified from HGU133a microarrays are still robust/sensitive on the HGU133plus2.0 microarray (Additional File 1).
Functional annotations of expression robust and sensitive genes
Enriched biological processes. The table shows some of the enriched biological processes of the sensitive and robust genes respectively.
Enriched Biological Process
Enriched Biological Process
protein metabolic process
RNA metabolic process
Comparisons of robust and sensitive genes in protein interaction network and transcriptional regulatory network
Topological characteristics of protein interaction network are associated with many gene properties, e.g. gene essentiality, gene duplicability . Here we focus on the centrality of robust and sensitive genes in the network. To make reliable inferences from the comparison result, we used a high-quality protein interaction data [19, 20]. We also confirmed the result with the HPRD  interaction data (Additional file 1).
Jeong et al. have shown that protein lethality is correlated with its degree in the protein interaction network . This correlation implicates the bigger importance of robust genes to the cell system, consistent with the higher betweenness which is originally designed to measure the influence of a node over the spread of information through the network . Closeness measures gene's geometric centrality in the network. The comparison of closeness indicates that no group is organizationally more central than others. It is noteworthy that similar results were obtained when comparing the geometric centrality of essential genes and non-essential genes in yeast with a measure called 'excentricity' . This phenomenon is believed to be due to the function compensations .
Transcriptional regulatory network differs with the protein interaction network that they reflect different layers of cellular activities. Transcription factors, which bind to the gene upstream promoter regions, have significant influences on the gene expressions. Therefore, a natural question is, do the robust genes and the sensitive genes have different extent of regulation by transcription factors? To answer this question, we compared the upstream binding transcription factors of these two gene classes. We used the TRANSFAC database to build the regulatory network. Though it is far from complete, it is the most reliable and confident data source till now. For the 641 robust genes, there are totally 26 transcription factors recorded in the TRANSFAC database that can bind to promoter regions of them, while for the 441 sensitive genes, the number of regulatory transcription factors rises to 155. This result is consistent with the previous report  that the expression of sensitive genes is under more regulations.
Gene expression sensitivity measures gene's responses to the external environment on the transcriptome level. In this study, we proposed a large scale meta-analysis strategy to categorize expression robust and sensitive genes. Further we found these two gene classes show significant differences in various aspects, including functions based on Gene Ontology classification , centralities in protein networks and regulations by transcription factors.
The Gene Ontology analysis shows distinct functional differences between the robust and sensitive genes. The enriched biological processes of robust genes concentrate on the cellular essential processes, for instances, protein, mRNA metabolic process, translation, ubiquitin cycle etc, while for sensitive genes, the enriched biological processes concentrate on some cell "response" processes to the surrounding environment, like the immune responses, cell-cell signalling. Such functional preferences confirm the implications of these gene classes and reflect their different roles in the cell. Centrality analysis reveals that although they have similar geometric positions in the interactome, they show different local characterization (degree) and different weight for the spread of information (betweenness) in the protein network. Jeong et al. have reported the correlation between protein lethality and its degree in the protein interaction network , and another study shows the high-betweenness proteins are more likely to be essential . Together with the function analysis, we come to the conclusion that there are connections between gene expression sensitivity and the genes' impact on the system. More transcription factors were found to bind to sensitive genes. This result is analogous to the finding that yeast non-essential genes are regulated by more transcription factors compared with essential genes . It seems the essential process related genes tend to have simpler regulatory mode, which makes the cell more stable.
Though our study incorporated large volume of microarray data, there are several potential limitations. For example, more data was generated for the hot spots of the biological research, thus decreased the diversity of the experimental samples in our study. In addition, the affymetrix HGU133a microarray represents 13441 genes on the chip, however, the number of human genes is estimated to be between 20,000 and 25,000 . Another restriction to our observations is, the current map of protein interactions and gene regulations is far from complete.
A major challenge of microarray analysis is interpreting the biological relevance of changes in expression . However, the current approaches tend to select genes with the largest changes in expression. Our analysis suggests that genes have different propensities corresponding to perturbations and such propensities should be considered in the gene expression data analysis.
Understanding gene expression sensitivity has important implications for choosing biomarkers, drug targets etc from transcriptional profiling data. Though we explored the general characteristics of expression robust and sensitive genes, the underlying mechanisms of gene transcription sensitivity still represent further challenges.
We would like to thank Xiaojing Wang, Yao Yu and Yun Li for helpful discussions in the algorithm design and implementations. This work is supported by National High-Tech R&D Program of China (863) (grant 2007DFA31040, 2007AA02Z304) and Shanghai Committee of Science and Technology (grant 07ZR14085, 07DZ22004).
This article has been published as part of BMC Bioinformatics Volume 10 Supplement 1, 2009: Proceedings of The Seventh Asia Pacific Bioinformatics Conference (APBC) 2009. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/10?issue=S1
- Bhowmick D, Davison AC, Goldstein DR, Ruffieux Y: A Laplace mixture model for identification of differential expression in microarray experiments. Biostatistics 2006, 7(4):630–641. 10.1093/biostatistics/kxj032View ArticlePubMedGoogle Scholar
- Shanahan CM, Weissberg PL, Metcalfe JC: Isolation of gene markers of differentiated and proliferating vascular smooth muscle cells. Circ Res 1993, 73(1):193–204.View ArticlePubMedGoogle Scholar
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001, 98(9):5116–5121. 10.1073/pnas.091062498PubMed CentralView ArticlePubMedGoogle Scholar
- Bier FF, von Nickisch-Rosenegk M, Ehrentreich-Forster E, Reiss E, Henkel J, Strehlow R, Andresen D: DNA microarrays. Adv Biochem Eng Biotechnol 2008, 109: 433–453.PubMedGoogle Scholar
- Wilkes T, Laux H, Foy CA: Microarray data quality – review of current developments. Omics 2007, 11(1):1–13. 10.1089/omi.2006.0001View ArticlePubMedGoogle Scholar
- Butte AJ, Dzau VJ, Glueck SB: Further defining housekeeping, or "maintenance," genes Focus on "A compendium of gene expression in normal human tissues". Physiol Genomics 2001, 7(2):95–96.PubMedGoogle Scholar
- Tu Z, Wang L, Xu M, Zhou X, Chen T, Sun F: Further understanding human disease genes by comparing with housekeeping genes and other genes. BMC Genomics 2006, 7: 31. 10.1186/1471-2164-7-31PubMed CentralView ArticlePubMedGoogle Scholar
- Eisenberg E, Levanon EY: Human housekeeping genes are compact. Trends Genet 2003, 19(7):362–365. 10.1016/S0168-9525(03)00140-9View ArticlePubMedGoogle Scholar
- De Ferrari L, Aitken S: Mining housekeeping genes with a Naive Bayes classifier. BMC Genomics 2006, 7: 277. 10.1186/1471-2164-7-277PubMed CentralView ArticlePubMedGoogle Scholar
- Ohn JH, Kim J, Kim JH: Genomic characterization of perturbation sensitivity. Bioinformatics 2007, 23(13):i354–358. 10.1093/bioinformatics/btm172View ArticlePubMedGoogle Scholar
- Wuchty S, Almaas E: Peeling the yeast protein network. Proteomics 2005, 5(2):444–449. 10.1002/pmic.200400962View ArticlePubMedGoogle Scholar
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R: NCBI GEO: mining tens of millions of expression profiles – database and tools update. Nucleic Acids Res 2007, (35 Database):D760–765. 10.1093/nar/gkl887Google Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005, 102(43):15545–15550. 10.1073/pnas.0506580102PubMed CentralView ArticlePubMedGoogle Scholar
- Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E, et al.: PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 2003, 34(3):267–273. 10.1038/ng1180View ArticlePubMedGoogle Scholar
- Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A, et al.: Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci USA 2002, 99(7):4465–4470. 10.1073/pnas.012025199PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25(1):25–29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Yu H, Greenbaum D, Xin Lu H, Zhu X, Gerstein M: Genomic analysis of essentiality within protein networks. Trends Genet 2004, 20(6):227–231. 10.1016/j.tig.2004.04.008View ArticlePubMedGoogle Scholar
- Liang H, Li WH: Gene essentiality, gene duplicability and protein connectivity in human and mouse. Trends Genet 2007, 23(8):375–378. 10.1016/j.tig.2007.04.005View ArticlePubMedGoogle Scholar
- Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, et al.: Towards a proteome-scale map of the human protein-protein interaction network. Nature 2005, 437(7062):1173–1178. 10.1038/nature04209View ArticlePubMedGoogle Scholar
- Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, et al.: A human protein-protein interaction network: a resource for annotating the proteome. Cell 2005, 122(6):957–968. 10.1016/j.cell.2005.08.029View ArticlePubMedGoogle Scholar
- Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivakumar K, Anuradha N, Reddy R, Raghavan TM, et al.: Human protein reference database – 2006 update. Nucleic Acids Res 2006, (34 Database):D411–414. 10.1093/nar/gkj141Google Scholar
- Jeong H, Mason SP, Barabasi AL, Oltvai ZN: Lethality and centrality in protein networks. Nature 2001, 411(6833):41–42. 10.1038/35075138View ArticlePubMedGoogle Scholar
- Newman MEJ: A measure of betweenness centrality based on random walks. 2003.Google Scholar
- Wuchty S, Stadler PF: Centers of complex networks. J Theor Biol 2003, 223(1):45–53. 10.1016/S0022-5193(03)00071-7View ArticlePubMedGoogle Scholar
- Joy MP, Brock A, Ingber DE, Huang S: High-betweenness proteins in the yeast protein interaction network. J Biomed Biotechnol 2005, 2005(2):96–103. 10.1155/JBB.2005.96PubMed CentralView ArticlePubMedGoogle Scholar
- Stein LD: Human genome: end of the beginning. Nature 2004, 431(7011):915–916. 10.1038/431915aView ArticlePubMedGoogle Scholar
- Lu X, Jain VV, Finn PW, Perkins DL: Hubs in biological interaction networks exhibit low changes in expression in experimental asthma. Mol Syst Biol 2007, 3: 98. 10.1038/msb4100138PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.