Association of tissue lineage and gene expression: conservatively and differentially expressed genes define common and special functions of tissues
© Hao et al. 2010
Published: 14 December 2010
Skip to main content
© Hao et al. 2010
Published: 14 December 2010
Embryogenesis is the process by which the embryo is formed, develops, and establishes developmental hierarchies of tissues. The recent advance in microarray technology made it possible to investigate the tissue specific patterns of gene expression and their relationship with tissue lineages. This study is focused on how tissue specific functions, tissue lineage, and cell differentiation are correlated, which is essential to understand embryonic development and organism complexity.
We performed individual gene and gene set based analysis on multiple tissue expression data, in association with the classic topology of mammalian fate maps of embryogenesis. For each sub-group of tissues on the fate map, conservatively, differentially and correlatively expressed genes or gene sets were identified. Tissue distance was found to correlate with gene expression divergence. Tissues of the ectoderm or mesoderm origins from the same segments on the fate map shared more similar expression pattern than those from different origins. Conservatively expressed genes or gene sets define common functions in a tissue group and are related to tissue specific diseases, which is supported by results from Gene Ontology and KEGG pathway analysis. Gene expression divergence is larger in certain human tissues than in the mouse homologous tissues.
The results from tissue lineage and gene expression analysis indicate that common function features of neighbor tissue groups were defined by the conservatively expressed genes and were related to tissue specific diseases, and differentially expressed genes contribute to the functional divergence of tissues. The difference of gene expression divergence in human and mouse homologous tissues reflected the organism complexity, i.e. distinct neural development levels and different body sizes.
Investigating the expression divergence of multiple tissues is important to understand the organism complexity . Currently, the advance in microarray technology has provided a huge amount of quantitative data of tissue housekeeping and tissue specific gene expression in many species . Some studies have focused on the relationship among tissue specificity, expression conservation, expression level, and sequence conservation [3, 4]. For example, the evolutionary rate of expression divergence and that of coding sequence divergence were found to be weakly, but significantly positively correlated. Others paid attention to housekeeping, tissue-specific gene and tissue-specific transcriptional regulation for revealing molecular fundamental of tissue function and their evolutionary characters [5–7]. Results from these studies showed that different tissues shared a large number of ubiquitously expressed genes and the tissue specific genes are more likely to reveal tissue-specific function. A gene set (Gene Ontology - GO) based analysis proposed a “tissue-driven” hypothesis which describes the relationship between the stabilizing constraints on tissue-specific gene expression and individual GO categories .
Embryogenesis is the process by which the embryo is formed, develops, and establishes developmental hierarchies of tissues. It describes the developmental history from the single-celled zygote to the multi-celled adult [9, 10]. The construction of Caenorhabditis elegans cell fate map traces the sequence of cell division, migration, and apoptosis of each of the 671 cells . Fate maps of mammals  were proved necessary to construct developmental hierarchies of tissues. For example, during the early stages of embryonic development, the brain starts to form in three distinct segments: the prosencephalon, mesencephalon, and rhombencephalon. The cerebellum later develops from the rhombencephalon, which is the caudal portion of the embryonic brain . The gonads which in males are the testes and in females are the ovaries develop from the intermediate mesoderm, which is found between the paraxial mesoderm and the lateral plate [7, 14]. It is also from the intermediate mesoderm the mammalian kidney develops.
To investigate the gene expression similarity and specificity of tissues in association with the fate map of embryogenesis is of great important to understand how tissue specific functions, tissue lineage, and cell differentiation are correlated in the context of embryonic development and organism complexity. To predict the potential disorders of a tissue from its expression profile might lead to early diagnosis of certain diseases and identification of congenital defects in clinics. In the current study, by combining gene expression profiling from multiple tissues with the mammalian fate maps of embryogenesis, we investigated tissue lineage and cell differentiation through mammalian development at the molecular level. We introduced individual gene and gene set based approaches  for evaluating tissue expression similarity and divergence. To provide a valuable resource for the in-depth understanding of tissue development and tissue specific functions, we created a gene and gene set expression map along the paths of mammalian embryogenesis. Herein, we present our results on the conservatively, differentially and correlatively expressed genes/gene sets for each sub-group of tissues on the fate map. By Gene Ontology and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis, we further define the functions of each gene group. In addition, by comparing the data from human and mouse tissues, we investigate the inter-species divergence of expression profile in tissues. Our results provide valuable insight into how developmental hierarchies and gene expression are coordinated, and help understand tissue specific functions in relation to cell differentiation features.
The human (HG-U133A and GNFIH) and mouse (GNF1M) Affymetrix microarray data (Affymetrix, Snata Clara, CA)  were retrieved from Gene Expression Omnibus (GEO) of National Center for Biotechnology Information (NCBI)  (Human: GDS594 and GDS596; Mouse: GDS 592). The data resource contains the expression data from 73 human tissues (GDS594/GDS596) and 69 mouse tissues (GDS592). Only common homology tissues from human and mouse were considered in the following analysis (24 orthologous adult tissue listed in Additional file 1). And for those biological replicates we assigned the average expression value as the signal. After mapping probes to the microarray chip platform annotation files, the final datasets include 14746 human genes and 13048 mouse genes. The homology gene pairs’ information in human and mouse was gained from NCBI (http://www.nicbi.nlm.gov/HomoloGene). After extracting the unique hit homology pairs, we identified 5055 orthologous gene pairs have expression data in both human and mouse. We used packages from BioConductor (http://www.bioconductor.org/) for functional annotation of those genes. Package ‘org.Hs.eg.db’ and ‘org.Mm.eg.db’ (version 2.2.11) were used for Gene Ontology (GO) mapping of human and mouse genes respectively. Package ‘KEGG.db’ (version 2.2.11) was used for pathway mapping of genes. Only these GO modules with over five genes on the chip will be used in the following analysis.
Let p i,t1 and p i,t2 indicate the p-rank values of gene i in tissues t1 and t2. We calculated the gene divergence E i for every gene between each two tissues on the fate map as:
E i = |p i,t 1 – p i,t 2|
In each pair of tissues, those genes with divergence in the lowest 5% of all gene divergence values were classified as the conservatively expressed genes, which have the least expression divergence between two tissues. Furthermore, those genes which were expressed as the top 5% in a group of tissues or in all the tissues were considered as ubiquitously conservatively expressed genes in corresponding tissue group.
Gene Ontology (GO)  provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data. Consider a set of n genes in one Gene Ontology module; three kinds of measurements were defined for characterizing the profiling of this GO among the tissues. If expression distance was significantly small for a given GO module in a pair of tissues, the expression level of this module was considered conservative in the tissue pair. If a significant p value was obtained in the Kolmogorov–Smirnov test (KS test)  for the expression of a given GO module in a pair of tissues, the expression distribution of this module was referred to as a divergent pattern in these tissues. If Pearson correlation coefficient was significant for a given GO module in a pair of tissues, their expression relationship was referred to as a linear correlated pattern .
while n is the number of gene in GO g.
µ for the mean of D values and σ for the standard deviation of D values.
For a sub-group of tissues in the fate map’s branch, the intro group expression similarity of GO g is estimated using an integrated p value . Since D ~ N(µ,σ 2), D sub (the mean value of D values in a sub-group of tissues) should follow D sub ~ N(µ,σ 2 / N), N for the sample size in this group. Thus the integrated p value could be calculated as:
P g,t–sub = Pr(X ≤ x) = ϕ µ,σ 2 / N (x), x = D g,t1–t2
The GO module with a small p value or integrated p value was considered as conserved GO module in a pair or a group of tissues.
To identify differentially expressed GO modules in a pair of tissues, a nonparametric KS test was performed . The p-rank values of the genes in certain GO module from a pair of tissues were used. A p value (denoted as p g,t1-t2 ) less than 0.05 denoted that the difference of expression status of this GO module in two tissues should be considered as statistically significant.
, n for the number of pair wise p values in this group
If these GO modules were not signature modules of a group of tissues, the p g,ti-tj would follow uniform distribution. If p g,ti-tj followed uniform distribution, Z g,ti-tj would follow norm distribution. As a result, Z score would also follow norm distribution. A significant small value of Z comparing to normal distribution corresponded to the significantly being perturbed of the GO modules under these tissues.
, while n is the number of gene in GO m.
A high r indicates a high similarity of the expression profiles of the GO module in two tissues, thus this GO module was considered as correlated expressed GO module in tissues. The threshold for r between two tissues was set to be r > 0.9, while the threshold for r in a group of tissues was set to be r mean > 0.9.
The Bonferroni correction was used to adjust the p-values for liver enrichment gene identification based on the hyper-geometric distribution . When performing n tests, with each of them being significant with probability β, the probability that at least one of them comes out significant should be less than n*β. The calculation was conducted in R platform with ‘stats’ package (http://www.r-project.org).
The analysis workflow was presented in Additional file 3. All the supplementary files were available at our website: http://omics.biosino.org:14000/kweb/tissue_expression/
Certain individual genes are extremely conservatively expressed in a group of tissues from one branch of the fate map. These genes might always be closely related to the specific function of those tissues. For example, in human, gene CALM2, which is well known as the important phosphorylase kinase in calcium signaling pathway , vital for transduction of nerve stimulus -a basic function for the nerve cells, was conservatively expressed through all tissues derived from neural tube (Additional file 4). Interestingly, in mouse, the kinase-coding gene Camk2g with similar function was conservatively expressed in these tissues, too. From gene set enrichment analysis, we found that conservatively expressed gene sets in the tissue group from the same ancient segments in embryogenesis define certain common functions in the tissue group. Such phenomenon was quite explicit when we examined the tissue group of high order on the fate map, such as the group of tissues developed from mesoderm, ectoderm and endoderm, etc. For example, the GO modules for fundamental biological processes, i.e. nucleic acid binding (GO:0003676), structural constituent of ribosome (GO:0003735), and structural constituent of ribosome (GO:0003691), were identified to be extremely conserved in all the tissues (Additional file 6 for human and Additional file 7 for mouse) in human. Another example was the relationship between tissue-conserved genes and diseases. Those genes conservatively expressed in the cerebellum and olfactory bulb were significantly enriched in Prion diseases related pathway (HSA05020, Additional file 6).
KS tests were performed to detect significantly differentially expressed GO modules, of which the distribution of the gene expression levels varies greatly in different tissues. Tissues from mesoderm and endoderm segments contain more differentially expressed GO modules found by pair wise comparisons of these tissues (Figure 2). The GO modules for synaptic transmission (GO:0007268, GO:0019226) and transmission of nerve impulse (GO:0045202) were identified as the top three significantly differentially expressed ones in brain, while in the sub-group of brain (prefrontal cortex, amygdale, pituitary, hypothalamus) the expression profiles of these modules remain stable (Additional file 9). The cerebellum has an important function in the integration of sensory perception, coordination and motor control. In order to coordinate motor control and response to stimulation, genes in these modules are actively expressed in this region to link the cerebellum with the cerebral motor cortex (which sends information to the muscles causing them to move) and the spinocerebellar tract (which gives proprioceptive feedback on the position of the body in space) . Besides, the significantly differentially expressed GO modules in ectoderm contain function modules for immune system (GO:0002376, GO:0006955), response to defense (GO:0006952), response to stimulus (GO:0050896), etc. Thus, the differentially expressed GO modules contribute to the function divergence of tissues.
Using Pearson’s correlation coefficient, correlated expressed GO modules were identified. The expression of genes in these modules simultaneously up-regulated or down-regulated (or remain same) among different tissues (Additional file 10). Neural related tissues contain more correlated expressed GO modules than the tissues developed from mesoderm and endoderm. There were 827 and 810 highly correlated (correlation coefficient > 0.9) GO modules in neural tube and neural crest, respectively. However, the correlated expressed modules in intermediate mesoderm and primitive gut were 236 and 228, respectively. These results suggested that the gene expression in non-neural tissues is more divergent than in neural related tissues. Most of the correlated expression genes of neural tissues were enriched into such pathways like signal peptide processing (GO:0006465), G-protein signaling pathway (GO:0007189, GO:0010578 and GO:0010579), BMP signaling pathway regulation (GO:0030510). Comparatively the expression of these GO modules in endoderm and mesoderm tissues showed no significant correlation.
Gene ontology based tissue expression divergence.
Human (4207 GO modules)
Mouse (2321 GO modules)
pluripotent stem cell
endoderm & mesoderm
primitive gut 1 & 2 & 3
thymus & prostate
pancreas & liver
trachea & lung
skin & skeletal muscle
uterus & kidney
testis & ovary
brain 1 & 2
cerebellum & olfactory bulb
prefrontalcortex & amygdala & pituitary & hypothalamus
In this study, we mapped human and mouse tissues onto the mammalian fate map of embryogenesis and further analyzed the expression conservation of tissues with individual genes and gene set (Gene Ontology) based methods. We tested the hypothesis the tissues from the same branch on the fate map shared more similarity in gene expression profile with each other than their remote neighbors. Our results showed this hypothesis was true in ectoderm and mesoderm, where tissues from the same segments on the fate map share more similar expression patterns than those from different origins of embryogenesis. However, this assumption is not true in endoderm tissues. The molecular basis for such phenomenon may be related to gene imprinting that tissues inherit from their ancestors throughout the development process. The imprinting might involve DNA methylation, histone modification, and other epigenetic mechanisms that lead to the conservation of gene expression in neighbor tissues and divergence in distant segments. In addition, the expression divergence of genes or gene sets varied greatly in different sub-group of neighbor tissues on the fate map. Neural tissues developed from ectoderm were the most conserved sub-group. The expression pattern of genes in these tissues is more conservative to each other: more conserved individual genes, more conserved GO modules, less differentially expressed GO modules and more significantly correlated GO modules, compared with tissues developed from mesendoderm . Notably, tissues developed from endoderm appear to have a large number of differentially expressed genes, which might be related to the large functional variation of these tissues and the long developing time in embryo.
In most tissues, tissue-specific functions are reflected in the stably expressed genes and GO categories. For example, genes from ubiquitin mediated proteolysis and ribosome pathway show universal conservation in all tissues, and their expression levels in all sub-groups of tissues remain stable. Different tissue groups of the same origin have identical expression conserved genes. CALM of human genome or Camk2g of mouse genome are conservatively expressed in brain and other tissues developed from neural tube, whereas neither these genes nor the calcium signaling pathway is detected in the conservation and enrichment analysis of mesendoderm tissues. The construction of tissue specificity is more likely to be governed by those gene sets with specific molecular functions. Prion disease pathway was identified as conservative gene enriched pathway in the cerebellum and olfactory bulb. The Prnp (prion protein) was one of the conservative genes in this pathway, which also plays the role as the pathological agent in transmissible spongiform encephalopathies such as bovine spongiform encephalopathy in cattle, scrapie in sheep, and Creutzfeldt–Jakob disease in humans . This gene is considered a potential biomarker for early diagnosis of prion disease, as the Prnp-mutant over-expression changes the structure of its globular domain and induces a lethal spongiform encephalopathy .
The timeline difference of tissue development and maturity might account for the phenomenon that there is larger divergence of gene expression in tissues from ectoderm than in tissues from mesoderm and endoderm. By week 20 of gestation, fetal kidneys, urinary tract, and digestive system start functioning by producing urine and meconium. The neural system, however, reaches maturity much later. In Rhombic lip, a primary region that produces the neurons to make up the cerebellum, neurons migrate to the external granular layer until embryonic week 27. The later differentiation of neural tissues might lead to less conservative gene expression pattern at molecular level. Another important factor that might affect gene expression is microRNAs, which have been shown to regulate target gene expression in tissues at different developmental stages . By cross-species comparison, Cui et al. found that the expression variation of miRNA targets is significantly lower than that of other genes, which implies miRNAs function as constrain to reduce target gene expression variation. It remains interesting to see whether miRNAs play similar roles between mammalian species, i.e. mouse and human.
By comparison of gene and gene set expressions between human and mouse homologous tissues, we observed some major differences in certain tissues’ gene expression patterns, e.g. smaller gene expression divergence of neural tissue group in mouse (denoted as more conservative GO modules in mouse ectoderm tissues), and more conservative expression of paraxial mesoderm in mouse. Although the trend of “tissue-driven” (lower divergence of gene expression in more stringent tissues and higher divergence in more relaxed tissues) in gene expression remains unchanged between human and mouse, the difference in gene expression divergence between certain human and mouse tissues is mainly attributed to the highly developed human neural system supporting intelligence and the large difference in their body sizes.
The recent advance in microarray technology made it available to use a large amount of tissue expression data for the investigation of the tissue specific expression pattern in association with tissue lineage and development. This study was designed to learn how tissue specific functions, tissue lineage, and cell differentiation are correlated, which is essential to understand embryonic development and organism complexity. By performing individual gene and gene set based analysis on multiple tissues expression data in association with the classic topology of mammalian fate maps of embryogenesis, we identified conservatively, differentially and correlatively expressed genes or gene sets for each sub-group tissue on the fate map. The results from tissue gene expression, and Gene Ontology and KEGG pathway analyses indicate that common function features of neighbor tissue groups were defined by the conservatively expressed genes and were related to tissue specific diseases, and differentially expressed genes contributed to the functional divergence of tissues. The difference of gene expression divergence in human and mouse homologous tissues reflected the organism complexity, i.e. distinct neural development levels and different body sizes.
Kyoto Encyclopedia of Genes and Genomes
National Center for Biotechnology Information
Funding: This work is supported by National High-Tech R&D Program of China (863) (grant 2007AA02Z330, 2007AA02Z332), National Key Basic Research Program (Grant No. 30800210) and Shanghai Committee of Science and Technology (grant 07JC14048, 08JC1416600).
This article has been published as part of BMC Bioinformatics Volume 11 Supplement 11, 2010: Proceedings of the 21st International Conference on Genome Informatics (GIW2010). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/11?issue=S11.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.