- Open Access
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes
© Dean et al; licensee BioMed Central Ltd. 2009
Published: 17 September 2009
The osteocyte is a type of cell that appears to be one of the key endocrine regulators of bone metabolism and a key responder to initiate bone formation and remodeling. Identifying the regulatory networks in osteocytes may lead to new therapies for osteoporosis and loss of bone.
Using microarray, we identified 269 genes over-expressed in osteocyte, many of which have known functions in bone and muscle differentiation and contractility. We determined the evolutionarily conserved and enriched TF binding sites in the 5 kb promoter regions of these genes. Using this data, a transcriptional regulatory network was constructed and subsequently partitioned to identify cis-regulatory modules.
Our results show that many osteocyte-specific genes, including two well-known osteocyte markers DMP1 and Sost, have highly conserved clustering of muscle-related cis-regulatory modules, thus supporting the concept that a muscle-related gene network is important in osteocyte biology and may play a role in contractility and dynamic movements of the osteocyte.
It is well known that bone tissue has the capacity to alter its mass and structure in response to mechanical strain. Osteocytes are terminally differentiated cells derived from osteoblasts, which first become embedded and surrounded by osteoid matrix that subsequently mineralizes . They are regarded as the mechanosensory cells that respond to mechanical loading and a variety of hormones such as vitamin D and PTH, and sends signals to other bone cells to initiate bone formation and remodeling . A better understanding of the gene networks regulating osteocytes can therefore lead to new therapies for osteoporosis, loss of bone in space travel and extended bed rest. However, even though osteocytes are the most abundant cells in bone, the regulatory pathways controlling osteocyte biology have not been identified.
As osteocytes are embedded within the bone matrix, with a complex network between the different stages of cells within the osteoblast-osteocyte lineage, studies of osteocytes have been hampered by their inaccessibility and by the lack of molecular and cell surface markers that could be used to isolate and characterize this cell population . Dentin matrix protein (DMP-1) has been shown as a good marker for the osteocyte lineage and is specifically expressed along and in the canaliculi of osteocytes within the bone matrix, suggesting a role for DMP1 in osteocyte function. Recently, we generated a mouse model containing a DMP1 region, -7892 to +4439 bp (8 kb), driving GFP and thus directing expression to osteocytes . This enables us to purify osteocytes from osteoblast cells using fluorescence-activated cell sorting, and compare the gene expression profiles in these two types of cells directly using microarray.
In this work, we developed a systems biology approach to study osteocyte biology by integrating data from microarray experiments, functional annotations and comparative genomics. This type of approaches has been shown to greatly eliminate noises contained in individual data sources, and improve the understanding of complex biological phenomena, such as Alzheimer's disease and cancer [3, 4]. Typically, this type of approaches starts with identifying a set of differentially expressed genes, and then clusters genes according to their expression profiles or functions, followed by an analysis of cis-regulatory elements presented in the promoter sequences. Our method differs from those approaches in two important aspects. First, we only considered cis-regulatory elements that are both over-represented and evolutionary conserved. This significantly reduced the effective lengths of promoter regions when searching for cis-regulatory elements, and therefore eliminated many spurious matches. Moreover, we developed a graph theoretical method to identify transcriptional regulatory modules (CRMs) [5, 6], which revealed interesting combinatorial relationships between several transcription factors.
Briefly, from microarray experiments, we obtained 269 osteocyte-specific genes, many of which have functions in bone or muscle development and contractility. We then identified enriched and evolutionarily conserved cis-regulatory elements from the 5 kb upstream promoter regions of a subset of 98 bone- and muscle-related genes, and used these data to construct a transcriptional regulatory network that links TFs to their putative binding sites on these 98 genes. We further proposed a graph-partitioning algorithm to identify possible cis-regulatory modules [5, 6]. Our results show that many osteocyte-specific genes, including two well-known osteocyte markers DMP1 and Sost, have highly conserved clustering of muscle-related cis-regulatory modules, thus supporting the concept that a muscle-related gene network is important in osteocyte biology and may play a role in contractility and dynamic movements of the osteocyte.
Results and discussion
Bone and muscle-related genes are over-expressed in osteocyte cells
To identify potential regulatory networks of osteocytes, we obtained gene expression profiles from osteocytes purified from calvariae of 5–8 day-old mice expressing 8 kb DMP1 promoter driving GFP. As a control, we also obtained gene expression profiles from GFP-negative cells, which contain about 60% osteoblasts at different stages (before DMP1 gene turns on) and some macrophages. The microarray data is then normalized using GCRMA  and significantly differentially expressed genes were identified. We identified 269 genes that are over-expressed by at least 3 fold in osteocytes with a FDR-corrected p-value < 0.05 (See Methods).
Functional annotation clusters
SP_PIR_KEYWORD or GOTERM
# of genes
Benjamini corrected p-value
GOTERM_CC: Extracellular region
GOTERM_BP: biomineral formation
GOTERM_BP: system development
GOTERM_BP: anatomical structure development
GOTERM_CC: proteinaceous extracellular matrix
GOTERM_CC: extracellular matrix
SP_PIR_KEYWORD: muscle protein
GOTERM_BP: muscle system process
Conserved cis-regulatory elements in osteocyte-specific genes
Genes with Mef2 sites
Modular structure of the transcriptional regulatory network
A putative model of the transcriptional network
In this paper, we introduced a systems biology method for identifying and analyzing transcriptional regulatory networks in the osteocyte. We integrated data from microarray experiments, functional annotations, comparative genomics, and graph-theoretic analysis to create a putative model of the transcriptional regulatory networks in osteocytes. Many parts of the network can be confirmed by the literature, and more direct experimental validations are underway. Our model shows that many osteocyte-specific genes, including two well-known osteocyte markers DMP1 and Sost, have highly conserved clustering of muscle-related cis-regulatory modules, thus supporting the concept that a muscle-related gene network is important in osteocyte biology and may play a role in contractility and dynamic movements of the osteocyte.
Microarray experimental procedures and analysis
Three independent experiments were carried out with mice containing the DMP1-GFP transgene that marks osteocytes by the GFP expression. Following cell separation utilizing fluorescence activated cell sorting, the RNA was isolated from the GFP-positive and GFP negative cells. All experiments showed enrichment of 15 to 50 fold in DMP1 mRNA expression, a measure of osteocyte enrichment. The experiments with the 50 fold enrichment of osteocytes (GFP-positive) were focused on for this study with 3 replicate determinations of expression levels of all genes.
Microarray experiments were conducted using the Affymetrix 430A mouse chip with over 21,000 probes set. These raw .cel files are then normalized by GCRMA using Limma included in the Bioconductor package in R . In these experiments we used the B statistic with B values greater than 3 and FDR = .05. We identified the top 269 genes out of the 21,000 that were differentially expressed between GFP-positive and GFP-negative cells.
Deriving the 98 gene set related to bone/muscle
The top 269 differentially expressed genes were functionally clustered using the DAVID Bioinformatics tool, which also provides enrichment scores for each cluster . For each GO term associated with a group of genes, a p-value is computed by the hypergeometric distribution, and then adjusted for multiple testing using the Benjamini method . The enrichment score for a cluster is then calculated as the negative logarithm of the geometric mean of the individual GO p-values . 98 of these 269 genes are functionally enriched in the skeletal/bone and muscle biology clusters with enrichment scores of 11.72 and 5.18, respectively.
Building the transcriptional regulatory network
The 98 gene set was input into Whole Genome Vista  for discovery of conserved and over-represented TF binding sites occurring on the 5 kb upstream promoter sequence upstream to the transcription starting site of a gene. The motifs found by Vista are known motifs from the TRANSFAC database . The significance of a motif found on a gene is determined by a p-value based on the number of occurrences of the motif in the 5 kb upstream promoter region of this gene as compared to the total number of occurrences of the motif in the same 5 kb region of the rest of the RefSeq genes in the genome. A potential regulatory network was created from this data in which an edge between a gene and a TF represents an over-representation of that TF's binding site on the gene's promoter, as according to WGV.
Detecting network modules
In order to identify modules from the transcriptional regulatory network, we first assigned a cosine similarity score to each pair of genes according to their shared TFs. A weighted gene-gene network was then created in which an edge weight between two genes corresponds to their similarity score. This similarity matrix was then converted to a sparse network by connecting each gene to its k nearest neighbours (k = 7) with a similarity cutoff score equals 0.5. The network is then partitioned using the algorithm Qcut , resulting in gene sets that have many common TF binding sites. The regulatory network was input into Cytoscape  for visualization, along with the gene set partition information.
This work was supported in part by a UTSA faculty research award and a NIH grant 1SC3GM086305-01 to JR, NIH grants 5P01AR046798-080004 and 5R01AR054616-02 to SEH, and a NIH grant 5R03AR053275-02 to IK.
This article has been published as part of BMC Bioinformatics Volume 10 Supplement 9, 2009: Proceedings of the 2009 AMIA Summit on Translational Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/10?issue=S9.
- Yang W, Kalajzic I, Lu Y, Guo D, Harris MA, Gluhak-Heinrich J, Bonewald LF, Feng JQ, Rowe DW, Harris SE: In vitro and in vivo study on osteocyte-specific mechanical signaling pathways. J Musculoskelet Neuronal Interact 2004, 4(4):386–387.PubMedGoogle Scholar
- Kalajzic I, Braut A, Guo D, Jiang X, Kronenberg MS, Mina M, Harris MA, Harris SE, Rowe DW: Dentin matrix protein 1 expression during osteoblastic differentiation, generation of an osteocyte GFP-transgene. Bone 2004, 35(1):74–82. 10.1016/j.bone.2004.03.006View ArticlePubMedGoogle Scholar
- Yan B, Yang X, Lee T, Friedman J, Tang J, Van Waes C, Chen Z: Genome-wide identification of novel expression signatures reveal distinct patterns and prevalence of binding motifs for p53, nuclear factor-κB and other signal transcription factors in head and neck squamous cell carcinoma. Genome Biol 2007, 8(5):R78. 10.1186/gb-2007-8-5-r78PubMed CentralView ArticlePubMedGoogle Scholar
- Ray M, Ruan J, Zhang W: Variations in the transcriptome of Alzheimer's disease reveal molecular networks involved in cardiovascular diseases. Genome Biol 2008, 9(10):R148. 10.1186/gb-2008-9-10-r148PubMed CentralView ArticlePubMedGoogle Scholar
- Sharan R, Ovcharenko I, Ben-Hur A, Karp R: CREME: a framework for identifying cis -regulatory modules in human-mouse conserved segments. Bioinformatics 2003, 19(Suppl 1):I283-I291. 10.1093/bioinformatics/btg1039View ArticlePubMedGoogle Scholar
- Ivan A, Halfon M, Sinha S: Computational discovery of cis -regulatory modules in Drosophila without prior knowledge of motifs. Genome Biol 2008, 9(1):R22. 10.1186/gb-2008-9-1-r22PubMed CentralView ArticlePubMedGoogle Scholar
- Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 2004., 3(1): Article 3 Article 3Google Scholar
- Huang DW, Sherman BT, Tan Q, Collins JR, Alvord WG, Roayaei J, Stephens R, Baseler MW, Lane HC, Lempicki RA: The DAVID gene functional classification tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol 2007, 8(9):R183. 10.1186/gb-2007-8-9-r183PubMed CentralView ArticlePubMedGoogle Scholar
- Loots GG, Ovcharenko I, Pachter L, Dubchak I, Rubin EM: rVista for comparative sequence-based discovery of functional transcription factor binding sites. Genome Res 2002, 12: 832–839.PubMed CentralView ArticlePubMedGoogle Scholar
- Ruan J, Zhang W: Identifying network communities with a high resolution. Physical Review E 2008, 77: 016104. 10.1103/PhysRevE.77.016104View ArticleGoogle Scholar
- Matys V, Fricke E, Geffers R, Gössling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV, Kloos DU, Land S, Lewicki-Potapov B, Michael H, Münch R, Reuter I, Rotert S, Saxel H, Scheer M, Thiele S, Wingender E: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 2003, 31: 374–378. 10.1093/nar/gkg108PubMed CentralView ArticlePubMedGoogle Scholar
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cystoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003, 13(11):2498–2504. 10.1101/gr.1239303PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.