Integrated olfactory receptor and microarray gene expression databases
© Liu et al. 2007
Received: 12 December 2006
Accepted: 30 June 2007
Published: 30 June 2007
Skip to main content
© Liu et al. 2007
Received: 12 December 2006
Accepted: 30 June 2007
Published: 30 June 2007
Gene expression patterns of olfactory receptors (ORs) are an important component of the signal encoding mechanism in the olfactory system since they determine the interactions between odorant ligands and sensory neurons. We have developed the Olfactory Receptor Microarray Database (ORMD) to house OR gene expression data. ORMD is integrated with the Olfactory Receptor Database (ORDB), which is a key repository of OR gene information. Both databases aim to aid experimental research related to olfaction.
ORMD is a Web-accessible database that provides a secure data repository for OR microarray experiments. It contains both publicly available and private data; accessing the latter requires authenticated login. The ORMD is designed to allow users to not only deposit gene expression data but also manage their projects/experiments. For example, contributors can choose whether to make their datasets public. For each experiment, users can download the raw data files and view and export the gene expression data. For each OR gene being probed in a microarray experiment, a hyperlink to that gene in ORDB provides access to genomic and proteomic information related to the corresponding olfactory receptor. Individual ORs archived in ORDB are also linked to ORMD, allowing users access to the related microarray gene expression data.
ORMD serves as a data repository and project management system. It facilitates the study of microarray experiments of gene expression in the olfactory system. In conjunction with ORDB, ORMD integrates gene expression data with the genomic and functional data of ORs, and is thus a useful resource for both olfactory researchers and the public.
The mammalian olfactory system, with its ability to detect numerous odor molecules in the environment, helps animals locate food, mates, and predators. The detection sensitivity and specificity largely depend on the olfactory receptors (ORs) expressed in the primary sensory neurons within the neuroepithelium in the nose . In rodents, there are approximately 10 million sensory neurons expressing > 1000 different types of OR genes in each nostril . Each neuron expresses only one specific OR, which exhibits an optimal response to specific odorants . Axons from the sensory neurons that express the same OR converge onto one or a few glomeruli per olfactory bulb . The expression patterns of OR genes and the unique projection of their sensory neurons provide a molecular basis for olfactory signal coding in the brain.
Diverse sets of data have resulted from olfactory research carried out at the cellular, functional and behavioral levels [5–9]. Since the first identification of rat OR genes in 1991, thousands of ORs have been found, some in tissues not associated with the olfactory system [10, 11]. Functional imaging in the olfactory bulb has produced two-dimensional "odor maps" (unique activity patterns) in the glomerular layer [12–14]. Three key databases in the SenseLab [ORDB (OR database), OdorDB (odor database) and OdorMapDB (odor map database)], housed at the Yale University School of Medicine, provide an integrated, multidisciplinary model for the olfactory pathway [15–18]. The information contained in these databases illustrates the chain of events that starts from the exposure of the animals to odorous environment, to the binding of odorant ligands with specific ORs, and to the resulting spatial activity patterns in the olfactory bulb.
Recent microarray studies using Affymetrix (Santa Clara, CA) gene-chips have resulted in the accumulation of a large amount of OR gene expression data. These data show the differential expression of OR genes in the olfactory and non-olfactory tissues . Building an efficient microarray database specific for the ORs and integrating it with the SenseLab system has been challenging. There are many large-scale and well-established databases, e.g., ArrayExpress at the European Bioinformatics Institute , the Stanford Microarray Database , and the Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) , for public data repositories of the microarray experiments. However, they do not fit the needs specific to the chemosensory research community, i.e., to have a domain-specific database integrated with the SenseLab system.
This paper describes the Olfactory Receptor Microarray Database (ORMD), an informatics tool dedicated to disseminating information related to rodent OR microarray experiments. Currently archived in the ORMD are gene expression data of ORs in the olfactory epithelium as well as other tissues. ORMD is integrated with ORDB (which relies on an architecture that is common to all databases in SenseLab), through dynamic link in the webpage for each OR. Both databases are designed to facilitate the experimental research on elucidating the mechanisms underlying perception of smell.
The database schema of ORMD has been implemented in Oracle. It is primarily a traditional relational database. Some tables, however, implement the EAV (Entity-Attribute-Value) data model . Since the number and names of the parameters defining the experimental conditions may vary and be unpredictable, the EAV model has been chosen to store the values of some parameters for the experiments. This model ensures flexibility of the system. Also, an EAV-like approach has been used to store gene expression datasets, with the experiment being the entity, the probe set name being the attribute, and the signal, detection call, and P-value being the values. The Web application has been developed in Java Server Pages (JSP) and run on Apache Tomcat Server.
ORDB is one of the major databases in the SenseLab system . As previously reported , the archived OR properties are categorized as descriptive attributes and sequence data. The descriptive attributes include animal species name (i.e., organism), strain, source tissue, chromosomal location, data source, sequence laboratories [the principal investigator's (PI's) lab that cloned the OR gene or identified the gene from the genome], and references (including links to GenBank and Medline sources for the OR gene). The sequence data include the nucleotide residues of genomic DNA or cDNA (complementary DNA) and the amino acid residues of receptor proteins. Recently, functional data, including gene expression, molecular modelling and activity regulation, have been added as a new category of information about the receptors. In addition, ORDB is continuously being expanded to include recently identified receptors, as well as new species whose ORs have been cloned and identified . Currently, ORDB contains receptors from 50 different species with complete genomic repertoires for several species, including mouse, rat and human. It represents the work of more than 100 laboratories around the world.
ORMD is a secure, Web-based repository for the OR gene expression data from the microarray experiments. The tools to access the data are the same for both regular and logged-in users. The availability of individual microarray data, however, depends on the display attribute of the experiments set by the data owners, the ownership of the data, and the logged-in status of the user. While users without login can access the data that have been made "public" by owners, users with login may access both the public data and their own private data. Only logged-in users may create projects and enter the details of data resulting from their microarray experiments. For each experiment, the gene expression datasets and the raw data files can be uploaded and stored in the database. Currently archived in the database are 31 microarray experiments from mice using the standard (Murine Genome Array U74Av2) and custom-designed Affymetrix gene-chips covering the mouse ORs. Fourteen experiments, with sample sources from the olfactory epithelium and varied body organs including brain, testes, heart, spleen, etc, have been made available to the public.
Gene expression datasets can also be exported from ORMD. Figure 4B shows the Web interface for data export. In order to provide comparable datasets from different experiments, the data from the same gene-chip type are exported. Once the chip type is selected, the list of related experiments is automatically refreshed. Users may choose one or more datasets and export and save them (into their local computers) as text-formatted or Microsoft Excel files. Users can then use third-party tools to analyse the exported data. Figure 4C shows part of a sample Excel file of gene expression data from the experiments named "Embryo," "Heart," and "Kidney." These files describe the OR expression profiles for these three organs. In addition, the system allows users to choose multiple experiments and carry out pair-wise scatterplots to visualize the gene expression patterns between samples. Raw data files are also available for download from the experiment detail page, allowing researchers to share the data with colleagues and others in the community.
ORMD provides an informatics tool for researchers to manage their projects and experiments. It allows users to create/edit projects which may include multiple experiments. Individual experiments may be associated with one or more projects. This hierarchy helps researchers to systematically organize their data. It also allows efficient retrieval of the microarray datasets based on the projects.
ORMD primarily serves as a private data repository for those who will deposit data, but also provides a resource of OR gene expression information to the community. The decision as to which datasets should be made available to the public lies with the data owners (i.e., account users). While private experimental data can be made public, the publicly available data can be made private by the owners. This allows the account users the freedom to protect their unpublished data. In general, the criterion for making data public is the publication of the study in journals. The owners can also make available the unpublished data that they are willing to share with the community. The system provides the capacity to allow the public data to be adequately annotated, in particular for the MIAME-compliance of the published data .
This paper describes a database system for the storage and presentation of microarray gene expression data of olfactory receptors. The database is integrated with the olfactory databases in the SenseLab system. It is designed to facilitate experimental research in the olfactory field.
Investigating how odor information is transduced and processed by the olfactory system is essential to our understanding of the sense of smell. In essence, SenseLab uses the olfactory system as a model to develop informatics tools to facilitate experimental neuroscience research. The OdorDB, ORDB, and OdorMapDB in SenseLab archive the odor types, ORs, and odor map images, respectively. These olfactory databases are integrated, allowing a clear description of the chain of events from the odor stimuli to the unique activity pattern in the brain.
ORMD and ORDB differ not only in the type of data archived but also in the scope of user accessibility. ORDB is a knowledge-based database, with its content originating from published data. The data include the "normal" genetic information of the receptors in a given species. The volume of the ORDB grows as investigators identify new genes or extend their research into new species. On the other hand, ORMD archives gene expression data from diverse microarray experiments, dependent on tissue source and affected by biological or disease conditions. Whereas ORDB primarily provides a resource to the research community and general public (though facilities for private storage of cloned OR gene data prior to publication or deposition into GenBank exists), ORMD provides a data repository, management and sharing tool for researchers with user accounts and also allows public access to dedicated datasets.
The eventual utility of both ORDB and ORMD in olfactory research will be evident as the microarray approach is increasingly used to investigate the gene expression patterns of ORs in the olfactory system [11, 25]. Receptor genes archived in ORDB are characterized by their sequences, species, chromosome locations, etc. The expression of the receptor genes, however, depends on species, developmental stage, and tissue source. Strong expression of some genes in certain regions may help researchers uncover the relationships between animal behaviors and the stimulating odor types. Since the OR gene family contains hundreds to thousands of members in each species, the gene-chip approach provides a high-throughput, combinatorial, and powerful tool to examine the expression of the identified genes in the species simultaneously. The integrated olfactory databases described in this paper help archive and present large amounts of gene expression data, thus facilitating experimental research in understanding the molecular mechanism underlying olfactory detection and discrimination.
ORs have also been found in many other organs, such as testes, liver, and spleen. For example, the olfactory receptor hOR17-4 is found in human spermatogenic cells and may play a role in chemical communication between sperm and egg . The expression of olfactory receptors in non-olfactory tissues may be due to the fact that the ORs are members of a superfamily of membrane receptors known as G-protein-coupled receptors . Although the functions of ORs in other body organs remain elusive, a comprehensive investigation of ORs using microarray techniques will enhance our understanding of signal transduction in biological systems beyond olfaction.
Although ORMD is currently a data depository and management system for Affymetrix gene-chip data, it can serve as an open-source database easily adapted to house other types of microarray data. Many journals require that published microarray data conform to the MIAME Consortium. ORMD primarily serves as a private data repository, not as a portal for publication of experimental results. Although the database does not enforce the requirements set by the Consortium for private data, it allows owners to annotate the data made public according to the MIAME checklist. In general, storing good quality data always remains a high priority from system administrative as well as scientific points of view. It will be helpful that a workgroup of account users recommend and enforce the MIAME compliance as a requirement for all data made accessible to the public.
As an informatics tool, ORMD is a secure management system for microarray projects and experiments. It can be used to facilitate microarray studies in olfactory as well as other systems. The authenticated login to access the private data and the regular backup of the database ensure security of the system and protection of the data.
We have described the development of ORMD and its integration with the established OR gene database ORDB. ORDB provides information on the receptor genes and proteins, while ORMD provides microarray gene expression data of the ORs. These databases include hyperlinks that connect the genes and their expression data. Together, they provide a resource for researchers using different investigative approaches to understand how mammalian organisms perceive odors.
Project name: ORMD
Project home page: http://neurolab.med.yale.edu/ormd/
ORDB home page: http://senselab.med.yale.edu/ORDB/
Operating system(s): Platform independent
Programming language: SQL, Java
Other requirements: Access to Oracle database
License: The SQL schema is freely available from the website
Any restrictions to use by non-academics: None
The work was supported by NIH grants K22LM008422, T15LM07056, P20LM07253, P01DC04732 and R01DC06213. The authors thank Dr. Rixin Wang and Ms. Jin Yang for technical support and Mr. George Michel for critical reading of the manuscript.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.