Volume 11 Supplement 6
Two new ArrayTrack libraries for personalized biomedical research
© Xu et al; licensee BioMed Central Ltd. 2010
Published: 7 October 2010
Recent advances in high-throughput genotyping technology are paving the way for research in personalized medicine and nutrition. However, most of the genetic markers identified from association studies account for a small contribution to the total risk/benefit of the studied phenotypic trait. Testing whether the candidate genes identified by association studies are causal is critically important to the development of personalized medicine and nutrition. An efficient data mining strategy and a set of sophisticated tools are necessary to help better understand and utilize the findings from genetic association studies.
SNP (single nucleotide polymorphism) and QTL (quantitative trait locus) libraries were constructed and incorporated into ArrayTrack, with user-friendly interfaces and powerful search features. Data from several public repositories were collected in the SNP and QTL libraries and connected to other domain libraries (genes, proteins, metabolites, and pathways) in ArrayTrack. Linking the data sets within ArrayTrack allows searching of SNP and QTL data as well as their relationships to other biological molecules. The SNP library includes approximately 15 million human SNPs and their annotations, while the QTL library contains publically available QTLs identified in mouse, rat, and human. The QTL library was developed for finding the overlap between the map position of a candidate or metabolic gene and QTLs from these species. Two use cases were included to demonstrate the utility of these tools. The SNP and QTL libraries are freely available to the public through ArrayTrack at http://www.fda.gov/ArrayTrack.
These libraries developed in ArrayTrack contain comprehensive information on SNPs and QTLs and are further cross-linked to other libraries. Connecting domain specific knowledge is a cornerstone of systems biology strategies and allows for a better understanding of the genetic and biological context of the findings from genetic association studies.
Genetic variations are a major factor for inter-individual differences in disease susceptibility and response to environmental exposures such as nutrients and drugs. Recent advances in microarray-based genotyping techniques have enabled researchers to rapidly scan for known single nucleotide polymorphisms (SNPs), one of the most common genetic variations, across complete genomes. Genome wide association studies (GWAS) have identified putative variations that contribute to common, complex diseases such as asthma, cancer, diabetes, heart disease and mental illnesses. SNPs that have been associated with complex diseases may eventually be used to develop better strategies to detect, treat and prevent these diseases. A web-based catalog of GWAS publications has been created and periodically updated at the National Human Genome Research Institute . Such technology is contributing to the development of personalized medicine, in which the current one-size-fits-all approach to medical care will give way to more customized treatment strategies.
However, it is uncommon for GWAS to incorporate diet or environmental exposures which are known to influence disease susceptibility ([2, 3] and http://www.nugo.org/nutrialerts/39848). In addition, many GWAS have been done in European populations and their applicability to other populations and individuals has not been adequately studied ([4–6] and http://www.nugo.org/nutrialerts/40314 and http://www.nugo.org/nutrialerts/38373). GWAS results must therefore be further tested to determine whether the statistical associations found offer real-world potential to predict complex phenotypes or are useful in developing testable hypotheses about the development, progression, or treatment of a disease.
A novel strategy has been proposed to analyze gene-nutrient interactions, aiming to discover genes that contribute to individual risk factors [7–9]. This data mining strategy is based on analyzing candidate genes involved in nutrient metabolism or regulation and mapping those genes to quantitative trait loci (QTL) contributing to a particular trait or condition. A QTL is a region of DNA that is associated with a particular phenotypic trait. A common use of QTL data is to identify candidate genes underlying a trait within one or more QTL. This approach utilizes the available genomic, physiological, and environmental data to select candidate genes for further analyses.
A limitation of this type of strategy is that many databases are knowledge or domain specific – that is, they limit data to one discipline such as proteomics, genomics, or metabolomics. To address this limitation, we propose a solution through ArrayTrack. ArrayTrack is a publicly accessible microarray data management and analysis system developed by the FDA’s National Center for Toxicological Research [10, 11]. It has been extended to manage and analyze preprocessed proteomics and metabolomics experiment data. To facilitate data interpretation, ArrayTrack has integrated a rich collection of biological information for genes, proteins and pathways, which are drawn from public repositories and organized as individual yet cross-linked libraries. Thus it provides a one-stop solution for omics data analysis and interpretation in the context of gene-function relationship.
One of the focuses in GWAS is to relate SNPs to genes and pathways to understand the underlying mechanisms of the studied disease. The SNP-gene-pathway relationship should be dynamically interrogated in an interactive/integrated environment. ArrayTrack has provided a gene-pathway exploratory platform. By integrating the SNP library that contains annotation summary information of SNPs and their mapped relationship to genes, ArrayTrack now enables dynamic analysis of the SNP-gene-pathway relationship and thus offers support to SNP studies. The identification of the SNP-gene-QTL relationship is the basis to test whether the gene/SNP is associated with the etiology of a disease in animal models or human studies. The integration of SNP and QTL libraries into ArrayTrack enables dynamic mining of such complex biological interactions and thus expands the utility of ArrayTrack.
Construction and content
A major goal of the SNP and QTL libraries is to collect dispersed data in one place, allowing researchers to easily access and compare data across multiple knowledge bases. Data have been downloaded from public repositories and reorganized as library components of ArrayTrack. The data in the SNP and QTL libraries can directly link back to their sources, as well as ArrayTrack’s own existing collection of libraries.
Data field names and description of the SNP annotation summary database table.
Physical location start position in chromosome
Physical location end position in chromosome
Reference SNP identifier (rs#)
DNA strand (+/-) containing the observed alleles
The sequences of the observed alleles
Sample type from exemplar submission
The class of variant (single, in-del, insertion, microsatellite, etc)
The validation status of the SNP
The average heterozygosity from all observations
The Standard Error for the average heterozygosity
The functional category (intron, synonymous, missense, etc)
How the variant affects the reference sequence
For additional annotations, external links are provided for each SNP to the websites of dbSNP, UCSC Genome Browser, Ensembl, and the International HapMap Project [15–17]. These websites provide information about SNP allele frequency distributions among different populations, linkage with nearby genetic variants, functional annotations, and pathways involving the related genes [18, 19]. Major online SNP databases and resources are listed at http://www.nugo.org/nutrialerts/40615. The SNP library also maps SNPs to genes in ArrayTrack’s Gene library based on the relationships downloaded from dbSNP.
Data field names and description of the QTL annotation database table.
Species taxonomy ID
QTL’s ID assigned by the data source
QTL’s gene ID assigned by NCBI
QTL representation symbol (short name)
QTL full name (long description)
The chromosome that the QTL is positioned on
The chromosome strand that the QTL is positioned on
Estimated centiMorgan (cM) position on the chromosome
Base pair starting position on the chromosome
Base pair ending position on the chromosome
How position is determined
PubMed IDs for the original papers detailing the QTL
Other symbols may have been used
Phenotype ontology annotation
Candidate genes mentioned by original papers
Utility and discussion
Gene – nutrient interaction
Search the metabolic and regulatory pathways of a chosen nutrient to generate a list of genes regulated by or involved in the metabolism of such a nutrient. Examples used to develop this approach included thiamine, folic acid, riboflavin, glucose, fructose, vitamin A, vitamin D, and vitamin E. The pathway for each gene or metabolite is searched individually. This step may be accomplished through GeneGo or other similar pathway search tools.
Using the QTL library, map each gene to QTLs contributing to a phenotype. In this case, the metabolic genes were “mapped” to QTLs for obesity, T2DM, body weight, or other related phenotypes or to QTLs that contribute to those diseases (for example, insulin or glucose level QTLs). The chromosomal position of each gene is found with the specified species mapping information and then used to construct a chromosomal search region for QTLs with a user specified range of extension.
Fructose metabolic pathway genes mapped to QTLs related to obesity, type 2 diabetes and cardiovascular diseases.
free fatty acid level 1
body weight QTL 18
epididymal fat weight
predicted fat percentage 3
HDL QTL 22
body weight, QTL 6
dietary obesity 2
HDL QTL 4
diabetes susceptibility QTL 2
HDL QTL 27
induction of brown adipocytes 8
HDL QTL 17
type 2 diabetes modifying QTL 1
blood glucose level 1
multigenic obesity 5
organ weight QTL 3
type 2 diabetes mellitus 2 in SMXA RI mice
Connecting GWAS results with QTLs
Obtain a list of trait-associated SNPs from published GWAS results for a chosen condition such as obesity, T2DM, or hypertension. This can be quickly accomplished through querying the GWAS Catalog.
Using ArrayTrack’s SNP library, map each SNP to genes based on chromosomal positions. The result of this step is a list of genes.
For each gene in the list, query ArrayTrack’s QTL library to find whether there are any nearby QTLs that may contribute to the studied condition.
Comparison of hypertension related GWAS findings and QTLs in humans. SYMBOL and CHR stands for QTL symbol and chromosome, respectively.
Blood pressure QTL 30 (human)
Blood pressure QTL 32 (human)
Blood pressure QTL 33 (human)
Blood pressure QTL 34 (human)
Blood pressure QTL 34 (human)
Blood pressure QTL 35 (human)
Blood pressure QTL 35 (human)
Blood pressure QTL 46 (human)
Blood pressure QTL 6 (human)
Blood pressure QTL 89 (human)
Blood pressure QTL 89 (human)
Blood pressure QTL 9 (human)
Fibrinogen level QTL 4 (human)
Fibrinogen level QTL 4 (human)
Fibrinogen level QTL 5 (human)
Fibrinogen level QTL 5 (human)
Fibrinogen level QTL 6 (human)
Fibrinogen level QTL 6 (human)
Heart rate QTL 7 (human)
Heart rate variability QTL 2 (human)
Besides meeting the need of SNP interpretation and exploration, the integration of the SNP library with ArrayTrack’s library collection enables users to quickly explore and compare the associated biological pathways for SNPs of interest. Along with ArrayTrack’s library collection, the SNP and QTL libraries will be maintained and periodically updated as new data become available. As the development of these libraries progresses, query based on gene names will be added to the SNP library and query based on QTL symbols will be implemented for the QTL library.
The massive amount of data generated in biomedical research studies is often considered and organized as separate knowledge domains. We are developing strategies and tools such as the SNP and QTL libraries for data mining that will allow for more targeted research studies for developing the path to personalized nutrition, medicine, and healthcare.
Availability and requirements
The SNP and QTL libraries are freely available to the public through ArrayTrack at http://www.fda.gov/ArrayTrack.
The views presented in this article do not necessarily reflect those of the Food and Drug Administration. We would like to thank the ArrayTrack development team for providing invaluable supports and a system platform as the building foundation of the libraries described in this manuscript.
This article has been published as part of BMC Bioinformatics Volume 11 Supplement 6, 2010: Proceedings of the Seventh Annual MCBIOS Conference. Bioinformatics: Systems, Biology, Informatics and Computation. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/11?issue=S6.
- A Catalog of Published Genome-Wide Association Studies[http://www.genome.gov/GWAStudies]
- Kaput J: Nutrigenomics research for personalized nutrition and medicine. Curr Opin Biotechnol 2008, 19(2):110–120. 10.1016/j.copbio.2008.02.005View ArticlePubMedGoogle Scholar
- Kaput J, Rodriguez RL: Nutritional genomics: the next frontier in the postgenomic era. Physiol Genomics 2004, 16(2):166–177.View ArticlePubMedGoogle Scholar
- Myles S, Davison D, Barrett J, Stoneking M, Timpson N: Worldwide population differentiation at disease-associated SNPs. BMC Med Genomics 2008, 1(1):22. 10.1186/1755-8794-1-22PubMed CentralView ArticlePubMedGoogle Scholar
- Myles S, Tang K, Somel M, Green RE, Kelso J, Stoneking M: Identification and analysis of genomic regions with large between-population differentiation in humans. Ann Hum Genet 2008, 72(Pt 1):99–110.PubMedGoogle Scholar
- Adeyemo A, Rotimi C: Genetic Variants Associated with Complex Human Diseases Show Wide Variation across Multiple Populations. Public Health Genomics 2010, 13(2):72–79. 10.1159/000218711View ArticlePubMedGoogle Scholar
- Kaput J, Swartz D, Paisley E, Mangian H, Daniel WL, Visek WJ: Diet-Disease Interactions at the Molecular Level: An Experimental Paradigm. J Nutr 1994, 124(8_Suppl):1296S-1305.PubMedGoogle Scholar
- Park EI, Paisley EA, Mangian HJ, Swartz DA, Wu M, O'Morchoe PJ, Behr SR, Visek WJ, Kaput J: Lipid Level and Type Alter Stearoyl CoA Desaturase mRNA Abundance Differently in Mice with Distinct Susceptibilities to Diet-Influenced Diseases. J Nutr 1997, 127(4):566–573.PubMedGoogle Scholar
- Wise C, Kaput J: A Strategy for Analyzing Gene - Nutrient Interactions in Type 2 Diabetes. J Diabetes Sci Technol 2009, 3(4):710–721.PubMed CentralView ArticlePubMedGoogle Scholar
- Tong W, Cao X, Harris S, Sun H, Fang H, Fuscoe J, Harris A, Hong H, Xie Q, Perkins R, et al.: ArrayTrack--supporting toxicogenomic research at the U.S. Food and Drug Administration National Center for Toxicological Research. Environ Health Perspect 2003, 111(15):1819–1826. 10.1289/ehp.6497PubMed CentralView ArticlePubMedGoogle Scholar
- Fang H, Harris SC, Su Z, Chen M, Qian F, Shi L, Perkins R, Tong W: ArrayTrack: An FDA and Public Genomic Tool. Methods Mol Biol 2009, 563: 379–398. full_textView ArticlePubMedGoogle Scholar
- Rhead B, Karolchik D, Kuhn RM, Hinrichs AS, Zweig AS, Fujita PA, Diekhans M, Smith KE, Rosenbloom KR, Raney BJ, et al.: The UCSC genome browser database: update 2010. Nucleic Acids Res 2010, 38(Databse issue):D613-D619. 10.1093/nar/gkp939PubMed CentralView ArticlePubMedGoogle Scholar
- dbSNP: the NCBI Database of Genetic Variation[http://www.ncbi.nlm.nih.gov/SNP]
- The Ensembl Project[http://www.ensembl.org/Homo_sapiens/index.html]
- Consortium IHGS: The International HapMap Project. Nature 2003, 426(6968):789–796. 10.1038/nature02168View ArticleGoogle Scholar
- The International HapMap C: A haplotype map of the human genome. Nature 2005, 437(7063):1299–1320. 10.1038/nature04226View ArticleGoogle Scholar
- Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, et al.: A second generation human haplotype map of over 3.1 million SNPs. Nature 2007, 449(7164):851–861. 10.1038/nature06258View ArticlePubMedGoogle Scholar
- Illig T, Gieger C, Zhai G, Romisch-Margl W, Wang-Sattler R, Prehn C, Altmaier E, Kastenmuller G, Kato BS, Mewes HW, et al.: A genome-wide perspective of genetic variation in human metabolism. Nat Genet 2010, 42(2):137–141. 10.1038/ng.507PubMed CentralView ArticlePubMedGoogle Scholar
- Peng G, Luo L, Siu H, Zhu Y, Hu P, Hong S, Zhao J, Zhou X, Reveille JD, Jin L, et al.: Gene and pathway-based second-wave analysis of genome-wide association studies. Eur J Hum Genet 2010, 18(1):111–117. 10.1038/ejhg.2009.115PubMed CentralView ArticlePubMedGoogle Scholar
- Mouse Genome Database (MGD) at the Mouse Genome Informatics website, The Jackson Laboratory, Bar Harbor, Maine[http://www.informatics.jax.org]
- Twigger SN, Shimoyama M, Bromberg S, Kwitek AE, Jacob HJ, RGD Team: The Rat Genome Database, update 2007--Easing the path from disease to data and back again. Nucleic Acids Res 2007, 35(Database issue):D658-D662. 10.1093/nar/gkl988PubMed CentralView ArticlePubMedGoogle Scholar
- Tappy L, Le K-A: Metabolic Effects of Fructose and the Worldwide Increase in Obesity. Physiol Rev 2010, 90(1):23–46. 10.1152/physrev.00019.2009View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.