DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis
- Brad T Sherman†1,
- Da Wei Huang†1,
- Qina Tan1,
- Yongjian Guo4,
- Stephan Bour4,
- David Liu3,
- Robert Stephens3,
- Michael W Baseler5,
- H Clifford Lane2 and
- Richard A Lempicki1Email author
© Sherman et al; licensee BioMed Central Ltd. 2007
Received: 21 May 2007
Accepted: 02 November 2007
Published: 02 November 2007
Due to the complex and distributed nature of biological research, our current biological knowledge is spread over many redundant annotation databases maintained by many independent groups. Analysts usually need to visit many of these bioinformatics databases in order to integrate comprehensive annotation information for their genes, which becomes one of the bottlenecks, particularly for the analytic task associated with a large gene list. Thus, a highly centralized and ready-to-use gene-annotation knowledgebase is in demand for high throughput gene functional analysis.
The DAVID Knowledgebase is built around the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of gene/protein identifiers from a variety of public genomic resources into DAVID gene clusters. The grouping of such identifiers improves the cross-reference capability, particularly across NCBI and UniProt systems, enabling more than 40 publicly available functional annotation sources to be comprehensively integrated and centralized by the DAVID gene clusters. The simple, pair-wise, text format files which make up the DAVID Knowledgebase are freely downloadable for various data analysis uses. In addition, a well organized web interface allows users to query different types of heterogeneous annotations in a high-throughput manner.
The DAVID Knowledgebase is designed to facilitate high throughput gene functional analysis. For a given gene list, it not only provides the quick accessibility to a wide range of heterogeneous annotation data in a centralized location, but also enriches the level of biological information for an individual gene. Moreover, the entire DAVID Knowledgebase is freely downloadable or searchable at http://david.abcc.ncifcrf.gov/knowledgebase/.
In the post-genomic era, one of the challenges is to systematically and comprehensively interpret large amounts of data results from experiments with a genome-wide scope, such as gene lists derived from microarray or proteomics studies. Using the biological knowledge accumulated in the past decades and the aid of computing algorithms, it is possible to assemble potential biological pictures associated with these studies. Due to the complex and distributed nature of biological research, our current knowledge is spread over many redundant databases maintained by independent groups. One gene could have different identifiers within one, or many, databases. Similarly, the biological terms associated with different gene identifiers for the same gene could be collected in different levels across different databases. Thus, an integrated gene-annotation database with comprehensive data coverage is essential as the first step of any high-throughput gene functional analytic algorithm. Some integrated databases, such as NCBI Entrez Gene , UniProt , PIR , etc., made great efforts to integrate annotation resources in one centralized location and are considered to be the world-class bioinformatics foundation for general bioinformatics purposes. A couple of other projects, e.g. SOURCE , RESOURCER , IDconverter , BioMart (formerly EnsMart) , UCSC Gene Sorter , were developed towards being more suitable for high throughput gene-annotation queries. However, some areas are still needed for further developments in order to better meet the requirements of the high throughput gene analysis: 1) Many types of annotations are not included. e.g. Panther and BioCarta Pathways are not covered in any of above works. 2) The partial cross-reference between NCBI and UniProt systems limits integration capability. e.g. Entrez Gene does not cover PIR ID or Affy ID at all. 3) The resulting format could be better suitable for high throughput data analysis of multiple genes. 4) The web query is performed on one gene at a time or in a small batch mode. e.g. only 100 gene at-at-time in Entrez Gene. 5) The database download is too large and complicated for regular users. e.g. Entrez Gene is in the range of tens of gigabytes in size and is comprised of a complicated, xml-like structure. 6) All data for a given database is not always available. e.g. SOURCE does not offer downloads. Due to the above limitations, the scope of most high-throughput functional annotation algorithms or data analyses is limited to a small subset of the many annotation resources and ID systems available, which does not maximize the potential analytic power. For example, the gene-annotation enrichment analytic tools, e.g. GOMiner , ermineJ , GOStat , etc., only use the GO database  as a backend annotation source and only NCBI Entrez Gene as a gene ID mapping source. Gene IDs and annotation contents derived from Uniprot are weaker or not acceptable at all in these packages. In addition, each of the tools requires a large amount of redundant efforts to build its own backend database from public resources.
The goal of this work is to create a large gene-centered knowledgebase that integrates the most useful and highly regarded heterogeneous annotation resources in a centralized location with improved cross-referencing capability between NCBI and UniProt systems [1, 2], and easy to use pair-wise data structure files for downloads, hence, more comprehensive and suitable for high throughput data analysis. The work was originally conducted years ago to successfully serve as a comprehensive backend knowledgebase for various high throughput gene-annotation enrichment analytic tools in the DAVID and EASE packages [13, 14]. The usefulness of the DAVID knowledgebase in our own bioinformatics software motivates us to make it available to the public community in order to benefit the high throughput data analysis projects in other research groups. Now, the entire DAVID Knowledgebase  is either freely downloadable or searchable through the DAVID Bioinformatics Resources web site .
The paper will describe the DAVID Knowledgebase regarding its unique strategy to integrate the redundant and heterogeneous annotation sources, the improved cross-reference capability across gene ID types, the large annotation content coverage, and the pair-wise text-format files for downloads and easy-to-use web-based query interface.
Construction and content
DAVID gene concept: a novel single-linkage algorithm to agglomerate redundant gene IDs into the DAVID gene clusters in order to improve cross-referencing capability
There are dozens of types of gene or protein sequence identifiers that are redundant within the same group or across several independent groups, such as GenBank Accession; GenBank ID; RefSeq Accession; PIR ID; PIR Accession; UniProt ID; UniProt Accession; etc. [1–3, 17–19]. The leading organizations, NCBI  and UniProt , have made significant strides in addressing the cross-reference and redundancy issues associated with gene identifiers. NCBI GenBank, representing the largest redundant database of nucleotide sequences, exchanges data with two other worldwide nucleotide sequence databases, EMBL and DDBJ. In addition, UniProt, as the largest redundant annotated protein sequence database, unites Swiss-Prot, TrEMBL, and PIR. Moreover, the three organizations have been independently constructing non-redundant gene cluster databases, NCBI Entrez Gene , UniProt UniRef , and PIR-NREF , respectively. The resulting databases are presented in a non-redundant format by grouping the different gene/protein IDs for the same gene into one entry. At this point, the redundant nucleotide and protein IDs from different resources have been largely addressed by the leading bioinformatics organizations. However, while the gene clusters are comprehensive for the gene/protein IDs within their own organization, many cannot be cross-referenced with gene identifiers from other independent organizations (Figure 1A). For example, UniProt does not cover RefSeq IDs; NCBI Entrez Gene does not reference PIR ID at all (Figure 1A). Therefore, the major challenge of annotation integration comes from the weak cross-reference of different types of gene/protein IDs between NCBI and UniProt systems since different annotation databases use one or another system as their major gene identifier systems, e.g. GeneRif  adopts NCBI IDs as major associated identifiers; InterPro  uses UniProt/SwissProt as major associated identifiers.
Data coverage in the DAVID Knowledgebase.
Gene Identifiers (> 60 millions)
Annotation Contents (> 90 millions in total)
Ontology (>40 million records)
Domain/Family (> 15 millions)
General Annotation (>21 millions)
P-P Interaction (> 4 millions)
Functional Category (>6.9 millions)
Disease Association (~9,000)
Gene Expression (>1.0 million)
Literature (>2.8 millions)
Because a DAVID gene is built based on annotated gene clusters, only well-known or studied gene identifiers in the original gene clusters are included. This scope is well aligned with the high throughput functional annotation purpose of the DAVID Knowledgebase in the sense that any unclear or unstudied sequences, such as an EST, are not helpful for automatic analysis of high-throughput functional annotation.
DAVID genes are secondary gene clusters built on well-known and annotated gene clusters from NCBI Entrez Gene, PIR NRef100, and UniProt UniRef100. Thus, the agglomeration quality of a DAVID gene solely relies on the quality of the original databases. Since the original databases have been used by the scientific community for many years, they are well known and regarded as the highest-quality bioinformatics resources in the world. To further detect potential problems inherited from the original sources into DAVID gene clusters, a comprehensive quality control (QC) procedure was conducted by examining the sequence alignment of every protein member within a given DAVID gene cluster using the NCBI BlastClust program [22, 23] (Additional File 1 for detailed procedure of QC). The QC examination highlighted poor sequence alignment in ~10% of the DAVID genes, mainly caused by very short sequences (less than 20 amino acids), which were not handled well, or at all, by the BlastClust program. After filtering out those short sequences, less than 0.1% of the DAVID genes with poor alignment members needed to be corrected. The QC procedure reflects the high quality of the original resources, which is passed on to the DAVID Knowledgebase.
Collection and integration of functional annotation contents: the heterogeneous annotation contents and their IDs from different annotation databases are assigned to and centralized by the common DAVID genes
An example of improved annotation coverage for an individual gene USP8 in the DAVID Knowledgebase.
Molecular Function (Gene Ontology)
GO:0004197:cysteine-type endopeptidase activity
GO:0004843:ubiquitin-specific protease activity
GO:0008233: peptidase activity
GO:0004221: ubiquitin thiolesterase activity
GO:0016787: hydrolase activity
GO:0008234: cysteine-type peptidase activity
Biological Process (Gene Ontology)
GO:0006512: ubiquitin cycle
GO:0008283: cell proliferation
GO:0006511: ubiquitin-dependent protein catabolic process
GO:0007265: Ras protein signal transduction
Cellular Component (Gene Ontology)
Protein Domain (InterPro)
EC 220.127.116.11: Ubiquitin thiolesterase
Protein 3-D Structure(PDB)
1WHB:Solution structure of the Rhodanese-like domain in human UBP8
2A9U: Structure of the N-terminal domain of HumanUSP8
Disease Association (OMIM)
OMIM:603158: chronic myeloproliferative disorder
Inaccurate and conflicting annotation has been considered due to its potential negative impact on the knowledgebase. This impact is taken into consideration and corrected during DAVID cluster creation within the DAVID Quality control pipeline as stated previously. While the incorrect annotation is not used in the creation of the DAVID clusters, thereby stopping any magnification of error, the annotation is still maintained in the knowledgebase. Due to the significant amount of data available within the knowledgebase, including redundant and complimentary annotation, the inaccurate data becomes highly diluted. Considering that the DAVID Knowledgebase is intended for high throughput gene functional analysis, when many biology aspects are considered together in either high-throughput analysis or for any one gene, the negative impact of any errors is negligible since the true biology is generally overwhelming and supporting from the various sources. This may not be the case with each individual database if there is not a majority of supporting evidence given for the true biology or a systematic error has occurred with the source's annotation process for a given gene.
Considering the DAVID Knowledgebase is designed for high throughput gene functional analysis, the larger collection and integration of heterogeneous annotation sources and the quick accessibility to the larger amount of data are more important than the timely update, simply because the high throughput gene functional analysis relies on the global annotation profiles rather than an individual annotation source. Through automation of several tasks where appropriate and personnel additions, the goal of the DAVID knowledgebase update is set to occur quarterly. A complete list of the public databases contributing to the DAVID Knowledgebase are provided within Addiitonal File 3 while a detailed update procedure can be found in Additional File 4.
Utility and discussion
The data structure of the entire knowledgebase for downloads
The DAVID Knowledgebase is available in two categories of pair-wise text files: gene index files (gene id knowledge) and annotation index files (annotation knowledge) . Gene index files with a naming convention such as david2affy_id, david2genbank_accession, etc., are in a pair-wise format, linking a DAVID gene identifier to a public gene identifier (Figure 2). These relationships were built based on the results from the DAVID gene agglomeration step described in the "Construction and Content" section. Thus, any given public gene identifier can be converted to a corresponding DAVID gene identifier that represents a unique gene entry and internal linker to all available annotation contents within the DAVID Knowledgebase. Annotation index files with a naming convention such as david2pfam, david2kegg_pathway, david2omim, etc., are also in a format that pairs a DAVID gene identifier with an annotation term. All genes and annotation terms within the DAVID Knowledgebase are centralized by the common DAVID gene identifier. The unified DAVID gene identifier not only normalizes the cross-reference among heterogeneous databases, but also makes any search and calculation simpler and more efficient. Therefore, for any given public gene identifier, the corresponding DAVID gene identifier can be obtained with the DAVID gene index files. Then, any annotation terms within the DAVID Knowledgebase can be further queried from the DAVID annotation index files using the DAVID identifier (Figure 2B). In addition, each independent annotation resource and public gene identifier system is separated into independent files, making it easier for users to focus on the data they are most interested in, as well as reducing the file size and simplifying the format for easy processing. The easily interpreted file names, such as david2genbank_accession and david2pfam, allow users to quickly identify the data that will help them interpret their data. Of course, users can further combine their data files of interest into one large file or database table to search the data in a way best suited to the individual. The simple pair-wise text format provides the flexibility to either directly query the files or insert them into tables in a customized, in-house relational database without much, if any, file parsing or re-formatting. The format should be simple to regular users with some computational skills as well as extendable for expert users. Moreover, users may add any new annotation sources to the DAVID Knowledgebase as additional independent files, as long as the files are in a gene-annotation associated format. The DAVID Knowledgebase, with its large, diverse annotation categories and flexible format, provides the scientific community with a single, comprehensive platform for gathering annotation for specific studies.
Web interface for batch query
A query comparison between DAVID Knowledgebase and Entrez Gene
Key advantages of the DAVID Knowledgebase
The DAVID Knowledgebase intends to integrate and organize the high quality, world-class bioinformatics databases into a centralized location in a gene-centric format. The work is particularly useful for high throughput data analysis with the following advantages: 1) Improved ID cross-referencing capability enhances comprehensiveness of integration of heterogeneous annotation resources, hence enriching the annotation coverage for individual genes. 2) The integration allows quick access to a wide range of annotation contents in a batch manner. 3) Simple pair-wise, gene-centric formatted files simplify the data structure so that all users may benefit. 4) The pair-wise data structure is more flexible and suitable for high throughput data access.
Special attention on the DAVID Knowledgebase
After several QC procedures, a certain error rate still exists in the DAVID Knowledgebase as it exists in any other bio-databases. Users may report such errors directly to us through email or the DAVID forum . In addition, users should be aware that the DAVID Knowledgebase is designed for high throughput gene functional screening for large gene lists on a gene-centric level, rather than replacing original annotation databases, which may contain additional details, such as gene isoform specific annotations, for drill-down analysis. Moreover, the quarterly update schedule of the DAVID Knowledgebase could result in a slight time gap as compared with the member database updates.
The DAVID Gene Concept agglomerates diverse types of gene identifiers belonging to the same gene into one gene cluster. It allows large collections of heterogeneous annotations that are associated with different types of gene identifiers to be comprehensively integrated by a common DAVID gene. Combined with the simple pair-wise text format, the DAVID Knowledgebase provides not only a comprehensive, high-quality collection of gene annotation resources, but also the flexibility to cross-reference identifiers and annotations from several world-class, heterogeneous databases within one resource. To the best of our knowledge, the annotation data coverage and gene/protein ID cross-referencing capability far exceeds that of backend data sources of other high throughput gene functional annotation tools. Therefore, it can be used as the backend gene-annotation database of existing high throughput gene functional analysis tools to improve their discovery power. The DAVID Knowledgebase also aids the researcher in focusing on data analysis or the core development of new high-throughput functional data-mining algorithms, rather than spending time on gene-annotation data collection and integration.
Availability and requirements
The DAVID Knowledgebase is freely downloadable for nonprofit use under the URL http://david.abcc.ncifcrf.gov/knowledgebase. Data files are available in a tab-delimited text format which can be opened by any text editor in Windows, Mac or Unix systems.
Database for Annotation, Visualization and Integrated Discovery
Protein Information Resource
National Center for Biotechnology Information
Basic Local Alignment Search Tool
European Molecular Biology Laboratory
DNA Data Bank of Japan
Simple Modular Architecture Research Tool
Online Mendelian Inheritance in Man
Laboratory of Immunopathogenesis and Bioinformatics
The authors are grateful to the referees for their constructive comments and thank David Bryant in the ABCC group for web server support. Thanks also go to Melaku Gedil, Ping Ren, Jun Yang in the Laboratory of Immunopathogenesis and Bioinformatics (LIB) group for biological discussion. We also thank Bill Wilton and Mike Tartakovsky for information technology and network support. The project has been funded with federal funds from the National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health (NIH), under Contract No. NOI-CO-56000. The contents of this tool and publication do not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the United States Government.
- Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 2005, 33(Database issue):D54–8. 10.1093/nar/gki031PubMed CentralView ArticlePubMedGoogle Scholar
- Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS: UniProt: the Universal Protein knowledgebase. Nucleic Acids Res 2004, 32(Database issue):D115–9. 10.1093/nar/gkh131PubMed CentralView ArticlePubMedGoogle Scholar
- Wu CH, Yeh LS, Huang H, Arminski L, Castro-Alvear J, Chen Y, Hu Z, Kourtesis P, Ledley RS, Suzek BE, Vinayaka CR, Zhang J, Barker WC: The Protein Information Resource. Nucleic Acids Res 2003, 31(1):345–347. 10.1093/nar/gkg040PubMed CentralView ArticlePubMedGoogle Scholar
- Diehn M, Sherlock G, Binkley G, Jin H, Matese JC, Hernandez-Boussard T, Rees CA, Cherry JM, Botstein D, Brown PO, Alizadeh AA: SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data. Nucleic Acids Res 2003, 31(1):219–223. 10.1093/nar/gkg014PubMed CentralView ArticlePubMedGoogle Scholar
- Tsai J, Sultana R, Lee Y, Pertea G, Karamycheva S, Antonescu V, Cho J, Parvizi B, Cheung F, Quackenbush J: RESOURCERER: a database for annotating and linking microarray resources within and across species. Genome Biol 2001, 2(11):SOFTWARE0002. 10.1186/gb-2001-2-11-software0002PubMed CentralView ArticlePubMedGoogle Scholar
- Alibes A, Yankilevich P, Canada A, Diaz-Uriarte R: IDconverter and IDClight: conversion and annotation of gene and protein IDs. BMC Bioinformatics 2007, 8: 9. 10.1186/1471-2105-8-9PubMed CentralView ArticlePubMedGoogle Scholar
- Kasprzyk A, Keefe D, Smedley D, London D, Spooner W, Melsopp C, Hammond M, Rocca-Serra P, Cox T, Birney E: EnsMart: a generic system for fast and flexible access to biological data. Genome Res 2004, 14(1):160–169. 10.1101/gr.1645104PubMed CentralView ArticlePubMedGoogle Scholar
- Kent WJ, Hsu F, Karolchik D, Kuhn RM, Clawson H, Trumbower H, Haussler D: Exploring relationships and mining data with the UCSC Gene Sorter. Genome Res 2005, 15(5):737–741. 10.1101/gr.3694705PubMed CentralView ArticlePubMedGoogle Scholar
- Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, Sunshine M, Narasimhan S, Kane DW, Reinhold WC, Lababidi S, Bussey KJ, Riss J, Barrett JC, Weinstein JN: GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol 2003, 4(4):R28. 10.1186/gb-2003-4-4-r28PubMed CentralView ArticlePubMedGoogle Scholar
- Lee HK, Braynen W, Keshav K, Pavlidis P: ErmineJ: tool for functional analysis of gene expression data sets. BMC Bioinformatics 2005, 6: 269. 10.1186/1471-2105-6-269PubMed CentralView ArticlePubMedGoogle Scholar
- Beissbarth T, Speed TP: GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 2004, 20(9):1464–1465. 10.1093/bioinformatics/bth088View ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25(1):25–29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Dennis G Jr., Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA: DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 2003, 4(5):P3. 10.1186/gb-2003-4-5-p3View ArticlePubMedGoogle Scholar
- Hosack DA, Dennis G Jr., Sherman BT, Lane HC, Lempicki RA: Identifying biological themes within lists of genes with EASE. Genome Biol 2003, 4(10):R70. 10.1186/gb-2003-4-10-r70PubMed CentralView ArticlePubMedGoogle Scholar
- DAVID Knowledgebase[http://david.abcc.ncifcrf.gov/knowledgebase]
- DAVID Bioinformatics Resources[http://david.abcc.ncifcrf.gov]
- Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL: GenBank. Nucleic Acids Res 2006, 34(Database issue):D16–20. 10.1093/nar/gkj157PubMed CentralView ArticlePubMedGoogle Scholar
- Pruitt KD, Tatusova T, Maglott DR: NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 2005, 33(Database issue):D501–4. 10.1093/nar/gki025PubMed CentralView ArticlePubMedGoogle Scholar
- Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Mazumder R, O'Donovan C, Redaschi N, Suzek B: The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res 2006, 34(Database issue):D187–91. 10.1093/nar/gkj161PubMed CentralView ArticlePubMedGoogle Scholar
- Mitchell JA, Aronson AR, Mork JG, Folk LC, Humphrey SM, Ward JM: Gene indexing: characterization and analysis of NLM's GeneRIFs. AMIA Annu Symp Proc 2003, 460–464.Google Scholar
- Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bradley P, Bork P, Bucher P, Cerutti L, Copley R, Courcelle E, Das U, Durbin R, Fleischmann W, Gough J, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McDowall J, Mitchell A, Nikolskaya AN, Orchard S, Pagni M, Ponting CP, Quevillon E, Selengut J, Sigrist CJ, Silventoinen V, Studholme DJ, Vaughan R, Wu CH: InterPro, progress and status in 2005. Nucleic Acids Res 2005, 33(Database issue):D201–5. 10.1093/nar/gki106PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215(3):403–410.View ArticlePubMedGoogle Scholar
- DAVID Knowledgebase Web Interface[http://david.abcc.ncifcrf.gov/knowledgebase_summary.jsp]
- DAVID API Services[http://david.abcc.ncifcrf.gov/content.jsp?file=DAVID_API.html]
- Cicala C, Arthos J, Selig SM, Dennis G Jr., Hosack DA, Van Ryk D, Spangler ML, Steenbeke TD, Khazanie P, Gupta N, Yang J, Daucher M, Lempicki RA, Fauci AS: HIV envelope induces a cascade of cell signals in non-proliferating target cells that favor virus replication. Proc Natl Acad Sci U S A 2002, 99(14):9380–9385. 10.1073/pnas.142287999PubMed CentralView ArticlePubMedGoogle Scholar
- DAVID Forum[http://david.abcc.ncifcrf.gov/content.jsp?file=Contact.html]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.