Facilitating functional annotation of chicken microarray data
- Teresia J Buza1, 2Email author,
- Ranjit Kumar1, 2,
- Cathy R Gresham2, 5,
- Shane C Burgess†1, 2, 3, 4 and
- Fiona M McCarthy†1, 2
© Buza et al; licensee BioMed Central Ltd. 2009
Published: 8 October 2009
Modeling results from chicken microarray studies is challenging for researchers due to little functional annotation associated with these arrays. The Affymetrix GenChip chicken genome array, one of the biggest arrays that serve as a key research tool for the study of chicken functional genomics, is among the few arrays that link gene products to Gene Ontology (GO). However the GO annotation data presented by Affymetrix is incomplete, for example, they do not show references linked to manually annotated functions. In addition, there is no tool that facilitates microarray researchers to directly retrieve functional annotations for their datasets from the annotated arrays. This costs researchers amount of time in searching multiple GO databases for functional information.
We have improved the breadth of functional annotations of the gene products associated with probesets on the Affymetrix chicken genome array by 45% and the quality of annotation by 14%. We have also identified the most significant diseases and disorders, different types of genes, and known drug targets represented on Affymetrix chicken genome array. To facilitate functional annotation of other arrays and microarray experimental datasets we developed an Array GO Mapper (AGOM) tool to help researchers to quickly retrieve corresponding functional information for their dataset.
Results from this study will directly facilitate annotation of other chicken arrays and microarray experimental datasets. Researchers will be able to quickly model their microarray dataset into more reliable biological functional information by using AGOM tool. The disease, disorders, gene types and drug targets revealed in the study will allow researchers to learn more about how genes function in complex biological systems and may lead to new drug discovery and development of therapies. The GO annotation data generated will be available for public use via AgBase website and will be updated on regular basis.
The development of microarray high-throughput screening platforms for chicken is an important step for gene expression profiling in changes occurring in avian as a response to different challenges and stimuli [1–3]. The chicken research community uses microarrays for a wide range of applications, including gene expression analysis [1, 4], exon expression analysis [5–7], novel transcript discovery , genotyping [9, 10] and resequencing [11, 12]. In addition, microarray analysis can also be combined with chromatin immunoprecipitation to perform genome-wide identification of transcription factors and their respective binding sites .
According to statistics obtained from "Gallus Expression in Situ Hybridization Analysis" (GEISHA; http://geisha.arizona.edu/geisha/microarray.jsp; 03/14/2009), there is already significant resources constructed for the "Whole Genome" Chicken Microarrays. Listed in GEISHA are: 1) Arizona Gallus gallus 20.7 K Long Oligo Array, 2) Affymetrix array which cover 32,773 transcripts corresponding to over 28,000 chicken genes, 3) FHCRC Chicken 13 K Array, 4) University of Delaware-Larry Cogburn which produced UD_Liver_3.2 K, UD 7.4 K Metabolic/Somatic Systems, Chicken Neuroendocrine System 5 K and the DEL-MAR 14 K Integrated Systems and 5) ARK Genomics which offers a 1,153 clone chicken embryo array, a 5,000 cDNA chicken immune array, and a 4,800 clone chicken neuroendocrine array. Gene Expression Omnibus (GEO), publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo, is a curated public repository for high-throughput gene expression data [14, 15]. Platform is one of central data entities of GEO which contains a list of probes that define what set of molecules may be detected and can easily be browsed, queried and retrieved to fit user's interests [14, 16].
Comprehensive annotation of these arrays will benefit chicken researchers, because they will be able to functionally model their expressed dataset to obtain relevant information about their biological system. However, most arrays are not associated to any functional information. The only array that is comprehensively annotated to GO is the Affymetrix chicken GeneChip array . This array is the mostly used for gene expression studies as shown in a survey when the chicken research community was polled in July 2008 http://doodle.com/participation.html?pollId=zwvmhpt5t23tvfv8). The Affymetrix NetAffx database links probesets on Affymetrix GenChip microarrays to GO using data from the GO Consortium . However, the GO evidence codes are not linked to any reference that was used to make functional assertions. This is a challenge to researchers who want to associate their dataset with functional information at the same time showing supporting evidence. For example, use of an experimental evidence code in a GO annotation should be associated with a paper that displays results from a physical characterization of a gene/gene product being annotated. This allows the researcher to access the detailed information that was used to make the GO annotation.
In this study we have re-annotated all gene products associated with probesets on Affymetrix chicken genome array using GO standards. However, the GO describes normal gene or gene product function  such that information about which genes are associated with significant diseases and disorders and which are known to be drug targets is not captured using the GO. This type of information would clearly benefit researchers in modeling diseases. We therefore used Ingenuity Pathway Analysis to identify significant diseases, disorders, drug targets and types of gene represented on Affymetrix chicken genome array. Furthermore, we demonstrate how other microarrays can be annotated using the annotations from Affymetrix chicken array.
Initial assessment of structural and functional annotation of chicken array
Initial assessment of structural and functional annotation of chicken array
Name of Microarray
ARK-Genomics G. gallus 20 K v1.0 (GPL5480)
ARK-Genomics G. gallus 13 K v4.0 (GPL5673)
Affymetrix GenChip® chicken genome array
Chicken 44 K custom Agilent microarray (GPL4993)
Arizona Gallus gallus 20.7 K Oligo Array v1.0 (GPL6049)
FHCRC Chicken 13 K Array (GPL1836)
Custom 4 × 2 K miRNA microarray (#4166) (GPL7472)
Chick Pineal 2004 (GPL1289)
DEL-MAR 14 K Integrated Systems(GPL1731)
Avian Innate Immunity Microarray (AIIM) (GPL1461)
UD 7.4 K Metabolic/Somatic Systems (GPL1737)
UD_Liver_3.2 K (GPL1742)
Chicken_Neuroendocrine_System_5 K (GPL1744)
Functional annotation and GO annotation quality
Biological functions represented on Affymetrix chicken GenChip® array
Number of Genes
Diseases and Disorders
2.43E-53 – 6.86E-08
4.94E-52 – 6.76E-08
6.69E-37 – 6.69E-37
3.18E-36 – 6.17E-08
6.01E-30 – 6.17E-08
Molecular and Cellular Functions
1.19E-55 – 6.51E-08
Cellular growth and proliferation
6.66E-42 – 4.87E-08
1.00E-35 – 5.68E-08
2.82E-35 – 2.21E-08
1.89E-32 – 6.78E-08
Physiological System Development and Function
5.95E-38 – 1.18E-12
6.07E-36 – 4.30E-08
5.40E-34 – 5.36E-08
2.33E-33 – 4.90E-08
2.03E-27 – 1.20E-08
Tool for array GO mapping
The major challenge that faces microarray researchers is interpretation of hundreds of differentially expressed genes into a biologically relevant context. The Gene Ontology (GO) Consortium provides a controlled vocabulary to annotate the biological knowledge associated with genes or gene products. In order to make the functional interpretation of microarray dataset less challenging, microarray developers can associate their arrays with functional information.
However, most chicken arrays either have no associated GO information or do not follow the GO annotation standards . In this study we have re-annotated and improved the GO annotation of Affymetrix chicken genome array to facilitate annotation of other chicken arrays and microarray experimental datasets. Further, we developed the Array GO Mapper (AGOM) tool to generate GO annotations for chicken arrays with no GO information or for microarray experimental datasets and demonstrated its utility by annotating the Arizona chicken array which had no associated GO information. By implementing AGOM researchers will not only obtain functional information for their experimental dataset but will also obtain GAQ scores associated with each GO term retrieved. This will help researchers determine the quality of annotations made to their datasets and also help tracking the improvement made by any additional GO when there are any updates.
We also provided additional functional information not covered by the GO but is associated with the Affymetrix chicken genome array. This additional data broadens the ability of array users to model their datasets, for example infectious disease datasets. The additional information obtained on diseases, disorders and known drug targets represented on this array will provide light to future research in drug and therapy development.
Improved amount and quality of GO annotations of gene products represented on the Affymetrix chicken genome array will help researchers to model their genes of interest to high quality functional information by using AGOM tool. The existing chicken microarray studies can use AGOM and this demonstrates how this tool can enhance functional annotation in these studies. Annotation of microarrays of other species will be included in the future. The top significant diseases and disorders represented on the chicken array correlate well with how the chicken is used as a biomedical model organism to study human diseases and development. The identified gene types and drug targets allows researchers to learn more about how genes function in complex biological systems and may lead to new drug discovery and development of therapies.
Initial assessment of structural and functional annotation of chicken array
We downloaded 12 chicken array platforms deposited in the NCBI Gene Expression Omnibus (GEO: http://www.ncbi.nlm.nih.gov/geo/) database (Table 1). Affymetrix GenChip chicken genome array annotations were downloaded from the Affymetrix website http://www.affymetrix.com. In each array we assessed whether the printed transcripts were structurally linked to any gene, EST or protein. Gene Ontology (GO) was used as criteria for initial assessment of functional annotation. The purpose of this assessment was to determine which whole chicken genome arrays could be used as reference for structural and functional annotation of other arrays or experimental datasets. Affymetrix chicken genome array was the only one that had been comprehensively structurally and functionally annotated and was selected for further improvement.
Further assessment and improvement of GO annotation of the Affymetrix chicken array was necessary. The GO annotations associated with the probe sets on Affymetrix chicken array do not show detail information to support the annotation. For example; were experimental evidence codes are shown there is no any literature referenced to support the annotation. For this reason we decided to re-annotate all gene products linked to the probesets on this array, regardless of their original annotations, in order to provide high quality and standard functional information to the array users. We first used GORetriever  to download chicken GO annotations for all UniProtKB accessions linked to the probesets. Further annotations for linked gene products with RefSeq number and Ensembl gene identifiers were obtained from AgBase-community databases and Gene Ontology Annotation (GOA) project using an in-house Perl script (GOMapper.pl). Additional GO was retrieved by implementing an in-house tool (ISO.pl) to transfer the experimental GO annotations from 1:1 chicken-human/mouse/rat orthologs to the corresponding chicken proteins orthologs. The improved GO annotations will be made available publicly via AgBase.
Additional functional information
In addition to the molecular function, biological process and cellular component annotations provided via the GO, other functional information is also useful for researchers wishing to assess the type of biological information represented by transcript printed on an array. For example, researchers will also benefit by knowing which genes on the array are associated with disease and disorders and which are known drug targets. We used Ingenuity Pathways Analysis (IPA) software to determine known drug targets and significant disease and disorders. The Fischer's exact test was used to calculate a P-value determining the probability that the biological functions, diseases or disorders assigned to the array datasets was due to chance alone.
Assessment of GO annotation quality (GAQ)
Development of Array GO Mapper (AGOM)
AGOM was developed to GO annotate chicken arrays and chicken microarray experimental datasets using improved Affymetrix GO annotations generated in the work described here. The tool is written in Perl and works on both windows and Linux platforms. It requires a tab delimited input file containing the microarray dataset cross references for which the GO annotations are searched. The Affymetrix improved GO data file was used as a database to search from. This database contains 6 cross-reference identifier types, which facilitate mapping between arrays and experimental datasets. AGOM works with any type of array (whole genome and specific array platform) and experimental datasets with common identifier(s) between the arrays/datasets and the Affymetrix data. The gene associations are presented in 16 columns according to GO standards (Additional file 3). The depth of a GO term, evidence code rank and GAQ score of individual GO term associated with the Affymetrix GO data are in the last 3 columns of file.
We demonstrated AGOM implementation by searching GO annotation for Arizona chicken array (GPL6049) from improved Affymetrix chicken array GO data. The Arizona chicken array was chosen because it has no existing GO associated with its gene products (Table 1). In addition, the Arizona array probes are linked to a variety of identifiers (GenBank accession, Entrez Gene ID and Ensembl ID) that can be used to search the Affymetrix GO data while most of other arrays contain only GenBank accessions (Additional file 3). For example, in this study GenBank accession, Entrez Gene ID and Ensembl ID linked to Arizona array were searched against the improved Affymetrix GO annotations to retrieve corresponding GO records. The output generated from the search includes Arizona array identifiers in the first 5 columns; Oligo_ID (unique ID), GenBank accession, Entrez Gene ID, Ensembl ID and array Spot number. When a match is found the corresponding GO information is added to a tab-delimited output file.
AGOM is available via AgBase (http://www.agbase.msstate.edu/; see under Array annotation) where users can use the tool directly online or can download it as a standalone program. When implementing the tool online, users will be given options to retrieve any data associated with the Affymetrix chicken array (Additional file 3). The script is also available upon request and advice is available by e-mail.
The project was supported by the National Research Initiative of the USDA Cooperative State Research, Education and Extension Service, grant number MISV-329140.
This article has been published as part of BMC Bioinformatics Volume 10 Supplement 11, 2009: Proceedings of the Sixth Annual MCBIOS Conference. Transformational Bioinformatics: Delivering Value from Genomes. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/10?issue=S11.
- Heidari M, Huebner M, Kireev D, Silva RF: Transcriptional profiling of Marek's disease virus genes during cytolytic and latent infection. Virus Genes 2008, 36(2):383–392. 10.1007/s11262-008-0203-7View ArticlePubMedGoogle Scholar
- Sarson AJ, Read LR, Haghighi HR, Lambourne MD, Brisbin JT, Zhou H, Sharif S: Construction of a microarray specific to the chicken immune system: profiling gene expression in B cells after lipopolysaccharide stimulation. Can J Vet Res 2007, 71(2):108–118.PubMed CentralPubMedGoogle Scholar
- Smith J, Speed D, Hocking PM, Talbot RT, Degen WG, Schijns VE, Glass EJ, Burt DW: Development of a chicken 5 K microarray targeted towards immune function. BMC Genomics 2006, 7: 49. 10.1186/1471-2164-7-49PubMed CentralView ArticlePubMedGoogle Scholar
- Masker K, Golden A, Gaffney CJ, Mazack V, Schwindinger WF, Zhang W, Wang LH, Carey DJ, Sudol M: Transcriptional profile of Rous Sarcoma Virus transformed chicken embryo fibroblasts reveals new signaling targets of viral-src. Virology 2007, 364(1):10–20. 10.1016/j.virol.2007.03.026PubMed CentralView ArticlePubMedGoogle Scholar
- Clark TA, Schweitzer AC, Chen TX, Staples MK, Lu G, Wang H, Williams A, Blume JE: Discovery of tissue-specific exons using comprehensive human exon microarrays. Genome Biol 2007, 8(4):R64. 10.1186/gb-2007-8-4-r64PubMed CentralView ArticlePubMedGoogle Scholar
- Grigoryev DN, Ma SF, Shimoda LA, Johns RA, Lee B, Garcia JG: Exon-based mapping of microarray probes: recovering differential gene expression signal in underpowered hypoxia experiment. Mol Cell Probes 2007, 21(2):134–139. 10.1016/j.mcp.2006.09.002PubMed CentralView ArticlePubMedGoogle Scholar
- Xing Y, Kapur K, Wong WH: Probe selection and expression index computation of Affymetrix Exon Arrays. PLoS ONE 2006, 1: e88. 10.1371/journal.pone.0000088PubMed CentralView ArticlePubMedGoogle Scholar
- Cao W, Epstein C, Liu H, DeLoughery C, Ge N, Lin J, Diao R, Cao H, Long F, Zhang X, et al.: Comparing gene discovery from Affymetrix GeneChip microarrays and Clontech PCR-select cDNA subtraction: a case study. BMC Genomics 2004, 5(1):26. 10.1186/1471-2164-5-26PubMed CentralView ArticlePubMedGoogle Scholar
- Butcher LM, Meaburn E, Liu L, Fernandes C, Hill L, Al-Chalabi A, Plomin R, Schalkwyk L, Craig IW: Genotyping pooled DNA on microarrays: a systematic genome screen of thousands of SNPs in large samples to detect QTLs for complex traits. Behav Genet 2004, 34(5):549–555. 10.1023/B:BEGE.0000038493.26202.d3View ArticlePubMedGoogle Scholar
- Meaburn E, Butcher LM, Liu L, Fernandes C, Hansen V, Al-Chalabi A, Plomin R, Craig I, Schalkwyk LC: Genotyping DNA pools on microarrays: tackling the QTL problem of large samples and large numbers of SNPs. BMC Genomics 2005, 6(1):52. 10.1186/1471-2164-6-52PubMed CentralView ArticlePubMedGoogle Scholar
- Corless CE, Kaczmarski E, Borrow R, Guiver M: Molecular characterization of Neisseria meningitidis isolates using a resequencing DNA microarray. J Mol Diagn 2008, 10(3):265–271. 10.2353/jmoldx.2008.070152PubMed CentralView ArticlePubMedGoogle Scholar
- Lebet T, Chiles R, Hsu AP, Mansfield ES, Warrington JA, Puck JM: Mutations causing severe combined immunodeficiency: detection with a custom resequencing microarray. Genet Med 2008, 10(8):575–585.View ArticlePubMedGoogle Scholar
- Chung HR, Kostka D, Vingron M: A physical model for tiling array analysis. Bioinformatics 2007, 23(13):i80–86. 10.1093/bioinformatics/btm167View ArticlePubMedGoogle Scholar
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Marshall KA, et al.: NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 2009, (37 Database):D885–890. 10.1093/nar/gkn764Google Scholar
- Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002, 30(1):207–210. 10.1093/nar/30.1.207PubMed CentralView ArticlePubMedGoogle Scholar
- Zhu Y, Davis S, Stephens R, Meltzer PS, Chen Y: GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus. Bioinformatics 2008, 24(23):2798–2800. 10.1093/bioinformatics/btn520PubMed CentralView ArticlePubMedGoogle Scholar
- GeneChip ® Arrays: Chicken Genome Array [http://www.affymetrix.com/technology/index.affx, http://www.affymetrix.com/products/arrays/specific/chicken.affx]
- Liu G, Loraine AE, Shigeta R, Cline M, Cheng J, Valmeekam V, Sun S, Kulp D, Siani-Rose MA: NetAffx: Affymetrix probesets and annotations. Nucleic Acids Res 2003, 31(1):82–86. 10.1093/nar/gkg121PubMed CentralView ArticlePubMedGoogle Scholar
- Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, et al.: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 2004, (32 Database):D258–261.Google Scholar
- Buza TJ, McCarthy FM, Wang N, Bridges SM, Burgess SC: Gene Ontology annotation quality analysis in model eukaryotes. Nucleic Acids Res 2008, 36(2):e12. 10.1093/nar/gkm1167PubMed CentralView ArticlePubMedGoogle Scholar
- Rhee SY, Wood V, Dolinski K, Draghici S: Use and misuse of the gene ontology annotations. Nat Rev Genet 2008, 9(7):509–515. 10.1038/nrg2363View ArticlePubMedGoogle Scholar
- McCarthy FM, Bridges SM, Wang N, Magee GB, Williams WP, Luthe DS, Burgess SC: AgBase: a unified resource for functional analysis in agriculture. Nucleic Acids Res 2007, (35 Database):D599–603. 10.1093/nar/gkl936Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.