- Oral presentation
- Open Access
Public microarray repository semantic annotation with ontologies employing text mining and expression profile correlation
© Ruau et al; licensee BioMed Central Ltd 2008
- Published: 30 October 2008
- Microarray Experiment
- Gene Expression Omnibus
- Text Mining
- Unify Medical Language System
- Semantic Annotation
Gene Expression Omnibus (GEO)  is the largest public web repository of microarray experiments. GEO, like ArrayExpress and Stanford MicroArray Database, provides descriptions of microarray experiments in free text making it difficult to search and comprehensively link those data to other knowledge resources. Text mining techniques applied to microarray experiment annotation are challenged by poor and/or ambiguous free text description and consequently leave some objects unlabelled. Previous work organized GEO entries at the level of series (GSE) and data sets (GDS)  using the Unified Medical Language System (UMLS) . GSE and GDS description are often too broad and a better quality of annotation can be achieved if the GEO samples (GSM) are considered directly. Here we report on a novel approach for annotating GSM objects by employing a combination of text mining and global gene expression similarity. We hypothesize that the biological material analyzed on microarrays is related if unlabeled and labeled objects are highly similar in expression values and hence the class/annotation of one object can help annotate an unlabeled object. Our new method allows us to achieve a higher percentage of semantic annotation by combining both types of information stored in microarray databases.
GSM object annotation coverage.
GSM object number (a)
Labeled objects by ProMiner (b)
Raw data available/ProMiner labeled (c)
Propagated annotation (d)
The class/annotation propagation from a labeled object to an unlabeled object works only if there is one labeled object within the δ range. Thus the chances of class propagation increase with the number of available objects. We plan to improve on this by merging different types of microarray platforms by using tools like AILUN  as well as adding a confidence score to the propagated annotations. Ultimately, the annotation process will be automatized and the resulting database made freely available.
- Gene Expression Omnibus[http://www.ncbi.nlm.nih.gov/geo/]
- Butte AJ, Kohane IS: Creation and implications of a phenome-genome network. Nat Biotechnol 2006, 24: 55–62. 10.1038/nbt1150PubMed CentralView ArticlePubMedGoogle Scholar
- Unified Medical Language System[http://www.nlm.nih.gov/research/umls/]
- Hanisch D, Fundel K, Mevissen HT, Zimmer R, Fluck J: ProMiner: rule-based protein and gene entity recognition. BMC Bioinformatics 2005, 6(Suppl 1):S14. 10.1186/1471-2105-6-S1-S14PubMed CentralView ArticlePubMedGoogle Scholar
- The Open Biomedical Ontologies[http://obofoundry.org/]
- Dasarathy BV: Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press; 1991.Google Scholar
- Chen R, Li L, Butte AJ: AILUN: reannotating gene expression data automatically. Nature Methods 2007, 4: 879. 10.1038/nmeth1107-879PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd.