Investigating the concordance of Gene Ontology terms reveals the intra- and inter-platform reproducibility of enrichment analysis
© Zhang et al.; licensee BioMed Central Ltd. 2013
Received: 12 February 2013
Accepted: 26 April 2013
Published: 29 April 2013
Reliability and Reproducibility of differentially expressed genes (DEGs) are essential for the biological interpretation of microarray data. The microarray quality control (MAQC) project launched by US Food and Drug Administration (FDA) elucidated that the lists of DEGs generated by intra- and inter-platform comparisons can reach a high level of concordance, which mainly depended on the statistical criteria used for ranking and selecting DEGs. Generally, it will produce reproducible lists of DEGs when combining fold change ranking with a non-stringent p-value cutoff. For further interpretation of the gene expression data, statistical methods of gene enrichment analysis provide powerful tools for associating the DEGs with prior biological knowledge, e.g. Gene Ontology (GO) terms and pathways, and are widely used in genome-wide research. Although the DEG lists generated from the same compared conditions proved to be reliable, the reproducible enrichment results are still crucial to the discovery of the underlying molecular mechanism differentiating the two conditions. Therefore, it is important to know whether the enrichment results are still reproducible, when using the lists of DEGs generated by different statistic criteria from inter-laboratory and cross-platform comparisons. In our study, we used the MAQC data sets for systematically accessing the intra- and inter-platform concordance of GO terms enriched by Gene Set Enrichment Analysis (GSEA) and LRpath.
In intra-platform comparisons, the overlapped percentage of enriched GO terms was as high as ~80% when the inputted lists of DEGs were generated by fold change ranking and Significance Analysis of Microarrays (SAM), whereas the percentages decreased about 20% when generating the lists of DEGs by using fold change ranking and t-test, or by using SAM and t-test. Similar results were found in inter-platform comparisons.
Our results demonstrated that the lists of DEGs in a high level of concordance can ensure the high concordance of enrichment results. Importantly, based on the lists of DEGs generated by a straightforward method of combining fold change ranking with a non-stringent p-value cutoff, enrichment analysis will produce reproducible enriched GO terms for the biological interpretation.
KeywordsDNA microarray Intra-/inter-platform comparison Gene Ontology enrichment Microarray quality control (MAQC)
Over the last decade, DNA microarray technology has reached a rapid development and found wide application in many areas of biology and medical science. One of its important applications is to identify differentially expressed genes (DEGs) across groups of samples or distinct biological conditions of interest [1, 2]. Biological interpretation of microarray data requires reliable and reproducible lists of DEGs. The microarray quality control (MAQC) project launched by US Food and Drug Administration (FDA) elucidated that the lists of DEGs generated by intra- and inter-platform comparisons reached a high level of concordance, which largely depended on the statistical criteria used for ranking and selecting DEGs [3, 4]. For the further biological interpretation, statistical methods of gene enrichment analysis provide powerful tools for associating the DEGs with prior biological knowledge, e.g. Gene Ontology (GO) terms and signaling pathways. The enrichment analysis mainly used prior knowledge, e.g. GO categories [5, 6] or Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways [7, 8], to investigate whether the predefined gene sets showed significantly phenotypic differences between two biological states.
Many methods for enrichment analysis were developed to discover the biological meaning of DEGs. Mootha et al. firstly proposed an earlier version of Gene Set Enrichment Analysis (GSEA), which used an equal weighted version of Kolmogorov-Smirnow statistic for gene sets enrichment without considering the correlation between genes and the phenotype . Subramanian et al. extended this procedure in 2005 and successfully used it for analyzing molecular profiling data . Kim and Volsky carried out a parametric analysis of gene set enrichment (PAGE) to the improved GSEA and identified more statistically significant gene sets. PAGE used less computational effort than GSEA because it used normal distribution for statistical inference . Oron et al. improved GSEA by using a linear regression diagnostic technique and discovered a vital factor to the influence of gene expression from acute lymphoblastic leukemia datasets . Ji et al. proposed a new method FDR-FET to improve the sensitivity and selectivity of GSEA . Kim et al. used z-statistics and permutation test to identify significantly enriched gene sets . In addition, other statistical methods including significance analysis of function and expression (SAFE) , BayGO , ProbCD , EasyGO , ProfCom , GlobalANCOVA , GOEAST  and LRpath  were also developed for enrichment analysis.
Based on the methods mentioned above, researchers can subsequently reveal the pathological mechanism from the microarray data sets. Xu et al. enriched two gene sets associated with the glycolytic-related pathway from the microarray data of prostate non-recurrent patients. This pathway was considered as a candidate negative modulator of AKT1-induced proliferation . De Windt et al. used GSEA to analyze Niemann-pick type C (NPC) disease and discovered 27 up-regulated and 33 down-regulated pathways. These affected pathways were provided as targets for subsequent drug discovery project . In breast cancer research, Murohashi et al. found that the genes composed in CD24/low/CD44+ cell populations were fallen into the significantly enriched gene sets, which were associated with the pathways of transforming growth factor-ß, tumor necrosis factor, and interferon response. The signaling pathways enriched by GSEA were suggested to identify molecular targets and biomarkers for Tumour-initiating-like cells .
However, when mapping the DEGs to the predefined gene sets, any difference between two DEG lists may cause different outputs of the enrichment analysis. For the same compared conditions, the reproducible enrichment results are still crucial to the discovery of the underlying molecular mechanism differentiating the two conditions. Therefore, it is important to know whether the enrichment results are still reproducible, when using the lists of DEGs generated by different statistic criteria from different commercial microarray platforms. As a part of the MAQC project, Guo et al. investigated the intra-laboratory overlap of enriched KEGG pathways and GO terms with a rat toxicogenomics dataset and revealed that, compared to the p-value ranking, the use of fold change ranking (with p < 0.05 cutoff) for DEG selection showed more consistency in enrichment analysis . In the previous study by Manoli et al. , the concordance of pathways enriched by Fisher’s exact test, global test and GESA were investigated based on the microarray data from Affymetrix microarray platform and the DEGs generated by significance analysis of microarrays (SAM) and mixed model analysis (MMA). The pathways found by Fisher’s exact test and global test showed more concordant than those by GSEA in all conditions. In the current study, the microarray data were collected from the large data sets provided by MAQC project [3, 4], which included three major microarray platforms: Affymetrix (AFX), Agilent Technologies (AG1) and Illumina (ILM) and the lists of DEGs for enrichment analysis were generated by using three statistical criteria: fold change ranking with a non-stringent p-value cut-off which was calculated by t-test, significance analysis of microarrays (SAM)  and t-test. Finally, we systematically investigated the intra- and inter-platform concordance of GO terms enriched by two common methods of enrichment analysis, namely gene set enrichment analysis (GSEA)  and LRpath . The results showed that, based on the DEG lists generated by SAM and FC, the levels of intra- and inter-platform concordance of GO terms were generally high and can satisfy the further biological interpretation.
Intra-platform concordance of enrichment results
Number of DEGs selected by FC, SAM and t-test and different cutoff
|log2FC| > 1 (p < 0.01)
|log2FC| > 1 (p < 0.05)
SAM (P < 0.01)
SAM (P < 0.05)
t-test (p < 0.01)
t-test (p < 0.05)
The results of the comparisons among difference DEG selection criteria for AG1 and ILM were shown in Additional file 1: Figures S5-S8. For AG1, when the GO terms enriched by GSEA and LRpath, the percentages of overlapping GO terms for comparing the fold change ranking (p < 0.05) with SAM (p < 0.05) were always higher than ~77%, whereas the percentages of overlapping GO terms for the rest comparisons varied from ~62% to ~92% (Additional file 1: Figure S5 and S6). Similar results can be seen for ILM (Additional file 1: Figure S7 and S8). When all GO terms meeting FDR < 0.25 criterion were selected, the variation range of the percentages for the comparisons among three DEG selection criteria became wider than those showed in Additional file 1: Figures S5 and S6.
Inter-platform concordance of enrichment results
Reproducible enrichment results are essential for further biological interpretation of microarray data when using statistical methods for gene enrichment analysis. It was proved by MAQC project that the levels of DEG concordance in inter-laboratory and cross-platform comparisons were generally high [3, 4]. For the subsequent enrichment analysis, it is important to know whether the DEG lists generated by different statistical criteria from different microarray platforms can ensure satisfied reproducibility of the enrichment results. In our current study, we systematically investigated the intra- and inter-platform concordance of GO terms enriched by GSEA and LRpath. Note that GSEA is of the 'subject-sampling' type while LRpath treats the genes as the sampling units. In this study, we only focused on the concordance of GO terms enriched by the same enrichment analysis method. The comparison of different enrichment analysis methods will be discussed in our further research.
As proposed by MAQC project that combining fold change ranking with a non-stringent p-value cut-off can provide reproducible DEG lists, the levels of concordance of enriched GO terms were still high for this straightforward combining method. In inter-site comparisons for AFX, AG1 and ILM, all the percentages of overlapping GO terms enriched by GSEA and LRpath were above ~90% and ~80%, respectively, when GO terms meeting FDR < 0.25 criterion were selected. The concordance of GSEA results were no significant difference in inter-site comparisons. But for a certain test site, the concordance of LRpath results were obviously different when the comparisons among three DEG selection criteria. For the cross-platform comparisons at each test site, the percentages were around 80% (varied from ~75% to ~84%) when the GO terms were enriched with the inputted DEGs selected by fold change ranking (|log2FC| > 1 (p < 0.05)). By contrast, the lack of reproducibility of enriched GO terms was found when the inputted DEGs were selected by t-test. Although all the percentages of overlapping GO terms enriched by GSEA in inter-site comparison were still greater than ~87%, the percentages varied from ~69% to ~74% when the inputted DEGs were generated by t-test and the GO terms were enriched by LRpath. Similarly, in the cross-platform comparisons for t-test (p < 0.05), the percentages of overlapping GO terms enriched by GSEA were ~78% and then dropped to ~50% when GO terms were enriched by LRpath. In addition, for AFX and AG1, we found that the percentages of overlapping GO terms for the comparison of fold change (|log2FC| > 1 (p < 0.05)) versus SAM (p < 0.05) were always higher than ~76% when comparing the different DEG selection criteria at each test site. It suggested that the concordance of enrichment results based on the DEG selection methods of fold change ranking and SAM were generally high.
To some extent, the number of selected DEGs impacted on the percentages of overlapping GO terms. In inter-site comparisons, most of the percentages for SAM (p < 0.01) were higher than ~85% except for the comparisons of ILM_1 versus ILM_2 and ILM_2 versus ILM_3, when the GO terms were enriched by GSEA (Additional file 1: Figure S2a and S2c), because the number of DEGs selected from test site 2 was 3,059, which was about half of those selected from test sites 1 and 3 (Table 1). It is worthwhile to note that there were large discrepancies between the two reference RNA samples, namely UHRR and HBRR, which were just designed for investigating the capabilities and limitations of the microarray technology and for the corresponding data analysis approaches. So, the number of selected DEGs from a relevant biological study such as control versus treatment would be less than those selected by using UHRR versus HBRR, for which it may cause a decrease in the percentages of overlapping GO terms. In addition, when comparing the GO semantic similarity with real biological data sets, the hierarchical structure of GO graph should be considered .
In our study, we conducted the intra- and inter-platform comparisons with MAQC data sets and inspected the concordance of GO terms enriched by GSEA and LRpath when the inputted DEG lists were generated by different statistical criteria. The percentages of overlapping GO terms for fold change ranking (|log2FC| > 1 (p < 0.05)) were as high as ~90% in inter-site comparisons when GO terms meeting FDR < 0.25 criterion were selected, and were around 80% in cross-platform comparisons. Our results demonstrated that the DEG lists generated by a straightforward method combining fold change ranking with a non-stringent p-value cut-off can ensure the reproducibility of the enrichment results. In addition, the tool GSEA for enrichment analysis can always yield relatively stable enrichment results.
The MAQC data sets were downloaded from the National Center for Biotechnology Information’s Gene Expression Omnibus (GEO series accession number: GSE5350). The two compared RNA samples were a Universal Human Reference RNA (UHRR, marked as sample A) from Stratagene and a Human Brain Reference RNA (HBRR, marked as sample B) from Ambion, which were used as two compared biological conditions for selecting DEGs. Microarray data generated from three commercial platforms: Affymetrix (AFX), Agilent Technologies (AG1) and Illumina (ILM), were collected from MAQC data sets and used in our study. Each microarray platform was tested at three independent test sites and each RNA sample was replicated five times at each test site. Due to the distinct probe-design strategies and manufacturing processes, different microarray platforms target different subsets of the whole human transcriptome. For the convenient of intra- and inter-platform comparison, we directly focused on the expression of 12,091 common probes, which were summarized by MAQC project and represented 12,091 unique Entrez genes [3, 4]. Results showed below were based on these 12,091 “common” genes. All the gene expression data were log2-transformed.
Student t-test is extensively used in gene expression analysis. It demonstrates whether the difference between two groups of samples is significant. In our study, the p-values calculated by t-test are directly used for gene filtering without any multiple-testing correction. The DEGs were obtained by setting two criteria of p < 0.05 and p < 0.01, and inputted to GSEA and LRpath for further analysis.
Fold Change (FC)
The fold change is a wildly used method for selecting DEGs from gene expression data and indicates to what extent a gene is differentially expressed between two groups of samples. After filtering the genes with the non-stringent p-value cutoff (p < 0.05 or p < 0.01) which calculated by t-test, the rest of them were ranked by their fold changes (sample A/sample B). Note that for each test site, the expression intensity of a gene in sample A or sample B was the average value of the intensities of five replicates. Then, at each given cut-off, a list of DEGs was obtained for the subsequent analysis.
Significance Analysis of Microarrays (SAM)
Significance analysis of microarrays (SAM) identifies whether a gene is significantly different between two groups of samples based on a permutation procedure by combining the gene-specific t-test with a statistic d value . DEGs selected by SAM were calculated with siggenes package in Bioconductor 2.10 within R 2.12.1.
Gene Set Enrichment Analysis (GSEA)
Gene Set Enrichment Analysis (GSEA) is a commonly used approach for enrichment analysis. An earlier version of this method was firstly proposed by Mootha et al. . In 2005, Subramanian et al. extended this procedure by considering the correlation between each of the genes and the phenotypes . In our research, we used the GSEA methodology described by Subramanian et al. GSEA software can be downloaded from web site (http://www.broadinstitute.org/gsea/index.jsp). In the calculation procedure, genes were first ranked by signal to noise ratio (SNR) or other metric generated by statistical methods, such as p-values generated by standard t-test and SAM or log2-transformed values of fold change. Then an enrichment score (ES) corresponded to Kolmogorov-Smirnow statistic was calculated based on the ranked gene lists for each predefined gene set and subsequently normalized according to its size. Finally, based on the normalized enrichment score, a permutation-based false discovery rate (FDR) was generated to indicate the significance of enriched gene sets. The GO terms associated with the significant enriched gene sets were identified and used for further biological interpretation.
LRpath is a logistic-based method for identifying the significantly enriched gene sets, which described the log-odds of a gene belonging to the specific category as a linear function of the statistical significance of its expression level, e.g. p-value generated by t-test . The slop parameter in the logistic regression equation was used to decide whether a predefined gene set is significantly enriched with the inputted DEGs. The p-values from the test of each predefined gene sets were then adjusted for multiple testing by controlling FDR. LRpath program was run in R 2.12.1 and can be downloaded from web site (http://eh3.uc.edu/lrpath).
Percentage of overlapping GO terms
where O i is the number of pairs of overlapped GO terms and T i is the total number of pairs of two lists within the first i pairs (i = 1, 2,…, N). For GSEA, N is the length of combined list which consisted of the shorter lists of GO terms between two compared lists enriched in ‘pos’ and ‘neg’ phenotypes. For LRpath, N equals the shorter length of two compared lists.
Significance analysis of microarrays
False discovery rate
Differentially expressed genes
Gene set enrichment analysis.
This work was supported by the National Nature Science Foundation of China (No. 21205085, No. 21175095).
- Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science. 1999, 286 (5439): 531-537. 10.1126/science.286.5439.531.View ArticlePubMedGoogle Scholar
- Welsh JB, Sapinoso LM, Su AI, Kern SG, Wang-Rodriguez J, Moskaluk CA, Frierson HF, Hampton GM: Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Res. 2001, 61 (16): 5974-5978.PubMedGoogle Scholar
- Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006, 24 (9): 1151-1161. 10.1038/nbt1239.View ArticlePubMedGoogle Scholar
- Wen ZSZ, Liu J, Ning B, Guo L, Tong W, Shi L: The Microarray quality control (MAQC) project and cross-platform analysis of microarray data. Handbook of statistical bioinformatics. Chapter 9. Edited by: Lu HH, Scholkopf B, Zhao H. 2011, Berlin: Springer, 171-192.View ArticleGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene Ontology: tool for the unification of biology. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.PubMed CentralView ArticlePubMedGoogle Scholar
- Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32: D258-D261. 10.1093/nar/gkh036.View ArticlePubMedGoogle Scholar
- Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 1999, 27 (1): 29-34. 10.1093/nar/27.1.29.PubMed CentralView ArticlePubMedGoogle Scholar
- Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 2006, 34: D354-D357. 10.1093/nar/gkj102.PubMed CentralView ArticlePubMedGoogle Scholar
- Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E: PGC-1 alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003, 34 (3): 267-273. 10.1038/ng1180.View ArticlePubMedGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 2005, 102 (43): 15545-15550. 10.1073/pnas.0506580102.PubMed CentralView ArticlePubMedGoogle Scholar
- Kim SY, Volsky DJ: PAGE: Parametric analysis of gene set enrichment. BMC Bioinformatics. 2005, 6: 144-155. 10.1186/1471-2105-6-144.PubMed CentralView ArticlePubMedGoogle Scholar
- Oron AP, Jiang Z, Gentleman R: Gene set enrichment analysis using linear models and diagnostics. Bioinformatics. 2008, 24 (22): 2586-2591. 10.1093/bioinformatics/btn465.PubMed CentralView ArticlePubMedGoogle Scholar
- Ji R-R, Ott K-H, Yordanova R, Bruccoleri RE: FDR-FET: an optimizing gene set enrichment analysis method. AABC. 2011, 4: 37-42.PubMed CentralPubMedGoogle Scholar
- Kim S-B, Yang S, Kim S-K, Kim SC, Woo HG, Volsky DJ, Kim S-Y, Chu I-S: GAzer: gene set analyzer. Bioinformatics. 2007, 23 (13): 1697-1699. 10.1093/bioinformatics/btm144.View ArticlePubMedGoogle Scholar
- Barry WT, Nobel AB, Wright FA: Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics. 2005, 21 (9): 1943-1949. 10.1093/bioinformatics/bti260.View ArticlePubMedGoogle Scholar
- Vencio RZN, Koide T, Gomes SL, Pereira CAD: BayGO: Bayesian analysis of ontology term enrichment in microarray data. BMC Bioinformatics. 2006, 7: 86-96. 10.1186/1471-2105-7-86.PubMed CentralView ArticlePubMedGoogle Scholar
- Vencio RZN, Shmulevich I: ProbCD: enrichment analysis accounting for categorization uncertainty. BMC Bioinformatics. 2007, 8: 383-389. 10.1186/1471-2105-8-383.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhou X, Su Z: EasyGO: Gene Ontology-based annotation and functional enrichment analysis tool for agronomical species. BMC Genomics. 2007, 8: 246-249. 10.1186/1471-2164-8-246.PubMed CentralView ArticlePubMedGoogle Scholar
- Antonov AV, Schmidt T, Wang Y, Mewes HW: ProfCom: a web tool for profiling the complex functionality of gene groups identified from high-throughput data. Nucleic Acids Res. 2008, 36: W347-W351. 10.1093/nar/gkn239.PubMed CentralView ArticlePubMedGoogle Scholar
- Hummel M, Meister R, Mansmann U: GlobalANCOVA: exploration and assessment of gene group effects. Bioinformatics. 2008, 24 (1): 78-85. 10.1093/bioinformatics/btm531.View ArticlePubMedGoogle Scholar
- Zheng Q, Wang X-J: GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis. Nucleic Acids Res. 2008, 36: W358-W363. 10.1093/nar/gkn276.PubMed CentralView ArticlePubMedGoogle Scholar
- Sartor MA, Leikauf GD, Medvedovic M: LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data. Bioinformatics. 2009, 25 (2): 211-217. 10.1093/bioinformatics/btn592.PubMed CentralView ArticlePubMedGoogle Scholar
- Xu Q, Majumder PK, Ross K, Shim Y, Golub TR, Loda M, Sellers WR: Identification of prostate cancer modifier pathways using parental strain expression mapping. PNAS. 2007, 104 (45): 17771-17776. 10.1073/pnas.0708476104.PubMed CentralView ArticlePubMedGoogle Scholar
- De Windt A, Rai M, Kytomaki L, Thelen KM, Luetjohann D, Bernier L, Davignon J, Soini J, Pandolfo M, Laaksonen R: Gene set enrichment analyses revealed several affected pathways in Niemann-Pick disease type C fibroblasts. Dna Cell Biol. 2007, 26 (9): 665-671. 10.1089/dna.2006.0570.View ArticlePubMedGoogle Scholar
- Murohashi M, Hinohara K, Kuroda M, Isagawa T, Tsuji S, Kobayashi S, Umezawa K, Tojo A, Aburatani H, Gotoh N: Gene set enrichment analysis provides insight into novel signalling pathways in breast cancer stem cells. Brit J Cancer. 2010, 102 (1): 206-212. 10.1038/sj.bjc.6605468.PubMed CentralView ArticlePubMedGoogle Scholar
- Guo L, Lobenhofer EK, Wang C, Shippy R, Harris SC, Zhang L, Mei N, Chen T, Herman D, Goodsaid FM: Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nat Biotechnol. 2006, 24 (9): 1162-1169. 10.1038/nbt1238.View ArticlePubMedGoogle Scholar
- Manoli T, Gretz N, Grone H-J, Kenzelmann M, Eils R, Brors B: Group testing for pathway analysis improves comparability of different microarray datasets. Bioinformatics. 2006, 22 (20): 2500-2506. 10.1093/bioinformatics/btl424.View ArticlePubMedGoogle Scholar
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. PNSA. 2001, 98 (18): 10515-View ArticleGoogle Scholar
- Pesquita C, Faria D, Falcao AO, Lord P, Couto FM: Semantic Similarity in Biomedical Ontologies. PLoS Comput Biol. 2009, 5 (7): e1000443-10.1371/journal.pcbi.1000443.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.