Skip to main content

Table 4 Precision and recall

From: Discovery of novel biomarkers and phenotypes by semantic technologies

Benchmark

Benchmark corpus

InfoCodex corpus

Precision

Recall

I2E raw

PubMed

PubMed

(exact)

(exact)

<1% obesity

5% obesity

3-5% diabetes

9-11% diabetes

3-7% MDOB

7% MDOB

I2E normalized

PubMed

PubMed

(exact)

(exact)

3-7% MDOB

3-7% MDOB

I2E manual

PubMed

PubMed

1-5% obesity

9-33% obesity

3-11% diabetes

9-31% diabetes

3-26% MDOB

4-15% MDOB

UMLS + GO + OMIM

UMLS + GO + OMIM

PubMed

1-4%

3-22%

1-8% (unary)

4-35% (unary)

Thomson Reuters

Thomson Reuters

PubMed

7-36% obesity

36% obesity

18% DM2

9-49% DM2

22% DM1

25% DI

TGI

TGI

PubMed

0-5% obesity

(exact) 2.5%

0-4% diabetes

1-14% MDOB

I2E manual

PubMed

ClinicalTrials.gov

(preferred terms) 27-59%

(preferred terms) 3-7%

UMLS + GO + OMIM

UMLS + GO + OMIM

ClinicalTrials.gov

(preferred terms) 1-2%

(preferred terms) <1%

I2E manual

PubMed

Merck internal

(preferred terms) 8-14%

(preferred terms) 1-2%

UMLS + GO + OMIM

UMLS + GO + OMIM

Merck internal

(preferred terms) <1%

(preferred terms) <1%

  1. Precision and recall of InfoCodex candidate biomarkers/phenotypes compared to various benchmarks. “(exact)” and “(preferred terms)” refer to sub-ranges according the 2x2 matching matrix described in the text under “Methods - Precision/recall”. “MDOB” refers to the InfoCodex output subset containing references to the 27 Merck D&O biomarkers. “(unary)” means all InfoCodex candidate biomarkers/phenotypes were lumped together across obesity, diabetes, and MDOB, in contrast to the default binary criterion for matching.