Skip to main content

Table 3 Scalability of MM term evaluation for chemical names when applied to a large corpus, in this case approximately 13.1 million MEDLINE records that contain approximately 7.4 million abstracts. Using these estimates, the overall precision for chemical term entry into the database is 82.7%.

From: A scalable machine-learning approach to recognize chemical names within large text databases

Cutoff

Sample 1 FP

Sample 2 FP

Sample 3 FP

Avg. Precision

Stdev

# Records

Errors (est.)

1–2

42%

46%

48%

54.7%

3.1%

203,985

92,473

2–5

27%

25%

22%

75.3%

2.5%

319,000

78,687

5–10

5%

3%

5%

95.7%

1.2%

202,655

8,782

10–20

2%

0%

1%

99.0%

1.0%

164,286

1,643

21+

0%

0%

0%

100.0%

0.0%

162,728

-

 

Weighted Average

82.7%

Total

1,052,654

181,584