Skip to main content

Table 3 Scalability of MM term evaluation for chemical names when applied to a large corpus, in this case approximately 13.1 million MEDLINE records that contain approximately 7.4 million abstracts. Using these estimates, the overall precision for chemical term entry into the database is 82.7%.

From: A scalable machine-learning approach to recognize chemical names within large text databases

Cutoff Sample 1 FP Sample 2 FP Sample 3 FP Avg. Precision Stdev # Records Errors (est.)
1–2 42% 46% 48% 54.7% 3.1% 203,985 92,473
2–5 27% 25% 22% 75.3% 2.5% 319,000 78,687
5–10 5% 3% 5% 95.7% 1.2% 202,655 8,782
10–20 2% 0% 1% 99.0% 1.0% 164,286 1,643
21+ 0% 0% 0% 100.0% 0.0% 162,728 -
  Weighted Average 82.7% Total 1,052,654 181,584