BMC Bioinformatics

Table 3 Scalability of MM term evaluation for chemical names when applied to a large corpus, in this case approximately 13.1 million MEDLINE records that contain approximately 7.4 million abstracts. Using these estimates, the overall precision for chemical term entry into the database is 82.7%.

From: A scalable machine-learning approach to recognize chemical names within large text databases

Cutoff	Sample 1 FP	Sample 2 FP	Sample 3 FP	Avg. Precision	Stdev	# Records	Errors (est.)
1–2	42%	46%	48%	54.7%	3.1%	203,985	92,473
2–5	27%	25%	22%	75.3%	2.5%	319,000	78,687
5–10	5%	3%	5%	95.7%	1.2%	202,655	8,782
10–20	2%	0%	1%	99.0%	1.0%	164,286	1,643
21+	0%	0%	0%	100.0%	0.0%	162,728	-
	Weighted Average			82.7%	Total	1,052,654	181,584

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com