Skip to main content

Table 2 Precision and Average Precision (rank dependent) for top 50 / 200 / 1000 predictions for 4 methods (TFIDF, Relative Frequency, Termine, Text2Onto) in terms of coverage of LMO and relevant vocabulary.

From: Terminologies for text-mining; an experiment in the lipoprotein metabolism domain

 

LMO

 

Precision

AveragePrecision

Top

TFIDF

Termine

Text2Onto

RelFreq

TFIDF

Termine

Text2Onto

RelFreq

50

35%

19%

17%

35%

65%

54%

38%

54%

200

20%

10%

12%

22%

42%

28%

23%

37%

1000

8%

4%

5%

8%

21%

12%

12%

20%

 

LMO + Domain expert

 

Precision

Average Precision

Top

TFIDF

Termine

Text2Onto

RelFreq

TFIDF

Termine

Text2Onto

RelFreq

50

75%

67%

33%

56%

86%

89%

52%

70%

200

55%

40%

49%

49%

74%

65%

38%

60%

1000

29%

20%

14%

28%

51%

40%

25%

45%

The key finding is that among the top 1000 predictions there are up to 51% terms, which are in the LMO or considered good terms by expert, implying that automated term recognition can play an important role in semi-automated ontology design.