Skip to main content
Figure 3 | BMC Bioinformatics

Figure 3

From: Automated vocabulary discovery for geo-parsing online epidemic intelligence

Figure 3

Percentage of artificial out-of-vocabulary words. Percentage of "hidden" words when reducing the dictionary size according to the minimum frequency thresholds λ. The first bar at each λ value shows the number of out-of-vocabulary words among the words of the corpus, and the second bar shows the number of location words outside the vocabulary among the words tagged as location references using the HealthMap gazetteer. Between brackets, the dictionary size corresponding to λ is reported.

Back to article page