Figure 3From: Automated vocabulary discovery for geo-parsing online epidemic intelligencePercentage of artificial out-of-vocabulary words. Percentage of "hidden" words when reducing the dictionary size according to the minimum frequency thresholds λ. The first bar at each λ value shows the number of out-of-vocabulary words among the words of the corpus, and the second bar shows the number of location words outside the vocabulary among the words tagged as location references using the HealthMap gazetteer. Between brackets, the dictionary size corresponding to λ is reported.Back to article page