Figure 4From: Automated vocabulary discovery for geo-parsing online epidemic intelligencePercentage of natural out-of-vocabulary words. Percentage of unique words from a separated evaluation set (500 alerts, 11,184 unique words) that are inside or outside of the training set extracted dictionary, for training sets T0 (1,000 alerts), T1 (2,500 alerts) and T2 (5,000 alerts). The percentage of location words is computed with respect to the locations found by the commercial geo-parser (see sect.).Back to article page