Leveraging graph topology and semantic context for pharmacovigilance through twitter-streams

BMC Bioinformatics

Table 1 Text Features used for Tweet Classification

	Tweet Text Feature
1	Number of hashtags
2	Number of words indicating negation
3	Number of URLs
4	Number of pronouns
5	Number of drug entities
6	Number of effect entities
7	Bag of words text representation

Table 1 The textual features extracted from twitter text for classification. Many spam and irrelevant tweets are designed to redirect a user to a desired website, thus the presence of URL’s in the tweet text is a strong indicator of spam. Spam tweets often contain many hashtags in an attempt to exploit trending topics, this makes the number of hashtags a very informative feature as well

ISSN: 1471-2105