| 2000 – 2005 | 1990 – 1999 |
---|
| documents | bigrams | bigrams | bigrams | documents | bigrams | bigrams |
| | all | Df > 20 | Df > 20 | | all | Df > 20 |
| | | | emerging | | | |
BI journal corpus | 5,968 | 12,992 | 701 | 172 | 2,728 | 10,777 | 119 |
BI query corpus | 90,082 | 50,248 | 15,406 | 4,666 | 52,574 | 33,438 | 8,862 |
MI journal corpus | 3,330 | 8,604 | 257 | 15 | 2,979 | 8,569 | 186 |
MI query corpus | 21,609 | 34,432 | 2,284 | 60 | 27,510 | 44,043 | 2,463 |
- 4 different sets of Medline abstracts were analyzed (ref. to text). All documents were categorized as recent documents (2000 – 2005) and past documents (1990 – 1999). From all documents bigrams were extracted from noun phrases (for details see text). The analysis was restricted to bigrams with document frequency of at least 20. In the set of recent documents we identified those bigrams that were not mentioned before 2000 ("emerging"). The BI journal corpus and the MI journal corpus are similar in terms of the document members and contained bigrams.