Skip to main content

Table 1 Number of bigrams in the BI and MI corpora.

From: SYMBIOmatics: Synergies in Medical Informatics and Bioinformatics – exploring current scientific literature for emerging topics

 

2000 – 2005

1990 – 1999

 

documents

bigrams

bigrams

bigrams

documents

bigrams

bigrams

  

all

Df > 20

Df > 20

 

all

Df > 20

    

emerging

   

BI journal corpus

5,968

12,992

701

172

2,728

10,777

119

BI query corpus

90,082

50,248

15,406

4,666

52,574

33,438

8,862

MI journal corpus

3,330

8,604

257

15

2,979

8,569

186

MI query corpus

21,609

34,432

2,284

60

27,510

44,043

2,463

  1. 4 different sets of Medline abstracts were analyzed (ref. to text). All documents were categorized as recent documents (2000 – 2005) and past documents (1990 – 1999). From all documents bigrams were extracted from noun phrases (for details see text). The analysis was restricted to bigrams with document frequency of at least 20. In the set of recent documents we identified those bigrams that were not mentioned before 2000 ("emerging"). The BI journal corpus and the MI journal corpus are similar in terms of the document members and contained bigrams.