Skip to main content

Table 2 Overlap between the query and the journal corpora.

From: SYMBIOmatics: Synergies in Medical Informatics and Bioinformatics – exploring current scientific literature for emerging topics

 

BiQueryCorpus (142,656 docs)

MiQueryCorpus (49,119 docs)

BiJournalCorpus (8,696 docs)

3,837

731

MiJournalCorpus (6,309 docs)

215

3,925

  1. The table displays the number of Medline abstracts contained in four corpora extracted from Medline (ref. to text). As expected there is a strong overlap between the BI journal corpus and BI query corpus and between the MI journal corpus and MI query corpus. The intersection between BI journal corpus and MI query corpus is small as well as the intersection between MI journal corpus and BI query corpus. This shows that the selection of the corpora based on the journal titles already leads to a selection of documents that represent information for the BI domain which is different from the MI domain. In the case of the BI journal corpus less than half of the documents are contained in the BI query corpus. This finding indicates that the query terms for the BI query corpus might be still too restrictive to cover the whole BI domain knowledge.