Skip to main content

Table 2 Relevance of inter-dictionary ambiguities for mining MEDLINE (amb.: ambiguous). The column 'nb. found abstracts' contains the number of MEDLINE abstracts (from within a set of approx. 7 million abstracts) that contain at least one gene/protein name of the respective organisms. The values in the other columns are percentages of the values in the column 'nb. found abstracts'.

From: Gene and protein nomenclature in public databases

 

nb. found abstracts

% amb. abstracts

% amb.+ unique synonym

% amb.+ unique organism

% amb.+ unique synonym or organism

human-mouse

2 761 987

60.5

23.1

37.8

46.5

human-rat

2 238 212

64.5

27.2

43.5

52.1

mouse-rat

2 532 682

58.2

24.2

17.1

33.7