Skip to main content

Table 2 Relevance of inter-dictionary ambiguities for mining MEDLINE (amb.: ambiguous). The column 'nb. found abstracts' contains the number of MEDLINE abstracts (from within a set of approx. 7 million abstracts) that contain at least one gene/protein name of the respective organisms. The values in the other columns are percentages of the values in the column 'nb. found abstracts'.

From: Gene and protein nomenclature in public databases

  nb. found abstracts % amb. abstracts % amb.+ unique synonym % amb.+ unique organism % amb.+ unique synonym or organism
human-mouse 2 761 987 60.5 23.1 37.8 46.5
human-rat 2 238 212 64.5 27.2 43.5 52.1
mouse-rat 2 532 682 58.2 24.2 17.1 33.7