Skip to main content
Figure 3 | BMC Bioinformatics

Figure 3

From: Moara: a Java library for extracting and normalizing gene and protein mentions

Figure 3

Results for the code example when normalized to mouse and human. Gene/protein mentions are coloured yellow; normalization objects are coloured white and green. Mention objects contain the text that was extracted from the document while the normalized objects present the Entrez Gene (human) or MGI (mouse) identifier, the synonym to which the mention text has been matched and the score obtained with the cosine similarity disambiguation strategy. If only one candidate matched the mention, no disambiguation was performed and the score is therefore zero; the higher the score, the better the candidate. The mention "Alu repeats" was not matched to any synonym in the human/mouse dictionaries. Mention "IL-1 beta" was matched to one candidate for both organisms, while other mentions, such as "interleukin -1 receptor", were matched to one candidate for mouse and three candidates for human. For human, mentions 2 and 4 are variations of the same entity and were therefore matched to the same candidates; two of the mentions were chosen by disambiguation analysis. The threshold for multiple disambiguation was automatically calculated for each mention as half the value of the highest score.

Back to article page