Distribution of scores in MEDLINE sets. For each of the sets of MEDLINE references analyzed in this work we plot the distribution of score values (using the average over all nouns). The complete MEDLINE (black line with X's) has a maximum around 0.65. The training set composed of 81,416 references annotated with MeSH terms related to stem cells (magenta with diamonds) has a maximum at 2.75 and a "hump" at 1.5. This type of distribution is due to the fact that this set includes both references truly related to stem cells and others that are not and agree more with the general MEDLINE background distribution of scores. The random set of 81,416 references (red with triangles) has, logically, an identical distribution to the whole of MEDLINE. The 6,923 randomly selected MEDLINE references (green with squares) used for the recall and precision test also follow the background distribution. Of those, the 204 references evaluated as stem cell related by a human expert (blue bars) had significantly higher scores than the background distribution of MEDLINE.