Skip to main content
Figure 2 | BMC Bioinformatics

Figure 2

From: Exploring subdomain variation in biomedical language

Figure 2

Distributions over adjective lemmas as tagged by the C&C parser trained on Genia. Clockwise from the top left: the heatmap shows the pairwise Jensen-Shannon Divergence (top half) and statistical significance (bottom half), as well as the homogeneity (diagonal). The dendrogram shows hierarchical clustering based on cosine difference between each subdomain's JSD values. The scatter plot is colored according to the best K-means clustering (determined by the Gap statistic) projected onto the first two principal components (normalized). The line plot shows the intra-subdomain spread of JSD values generated by random sampling.

Back to article page