Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Gene prioritization and clustering by multi-view text mining

Figure 1

Conceptual scheme of clustering disease relevant genes. Using these gene-by-term profiles, we evaluate the performance of clustering a benchmark data set consisting 620 disease relevant genes categorized in 29 genetic diseases. The numbers of genes categorized in the diseases are very imbalanced, moreover, some genes are simultaneously related to several diseases. To obtain meaningful clusters and evaluations, we enumerate all the pairwise combinations of the 29 diseases (406 combinations). For each time, the relevant genes of each paired diseases combination are selected and clustered into two groups, then the performance is evaluated using the disease labels. The genes which are relevant to both diseases in the paired combination are removed before clustering (totally less then 5% genes have been removed). Finally, the average performance of all the 406 paired combinations is used as the overall clustering performance.

Back to article page