Skip to main content

Table 2 Comparison on F-Score (FS), Entropy (E) and Rand Index (RI)

From: Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata

Key (ref. cluster number) Weights Our algorithm K-medoid DBSCAN APCluster StdHier
  α β γ FS E RI FS E RI FS E RI FS E RI FS E RI
Age (2) .44 .01 .55 .94 .34 .87 .86 .51 .67 .87 .43 .69 .68 .59 .54 .81 .60 .63
Cell line (4) .65 .11 .24 .46 .78 .56 .60 .78 .54 .49 .78 .40 .59 .70 .64 .52 .82 .43
Disease (4) .15 .18 .67 .58 .55 .65 .64 .58 .61 .63 .69 .36 .67 .63 .52 .61 .58 .63
Strain (4) .85 .00 .15 .58 .69 .62 .43 .68 .61 .50 .76 .35 .42 .68 .46 .48 .78 .35
Tissue (9) .80 .00 .20 .43 .73 .37 .41 .69 .56 .49 .77 .27 .35 .74 .58 .40 .68 .45
Treatment (4) .57 .00 .43 .78 .41 .74 .69 .58 .67 .76 .69 .47 .68 .69 .50 .81 .58 .66
Average .63 .58 .64 .61 .64 .61 .62 .69 .42 .57 .67 .54 .60 .67 .52
  1. A higher F-Score, a higher Rand Index or a lower entropy indicates a better quality, and the best ones are formatted as bold
\