Skip to main content

Table 1 Summary of the simulation analyses that were conducted, and the variables that were varied in each set of simulations

From: Statistical power for cluster analysis

Analysis

N

k

Effect size

Covariance

Dimensionality reduction

Cluster algorithms

(1) What drives cluster separation

1000

– 2 (10/90%)

Δ = 0.3–8.1

15 features

– None

– K-means

  

– 2 (equal)

 

– None

– MDS

– Ward

  

– 3 (equal)

 

– Random

– UMAP

– Cosine

    

– Mixed

 

– HDBSCAN

(2) Statistical power

10–160

– 2 (10/90%)

Δ = 1–10

2 features

– None

– K-means

  

– 2 (equal)

 

– None

 

– HDBSCAN

  

– 3 (equal)

   

– C-means

  

– 4 (equal)

    

(3) Discrete versus fuzzy clustering

120

– 1

Δ = 1–10

2 features

– None

– K-means

  

– 2 (equal)

 

– None

 

– C-means

  

– 3 (equal)

   

– Mixture model

  

– 4 (equal)

    
  1. Each unique combination of listed features was simulated. “Ward” and “cosine” refer to agglomerative (hierarchical) clustering, using Ward linkage and Euclidean distance or average linkage and cosine distance, respectively. “Mixture model” refers to finite Gaussian mixture modelling