Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: Selecting single cell clustering parameter values using subsampling-based robustness metrics

Fig. 2

chooseR generalizes across different clustering algorithms. ad Results of applying chooseR using Seurat to a data set comprising 11,000 human PBMCs. a Silhouette score distribution showing the selection of the optimal clustering parameter (resolution) value. Each dot represents a cluster. Medians with 95% CI are shown for each resolution. The vertical red line marks the optimal resolution, and the horizontal blue line marks the decision threshold (“Methods” section). b Average co-clustering frequency at the optimal resolution value = 2, following clustering on 100 random sub-samples of the data using 80% of the cells. c UMAP representation of cells, colored by silhouette score at the suggested optimal resolution value = 2. d Same as c, but colored by predicted cluster. eh Same as ad, but using the scVI workflow for  clustering. Here, the optimal resolution value = 1.6. UMAP coordinates in c, d, g, h are calculated in  Seurat with 50 principal components. UMAP coordinates from scVI, colored by silhouette scores and clusters are shown in Additional file 4: Fig. S4. i Bar plot comparing the number of recommended clusters using chooseR with Seurat and with scVI, as well as the recommended optimal cluster number (k) from the SC3 algorithm. j Heatmap showing the Dice coefficients between suggested Seurat and scVI clusters at optimal parameter values, indicating good overlap between the two partitionings. Colorbars on the top and right side of the heatmap indicate the within-cluster co-clustering score for each Seurat or scVI-derived cluster, showing a general trend that clusters with lower maximum Dice coefficient values are also the ones that are difficult to resolve clearly with either method. k Maximum Dice coefficients per cluster when comparing optimal versus sub-optimal parameter sets for clustering with Seurat and scVI. The two sets of dots (joined by lines) on the left indicate the maximum Dice coefficients per cluster obtained using Seurat with optimal parameters when compared to scVI-derived clusters using optimal (left) and sub-optimal parameter sets. The sets of dots on the right are the converse, showing the maximum Dice coefficient per scVI-derived cluster. Most clusters are less well-reproduced across Seurat and scVI when one or the other method is run with sub-optimal parameters, indicating that chooseR’s ability to identify near-optimal parameters leads to better reproducibility of clusters across different clustering algorithms

Back to article page