Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: An interpretable framework for clustering single-cell RNA-Seq datasets

Fig. 1

Overview of DendroSplit. a The workflow starts by preprocessing a N×M matrix of gene expressions before computing cell-cell pairwise distances, resulting in a N×N distance matrix. The distance matrix is fed into a hierarchical clustering algorithm to generate a dendrogram. The dynamic splitting step involves recursively splitting the tree into smaller subtrees corresponding to potential clusters. Finally, the subtrees are merged together during a cleanup step to produce final clusters. b A split corresponds to the partitioning of a larger cluster into two smaller clusters. A split is only deemed valid if the separation score, a metric for how well-separated two populations are, is above a predefined split threshold. Leveraging biological intuition, we rank how well each gene distinguishes the two subpopulations based on independent Welch’s t-tests. We use the – log of the smallest p-value obtained as our separation score due to its interpretability and practical effectiveness. A split threshold of 10 would work for the example shown here. c During the merge step, the clusters obtained from the split step are compared to one another using pairwise separation scores. If the closest two clusters are not sufficiently far apart based on a predefined merge threshold, they are merged together and the process is repeated. When all clusters are sufficiently far apart, the algorithm terminates and the final labels are output

Back to article page