Skip to main content
Fig. 3 | BMC Bioinformatics

Fig. 3

From: An information-theoretic approach to single cell sequencing analysis

Fig. 3

Heterogeneity is additively decomposable. The heterogeneity of a population of cells (5 cells in this illustration) with respect to the expression of a gene g, I(g), can be decomposed into inter- and intra-cluster heterogeneities for any proposed clustering, S (here, two subpopulations, or clusters, of 3 yellow and 2 purple cells). The inter-cluster heterogeneity \(H_S(g)\) is determined by independently aggregating all transcripts (shown as horizontal lines) associated with each sub-population in S and then taking the KLD of the resulting distribution from the uniform distribution of the transcripts over C clusters. It measures the extent to which transcripts are uniformly assigned to clusters. The intra-cluster heterogeneity \(h_S(g)\) is determined by taking the weighted sum (with respect to the number of transcripts on each subpopulation) of the heterogeneities of each of the constituent subpopulations, considered independently. It represents the average heterogeneity of the proposed clusters, accounting for disparities in number of transcripts assigned. In this toy example, the overall population heterogeneity of gene g, \(I(g)=0.55\), decomposes as the sum of the inter-cluster heterogeneity \(H_S(g)=0.33\), plus the intra-cluster heterogeneity \(h_S(g)=0.22\). The latter is obtained as the weighted sum (with respect to the number of transcripts in each cluster, here \(2/10=0.2\) and \(8/10=0.8\)) of the heterogeneities on each subpopulation. Further details and formulae are provided in the “Methods” Section

Back to article page