SpatialCPie: an R/Bioconductor package for spatial transcriptomics cluster evaluation
BMC Bioinformatics volume 21, Article number: 161 (2020)
Technological developments in the emerging field of spatial transcriptomics have opened up an unexplored landscape where transcript information is put in a spatial context. Clustering commonly constitutes a central component in analyzing this type of data. However, deciding on the number of clusters to use and interpreting their relationships can be difficult.
We introduce SpatialCPie, an R package designed to facilitate cluster evaluation for spatial transcriptomics data. SpatialCPie clusters the data at multiple resolutions. The results are visualized with pie charts that indicate the similarity between spatial regions and clusters and a cluster graph that shows the relationships between clusters at different resolutions. We demonstrate SpatialCPie on several publicly available datasets.
SpatialCPie provides intuitive visualizations of cluster relationships when dealing with Spatial Transcriptomics data.
Clustering is a standard analysis operation used for grouping entities in complex datasets to bring order and find patterns of similarity. Typically, clusters are used for identification purposes and further downstream analysis, e.g., statistical identification of key drivers of dissimilarity. The clustering can be conducted in various ways. Common techniques include k-means clustering, hierarchical clustering, DBSCAN, or MCL . Most clustering methods require prespecifying the number of clusters to use or otherwise choosing suitable hyperparameters for the dataset at hand.
Spatial Transcriptomics (ST) is a recent method to obtain spatial information during RNA-seq experiments . Briefly, barcoded capture probes are grouped into “spots” and printed on a glass array. The tissue section is placed on the array and permeabilized so that transcripts diffuse down to the capture probes. After sequencing, the barcodes of the probes can be used to map the transcripts back to the spot in which they were captured.
A common step in analyzing ST data is to cluster the gene expression profiles of the spots in order to identify and annotate regions of interest in the tissue section. This could, for example, be used to identify tumor regions or discover intra-tumor heterogeneity hidden to the human eye . However, selecting appropriate hyperparameters, e.g., the right number of clusters to use, poses a challenge in these types of analyses. Indeed, it is often necessary to try out different sets of hyperparameters, as each may provide distinct insights about the data. Moreover, the relationships between clusters are not always clear, and common visualizations strategies for high dimensional data, for example based on t-SNE, often produce results that are difficult to interpret . An additional obstacle is the fact that each barcoded spot in ST normally captures multiple cells. Consequently, gene expression measurements are derived from mixtures of cells, obfuscating cluster-based cell-type identification.
While tools exist for visualizing clusters in the context of ST data, none fully address the above concerns. Most prominently, the ST viewer  can visualize clusters spatially but classifications are binary and only a limited number of clustering algorithms are supported.
Here, we present SpatialCPie, an easy-to-use R package that gives the user an intuitive understanding of how clusters in ST data are related to each other and to what extent each region on the two-dimensional ST array is associated with each cluster. SpatialCPie is designed to be used as part of an R workflow, giving the user a high degree of flexibility to customize and quickly iterate their analyses. The data is clustered at multiple resolutions—i.e., with different numbers of clusters or hyperparameter settings—thereby avoiding the need to prespecify a single set of hyperparameters for the analysis, and the user can freely define which clustering algorithm to use. The results are visualized in two ways: with a cluster graph  that shows how clusters overlap between different resolutions and with two-dimensional array plots in which each spot is represented by a pie chart indicating its similarity to the different cluster centroids.
Historically, pie charts have frequently been used to display spatial data on geographical maps [7, 8]. Recently, with the advent of spatial omics and in a similar vein as the work presented here, analogous visualizations have also successfully been applied to tissue maps .
The user interface of SpatialCPie is implemented in Shiny . The interface consists of two main components: the cluster graph and the array plots, both described in detail below.
The cluster graph (Fig. 1, left) is a graph that visualizes the relationships between clusters over different resolutions. Clusters are represented as nodes in the graph, and edges show the degree to which clusters in consecutive resolutions overlap. Specifically, the opaqueness of an edge indicates the proportion of spots in the higher-resolution cluster that also exist in the lower-resolution cluster. The user can set a threshold on the proportion so that less informative edges—those representing only very small overlaps—are removed. Cluster relationships are further visualized by encoding the mean expression profile of each cluster in color space so that nodes constituting spots with similar expression have similar colors. The user can hover a node to see a summary of the most expressed genes in the cluster.
The cluster graph shows the ancestry of clusters and allows the user to reconcile insights from different cluster resolutions (“Human developmental heart” section).
The array plot (Fig. 1, right) is a graphical representation of the ST array. A pie chart for each spot shows the similarity score between the spot and the cluster centroids. The similarity score between spot s and cluster k is defined as
where xi is the gene expression vector of spot i,C(k) is the set of spots in cluster k,RMSD(a,b) is the root-mean-square deviation between gene vectors a and b, and λ is a user-selectable constant.
The pie charts relativize cluster assignments, making it possible to identify spatial trends in gene expression (fig. S2).
In a typical analysis of ST data, it is often the case that some parts of the tissue cluster clearly at a low resolution and are of less interest for further exploration. Meanwhile, other regions may be interesting to study in finer detail by sub-clustering. This can be achieved by using the tool iteratively (“Human developmental heart” section and Fig. 3).
SpatialCPie can be used to analyze any dataset with spatially distributed count data. Here, we demonstrate its utility on three publicly available ST datasets [11–13]: the human developmental heart (“Human developmental heart” section), breast cancer in situ (section S2.1), and melanoma (section S2.2). In all cases, we normalize the data using Seurat  before passing it to SpatialCPie.
Human developmental heart
The tissue section is taken from a 5-week-old heart with well-defined anatomical regions (Fig. 2b).
The array plots (Fig. 2a) and cluster graph (Fig. 2c) show a clear separation between the outflow tract, atria, and ventricles across resolutions. It is also evident that the outflow tract is highly homogeneous; most of its spots exhibit high similarity scores to a single cluster (cluster 2), and this cluster is clearly separated in color space from other clusters.
There is evidence of subtle differences in gene expression within the ventricles, but the clusters there are more similar to each other than to other clusters, as indicated by their colors and shared ancestry (Fig. 2c). Sub-clustering the ventricles (Fig. 3) reveals the compact ventricular myocardium that spans the periphery of the tissue. Curiously, we also find that the left and right ventricle exhibit slightly different cluster affinities, suggesting that their differences could be an interesting property to investigate further.
SpatialCPie provides a user-friendly interface for analyzing clusters in ST data and uses visualization techniques to help the analyst uncover and explore hidden gene expression patterns. Concretely, clustering is done at multiple resolutions, each providing a different level of granularity of the patterns in the data. Clusters over different resolutions are hierarchized in a cluster graph, and their spatial distributions are visualized in array plots. The array plots relativize cluster membership for each spatial region, thereby exposing gradients in gene expression that otherwise would be difficult to observe.
Overall, we find that the visual clues from looking at multiple cluster resolutions on the array plots, the relationships between the clusters in the cluster graph, as well as their color-coded expression profiles together give a comprehensive view of the spatial gene expression landscape in tissues.
Availability and requirements
Project name SpatialCPie
Project home page https://github.com/jbergenstrahle/SpatialCPie
Operating system(s) Platform independent
Programming language R
Availability of data and materials
The fetal heart dataset was obtained from the authors of .
Xu D, Tian Y. A comprehensive survey of clustering algorithms. Ann Data Sci. 2015; 2(2):165–93.
Ståhl PL, Salmén F, Vickovic S, Lundmark A, Navarro JF, Magnusson J, Giacomello S, Asp M, Westholm JO, Huss M, et al.Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 2016; 353(6294):78–82.
Berglund E, Maaskola J, Schultz N, Friedrich S, Marklund M, Bergenstråhle J, Tarish F, Tanoglidi A, Vickovic S, Larsson L, et al.Spatial maps of prostate cancer transcriptomes reveal an unexplored landscape of heterogeneity. Nat Commun. 2018; 9(1):2419.
Buettner F, Theis FJ. A novel approach for resolving differences in single-cell gene expression patterns from zygote to blastocyst. Bioinformatics. 2012; 28(18):626–32.
Fernández Navarro J, Lundeberg J, Ståhl PL. St viewer: a tool for analysis and visualization of spatial transcriptomics datasets. Bioinformatics. 2019. https://doi.org/10.1093/bioinformatics/bty714.
Zappia L, Oshlack A. Clustering trees: a visualisation for evaluating clusterings at multiple resolutions. bioRxiv. 2018:274035. https://doi.org/10.1093/gigascience/giy083.
Du X-H, Zhao Q, Xu J, Yang ZL. High inbreeding, limited recombination and divergent evolutionary patterns between two sympatric morel species in china. Sci Rep. 2016; 6(1):22434. https://doi.org/10.1038/srep22434.
Pischedda S, Barral-Arca R, Gómez-Carballa A, Pardo-Seco J, Catelli ML, Álvarez-Iglesias V, Cárdenas JM, Nguyen ND, Ha HH, Le AT, Martinón-Torres F, Vullo C, Salas A. Phylogeographic and genome-wide investigations of vietnam ethnic groups reveal signatures of complex historical demographic movements. Sci Rep. 2017; 7(1):12630. https://doi.org/10.1038/s41598-017-12813-6.
Qian X, Harris KD, Hauling T, Nicoloutsopoulos D, Muñoz-Manchado AB, Skene N, Hjerling-Leffler J, Nilsson M. A spatial atlas of inhibitory cell types in mouse hippocampus. bioRxiv. 2018. https://doi.org/10.1101/431957. http://arxiv.org/abs/https://www.biorxiv.org/content/early/2018/10/01/431957.full.pdf.
Chang W, Cheng J, Allaire JJ, Xie Y, McPherson J. Shiny: web application framework for r. R package version 0.11. 2015; 1(4):106.
Asp M, Giacomello S, Larsson L, Wu C, Fürth D, Qian X, Wärdell E, Custodio J, Reimegård J, Salmén F, Österholm C, Ståhl PL, Sundström E, Åkesson E, Bergmann O, Bienko M, Månsson-Broberg A, Nilsson M, Sylvén C, Lundeberg J. A spatiotemporal organ-wide gene expression and cell atlas of the developing human heart. Cell. 2019; 179(7):1647–166019. https://doi.org/10.1016/j.cell.2019.11.025.
Thrane K, Eriksson H, Maaskola J, Hansson J, Lundeberg J. Spatially resolved transcriptomics enables dissection of genetic heterogeneity in stage iii cutaneous malignant melanoma. Cancer Res. 2018; 78(20):5970–9.
Salmen F, Vickovic S, Larsson L, Stenbeck L, Vallon-Christersson J, Ehinger A, Hakkinen J, Borg A, Frisen J, Stahl P, et al.Multidimensional transcriptomics provides detailed information about immune cell distribution and identity in her2+ breast tumors. BioRxiv. 2018:358937. https://doi.org/10.1101/358937.
Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015; 33(5):495–502. https://doi.org/10.1038/nbt.3192.
We would like to acknowledge the Spatial Trancriptomics group at SciLifeLab Stockholm for testing out and providing helpful feedback.
Financial support for conducting this work was provided by the Knut and Alice Wallenberg Foundation, Swedish Foundation for Strategic Research, the Swedish Research Council, and Science for Life Laboratory. Open access funding provided by Royal Institute of Technology.
Ethics approval and consent to participate
Consent for publication
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Bergenstråhle, J., Bergenstråhle, L. & Lundeberg, J. SpatialCPie: an R/Bioconductor package for spatial transcriptomics cluster evaluation. BMC Bioinformatics 21, 161 (2020). https://doi.org/10.1186/s12859-020-3489-7