 Software
 Open Access
 Published:
INFLECT: an Rpackage for cytometry cluster evaluation using marker modality
BMC Bioinformatics volume 23, Article number: 487 (2022)
Abstract
Background
Current methods of highdimensional unsupervised clustering of mass cytometry data lack means to monitor and evaluate clustering results. Whether unsupervised clustering is correct is typically evaluated by agreement with dimensionality reduction techniques or based on benchmarking with manually classified cells. The ambiguity and lack of reproducibility of sequential gating has been replaced with ambiguity in interpretation of clustering results. On the other hand, spurious overclustering of data leads to loss of statistical power. We have developed INFLECT, an Rpackage designed to give insight in clustering results and provide an optimal number of clusters. In our approach, a mass cytometry dataset is overclustered intentionally to ensure the smallest phenotypically different subsets are captured using FlowSOM. A range of metacluster number endpoints are generated and evaluated using marker interquartile range and distribution unimodality checks. The fraction of marker distributions that pass these checks is taken as a measure of clustering success. The fraction of unimodal distributions within metaclusters is plotted against the number of generated metaclusters and reaches a plateau of diminishing returns. The inflection point at which this occurs gives an optimal point of capturing cellular heterogeneity versus statistical power.
Results
We applied INFLECT to four publically available mass cytometry datasets of different size and number of markers. The unimodality score consistently reached a plateau, with an inflection point dependent on dataset size and number of dimensions. We tested both ConsenusClusterPlus metaclustering and hierarchical clustering. While hierarchical clustering is less computationally expensive and thus faster, it achieved similar results to ConsensusClusterPlus. The four datasets consisted of labeled data and we compared INFLECT metaclustering to published results. INFLECT identified a higher optimal number of metaclusters for all datasets. We illustrated the underlying heterogeneity within labels, showing that these labels encompass distinct types of cells.
Conclusion
INFLECT addresses a knowledge gap in highdimensional cytometry analysis, namely assessing clustering results. This is done through monitoring marker distributions for interquartile range and unimodality across a range of metacluster numbers. The inflection point is the optimal tradeoff between cellular heterogeneity and statistical power, applied in this work for FlowSOM clustering on mass cytometry datasets.
Background
It has been widely accepted in (mass) cytometry that highdimensional datasets are best approached through unsupervised clustering algorithms [1, 2], revealing structure in highdimensional space that is not well identified through bivariate sequential gating. Prior to the introduction of unsupervised clustering into flow cytometry, the gold standard in cell classification of conventional flow cytometry data involved manual gating. In this approach, researchers use manually defined coordinates (gates) to group cells in bivariate plots. This process is repeated in a sequential manner until all desired populations have been defined. Manual gating requires a priori knowledge, it is not easily scalable, it is susceptible to the observer’s experience and personal bias, and it misses on untargeted populations. Therefore, there has been a keen interest in the cytometry community to develop automated data analysis methods, such as unsupervised clustering. Opposite to manual gating, unsupervised clustering algorithms do not prioritize any of the given input parameters. Interpretation and visualization of the clustering results is often aided by dimensionality reduction. Additionally, these datadriven algorithms provide more reproducible results, removing researcher bias that comes with manually setting gates in bivariate plots. Since the publication of the first clustering method for cytometry data in 2007, many clustering algorithms have been published and their performance thoroughly compared [3,4,5,6,7]. However, every highdimensional analysis method makes assumptions on the underlying data that need to be understood by the researcher implementing these methods [8, 9]. The added complexity of clustering algorithms has led to shifting of ambiguity from gating to ambiguity surrounding clustering results. The question of how many meaningful clusters exist in a highdimensional dataset has proven to be very difficult to answer. Methods such as FlowGrid [7] or Phenograph [10] have automated cluster detection relying on intracluster (dis)similarity. However, subsets of rare cells often have high clinical relevance with little phenotypic distance to other subsets. Clustering algorithms specifically designed to capture rare subsets [11, 12] or relying on overclustering [9] present other limitations: a large number of (small) clusters leads to a loss of statistical power due to corrections for multiple testing. Automated metaclustering, the grouping of phenotypically similar clusters, is a solution to overclustering, but it is in turn susceptible to the same challenges as onestep clustering methods. ConcensusClusterPlus, the default metaclustering method implemented in FlowSOM, often has a very conservative result, leading to broad clusters encompassing multiple cell types.
Manual identification and optional supervised metaclustering is a laborious process but currently unavoidable when analyzing overclustered highdimensional data. It has become common for mass cytometry data to be presented in a way that classifies multiple found clusters as a single phenotype. Clusters identified in an unsupervised manner are grouped together, implying that these (sub)clusters represent different states of the given phenotype [13,14,15,16,17]. If combined into a larger single cluster, few methods exist to evaluate the results that come from any given clustering algorithm [18, 19], meaning researchers visually inspect heatmaps of median cluster expressions or assess concordance of unsupervised clustering results to dimensionality reduction techniques such as tSNE or UMAP. Furthermore, none of these approaches address the question of how to set the limit to the number of clusters (or metaclusters) that is adequate for each dataset.
Here, we aim to address this question by a computational method, INFLECT, that iteratively evaluates metaclustering performance to find the highest level of parameter unimodality, and lowest expression spread per metacluster. Evaluating clustering results using these 2 characteristics is based on the assumptions that a multimodal univariate distribution contains multiple, possibly overlapping, cell populations. A wide marker spread similarly indicates poor clustering. These multiple cell populations can be related, but represent different activation states of a large cell phenotype, or transitional cells differentiating. In turn this means that if a clustering process results in narrow unimodal marker distributions across all generated clusters, it has successfully captured the full cellular heterogeneity in the dataset. INFLECT uses FlowSOM [20, 21] for upstream clustering because of fast runtimes and widespread use. In concordance with the FlowSOM workflow, datasets are overclustered in initial SOMclustering, and the subsequent metaclustering steps are investigated. Results of the metaclustering are evaluated based on the marker expression distributions within formed metaclusters. For each marker, INFLECT tests for unimodality and assesses marker expression spread. Unimodality is determined through the dip test [22], a statistical test. Marker distribution spread is evaluated through the interquartile range and fails if this range is too high.
Implementation
The aim of INFLECT is to provide a datadriven evaluation of metaclustering results and calculate the optimal metacluster number where marker distribution quality is balanced with the number of identified clusters. It is written in R and made available here: www.github.com/jnverhoeff/GarciaVallejoLab/INFLECT. The process of INFLECT is illustrated in Fig. 1. In summary, the method takes as input a highdimensional dataset clustered using the FlowSOM wrapper function, and a set of metacluster targets to be evaluated. The output consist of diagnostic graphs and a determined optimal metacluster number. In the following sections we describe the steps implemented in the R package.
Iterative metaclustering automation
The process starts with calculating the metaclustering of clusters formed by selforganizing maps of FlowSOM. To ensure capture of the smallest subset of interest, we recommend an initial number of SOMclusters of at least 200. The subsequent maximum number of metaclusters k cannot exceed 90% of the number of SOMclusters due to resampling within ConsensusClusterPlus. This is done according to either ConsensusClusterPlus[23] or hierarchical clustering based on Minkowski distance and Ward’s linkage. To limit the computation time not every possible metacluster number is evaluated. Using the default settings, INFLECT increases the metacluster targets number by 5 from 50 metaclusters on, and by 10 from 150 metaclusters on. Target numbers are sparser at higher metacluster numbers, due to increased computational load. In our experience the changes between metaclustering results at higher target numbers vary less and thus contribute less information. However, users can specify the amount of metacluster numbers to evaluate using the input vector \(set\_i\). This step can be multithreaded to expedite calculations.
Metaclustering evaluation step
Each metaclustering result is evaluated separately, again enabling multithreading. For each metacluster per metaclustering result, all relevant marker distributions are evaluated for interquartile range and unimodality through the dip test. Results are collected in a matrix of m (= the amount of metaclusters) by n (= the amount of markers) on a pass/fail basis. A marker distribution fails if the interquartile range exceeds a given threshold or has a nonunimodal distribution. Default interquartile range threshold is set at a value of 2 after hyperbolic arcsine transformation of raw data (the standard transformation for mass cytometry data). The fraction of distributions that passed this quality control (QC) across all clusters is calculated and given as output U. Taken together the outputs form the set \({U}_{i}\), called the Unimodality set.
Diagnostic plotting
The values within the Unimodality set \({U}_{i}\), are plotted against the number of metaclusters. Subsequently, an Lfunction is applied to find the inflection point, where dividing the graph in two halves using straight fitted lines results in the least error. This can be done on the Unimodality set itself, or a sigmoidal curve fitted to the data. This implementation of the Lmethod is based on work by Salvador and Chan [24]. The horizontal coordinate of this inflection point is the optimal metacluster number for this dataset.
Application to public datasets
We implemented INFLECT on multiple publicly available datasets to assess robustness and applicability. Characteristics of the datasets used are summarized in Table 1. Labels for the events are taken from the datasets directly. In direct comparison of labels and INFLECT metaclusters only labeled data is considered.
INFLECT requires a single inputparameter, \(k\), for the amount of clusters generated in the initial SOMclustering. In the 4 datasets included in this study, the plateau in unimodal marker distributions was consistently reached before \(k=100\). For statistical power in the plateau we recommend a value of \(k\) of circa 200. In our experiments, to ensure we reached the point of spurious overclustering, we applied INFLECT to the datasets with \(k>350\).
All computations in this work were carried out using a 2X Intel® E52660 v3 computing cluster, clocked at 2.60 GHz and 128 GB of RAM. Where possible, processes were run in parallel in 10 threads.
Validation
KimmeyPBMC used 12 markers for clustering into 9 major phenotypes, whereas the panel by Bagwell uses 29 out of 30 markers for manual gating into 26 phenotypes. This variability revealed the broad applicability of INFLECT, where a broad range of mass cytometry datasets show formation of a stable plateau in metacluster unimodality.
The labeled data allowed for comparison of the inflection point metacluster number to the labeled populations. The number of labels was treated as a measure of dataset heterogeneity. Deviations between INFLECT and the number of labels were further investigated for causes and possible cellular heterogeneity within labels.
Results
Unimodal marker distribution of FlowSOM clusters reaches a plateau (for both methods of metaclustering)
Firstly, we investigated how the number of metaclusters related to cluster unimodality and if this could lead to an optimal metacluster number. To assess this we applied INFLECT to 4 publically available datasets. Because FlowSOM requires an input parameter defining the number of resulting clusters, we could iteratively increase the number of generated metaclusters in repeated runs.
By evaluating cluster uniformity as a fraction of unimodal markers across clusters, using the dip test for unimodality, we monitored clustering success. In the four labeled datasets we encountered a consistent formation of a plateau. At these plateaus, a (further) increase of metaclusters yielded only a small improvement in cluster unimodality. The Unimodality set \({U}_{i}\) is the combined set of cluster unimodality fractions for every metaclustering i. After fitting a sigmoidal curve we implemented the Lfunction [24] method to calculate the inflection point of the curve. To assess the effects of dataset size and different methods of metaclustering, we applied INFLECT with different combinations of subsamplings of the Levine32 dataset and metaclustering methods as shown in Fig. 2. Different methods of metaclustering yield slightly different sets of \({U}_{i}\) for the same dataset (Hierarchical clustering in Fig. 2A and ConsensusClusterPlus in 2C). The resampling processes in ConsensusClusterPlus [23] are a computationally expensive step, leading to high runtimes for higher values of \(i\). Run times are summarized in Table 2.
For hierarchical clustering \(i=(\mathrm{5,6},7\dots 150, 155, 160\dots 200, 210, 220\dots 320)\) were calculated in sequence. ConsensusClusterPlus was calculated for \(i = (5, 6, 7\dots 50, 55, 60\dots 100, 120, 140\dots 200, 225, 250, 275, 300)\). Therefore, we applied the much faster hierarchical clustering in all subsequent analysis.
The metacluster unimodality results \({{\varvec{U}}}_{{\varvec{i}}}\) are dependent on dataset size
To investigate the effect of the size of the data set, we generated smaller datasets by subsampling Levine32 (Fig. 2B). For the full size dataset down to 25% size, the inflection points and unimodality score plots are very similar, ranging from 36 metaclusters for the full dataset to 34 for the 25% sample. At 10% size, which is 60,000 cells for this dataset, the inflection point shifts to 19 metaclusters, and to 26 metaclusters for 5% size. Of note is that the cluster unimodality fraction of the plateau for higher metacluster numbers seems to drop for the smaller dataset sizes.
In Fig. 2D we display the marker performance in the dataset across all the metacluster numbers. Several markers pass the unimodality and marker spread checks for all metacluster numbers, such as CD13 or CD47. Other markers show a greater spread, having multimodal distributions for lower metacluster numbers but rising to 100% for very high metacluster numbers (CD123 and CD16) or never reaching 100% (CD8). This graph can inform researchers on the clustering power of different markers in the dataset. In Additional file 1: Fig. S1, the performance of the remaining clustering markers is shown.
The Lfunction provides a datadriven Inflect point for unimodality scores and fitted curves
To investigate the robustness of INFLECT cluster unimodality and the implementation of the Lfunction, we subsampled 90% of the Levine32 dataset 15 times. INFLECT was applied on the 15 replicates. Resulting unimodality sets (Fig. 3A) exhibit little variation. In four out of the 85 metacluster numbers evaluated, all 15 replicates yielded the same cluster accuracy. Greatest variability between replicates was 0.69%, which occurred at m = 9 metaclusters. Applying the Lfunction on fitted curves (Fig. 3B) vary between 36 and 42, with a mean of 38.90 and a SEM of 0.62. When applying the Lfunction directly on the Unimodality set \({U}_{i}\) (Fig. 3C) data points, the position of the inflection point is more vulnerable to noise, with a lower mean of 28 but a higher SEM of 0.90. Repeats shown in Fig. 3 represent the greatest variability between the in total 15 replicates. Remaining repeats are shown in Additional file 1: Fig. S2. In addition to lower variability, fitting a sigmoidal curve to the data lowers the amount of metaclusters that need to be evaluated, further speeding up calculations. In Additional file 1: Fig. S3 we show that even with lower amount of data points in the Unimodality set, the fitted curve provides a stable inflection point. Hence, applying the Lfunction to fitted curves gives more robust results.
INFLECT captures underlying heterogeneity in labeled clusters of Levine32
For visualization purposes we sampled 40,000 events from Levine32 and performed a tSNE embedding (Fig. 4A). As can be expected, INFLECT splits several of the more abundant populations into smaller metaclusters, while matching smaller distinct cell types such as CD16 natural killer (NK) cells and Plasma B cells to one metacluster each. In this case INFLECT was run only on labeled data, which was 65% of total events. The optimal metacluster number for this data was higher, at 61 metaclusters, shown in Fig. 4B. In Additional file 1: Fig. S4, we compare INFLECT to two common clustering evaluation tools [27], and the default method of metaclustering in FlowSOM. The 2 metaclusters most closely matching CD16 + NK cells were investigated (metacluster 60 and 61) through visual inspection of marker histograms (Fig. 4C and D). 3 markers were selected for illustrating the greatest variability between the metaclusters. CD16 + NK cells show no multimodality as a total population, passing the dip test for each marker distribution, however in Fig. 4D it is shown that metaclusters 60 and 61 have distinct levels of CD16 expression. This seems to correspond well to the tSNE mapping in 4A, where the light blue CD16 + NK cell population consists of two connected populations. Full concordance between labeled data and INFLECT metaclusters is summarized in Additional file 2.
The inflection point is consistent across mass cytometry datasets
Wider applicability of INFLECT was investigated by applying the method to 3 other datasets (Fig. 5). In the Bagwell PBMC dataset (Fig. 5A) all 5 samples (which are technical replicates acquired at different sites) showed a similar pattern of cluster unimodality. The fraction of unimodal distributions rose rapidly to 95% for all 5 samples. The fraction of unimodal distributions increased at a lower rate for increasing numbers of clusters, reaching 99.02% unimodality for 300 FlowSOM metaclusters. Applying the Lfunction output a median of 53 metaclusters as the optimal number of metaclusters for this dataset with a fraction of unimodal distributions \({U}_{53}= 95.61\%\). Diagnostic curve for KimmeyBM (Fig. 5B) showed a similar pattern, \({U}_{5\dots 73}\) rising to 95.60% unimodality, then yielded diminishing returns up to 99.70% fraction of unimodal distributions. Inflection point, resulting from the Lfunction was at 74 metaclusters.. The KimmeyPBMC dataset had fewer markers used for phenotyping than the bone marrow dataset. This resulted in a lower optimal cluster number, determined at 41 metaclusters with a fraction of unimodal distributions of 98.89%. For higher numbers of metaclusters, the diagnostic curve had a nearly level unimodality plateau starting at inflection point = 42. Unimodality increases up to 98.37% at \({U}_{42}\), ultimately reaching 99.70% at \({U}_{350}\).
Similar to the Levine32 dataset, the Bagwell and Kimmey datasets are labeled by the original authors. Comparisons between the original labels and INFLECTinformed FlowSOM metaclusters were made using tSNE embeddings, marker histograms, and concordance between original labels and INFLECT metaclusters are shown via matching matrixes (Additional file 1: Figs. S5–S7, Additional file 3–5).
For the Bagwell dataset labels were determined by a published sequential gating strategy, though overlapping gates lead to events receiving multiple labels. INFLECT was able to identify a small population of CD16^{low} and CD56^{high} NK cells within the Early NK cell label (Additional file 1: Fig. S5CD). Concordance between event labels and INFLECT metaclusters is summarized in Additional file 3.
Both datasets in the paper by Kimmey et al. were labeled by manually annotating a large number of SPADE clusters, 175 clusters for the bone marrow dataset and 150 clusters for the PBMC dataset. Concordance between event labels and INFLECT metaclusters are displayed in Additional file 4 and 5, and visualized in Figs. S6 and S7 (Additional file 1). In both datasets INFLECT was again able to identify phenotypically distinct populations within labels.
Across all datasets considered in this work, the fraction of unimodal distributions reaches a plateau of diminishing returns. The start of this plateau can be determined by the Lfunction, providing an optimal metacluster number for the particular dataset.
Discussion
In this work we outline the use of overall cluster unimodality as a quality control measurement for FlowSOM clustering results. The goal of clustering is to determine and enumerate unique cellular subsets, or in other words grouping cells with unimodal distributions in all markers. By using cluster unimodality this clustering process can be condensed to a single score for each set of FlowSOM results, collected in the Unimodality set \({U}_{i}\). The chosen datasets contain labels for known cell phenotypes through different analysis methods, each with its own advantages and assumptions. The datasets also vary in dimensionality and depth of phenotyping. This underscores the flexibility of INFLECT. The fraction of unimodal distributions for a dataset reaches a plateau, at an inflection point that is dependent on dataset size, number of markers, and expression patterns of markers. While use of marker unimodality has been described before in highdimensional analysis [11], many rely on Gaussian mixture modeling or expectation maximization [28,29,30,31]. These processes are computationally expensive, making them impractical for use in a quality control setting where we would want to evaluate iterative clustering runs. Moreover, unimodality testing through the dip test has been described as more stable and less prone to errors in literature [19].
This work is inspired by quality control functions that are part of SPADEVizR [18] and work on the Lfunction [24]. The hyperparameters used in INFLECT, the threshold for interquartile range (set at \(arcsinh(X/5)= 2\)) and the dip test (\(\alpha = 0.05\)) are the same as in SPADEVizR. When combined into INFLECT we show that they provide a datadriven metaclustering endpoint for FlowSOM. Though not investigated in this study, the application of INFLECT should work for any clustering method where the user can determine the number of resulting clusters. This can be either in cases where the number of clusters is used as input, such as SCAFFoLD [32] or where the smallest acceptable cluster is determined as in Citrus [12]. Additionally, there are methods where input from the user more indirectly affects cluster composition and total number of clusters, such as bandwidth selection in densitybased clustering (ACCENSE [33] or Gaussian mean shiftbased clustering [34]). INFLECT can be utilized to iteratively evaluate these input parameters aside from visual inspection of tSNE embeddings. These approaches come with the caveat that they likely require take a longer time to compute. The design of FlowSOM allows for low runtimes also for larger datasets [21], making it very suitable for testing many iterations to determine the best clustering. Applying INFLECT to clustering algorithms other than FlowSOM was outside of the scope of this work and would have to be validated before implementation.
While INFLECT was developed for use in mass cytometry, the principle behind cluster unimodality would still hold true for highdimensional cytometry techniques like spectral cytometry [35] or highparameter flow cytometry. However, extra care should be taken in preprocessing of the data regarding distributions around zero (and lower). Since mass cytometry has no negative values, the unimodality test is performed on positive distributions only. Unexpected minor crosstalk between markers in conventional flow cytometry could result in multimodal negative peaks and will likely lead INFLECT to recommend an optimal cluster number that is too high [36, 37]. The distributions of gene expressions in scRNAseq data are not comparable to protein expression in mass cytometry. Therefore we do not expect INFLECT to provide a benefit when assessing the clustering results of single cell RNA sequencing (scRNAseq) data.
When comparing INFLECT clustering with the cluster definitions as originally published for the dataset, the inflection point was consistently higher than the number of phenotypes determined by the authors. By visually inspecting histograms for representative INFLECTmetaclusters, we show that the clusters have distinct marker expression patterns. While it is unlikely that a wholly novel cell type is discovered, by splitting cell types into distinct subclusters INFLECT enumerates possible differentiation states or activation states. INFLECT thereby allows for differential abundance testing on these subclusters. However, it is still up to the researcher to determine if this cluster of events represents a biologically meaningful subset. For example the CD56^{bright} population of NK cells identified within the BMMC dataset of Levine et al., while small, is an important subset with a distinct function in the immune system [38]. The rationale for finding the datadriven endpoint of clustering is to limit the number of clusters, in turn to limit the number of statistical tests that will be performed and reduce occurrence of type 1 errors. Therefore, manually merging clusters that were found in an unsupervised manner is a valid strategy. The diagnostic curves provided by INFLECT can aid in this process, showing at what number of metaclusters cluster unimodality will drop dramatically.
Conclusion
INFLECT addresses a knowledge gap in highdimensional cytometry analysis, namely assessing clustering results. This is done through monitoring marker distributions for interquartile range and unimodality across a range of metacluster numbers. The fraction of unimodal distributions within metaclusters, collected in the Unimodality set \({U}_{i}\), plotted versus the number of clusters consistently reaches a plateau, providing a datadriven endpoint for metacluster number. The inflection point is the optimal tradeoff between cellular heterogeneity and statistical power, applied in this work for FlowSOM clustering on mass cytometry datasets.
Availability and requirements
The computational method proposed in this work is collected in the R package “INFLECT” and made available on github at www.github.com/jnverhoeff/GarciaVallejoLab. Project name: INFLECT. Project home page: https://www.github.com/jnverhoeff/GarciaVallejoLab. Operating system(s): Platform independent. Programming language: R. Other requirements: R4.1.2. License: GNU GPLv3. Any restrictions to use by nonacademics: license needed for commercial use.
Availability of data and materials
The datasets supporting the conclusions of this article are available in the Flowrepository, Levine32 under FRFCMZZPH at http://flowrepository.org/id/FRFCMZZPH, both Kimmey et al. datasets under FRFCMZYR5 at http://flowrepository.org/id/FRFCMZYR5. The Bagwell et al. dataset is available on Cytobank, under Bagwell et al. at http://premium.cytobank.org/cytobank/experiments/221569.
Abbreviations
 QC:

Quality control
 BMMC:

Bone marrow mononuclear cells
 PBMC:

Peripheral blood mononuclear cell
 NK:

Natural killer
 SPADE:

Spanningtree progression analysis of densitynormalized events
 scRNAseq:

Cells, single cell RNA sequencing
 tSNE:

Tdistributed stochastic neighbor embedding
 UMAP:

Uniform manifold approximation projection
 SOM:

Selforganizing maps
References
Mair F, et al. The end of gating? An introduction to automated analysis of high dimensional cytometry data. Eur J Immunol. 2016;46(1):34–43.
Saeys Y, Van Gassen S, Lambrecht BN. Computational flow cytometry: helping to make sense of highdimensional immunology data. Nat Rev Immunol. 2016;16(7):449–62.
Kaushik A et al. A novel workflow for semisupervised annotation of celltype clusters in mass cytometry data. In: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 2019.
Ludwig J, et al. flowEMMi: an automated modelbased clustering tool for microbial cytometric data. BMC Bioinform. 2019;20(1):643.
Minoura K, et al. Modelbased cell clustering and population tracking for timeseries flow cytometry data. BMC Bioinform. 2019;20(23):633.
Weber LM, et al. diffcyt: Differential discovery in highdimensional cytometry via highresolution clustering. Commun Biol. 2019;2(1):1–11.
Ye X, Ho JWK. Ultrafast clustering of singlecell flow cytometry data using FlowGrid. BMC Syst Biol. 2019;13(Suppl 2):35–35.
Orlova DY, Herzenberg LA, Walther G. Science not art: statistically sound methods for identifying subsets in multidimensional flow and mass cytometry data sets. Nat Rev Immunol. 2018;18(1):77–77.
Saeys Y, Van Gassen S, Lambrecht B. Response to Orlova et al. “Science not art: Statistically sound methods for identifying subsets in multidimensional flow and mass cytometry data sets.” Nat Rev Immunol. 2018;18(1):78.
Levine JH, et al. Datadriven phenotypic dissection of AML reveals progenitorlike cells that correlate with prognosis. Cell. 2015;162(1):184–97.
Naim I, et al. SWIFT—scalable clustering for automated identification of rare cell populations in large, highdimensional flow cytometry datasets, Part 1: Algorithm design. Cytometry A. 2014;85(5):408–21.
Bruggner RV, et al. Automated identification of stratifying signatures in cellular subpopulations. Proc Natl Acad Sci. 2014;111(26):E2770–7.
Guo M, et al. Mass cytometry analysis reveals a distinct immune environment in peritoneal fluid in endometriosis: a characterisation study. BMC Med. 2020;18(1):3.
Kothari H, et al. Identification of human immune cell subtypes most responsive to IL1betainduced inflammatory signaling using mass cytometry. Sci Signal. 2021;14(673):5763.
Wong MT, et al. Mapping the diversity of follicular helper T cells in human blood and tonsils using highdimensional mass cytometry analysis. Cell Rep. 2015;11(11):1822–33.
Bengsch B, et al. Epigenomicguided mass cytometry profiling reveals diseasespecific features of exhausted CD8 T cells. Immunity. 2018;48(5):10291045.e5.
Patel AJ, et al. Regulatory B cell repertoire defects predispose lung cancer patients to immunerelated toxicity following checkpoint blockade. Nat Commun. 2022;13(1):3148.
Gautreau G, et al. SPADEVizR: an R package for visualization, analysis and integration of SPADE results. Bioinformatics. 2016;33(5):779–81.
Johnsson K, Linderoth M, Fontes M. What is a “unimodal” cell population? Using statistical tests as criteria for unimodality in automated gating and quality control. Cytometry A. 2017;91(9):908–16.
Van Gassen S, et al. FlowSOM: Using selforganizing maps for visualization and interpretation of cytometry data. Cytometry A. 2015;87(7):636–45.
Weber LM, Robinson MD. Comparison of clustering methods for highdimensional singlecell flow and mass cytometry data. Cytometry A. 2016;89(12):1084–96.
Hartigan JA, Hartigan PM. The dip test of unimodality. Ann Stat. 1985;13(1):70–84.
Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics (Oxford, England). 2010;26(12):1572–3.
Salvador SC, Philip. Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms.
Kimmey SC, et al. Parallel analysis of trimolecular biosynthesis with cell identity and function in single cells. Nat Commun. 2019;10(1):1185.
Bagwell CB, et al. Multisite reproducibility of a human immunophenotyping assay in whole blood and peripheral blood mononuclear cells preparations using CyTOF technology coupled with Maxpar Pathsetter, an automated data analysis system. Cytometry B Clin Cytom. 2020;98(2):146–60.
Liu X, et al. A comparison framework and guideline of clustering methods for mass cytometry data. Genome Biol. 2019;20(1):297.
Sörensen T, et al. immunoClust—an automated analysis pipeline for the identification of immunophenotypic signatures in highdimensional cytometric datasets. Cytometry A. 2015;87(7):603–15.
Boedigheimer MJ, Ferbas J. Mixture modeling approach to flow cytometry data. Cytometry A. 2008;73A(5):421–9.
Chan C, et al. Statistical mixture modeling for cell subtype identification in flow cytometry. Cytometry A. 2008;73A(8):693–701.
Ge Y, Sealfon SC. flowPeaks: a fast unsupervised clustering for flow cytometry data via Kmeans and density peak finding. Bioinformatics (Oxford, England). 2012;28(15):2052–8.
Spitzer MH, et al. An interactive reference framework for modeling a dynamic immune system. Science. 2015;349(6244):1259425.
Shekhar K, et al. Automatic classification of cellular expression by nonlinear stochastic embedding (ACCENSE). Proc Natl Acad Sci. 2014;111(1):202–7.
van Unen V, et al. Visual analysis of mass cytometry data by hierarchical stochastic neighbour embedding reveals rare cell types. Nat Commun. 2017;8(1):1740.
Robinson JP. Multispectral cytometry: the next generation. Biophoton Int. 2004;2004:36–40.
Folcarelli R, et al. Transformation of multicolour flow cytometry data with OTflow prevents misleading multivariate analysis results and incorrect immunological conclusions. Cytometry A. 2022;101(1):72–85.
Azad A, Rajwa B, Pothen A. flowVS: channelspecific variance stabilization in flow cytometry. BMC Bioinform. 2016;17(1):291.
Poli A, et al. CD56bright natural killer (NK) cells: an important NK cell subset. Immunology. 2009;126(4):458–65.
Acknowledgements
We would like to thank Sofie van Gassen for fruitful discussions.
Funding
This work was funded by the Cancer Center Amsterdam—Amsterdam UMC. The funding body had no influence the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Author information
Authors and Affiliations
Contributions
JV, and JJGV designed research; JV performed research; JV analysed data; JV, SA, and JJGV wrote the paper. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Supplementary Fig. S1:
Marker performance for Levine32 INFLECT. Marker performance diagnostic plots for remaining markers of Levine32. On the xaxis the amount of FlowSOMmetaclusters, and on the yaxis the amount of metaclusters that pass the unimodality and interquartile range checks as a percentage of total metaclusters for the listed marker; Supplementary Fig. S2: All 15 repeats of 90% subsamples of Levine32. A Lfunction applied to the sigmoidal curve fitted to set\({U}_{i}\). B Lfunction applied to the set \({U}_{i}\) itself. Resulting inflection points for both panels denoted with vertical dotted line; Supplementary Fig. S3: With fewer datapoints in Unimodality set \({U}_{i}\) INFLECT still produces a stable fitted sigmoidal curve. INFLECT diagnostic plots with different sizes of set \({U}_{i}\), from 81 calculations down to 9 datapoints. Resulting inflection point is stable around 40 metaclusters. The values of the Unimodality set \({U}_{i}\) are plotted on the yaxis versus the number of metaclusters assessed on the xaxis; Supplementary Fig. S4: Traditional (meta)clustering evaluations perform poorly on Levine32 dataset. A Davies Bouldin (DB) Index on the yaxis versus the amount of FlowSOMmetaclusters. A lower score indicates better clustering. In this case the DB index does not form a plateau, making interpretation difficult. B CalinskiHarabasz (CH) Index on the yaxis versus the amount of FlowSOMmetaclusters. A higher score indicates better clustering. CH index drops with increasing FlowSOMmetaclusters, making CH index less suitable for evaluation. C Diagnostic plot of ConsensusClusterPlus on Levine32 dataset. Relative change in area under cumulative distribution function (CDF) curve compared to \(k1\) clusters. FlowSOM implementation of ConsensusClusterPlus indicates 14 metaclusters as optimal\(k\). D tSNE embedding of Levine32, colored for the 14 ConsensusClusterPlus metaclusters. Multiple islands (CD4 T cells, CD8 T cells and monocytes) are grouped into 1 large metacluster; Supplementary Fig. S5: INFLECT reveals heterogeneity within labeled phenotypes of the replicate PBMC dataset from Bagwell et al. A tSNE embedding of 10,000 events from Bagwell dataset. Colored for the 26 manually gated phenotypes, plus lightpink for unlabeled cells and yellow for events with 2 or more labels. B Histograms of selected markers highlighting heterogeneity within the Early NK cells. Xaxis denote Arcsinh(x/5) transformed expression values. C Same tSNE embedding of 10,000 events as in A, here colored for the 52 INFLECT metaclusters. D Histograms of selected markers highlighting the difference between metacluster 24 and 29, which correspond to the 2 different cell populations within the Early NK label. Xaxis denote Arcsinh(x/5) transformed expression values. Metacluster 29 is a smaller CD56^{bright} CD16^{dim} population, whereas metacluster 24 has high levels of CD16 and intermediate CD56 expression; Supplementary Fig. S6: INFLECT applied to bone marrow dataset from Kimmey et al. reveals high degree of heterogeneity. A tSNE embedding of 10,000 events from KimmeyBM dataset. Colors denote the labels identified by the authors. Immature B cells are highlighted in rectangle. B Histograms of selected markers highlighting heterogeneity within Immature B cells. Xaxis denote Arcsinh(x/5) transformed expression values. C Same tSNE embedding as in A, now colored for the 74 INFLECT metaclusters. Highlighted in the rectangle are the 4 different metaclusters of the Immature B cells, which were separated into metaclusters 38, 41, 49, and 71. D Histograms of selected markers highlighting heterogeneity between the 4 metaclusters corresponding to the Immature B cell label. Xaxis denote Arcsinh(x/5) transformed expression values. Metacluster 71 has a CD38^{+} phenotype, metacluster 41 has a IgM^{dim} phenotype and metaclusters 38 and 49 are differentiated in expression levels of CD24 and to a lesser extent CD45RA; Supplementary Fig. S7: INFLECT algorithm applied to PBMC dataset from Kimmey et al. captures underlying heterogeneity of labeled populations. A tSNE embedding of 10,000 events from KimmeyPBMC dataset. Colors denote the labels identified by the authors. Rectangle highlights the cDCs label. B Histograms of selected markers highlighting heterogeneity within the cDCs label. Xaxis denote Arcsinh(x/5) transformed expression values. C Same tSNE embedding as in A, here colored for the 41 INFLECT metaclusters. In the rectangle the metaclusters 7 and 15 are highlighted, which correspond to the cDCs label. D Histograms of selected markers highlighting heterogeneity between metaclusters 7 and 15. Main difference between metaclusters 7 and 15 is the level of CD11c expression.
Additional file 2:
Matching matrix of Levine32 dataset.
Additional file 3:
Matching matrix of Bagwell dataset.
Additional file 4:
Matching matrix of KimmeyBM dataset.
Additional file 5:
Matching matrix of KimmeyPBMC dataset.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Verhoeff, J., Abeln, S. & GarciaVallejo, J.J. INFLECT: an Rpackage for cytometry cluster evaluation using marker modality. BMC Bioinformatics 23, 487 (2022). https://doi.org/10.1186/s1285902205018w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1285902205018w
Keywords
 Mass cytometry
 Data analysis
 Clustering results
 Clustering evaluation
 Unimodality
 Software
 Phenotyping