 Software
 Open access
 Published:
Sctensor detects manytomany cell–cell interactions from single cell RNAsequencing data
BMC Bioinformatics volume 24, Article number: 420 (2023)
Abstract
Background
Complex biological systems are described as a multitude of cell–cell interactions (CCIs). Recent singlecell RNAsequencing studies focus on CCIs based on ligand–receptor (L–R) gene coexpression but the analytical methods are not appropriate to detect manytomany CCIs.
Results
In this work, we propose scTensor, a novel method for extracting representative triadic relationships (or hypergraphs), which include ligandexpression, receptorexpression, and related L–R pairs.
Conclusions
Through extensive studies with simulated and empirical datasets, we have shown that scTensor can detect some hypergraphs that cannot be detected using conventional CCI detection methods, especially when they include manytomany relationships. scTensor is implemented as a freely available R/Bioconductor package.
Background
Complex biological systems and processes such as tissue homeostasis [1, 2], neurotransmission [3, 4], immune response [5], ontogenesis [6], and stem cell niches niche [7, 8] are composed of cell–cell interactions (CCIs). Many molecular biology studies have decomposed such systems into constituent parts (e.g., genes, proteins, and metabolites) to clarify their functions. Nevertheless, more sophisticated methodologies are required because CCIs essentially differentiate whole systems from functioning merely as the sum of their parts. Accordingly, microlevel measurements of such parts cannot always explain macrolevel biological functions.
Previous studies have investigated CCIs using technologies such as fluorescence microscopy [9,10,11,12,13], microdevicebased methods such as microwells, micropatterns, singlecell traps, droplet microfluidics, and micropillars [14,15,16,17,18,19,20,21,22], and transcriptomebased methods [23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52]. In particular, the recent singlecell RNAsequencing (scRNAseq) studies have focused on CCIs based on ligand–receptor (L–R) gene coexpression. By investigating the detected cell types through scRNAseq and the L–R pairs specifically expressed in the cell types, CCIs can potentially be understood at high resolution.
Despite their wide usage, the analytical methods based on L–R pairs are still not mature; such methods implicitly assume that CCIs consist of onetoone relationships between two cell types and that the corresponding L–R coexpression is observed in a celltypespecific manner. One study even removed ligand and receptor genes expressed in multiple cell types from their data matrix, assuming onetoone CCIs [53]. In real empirical data, however, each ligand and receptor gene can be expressed across multiple cell types, and some studies have actually focused on manytomany CCIs [25, 33, 36, 48, 54]. Such a difference between actual CCI patterns composed of real data and the hypothesis assumed by a model will cause severe bias in the detection of CCIs.
For the above reason, we propose scTensor, which is a novel CCI prediction method based on a tensor decomposition algorithm. Our method regards CCIs as hypergraphs and extracts some representative triadic relationships consisting of ligandexpression, receptorexpression, and related L–R pairs. The main contributions of this article are summarized as follows.

We developed a novel simulator to model the CCIs as hypergraphs and quantitatively evaluate the performance of scTensor and other L–R detection methods.

We reimplement some L–R detection methods from scratch in order to analyze the same L–R database with all of these methods and focus on only the performance of L–R detection methods, not the slight differences in data preprocessing and the L–R database used.

We show that scTensor’s performance with respect to its accuracy of manytomany CCI detection, computation time, and memory usage are superior to the other L–R detection methods.

We describe the implementation of scTensor as an R/Bioconductor package to enable the reproducibility of data analyses as well as continuous maintenance and improvements. We provide some original visualization functions and a function to generate an HTML report in scTensor to enable detailed interpretation of the results. We have extended our framework to work with 125 species.
Results
CCI as a hypergraph
One of the simplest CCI representations is a directed graph, where each node represents a cell type and each edge represents the coexpression of all L–R pairs (Fig. 1a, left). The direction of each edge is set as the ligand expressing cell type \(\rightarrow\) the receptorexpressing cell type. Such a data structure corresponds to an asymmetric adjacency matrix, in which each row and column represents a ligandexpressing cell type and receptorexpressing cell type, respectively. If some combinations of cell types are regarded as interacting, the corresponding elements of the matrix are filled with 1 and otherwise 0. If the degree of CCI is not a binary relationship, weighted graphs and corresponding weighted adjacent matrices may also be used. The previous analytical methods are categorized within this approach [23, 24, 26,27,28,29,30,31,32,33,34, 36, 40, 43,44,45,46, 48, 49, 51, 52, 55].
The drawback of using an adjacency matrix to describe CCIs is that multiple L–R coexpression scores are collapsed into a single value by summation or averaging. Because the average is simply a constant multiple of the sum, here we discuss only the sum. The summed value has no meaning in which L–R pairs are related to the CCI, and therefore CCIs and the related L–R pair lists cannot be detected simultaneously.
In contrast to an adjacency matrix (i.e., graph), the triadic relationship of CCIs also can be described as directed hypergraphs (i.e., CCI as hypergraph; CaH), where each node is a cell type but the edges are distinguished from each other by the different related L–R pair sets (Fig. 1a, right). Such a contextaware edge is called a “hyperedge” and is described as multiple different adjacency matrices. The set of matrices corresponds to a “tensor”, which is a generalization of a matrix to expand its order.
Overview of scTensor
Here we introduce the procedure of scTensor. Firstly, a tensor data is constructed through the following steps (Fig. 1b). A scRNAseq matrix and the cellular labels specifying cell types are supposed to be provided by users. Firstly, the gene expression values of each cell are normalized by count per median of library size (CPMED [56,57,58]) and logarithm transformation, for variancestabilization, is performed to the data matrix [i.e., \(\log _{10}{(\textrm{CPMED} + 1)}\)].
Next, the data matrix is converted to a celltypelevel average matrix according to the cell type labels. Combined with an L–R database, two corresponding rowvectors of an L–R pair are extracted from the matrix. The outer product (direct product) of the two vectors is calculated, and a matrix is generated. The matrix can be considered as the similarity matrix of all possible celltype combinations for each L–R pair. Finally, for each L–R pair, the matrix is calculated, and the tensor \(\mathcal {\chi } \in \mathbb {R}^{J \times J \times K}\), where J is the number of cell types and K is the number of L–R pairs, is generated as the merged matrices. In this work, this tensor is called the “CCItensor”.
After the construction of the CCItensor, we use the nonnegative Tucker2 decomposition (NTD2) algorithm [59, 60]. NTD2 decomposes the CCItensor as a core tensor \(\mathcal {G} \in \mathbb {R}^{R1 \times R2 \times K}\), and two factor matrices \(\varvec{A}^{\left( 1\right) } \in \mathbb {R}^{J \times R1}\) and \(\varvec{A}^{\left( 2\right) } \in \mathbb {R}^{J \times R2}\), where R1 and R2 are the NTD2 rank parameters. The factor matrix \(\varvec{A}^{\left( 1\right) }\) describes the R1 of ligand gene expression patterns in each cell type and the factor matrix \(\varvec{A}^{\left( 2\right) }\) describes R2 of receptor gene expression patterns in each cell type, and core tensor \(\mathcal {G}\) describes the degree of association of all the combination (\(R1 \times R2\)) of the ligand and receptor expression patterns of each L–R pair.
The result of NTD2 is considered the sum of some representative triadic relationships. In this work, each triadic relationship is termed \(\texttt {CaH}{}\left( r1,r2\right)\), which refers to the outer product of three vectors, \(\varvec{A}_{:r1}^{\left( 1\right) }\), \(\varvec{A}_{:r2}^{\left( 2\right) }\), and \(\mathcal {G}_{r1,r2,:}\), where r1 (\(1 \le r1 \le R1\)) and r2 (\(1 \le r2 \le R2\)) are the indices of the columns of the two factor matrices (Fig. 1c). The CaHs are extracted in a datadriven way without the assumption of onetoone CCIs. Therefore, this approach can also detect manytomany CCIs according to the data complexity.
Evaluation of manytomany CCIs detection
To examine the performance of the CCI methods in terms of detecting CaH, we validated the CCI methods using both simulated and empirical datasets (Fig. 2).
We first prepared 90 simulated datasets considering five numbers of cell types (3, 5, 10, 20, or 30), two CCI styles (onetoone or manytomany including onetomany and manytoone), three numbers of CCI types (1, 3, or 5), and three threshold values (E2, E5, or E10) for recognition of differentially expressed genes (DEGs). According to these conditions, ground truth CCIs were determined (Additional file 1).
Next, we prepared five real empirical datasets (FetalKidney [36], GermlineFemale [25], HeadandNeckCancer [54], Uterus [33], and VisualCortex [48]), each of which focused on manytomany CCIs in their respective original papers.
There are many L–R scoring methods to quantify the degree of coexpression of ligand and receptor genes. We reimplemented four scores used in many CCI prediction methods to evaluate performance independent of software implementation (Table 1 and Additional file 1). Here, we selected as the four methods sum score (CellPhoneDB [37], Giotto [61], CrossTalkR [62], and Squidpy [63]), product score (NATME [64], FunRes [65], ICELLNET [66], and TraSig [67]) Halpern’s score [68], and Cabello \(\mathrm {}\) Aguilar’s score (SingleCellSignalR [69] and CellTalkDB [70]), each of which is widely used in many studies.
To differentiate significant CCIs from nonsignificant CCIs, many CCI methods introduce a label permutation test with a random permutation of celltype labels to simulate the null distribution of CCIs. This process is considered a kind of binarization (1 for significant CCIs, 0 for nonsignificant CCIs). For scTensor, binarization was realized by median absolute deviation (MAD) thresholding against each column vector in factor matrices calculated by tensor decomposition.
To quantitatively evaluate how selectively each CCI method was able to detect the ground truth CCIs before and after binarization, nine evaluation measures were introduced. Four of them were applied both before and after binarization, and the remaining five were applied to the results only after binarization.
scTensor could selectively detect manytomany CCIs in simulated datasets
The area under the curve of precision–recall (AUCPR) and Matthews correlation coefficient (MCC) values of 30 datasets with an E10 threshold value are shown in Fig. 3. For the details of all the evaluation results for all the conditions, see Additional files 3–12. Figure 3 shows that the AUCPR values can vary among the CCI methods. When the CCIstyle was onetoone (Fig. 3a, left), scTensor (NTD2) achieved the highest AUCPR scores, and Halpern’s score obtained the secondhighest AUCPR values on average. For Halpern’s score, however, binarization has significantly reduced the significant CCIs. This may be explained by Halpern’s score having the lowest FPR (Additional file 10) and the highest FNR (Additional file 11), and it suggests that Halpern’s score is a quite conservative method to detect onetoone CCIs. When the CCIstyle was set as manytomany, both the previous and current versions of scTensor (NTD3 and NTD2, respectively) achieved higher AUCPR values on average (Additional file 4), compared with the other methods (Fig. 3a, right).
Figure 3b shows that the MCC values also varied among the CCI methods. When the CCIstyle was onetoone (Fig. 3b, left), Halpern’s score achieved the highest MCC values and scTensor (NTD2) obtained the secondhighest values on average (Additional file 8). When the CCIstyle was manytomany, scTensor (NTD2) and sum score obtained the highest MCC values, compared with the other methods (Fig. 3b, right).
Characteristics of scTensor (NTD2), Halpern ’s score, and sum score
The comprehensive validation described that the three methods (scTensor (NTD2), Halpern’s score, and sum score) performed better than the others under certain conditions. To further examine the characteristics and trends of each method, we aggregated the number of CCIs detected in three datasets in which each of the three methods excelled (Fig. 4).
scTensor (NTD2): this method performed well when the CCI style was manytomany. For example, in Fig. 4a, most manytomany CCIs could be detected. Although there were some false negative (FN) CCIs that were not detected, there were fewer false positive (FP) CCIs. In contrast, Halpern’s score was too conservative against this dataset and failed to detect most of the CCIs by the label permutation test. At a first glance of Fig. 4a, the sum score appears to work well with these data, but under scrutiny at the level of individual L–R pairs, sum score results contain many FP and FN CCIs (Additional file 13).
Sum score: This method performed well when the number of cell types was small and the style of CCI was restricted to onetoone (Fig. 4b). Even though Halpern’s score and scTensor (NTD2) were able to detect similar CCIs, Halpern’s score was quite conservative and contained many FN CCIs because it considered many CCIs to not be significant. For sum score, there seemed to be a bias toward FP CCIs. If the degree of coexpression of an L–R pair is high between two cell types, this method seems to detect FP pairs in which only one of the L–R is highly expressed. In such cases, crossshaped patterns were observed in the heatmap in Fig. 4. In our simulated datasets, this crossshaped pattern of FP CCIs were observed more frequently in the sum score.
Halpern’s score: In most data sets, Halpern’s score was found to be too conservative, with many FN CCIs, but in a very specific situation, that is, when the CCIstyle was onetoall (or alltoone), it outperformed the other methods (Fig. 4c). In contrast, scTensor (NTD2) inferred many FN CCIs among these data, while the sum score identified many FP CCIs (Additional file 13).
scTensor could selectively detect manytomany CCIs in real datasets
Next, we applied these CCI methods to real empirical datasets (Table 2 and Additional files 3–12). As expected from the results of simulated datasets, scTensor (NTD2) outperformed the other methods on these real datasets, which contain manytomany CCIs. Regarding AUCPR (Fig. 5a) and MCC (Fig. 5b) values, scTensor (NTD2) achieved higher values compared with the other methods, although the difficulty of detecting the CCIs was highly dependent on the dataset (Additional files 4, 8). We further investigated the real empirical datasets and found that the known CCIs reported by the original papers were reproduced by scTensor (Table 3). Additionally, some predicted manytomany CCIs can be considered biologically plausible because the CCIs are related to the same signaling pathways of known CCIs, although the original papers did not refer to the CCIs. These results can be interactively investigated using the HTML report generated by scTensor (Additional files 13–17).
Computational complexity and memory usage
We also assessed the orders of computational complexity and memory usage of all the CCI methods (Table 4). All the L–R score methods require \(\mathcal {O}(N^{2}L)\) order both in the computation and in memory usage, where N is the number of cell types and L is the number of L–R pairs. The label permutation tests combining any L–R scores require \(\mathcal {O}(N^{2}LP)\) in computation, where P is the number of random shuffles of celltype labels. In many cases, P is typically set as a large value greater than 1000 [37, 69], making this a very timeconsuming calculation.
In contrast, scTensor (NTD2) does not perform the label permutation; instead, it simply utilizes the factor matrices after the decomposition of the CCItensor. Hence, the order of computational complexity is reduced to \(\mathcal {O}(N^{2}L(R1+R2))\), where R1 and R2 are the number of columns or “rank” parameters for the first and secondfactor matrices, respectively. The rank parameters are typically set as small numbers (e.g., 10), this leads to a substantial computational advantage compared with the label permutation test. The computation time and memory usage when analyzing the simulated and real empirical datasets show that scTensor has an advantage in computational complexity compared with the label permutation test (Additional files 5, 6).
Method comparisons
The first method similar to scTensor is CellChat. This method uses the communication probabilities (3rdorder tensor), which is a CCI tensor constructed with the authors’ original score and is normalized so that the sum in the second mode is 1. In addition to the label permutation test on each L–R pair in the thirdorder tensor, NMF on the matrix data summarized by the second mode of the 3rdorder tensor is performed to detect global CCI patterns. However, this summarization reduces the order of tensor (i.e., 3rd to 2nd) and loses information on which L–R pairs contributed to the CCI. In particular, as this study has shown, the label permutation test tends to detect onetoone CCI, whereas NMF may also detect manyvsmany CCI, making it difficult to consider manyvsmany CCI and the L–R pairs that contribute to it simultaneously, even when the two methods are combined. Therefore, models like scTensor that can handle higherorder data as is are preferable.
Inspired by our method, another method Tensorcell2cell [71] extended our approach to higherorder CCI tensors (e.g., 4thorder tensors) to consider CCIs and the CCI contexts (e.g., disease state, organismal life stage, and tissue microenvironment) simultaneously. Other than its effectiveness for such a higherorder CCI tensor, the main differences between Tensorcell2cell and scTensor may include the following. First, Tensorcell2cell is implemented by Python but scTensor is implemented by R. Python offers a wide range of machine learning/deep learning packages, while R offers data preprocessing and visualization with TidyVerse and bioinformaticsrelated packages with Bioconductor.
Second, there are differences in tensor decomposition models; Tensorcell2cell performs CANDECOMP/PARAFACtype nonnegative tensor decomposition but scTensor’s model is NTD2. The difference between these models is the number of rank parameters. Tensorcell2cell has only one rank parameter, while scTensor has rank parameters for the number of tensor orders (i.e. 3). This difference can be an advantage or a disadvantage; a small number of ranks reduces the computational time required to estimate the optimal ranks but it might make the model too simple.
Third, there are differences in the related tools. Tensorcell2cell assumes their text file for input, it only supports major species such as mouse and human, and it seems to assume to be used with LIANA [72, 73], another CCI tool of the authors. On the other hand, scTensor supports 124 species (September 5, 2024) in the Bioconductor package LRBase, and can be combined with various other singlecell packages via the SingleCellExperiment object and Seurat (see Implementations and Fig. 6). With an understanding of these differences, users should choose the tool they want to use according to what they want to do.
Implementations
scTensor is implemented as an R/Bioconductor package that is freely available. Both a scRNAseq dataset and L–R database are required for scTensor execution. The default format for a scRNAseq dataset is SingleCellExperiment, in which the gene IDs correspond to NCBI’s Gene database to allow links with other databases (Fig. 6a). A scRNAseq dataset can also be converted from a Seurat object. We provided instructions for this data conversion (https://bioconductor.org/packages/release/bioc/vignettes/scTensor/inst/doc/scTensor_1_Data_format_ID_Conversion.html#caseiiiumicount).
LRBase, which is the L–R database for scTensor, is stored on a remote server called AnnotationHub and is downloaded to the user’s machine on demand, only when called by the user (Fig. 6a). To extend out method to a wide range of organisms, in this work, we originally constructed and are providing the L–R lists for 125 organisms (https://github.com/rikenbit/lrbaseworkflow/blob/master/sample_sheet/sample_sheet.csv). The details of the data processing pipeline are summarized in the README.md of lrbaseworkflow (https://github.com/rikenbit/lrbaseworkflow), which is a workflow for constructing the LRBase for each of the species. For data sustainability, we offer the data files, including older versions, on the AnnotationHub server. The data files are biannually updated in conjunction with Bioconductor updates and are provided using lrbaseworkflow. Users can specify which version of the data is used for analysis, thus ensuring data reproducibility.
NTD2 was implemented as within the function of nnTensor [74] R/CRAN package and internally imported into scTensor. scTensor constructs the CCItensor, decomposes the tensor by the NTD2 algorithm, and generates an HTML report.
To enhance the biological interpretation of CaH results, we implemented some visualization functions (Fig. 6a) and these plots can be interactively investigated via web browser. A wide variety of genewise information is included in the report and can be linked to the L–R lists through the use of other R/Bioconductor packages; the gene annotation is assigned by biomaRt [75] (Gene Name, Description, Gene Ontology [GO], STRING, and UniProtKB), reactome.db [76] (Reactome) and MeSH.XXX.eg.db [77] (Medical Subject Headings [MeSH]), while the enrichment analysis (also known as overrepresentative analysis [ORA]) is performed by GOstats [78] (GOORA), meshr [77] (MeSHORA), ReactomePA [79] (ReactomeORA), and DOSE [80] (Disease Ontology (DO)ORA, Network of Cancer Genes (NCG)ORA, DisGeNETORA).
To validate that the detected the coexpression of L–R gene pairs is also consistently detected in the other data including tissue or celltypelevel transcriptome data, the hyperlinks to RefEx [81], Expression Atlas [82], SingleCell Expression Atlas [83], scRNASeqDB [84], and PanglaoDB [85] are embedded in the HTML report, facilitating comparisons of the L–R expression results with the data from largescale genomics projects such as GTEx [86], FANTOM5 [87], the NIH Epigenomics Roadmap [88], ENCODE [89], and the Human Protein Atlas [90]. Additionally, in consideration of users who might want to experimentally investigate detected CCIs, we embedded hyperlinks to Connectivity Map (CMap [91]), which provides relationships between perturbations by the addition of particular chemical compounds/genetic reagents and the resulting gene expression change.
Discussion
In this work, we regarded CCIs as CaHs, which represent the triadic relationships of ligandexpressing cell types, receptorexpressing cell types, and the related L–R pairs. We implemented a novel algorithm scTensor based on a tensor decomposition algorithm for detecting such CaHs. Our evaluations using both simulated and real empirical datasets suggest that scTensor can detect manytomany CCIs more accurately than the other conventional CCI methods. Additionally, the calculation time and memory usage performances of scTensor are also superior to those of the other CCI methods.
To extend the use of scTensor to a wide range of organisms, we also created multiple L–R datasets for 125 organisms. scTensor has been published as an R/Bioconductor package, facilitating the reproducibility of data analysis and the maintainability of datasets. We also implemented an HTML report function that simplifies checking the analysis results of scTensor. Like many CCI tools, scTensor can import an external L–R database.
In the development of many CCI tools, the authors also develop their own L–R databases and investigate the differences among various L–R databases, particulaly when comparing their method with other conventional methods [72]. This makes it difficult to distinguish whether the performance of a method is caused by differences in algorithms or databases. Although the primary CCI resources used for existing L–R tools are highly duplicated, even slight differences can influence the detection of CCIs [72]. Therefore, to separate these two comparisons and to focus only on the algorithmic differences, in this work, we compared several existing CCI algorithms, by reimplementing them and anchoring them to a common L–R database.
We were also able to examine several strengths and weaknesses of the methods other than scTensor. For example, Halpern’s score was found to be too conservative with many FN CCIs, but it was superior to the other methods with respect to the detection of onetoall (or alltoone) CCIs. A possible reason for this is that since the formula for this score includes the square root of the chisquare distribution with two degree of freedom (or an exponential distribution with an expected value of 2), and these distributions are known to be heavytailed to some extent, thus potentially inflating the number of significant L–R pairs.
The permutation test implicitly assumes that the interactions occur between very few cell types because the larger the observed L–R score is than the empirical distribution computed by label permutation, the more significant the test result is. However, if the expression levels of ligand and receptor genes are high in any cell of any cell type, the L–R scores calculated by the label permutation are will also be high, and thus, the observed value of the L–R score will be regarded as not a particularly high value in the empirical distribution; consequently, such a test result will be not significant. Therefore, detection of manytomany CCIs by label permutation test is difficult in principle. In the extreme case of alltoall CCIs, the current approaches (although it also includes scTensor) cannot avoid FN CCIs.
There are still some plans to improve scTensor to build on the advantages of this current framework. For example, the algorithm can be improved by utilizing acceleration techniques such as randomized algorithm/sketching methods [92], incremental algorithm/stochastic optimization [93, 94], or distributed computing on largescale memory machines [95] for tensor decomposition, as is now available.
To reduce the memory usage of scTensor, we are developing DelayedTensor [96], which is an R/Bioconductor package to perform various tensor arithmetic and tensor decomposition algorithms based on DelayedArray [97], another R/Bioconductor package for handling outofcore multidimensional arrays in R. We intend to reduce the memory usage of scTensor by supporting this data format.
Tensor data formats are very flexible ways to represent heterogeneous biological data structures [98], because they easily integrate supplemental information about genes or cell types in a semisupervised manner. Such information could extend the scope of the data and thus improve the accuracy of inferences. For example, there are some attempts to use the following additional information for CCI detection as well (for more details, see Additional file 1).

CCI inference via receptorreceptor and extracellular matrix data [51, 99].

Consideration of multisubunit complexes [37].

Comparison of CCIs across multiple conditions [62, 71, 100,101,102,103,104,105,106].

Consideration of downstream transcriptional factors, target genes, and signaling pathways [69, 107,108,109].

Integration with bulk RNASeq or other type of omics datasets [37, 64, 65, 107,108,109,110].

Integration with spatial transcriptome data [113].
In particular, in a recent benchmark study [113], the proximity of spatial coordinates on tissue sections measured by spatial transcriptome technology and the CCI detected by L–R data were correlated, and some studies have attempted to integrate these two kinds of datasets a single model ([113] and Additional file 1). Auxiliary Information such as the proximity in spatial coordinates can be incorporated as a regularization term to extend the tensor decomposition model [114,115,116].
Although it is beyond the scope of the present paper to cover all of the abovementioned topics, considering these in the framework of tensor decomposition is a promising research direction, so we aim to continuously work on these through the development of updates and releases of scTensor for Bioconductor.
Conclusion
In this work, we present and evaluate scTensor, a new method for detecting CCIs based on L–R coexpression in scRNAseq datasets. We also revealed that the widely used label permutation test has a bias that impedes the detection of manytomany CCIs and demonstrated that the proposed method is a viable alternative.
Materials and methods
Simulated datasets
The simulated singlecell gene expression data were sampled from the negative binomial distribution \(NB\left( f_{gc}m_{g},\phi _{g}\right)\), where \(f_{gc}\) is the foldchange (FC) for gene g and cell type c, and \(m_{g}\) and \(\phi _{g}\) are the average gene expression and the dispersion parameter of the expression of gene g, respectively.
The \(m_{g}\) value and genewise variance \(v_{g}\) were calculated from a real scRNAseq dataset of mouse embryonic stem cells (mESCs) measured by QuartzSeq [117], and the genewise dispersion parameter \(\phi _{g}\) was estimated as \(\phi _{g} = \left( v_{g}  m_{g}\right) /m_{g}^2\).
For the determination of differentially expressed genes (DEGs) and nonDEGs, \(f_{gc}\) values were calculated based on the nonlinear relationship of FC and the gene expression level \(\log _{10}{f_{gc}} = a \exp (b \log _{10}{\left( m_{g}+1\right) })\). To estimate the parameters a and b, we detected the DEGs using edgeR. By setting the threshold values (i.e., false discovery rate) of edgeR as \(10^{2}\) (E2), \(10^{5}\) (E5), and \(10^{10}\) (E10) and using the resulting DEGs, a and b values for each threshold were estimated as (0.701, 0.363), (1.907, 0.666), and (4.429, 0.814), respectively.
For genes identified as DEGs based on a threshold according to the nonlinear relationship above, the estimated \(f_{gc}\) value was used, otherwise, 1 is specified as \(f_{gc}\). If a ligand gene of a cell type and a receptor gene of a cell type were both DEGs, we defined the relationship between these cell types as the ground truth CCIs and used them for quantitative evaluation.
To simulate the “dropout” phenomenon of scRNAseq experiments, we also introduced the dropout probability \(p_{dropout}^{gc} = \exp (c f_{gc}m_{g}^2)\), which is used in ZIFA [118] (default, c=1), and the expression values were randomly converted to 0 according to the dropout probability.
To simulate various situations, we set many different CCI tensors, considering the number of cell types (3, 5, 10, 20, or 30), the style of CCIs (onetoone or manytomany including onetomany and manytoone), the number of types of CCIs (1, 3, or 5), and the DEG threshold value (E2, E5, or E10); in total, 90 synthetic CCI tensors were generated.
scRNAseq real datasets
The gene expression matrix of human FetalKidney data was retrieved from the GEO database (GSE109205), and only highly variable genes (HVGs: http://pklab.med.harvard.edu/scw2014/subpop_tutorial.html) with low \(P\) values (\(\le\) 1E−1) were extracted. The celltype label data were provided by the authors upon our request.
The gene expression matrix of human GermlineFemale data was retrieved from the GEO database (GSE86146), and only HVGs with low \(P\) values (\(\le\) 1E−7) were extracted.
The gene expression matrix and the celltype labels of human HeadandNeckCancer data were retrieved from the GEO database (GSE103322), and only HVGs with low \(P\) values (\(\le\) 1E−1) were extracted.
The gene expression matrix of mouse Uterus data was retrieved from the GEO database (GSE118180), and only HVGs with low \(P\) values (\(\le\) 1E−1) were extracted. The celltype labels were provided by the authors upon our request.
The gene expression matrix and the celltype labels of mouse VisualCortex data were retrieved from the GEO database (GSE102827), and only HVGs with low \(P\) values (\(\le\) 1E−1) were extracted.
The gene expression values of each cell are normalized by CPMED [56,57,58] and logarithm transformation, for variancestabilization, is performed to the data matrix. For analyzing these real datasets, known L–R pairs in DLRP [119], IUPHAR [120], and HPMR [121] were searched in the data matrix. We defined the ground truth CCIs between two cell types if the CCIs were reported by the original studies. The L–R pairs associated with the CCIs were used for quantitative evaluation.
scTensor algorithm
CCItensor construction
Here, data matrix \(\varvec{Y} \in \mathbb {R}^{I \times H}\) is the gene expression matrix of scRNAseq data, where I is the number of genes and H is the number of cells. Matrix \(\varvec{Y}\) is converted into celltypewise average matrix \(\varvec{X} \in \mathbb {R}^{I \times J}\), where J is the number of cell types. The celltype labels are assumed to be specified by the user’s prior analysis, such as clustering or confirmation of marker gene expression. The relationship between the \(\varvec{X}\) and \(\varvec{Y}\) is described below:
where the matrix \(\varvec{A} \in \mathbb {R}^{H \times J}\) converts cellularlevel matrix \(\varvec{Y}\) to celltypelevel matrix X and each element of \({\varvec{A}}\) is
where \(n_{j}\) is the number of cells belonging to the j’s cell type.
Next, we search to determine whether any L–R pairs stored in the L–R database are both in the row names of matrix X, and if both IDs are found, corresponding Jlength rowvectors of the ligand and receptor genes (\(\varvec{x}_{L}\) and \(\varvec{x}_{R}\)) are extracted.
Finally, a \({J \times J}\) matrix is calculated as the outer product of \(\varvec{x}_{L}\) and \(\varvec{x}_{R}\) and incrementally stored. The stacked \({J \times J}\) matrices can be considered as a threedimensional array, which is also known as a threeorder tensor. The following outer product in the kth L–R pair (\(L\left( k\right)\) and \(R\left( k\right)\)) found is stored as the frontal slice (subtensor) of the CCItensor \(\mathcal {\chi } \in \mathbb {R}^{J \times J \times K}\):
Nonnegative Tucker3 decomposition (NTD3)
To extract the CaHs from the CCItensor \(\mathcal {\chi } \in \mathbb {R}^{J \times J \times K}\), we utilize NTD3 and NTD2, which are generalizations of nonnegative matrix factorization (NMF) to tensor data [59, 60]. The NMF approximates a nonnegative matrix data as the product of two lower rank nonnegative matrices (also known as factor matrices). Similar to NMF, NTD3 and NTD2 approximate a nonnegative tensor data as the product of some factor matrices and a core tensor.
To extend NMF to NTD3, we consider iterative updating \(\varvec{A}^{(n)} \mathcal {G}_{(n)} \varvec{A}^{(n)T} = \varvec{A}^{(n)} \mathcal {G}_{A}^{(n)} \left( n = 1,2,3 \right)\), which is the matricized expression of tensor decomposition. Here \(\varvec{A}^{(n)}\) is Kronecker product of the factor matrices without \(\varvec{A}^{(n)}\) and \(\mathcal {G}_{(n)}\) is the moden matricization of the core tensor \(\mathcal {G}\). For example, if \(n=1\), these become \(\varvec{A}^{(2)} \otimes \varvec{A}^{(3)}\) and \(\mathcal {G}_{(n)}\), respectively. By replacing X in the multiplicative update rule [60] with \(\varvec{A}^{(n)} \mathcal {G}_{(n)} \varvec{A}^{(n)T} \left( n = 1,2,3 \right)\), we can obtain the update rule for \(\varvec{A}^{(n)}\) as follows;
Similarly, the updating rule for core tensor \(\mathcal {G}\) is:
where \(\epsilon\) is a small value included to avoid generating negative values (default value 1E−10).
Nonnegative Tucker2 decomposition (NTD2)
The NTD3 has three rank parameters to be estimated, and it requires huge search space (\(R1 \times R2 \times R3\)). Additionally, the fewer the factor matrices, the more interpretable the results are. For these reasons, we further expanded the NTD3 into a model called the NTD2 [60] since v1.4.0 of scTensor.
In NTD2, the third factor matrix \(\varvec{A}^{(3)}\), which is related to L–R pairs, is replaced by an identity matrix \(I_{K}\), where the K diagonal elements are all 1 and the iteration step of \(\varvec{A}^{(3)}\) is skipped as follows:
Here, \(\mathcal {G}_{A}^{(1)} = \mathcal {G}_{1} [\varvec{A}^{(2)} I_{K}]\) and \(\mathcal {G}_{A}^{(2)} = \mathcal {G}_{2} [\varvec{A}^{(1)} I_{K}]\).
The updating rule for core tensor \(\mathcal {G}\) is
Rank estimation of NTD2
To extract the CaHs, scTensor estimates the NTD2 ranks for each matricized CCItensor (\(X^{(n)}\), \(n=1\) or 2). To be able to focus only on the dimensions that are informative and are not noisy, we used an ad hoc approach for NTD2 rank estimation.
Because NMF is performed in each matricized CCItensor in scTensor, we estimated each rank of NMF based on the residual sum of squares (RSS) [122] as
where \(\textrm{RSS}_{\textrm{max}}\) is the RSS by full rank NMF, \(\textrm{RSS}_{\textrm{min}}\) is the RSS by rank1 NMF, \(\textrm{RSS}_{\textrm{k}}\) is the RSS by rankk NMF, and \(\textrm{thr}_{\textrm{rank}}\) is the threshold value, ranging 0 to 1 (the default value is 0.8). RSS by rankk NMF is calculated between a data matrix X and the reconstructed matrix from W and H calculated by multiplicative updating rule [60] as follows:
RSS by fullrank and rank1 NMF is calculated by setting k as J and 1, respectively. With the estimated ranks \((\hat{R1}, \hat{R2})\), NTD2 was performed, and only the pairs (r1,r2) with large core tensor values are selected as CaHs. In its default mode, scTensor selects CaHs that explain the top 20 pairs sorted by the core tensor values.
Binarization
To binarize each column vector of the factor matrices obtained by NTD2, median absolute deviation (MAD), which is the median version of standard deviation (SD), was applied. Because we are only interested in the outliers of the elements of each vector in the positive direction, not the negative one, we focused only on the elements that deviate from the median in the positive direction as follows:
Here, \(\textrm{MAD}\left( x\right)\) is \(\textrm{median}\left( \Vert x  \textrm{median}\left( x\right) \Vert \right)\) and \(\textrm{thr}_{\textrm{bin}}\) is the threshold value (the default value is 1.0).
L–R scoring
Several methods have been proposed to score the degree of coexpression of a given L–R pair between two cell types, as described below.
Sum score
The gene expression of a ligand gene l can be averaged over cells belonging to the sth cell type within J cell types as follows:
Here, \(C_{s} \in \left( C_{1}, C_{2}, \ldots , C_{J} \right)\) and \(N_{C_{s}}\) is the number of cells belonging to cell type \(C_{s}\).
Likewise, the gene expression of a receptor gene r is averaged over cells belonging to the tth cell type within J cell types as follows:
Here, \(C_{t} \in \left( C_{1}, C_{2}, \ldots , C_{J} \right)\) and \(N_{C_{t}}\) is the number of cells belonging to cell type \(C_{t}\).
Using these values, the sum score is calculated as follows:
For example, some methods such as CellPhoneDB [37], Giotto [61], CrossTalkR [62], and Squidpy [63], essentially use this type of scoring (Table 1 and Additional file 1).
Product score
In some studies, the degree of coexpression is expressed as a product instead of a summation.
For example, some methods such as NATME [64], FunRes [65], ICELLNET [66], and TraSig [67], essentially use this type of scoring (Table 1 and Additional file 1).
Halpern’s score
Derived from the sum score, Halpern et al. proposed a score described below (Table 1 and Additional files 1).
In this score, Zscaling is firstly applied to both \(x_{l}^{C_{s}}\) and \(x_{r}^{C_{t}}\) over J cell types as follows:
Here, \(x_{L} = \left( x_{l}^{C_{1}}, x_{l}^{C_{2}}, \ldots , x_{l}^{C_{J}} \right)\) and \(x_{R} = \left( x_{r}^{C_{1}}, x_{r}^{C_{2}}, \ldots , x_{r}^{C_{J}} \right)\). Then, the square root of the sum of squares of these values is used as the degree of coexpression as follows:
Cabello \(\mathrm {}\) Aguilar’s score
Derived from the product score, Cabello \(\mathrm {}\) Aguilar et al. proposed a score described as follows (Table 1 and Additional file 1):
Here, \(\mu\) is the averaged value of the normalized read count matrix and is added as a scaling factor to avoid division by zero. This score is used in SingleCellSignalR [69] and CellTalkDB [70].
Label permutation method
To quantify the deviation of the observed scores obtained from real data, many studies employ \(P\) values in a statistical hypothesis testing framework. Typically, the label permutation method is widely used to calculate \(P\) values. In principle, this method can be used in combination with any L–R score as described above.
Here, we consider assigning a \(P\) value to any type of \(Score^{l,C_{s},r,C_{t}}\) above. In this method, the cluster labels of all the cells are randomly shuffled, and a synthetic score value is calculated. Performing this process 1000 times generates 1000 of the values. These values are used to generate the null distribution; for a combination of cell types, the proportion of the means which are “as or more extreme” than the observed mean is calculated as the \(P\) value. The label permutation test is performed as a onetailed test; there is a focus on L–R scores with significantly higher values compared to the null distribution, and not on L–R scores with significantly lower values. Because separating significant CCIs from nonsignificant CCIs by hypothesis testing can be regarded as a binarization process, label permutation results were compared with the results of scTensor binarization.
Quantitative evaluation of CCIs
The CCIs detected by the various methods tested in this paper were compared with ground truth CCIs to quantitatively evaluate the performance of each method. To evaluate the results, we used the metrics below.
Evaluation of the scoring before and after binarization
Each CCI method uses each corresponding L–R score to quantify the degree of coexpression of a given L–R pair between two cell types. To quantitatively evaluate the performance of each score, area under the curve of receiver operating characteristic (AUCROC) and area under the curve of precision–recall (AUCPR) were used.
A receiver operating characteristic (ROC) curve is a plot of the true positive rate (TPR, or the sensitivity = \(\frac{TP}{TP + FN}\)) versus the false positive rate (FPR, or 1  specificity, where specificity = \(\frac{TN}{TN + FP}\)) (where TP is the number of true positive CCIs, FP is the number of false positive CCIs, TN is the number of true negative CCIs, and FN is the number of false negative CCIs). The AUCROC value is the area under the ROC curve. AUCROC values range from 0 to 1, and the closer the value is to 1, the more the score indicates enrichment of the ground truch CCIs among the inferred CCIs.
A precision–recall curve is a plot of recall (i.e., sensitivity) versus precision (i.e., positive predictive value = \(\frac{TP}{TP + FP}\)). The AUCPR value is the value of the area under the precision–recall curve. AUCPR ranges from 0 to 1, and the closer the value is to 1, the more the score indicates enrichment of the ground truch CCIs among the inferred CCIs. AUCPR is known for its robustness against class imbalance, compared with AUCROC [123,124,125]. Hence, it seems that AUCPR is more appropriate than AUCROC because the number of significant CCIs are assumed to be less than that of nonsignificant CCIs in both simulated and real empirical data.
To evaluate whether the binarization was properly performed, we also applied these metrics to assess label permutation. As the label permutation test outputs \(P\) values, we utilized 1 − \(P\) value to quantify the degree of coexpression of elements of L–R pairs in the test. Because tensor decomposition is an unsupervised learning methods, we cannot distinguish which CaHs are enriched within the ground truth CCIs in advance. Additionally, we expected that scTensor could separate different styles of CCIs as multiple CaHs. Hence, we used the combination of CaHs from scTensor and ground truth CCIs with the maximum metrics values.
The calculation time and memory usage were evaluated by using the benchmark rules of Snakemake (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html?highlight=benchmark#benchmarkrules).
Evaluation of the scoring after binarization
Each CCI method uses a threshold value (e.g., \(P\) value, or MAD for scTensor) to differentiate significant CCIs from nonsignificant CCIs. This process is considered a kind of binarization (1 for significant CCIs, 0 for nonsignificant CCIs), so we evaluated how well each thresholding strategy could selectively detect the ground truch CCI by comparing the metrics below.
Fmeasure is the harmonic mean of precision and recall and is defined as follows:
Matthews Correlation Coefficient (MCC) is a special case of Pearson correlation coefficient when two variables are both binary vectors. MCC is defined as follows:
MCC is widely used for binary classification evaluation and especially known for its robustness against the class imbalance, compared with the other metrics such as accuracy, balanced accuracy, bookmaker informedness, markedness, and Fmeasure [126,127,128]. Hence, it seems that MCC is more appropriate to use than Fmeasure because the number of significant CCIs are assumed to be less than that of nonsignificant CCIs in both simulated and real empirical data.
To distinguish whether the Fmeasure and MCC values correspond to the number of detected CCIs or their selectivity in focusing only the ground truth CCIs, we also compared the positive rate (PR), false positive rate (FPR), and false negative rate (FNR) values of all the methods.
Availability and requirements
R packages

scTensor: https://bioconductor.org/packages/devel/bioc/html/scTensor.html

nnTensor: https://cran.rproject.org/web/packages/nnTensor/index.html

AnnotationHub: https://bioconductor.org/packages/release/bioc/html/AnnotationHub.html

LRBaseDbi: https://bioconductor.org/packages/release/bioc/html/LRBaseDbi.html

Operating system: Linux, Mac OS X, Windows

Programming language: R (v\(\)4.1.0 or higher), Bioconductor version (v\(\)3.14.0 or higher)

License: Artistic\(\)2.0

Any restrictions to use by nonacademics: For nonprofit use only
Snakemake workflows

scTensorexperiments (for the analyses conducted in this study): https://github.com/rikenbit/scTensorexperiments

lrbaseworkflow (for the biannual updates of LRBase): https://github.com/rikenbit/lrbaseworkflow

Operating system: Linux, Mac OS X, Windows

Programming language: Python (v\(\)3.7.8 or higher), Snakemake (v\(\)6.0.5 or higher), Singularity (v\(\)3.8.0 or higher)

License: MIT

Any restrictions to use by nonacademics: For nonprofit use only
Abbreviations
 CCIs:

Cell–cell interactions
 scRNAseq:

Singlecell RNA sequencing
 L–R:

Ligand and receptor
 CaH :

CCI as hypergraph
 CMPED:

Count per median of library size
 NTD3:

Nonnegative Tucker3 decomposition
 NTD2:

Nonnegative Tucker2 decomposition
 AUCPR:

Area under the curve of precision–recall
 FPR:

False positive rate
 FNR:

False negative rate
 MCC:

Matthews correlation coefficient
 FP:

False positive
 FN:

False negative
 GO:

Gene ontology
 MeSH:

Medical subject headings
 ORA:

Overrepresentative analysis
 DO:

Disease ontology
 NCG:

Network of cancer genes
 CMap:

Connectivity map
 FC:

Foldchange
 mESCs:

Mouse embryonic stem cells
 DEGs:

Differentially expressed genes
 HVGs:

Highly variable genes
 NMF:

Nonnegative matrix factorization
 RSS:

Residual sum of squares
 MAD:

Median absolute deviation
 SD:

Standard deviation
 AUCROC:

Area under the curve
 TPR:

True positive rate
 TP:

True positive
 TN:

True negative
 PR:

Positive rate
References
Yu Y, Elble RC. Homeostatic signaling by cellcell junctions and its dysregulation during cancer progression. J Clin Med. 2016;5(2):26.
Livshits G, Kobielak A, Fuchs E. Governing epidermal homeostasis by coupling cell–cell adhesion to integrin and growth factor signaling, proliferation, and apoptosis. PNAS. 2012;109(3):4886–91.
Chao DL, Ma L, Shen K. Transient cell–cell interactions in neural circuit formation. Nat Rev Neurosci. 2009;10:262–71.
Kasukawa T, Masumoto K, Nikaido I, Nagano M, Uno KD, Tsujino K, Hanashima C, Shigeyoshi Y, Ueda HR. Quantitative expression profile of distinct functional regions in the adult mouse brain. PLoS ONE. 2011;6:23228.
Miller JFAP, Mitchell GF. Cell to cell interaction in the immune response v. target cells for tolerance induction. J Exp Med. 1970;131(4):675–99.
Pieters T, Roy VF. Role of cell–cell adhesion complexes in embryonic stem cell biology. J Cell Sci. 2014;127:2603–13.
Tweedell KS. The adaptability of somatic stem cells: a review. J Stem Cells Regen Med. 2017;13(1):3–13.
Plaks V, Kong N, Werb Z. The cancer stem cell niche: how essential is the niche in regulating stemness of tumor cells? Cell Stem Cell. 2015;16:225–38.
Hegerfeldt Y, Tusch M, Brocker EB, Friedl P. Collective cell movement in primary melanoma explants: plasticity of cellcell interaction, beta1integrin function, and migration strategies. Cancer Res. 2002;62:2125–30.
Hofschroer V, Koch KA, Ludwig FT, Friedl P, Oberleithner H, Stock C, Schwab A. Extracellular protonation modulates cell–cell interaction mechanics and tissue invasion in human melanoma cells. Sci Rep. 2017;7:42369.
Stein JV, Gonzalez SF. Dynamic intravital imaging of cell–cell interactions in the lymph node. Mech Allerg Dis. 2016;139(1):12–20.
ReinharKing CA, Dembo M, Hammer DA. Cell–cell mechanical communication through compliant substrates. Biophys J. 2008;95:6044–51.
Dewji NN, Mukhopadhyay D, Singer SJ. An early specific cell–cell interaction occurs in the production of betaamyloid in cell cultures. PNAS. 2006;103(5):1540–5.
Konry T, Sarkar S, Sabhachandani P, Cohen N. Innovative tools and technology for analysis of single cells and cell–cell interaction. Annu Rev Biomed Eng. 2016;18:259–84.
Rothbauer M, Zirath H, Ertl P. Recent advances in microfluidic technologies for celltocell interaction studies. Lab Chip. 2018;18(2):249–70.
Li R, Lv X, Zhang X, Saeed O, Deng Y. Microfluidics for cell–cell interactions: a review. Front Chem Sci Eng. 2016;10(1):90–8.
Wiklund M, Christakou AE, Ohlin M, Iranmanesh I, Frisk T, Vanherberghen V, Onfelt B. Ultrasoundinduced cell–cell interaction studies in a multiwell microplate. Micromachines. 2014;5:27–49.
Tauriainen J, Gustafsson K, Gothlin M, Gertow J, Buggert M, Frisk TW, Karlsson AC, Uhlin M, Onfelt B. Singlecell characterization of in vitro migration and interaction dynamics of t cells expanded with il2 and il7. Front Immunol. 2015;6:196.
Merouane A, ReyVillamizar N, Lu Y, Liadi I, Romain G, Lu J, Singh H, Cooper LJN, Varadarajan N, Roysam B. Automated profiling of individual cell–cell interactions from highthroughput timelapse imaging microscopy in nanowell grids (timing). Bioinformatics. 2015;31(19):3189–97.
Espulgar W, Yamaguchi Y, Aoki W, Mita D, Saito M, Lee JK, Tamiya E. Single cell trapping and cell–cell interaction monitoring of cardiomyocytes in a designed microfluidic chip. Sens Actuators B Chem. 2015;207:43–50.
Sarkar S, Sabhachandani P, Stroopinsky D, Palmer K, Cohen N, Rosenblatt J, Avigan D, Konry T. Dynamic analysis of immune and cancer cell interactions at single cell level in microfluidic droplets. Biomicrofluidics. 2016;10(5): 054115.
Dura B, Dougan SK, Barisa M, Hoehl MM, Lo CT, Ploegh HL, Voldman J. Profiling lymphocyte interactions at the singlecell level by microfluidic cell pairing. Nat Commun. 2015;6:5940.
Ramilowski JA, Goldberg T, Harshbarger J, Kloppmann E, Lizio M, Satagopam VP, Itoh M, Kawaji H, Carninci P, Rost B, Forrest ARR. A draft network of ligandreceptormediated multicellular signalling in human. Nat Commun. 2015;22(6):7866.
Camp JG, Sekin K, Gerber T, LoefflerWirth H, Binder H, Gac M, Kanton S, Kageyama J, Damm G, Seehofer D, Belicova L, Bickle M, Barsacchi R, Okuda R, Yoshizawa E, Kimura M, Ayabe H, Taniguchi H, Takebe T, Treutlein B. Multilineage communication regulates human liver bud development from pluripotency. Nature. 2017;546(7659):533–8.
...Li L, Dong J, Yan L, Yong J, Liu X, Hu Y, Fan X, Wu X, Guo H, Wang X, Zhu X, Li R, Yan J, Wei Y, Zhao Y, Wang W, Ren Y, Yuan P, Yan Z, Hu B, Guo F, Wen L, Tang F, Qiao J. Singlecell RNAseq analysis maps development of human germline cells and gonadal niche interactions. Cell Stem Cell. 2017;20:858–73.
Zhou JX, Taramelli R, Pedrini E, Knijnenburg T, Hunag S. Extracting intercellular signaling network of cancer tissues using ligandreceptor expression patterns from wholetumor and singlecell transcriptomes. Sci Rep. 2017;7(1):8815.
Skelly DA, Squiers GT, McLellan MA, Bolisetty MT, Robson P, Rosenthal NA, Pinto AR. Singlecell transcriptional profiling reveals cellular diversity and intercommunication in the mouse heart. Cell Rep. 2018;22(3):600–10.
Pavlicev M, Wagner GP, Chavan AR, Owens K, Maziarz J, DunnFletcher C, Lallapur SG, Muglia L, Jones H. Singlecell transcriptomics of the human placenta: inferring the cell communication network of the maternalfetal interface. Genome Res. 2017;27:349–61.
Joost S, Jacob T, Sun X, Annusver K, La Manno G, Sur I, Kasper M. Singlecell transcriptomics of traced epidermal and hair follicle stem cells reveals rapid adaptations during wound healing. Cell Rep. 2018;25(3):585–97.
Kramann R, Machado F, Wu H, Kusaba T, Hoeft K, Schneider RK, Humphreys BD. Parabiosis and singlecell RNA sequencing reveal a limited contribution of monocytes to myofibroblasts in kidney fibrosis. JCI Insight. 2018;3(9):99561.
Cohen M, Giladi A, Gorki AD, Solodkin DG, Zada M, Hladik A, Miklosi A, Salame TM, Halpern KB, David E, Itzkovitz S, Harkany T, Knapp S, Amit I. Lung singlecell signaling interaction map reveals basophil role in macrophage imprinting. Cell. 2018;175(4):1031–44.
Davidson S, Efremova M, Riedel A, Mahata B, Pramanik J, Huuhtanen J, Kar G, VentoTormo R, Hagai T, Chen X, Haniffa MA, Shields JD, Teichmann SA. Singlecell RNA sequencing reveals a dynamic stromal niche within the evolving tumour microenvironment. bioRxiv. 2018. https://doi.org/10.1101/467225.
Mucenski ML, Mahoney R, Adam M, Potter AS, Potter SS. Single cell RNAseq study of wild type and hox9,10,11 mutant developing uterus. Sci Rep. 2019;9:4557.
Wu H, Uchimura K, Donnelly EL, Kirita Y, Morris SA, Humphreys BD. Comparative analysis and refinement of human PSCderived kidney organoid differentiation with singlecell transcriptomics. Cell Stem Cell. 2018;23(6):869–81.
Chen L, Lee JW, Chou CL, Nair AV, Battistone MA, Paunescu TG, Merkulova M, Breton S, Verlander JW, Wall SM, Brown D, Burg MB, Knepper MA. Transcriptomes of major renal collecting duct cell types in mouse identified by singlecell RNAseq. PNAS. 2017;114(46):9989–98.
Menon R, Otto EA, Kokoruda A, Zhou J, Zhang Z, Yoon E, Chen YC, Troyanskaya O, Spence JR, Kretzler M, Cebrian C. Singlecell analysis of progenitor cell dynamics and lineage specification in the human fetal kidney. Development. 2018;145(16):dev164038.
VentoTormo R, Efremova M, Botting RA, Turco MY, VentoTormo M, Meyer KB, Park JE, Stephenson E, Polanski K, Goncalves A, Gardner L, Holmqvist S, Henriksson J, Zou A, Sharkey AM, Millar B, Innes B, Wood L, WilbreyClark A, Payne RP, Ivarsson MA, Lisgo S, Filby A, Rowitch DH, Bulmer JN, Wright GJ, Stubbington MJT, Haniffa M, Moffett A, Teichmann SA. Singlecell reconstruction of the early maternalfetal interface in humans. Nature. 2018;563(7731):347–53.
Biton M, Haber AL, Rogel N, Burgin G, Beyaz S, Schnell A, Ashenberg O, Su CW, Smillie C, Shekhar K, Chen Z, Wu C, OrdovasMontanes J, Alvarez D, Herbst RH, Zhang M, Tirosh I, Dionne D, Nguyen LT, Xifaras ME, Shalek AK, von Andrian UH, Graham DB, RozenblattRosen O, Shi HN, Kuchroo V, Yilmaz OH, Regev A, Xavier RJ. T helper cell cytokines modulate intestinal stem cell renewal and differentiation. Cell. 2018;175(5):1307–20.
Kumar MP, Du J, Lagoudas G, Jiao Y, Sawyer A, Drummond DC, Lauffenburger DA, Raue A. Analysis of singlecell RNAseq identifies cell–cell communication associated with tumor characteristics. Cell Rep. 2018;25(6):1458–68.
Verma M, Asakura Y, Murakonda BSR, Pengo T, Latroche C, Chazaud B, McLoon LK, Asakura A. Muscle satellite cell crosstalk with a vascular niche maintains quiescence via VEGF and notch signaling. Cell Stem Cell. 2018;23(4):530–43.
...JerbyArnon L, Shah P, Cuoco MS, Rodman C, Su MJ, Melms JC, Leeson R, Kanodia A, Mei S, Lin JR, Wang S, Rabasha B, Liu D, Zhang G, Margolais C, Ashenberg O, Ott PA, Buchbinder EI, Haq R, Hodi FS, Boland GM, Sullivan RJ, Frederick DT, Miao B, Moll T, Flaherty KT, Herlyn M, Jenkins RW, Thummalapalli R, Kowalczyk MS, Canadas I, Schilling B, Cartwright ANR, Luoma AM, Malu S, Hwu P, Bernatchez C, Forget MA, Barbie DA, Shalek AK, Tirosh I, Sorger PK, Wucherpfennig K, Van Allen EM, Schadendorf D, Johnson BE, Rotem A, RozenblattRosen O, Garraway LA, Yoon CH, Izar B, Regev A. A cancer cell program promotes T cell exclusion and resistance to checkpoint blockade. Cell. 2018;175(4):984–97.
Kelleher AM, MilanoFoster J, Behura SK, Spencer TE. Uterine glands coordinate ontime embryo implantation and impact endometrial decidualization for pregnancy success. Nat Commun. 2018;9(1):2435.
...Yin J, Li Z, Yan C, Fang E, Wang T, Zhou H, Luo W, Zhou Q, Zhang J, Hu J, Jin H, Wang L, Zhao X, Li J, Qi X, Zhou W, Huang C, He C, Yang H, Kristiansen K, Hou Y, Zhu S, Zhou D, Wang L, Dean M, Wu K, Hu H, Li G. Comprehensive analysis of immune evasion in breast cancer by singlecell RNAseq. bioRxiv. 2018. https://doi.org/10.1101/368605.
Biase FH, Kimble KM. Functional signaling and gene regulatory networks between the oocyte and the surrounding cumulus cells. BMC Genom. 2018;19(1):351.
...Thorsson V, Gibbs DL, Brown SD, Wolf D, Bortone DS, Ou Yang TH, PortaPardo E, Gao GF, Plaisier CL, Eddy JA, Ziv E, Culhane AC, Paull EO, Sivakumar IKA, Gentles AJ, Malhotra R, Farshidfar F, Colaprico A, Parker JS, Mose LE, Vo NS, Liu J, Liu Y, Rader J, Dhankani V, Reynolds SM, Bowlby R, Califano A, Cherniack AD, Anastassiou D, Bedognetti D, Rao A, Chen K, Krasnitz A, Hu H, Malta TM, Noushmehr H, Pedamallu CS, Bullman S, Ojesina AI, Lamb A, Zhou W, Shen H, Choueiri TK, Weinstein JN, Guinney J, Saltz J, Holt RA, Rabkin CE, Network CGAR, Lazar AJ, Serody JS, Demicco EG, Disis ML, Vincent BG, Shmulevich L. The immune landscape of cancer. Immunity. 2018;48(4):812–30.
Han X, Chen H, Huang D, Chen H, Fei L, Cheng C, Huang H, Yuan GC, Guo G. Mapping human pluripotent stem cell differentiation pathways using high throughput singlecell RNAsequencing open access. BMC Genome Biol. 2018;19(1):47.
Costa A, Kieffer Y, ScholerDahirel A, Pelon F, Bourachot B, Cardon M, Sirven P, Magagna I, Fuhrmann L, Bernard C, Bonneau C, Kondratova M, Kuperstein I, Zinovyev A, Givel AM, Parrini MC, Soumelis V, VincentSalomon A, MechtaGrigoriou F. Fibroblast heterogeneity and immunosuppressive environment in human breast cancer. Cancer Cells. 2018;33(3):463–79.
Hrvatin S, Hochbaum DR, Nagy MA, Cicconet M, Robertson K, Cheadle L, Zilionis R, Ratner A, BorgesMonroy R, Klein AM, Sabatini BL, Greenberg ME. Singlecell analysis of experiencedependent transcriptomic states in the mouse visual cortex. Nat Neurosci. 2018;21(1):120–9.
Suryawanshi H, Morozov P, Straus A, Sahasrabudhe N, Max KEA, Garzia A, Kustagi M, Tuschl T, Williams Z. A singlecell survey of the human firsttrimester placenta and decidua. Sci Adv. 2018;4(10):4788.
Puram SV, Tirosh I, Parikh AS, Patel AP, Yizhak K, Gillespie S, Rodman C, Luo CL, Mroz EA, Emerick KS, Deschler DG, Varvares MA, Mylvaganam R, RozenblattRosen O, Rocco JW, Faquin WC, Lin DT, Regev A, Bernstein BE. Singlecell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer. Cell. 2017;171(7):1611–24.
Ximerakis M, Lipnick SL, Simmons SK, Adiconis X, Innes BT, Dionne D, Nguyen L, Mayweather BA, Ozek C, Niziolek Z, Butty VL, Isserlin R, Buchanan SM, Levine SR, Regev A, Bader GD, Levin JZ, Rubin LL. Singlecell transcriptomics of the aged mouse brain reveals convergent, divergent and unique aging signatures. bioRxiv. 2018. https://doi.org/10.1101/440032.
Sivakamasundari V, Bolisetty M, Sivajothi S, Bessonett S, Ruan D, Robson P. Comprehensive cell type specific transcriptomics of the kidney. bioRxiv. 2017. https://doi.org/10.1101/238063.
Peters AL, Luo Z, Li J, Mourya R, Wang Y, Dexheimer P, Shivakumar P, Aronow B, Bezerra JA. Single cell RNA sequencing reveals regional heterogeneity of hepatobiliary innate lymphoid cells in a tissueenriched fashion. PLoS ONE. 2019;14(4):e0215481.
Puram SV, Tirosh I, Parikh AS, Patel AP, Yizhak K, Gillespie S, Rodman C, Luo CL, Mroz EA, Emerick KS, Deschler DG, Varvares MA, Mylvaganam R, RozenblattRosen O, Rocco JW, Faquin WC, Lin DT, Regev A, Bernstein BE. Singlecell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer. Cell. 2017;171(7):1611–24.
Boisset JC, Vivie J, Murano MJ, Lyubimova A, van Oudenaarden A. Mapping the physical network of cellular interactions. Nat Methods. 2018;15:547–53.
...Zheng GX, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, Ziraldo SB, Wheeler TD, McDermott GP, Zhu J, Gregory MT, Shuga J, Montesclaros L, Underwood JG, Masquelier DA, Nishimura SY, SchnallLevin M, Wyatt PW, Hindson CM, Bharadwaj R, Wong A, Ness KD, Beppu LW, Deeg HJ, McFarland C, Loeb KR, Valente WJ, Ericson NG, Stevens EA, Radich JP, Mikkelsen TS, Hindson BJ, Bielas JH. Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017;8:14049.
van Dijk D, Sharma R, Nainys J, Yim K, Kathail P, Carr AJ, Burdziak C, Moon KR, Chaffer CL, Pattabiraman D, Bierie B, Mazutis L, Wolf G, Krishnaswamy S, Pe’er D. Recovering gene interactions from singlecell data using data diffusion. Cell. 2018;174(3):716–629.
Shekhar K, Lapan SW, Whitney IE, Tran NM, Macosko EZ, Kowalczyk M, Adiconis X, Levin JZ, Nemesh J, Goldman M, McCarroll SA, Cepko CL, Regev A, Sanes JR. Comprehensive classification of retinal bipolar neurons by singlecell transcriptomics. Cell. 2016;166(5):1308–23.
Kim YD, Choi S. Nonnegative tucker decomposition. In: IEEE conference on computer vision and pattern recognition (2007)
Cichocki A, Zdunek R, Phan AH, Amari S. Nonnegative matrix and tensor factorizations: applications to exploratory multiway data analysis and blind source separation. New Jersey: Wiley; 2009.
Dries R, Zhu Q, Dong R, Eng CHL, Li H, Liu K, Fu Y, Zhao T, Sarkar A, Bao F, George RE, Pierson N, Cai L, Yuan GC. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. BMC Genome Biol. 2021;22(78):1–31. https://doi.org/10.1186/s13059021022862.
Nagai JS, Leimkühler NB, Schaub MT, Schneider RK, Costa IG. Crosstalker: analysis and visualization of ligandreceptor networks. Bioinformatics. 2021;37(22):4263–5. https://doi.org/10.1093/bioinformatics/btab370.
Palla G, Spitzer H, Klein M, Fischer D, Schaar AC, Kuemmerle LB, Rybakov S, Ibarra IL, Holmberg O, Virshup I, Lotfollahi M, Richter S, Theis FJ. Squidpy: a scalable framework for spatial single cell analysis. bioRxiv. 2021. https://doi.org/10.1101/2021.02.19.431994.
Hou R, Denisenko E, Ong HT, Ramilowski JA, Forrest ARR. Predicting celltocell communication networks using NATMI. Nat Commun. 2020;11:5011. https://doi.org/10.1038/s4146702018873z.
Jung S, Singh K, Sol A. Funres: resolving tissuespecific functional cell states based on a cell–cell communication network model. Brief Bioinform. 2021;22(4):283. https://doi.org/10.1093/bib/bbaa283.
Noël F, MassenetRegad L, CarmiLevy I, Cappuccio A, Grandclaudon M, Trichot C, Kieffer Y, MechtaGrigoriou F, Soumelis V. Dissection of intercellular communication using the transcriptomebased framework ICELLNET. Nat Commun. 2021;12:1089. https://doi.org/10.1038/s4146702121244x.
Li D, Velazquez JJ, Ding J, Hislop J, Ebrahimkhani MR, BarJoseph Z. Inferring cell–cell interactions from pseudotime ordering of scRNASeq data. bioRxiv. 2021. https://doi.org/10.1101/2021.07.28.454054.
Halpern KB, Shenhav R, Massalha H, Toth B, Egozi A, Massasa EE, Medgalia C, David E, Giladi A, Moor AE, Porat Z, Amit I, Itzkovitz S. Pairedcell sequencing enables spatial gene expression mapping of liver endothelial cells. Nat Biotechnol. 2018;36(10):962–70. https://doi.org/10.1038/nbt.4231.
CabelloAguilar S, Alame M, KonSunTack F, Fau C, Lacroix M, Colinge J. Singlecellsignalr: inference of intercellular networks from singlecell transcriptomics. Nucleic Acids Res. 2020;48(10):55. https://doi.org/10.1093/nar/gkaa183.
Shao X, Liao J, Li C, Lu X, Cheng J, Fan X. Celltalkdb: a manually curated database of ligandreceptor interactions in humans and mice. Brief Bioinform. 2021;22(4):269. https://doi.org/10.1093/bib/bbaa269.
Armingol E, Baghdassarian HM, Martino C, PerezLopez A, Knight R, Lewis NE. Contextaware deconvolution of cell–cell communication with tensorcell2cell. Nat Commun. 2022;13:3665. https://doi.org/10.1038/s41467022313692.
Dimitrov D, Türei D, GarridoRodriguez M, Burmedi PL, Nagai JS, Boys C, Ramirez Flores RO, Kim H, Szalai B, Costa IG, Valdeolivas A, Dugourd A, SaezRodriguez J. Comparison of methods and resources for cell–cell communication inference from singlecell RNAseq data. Nat Commun. 2022;12(1):3224. https://doi.org/10.1038/s41467022307550.
Baghdassarian H, Dimitrov D, Armingol E, SaezRodriguez J, Lewis NE. Combining liana and tensorcell2cell to decipher cell–cell communication across multiple samples. bioRxiv. 2023. https://doi.org/10.1101/2023.04.28.538731.
Tsuyuzaki KT, Ishii M, Nikaido I. nnTensor: an R package for nonnegative matrix/tensor decomposition. JOSS. 2023;8:5015.
Durinck S, Spellman P, Birney E, Huber W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRT. Nat Protocols. 2009;4:1184–91.
...Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, Haw R, Jassal B, Korninger F, May B, Milacic M, Roca CD, Rothfels K, Sevilla C, Shamovsky V, Shorser S, Varusai T, Viteri G, Weiser J, Wu G, Stein L, Hermjakob H, D’Eustachio P. The reactome pathway knowledgebase. Nucleic Acids Res. 2018;46(D1):649–55.
Tsuyuzaki K, Morota G, Ishii M, Nakazato T, Miyazaki S, Nikaido I. MeSH ORA framework: R/Bioconductor packages to support MeSH overrepresentation analysis. BMC Bioinform. 2015;16(45):1–17.
Falcon S, Gentleman R. Using GOstats to test gene lists for GO term association. Bioinformatics. 2007;23(2):257–8.
Yu G, He Q. ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Mol BioSyst. 2016;12(12):477–9.
Yu G, Wang L, Yan G, He Q. DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics. 2015;31(4):608–9.
Ono H, Ogasawara O, Okubo K, Bono H. Refex, a reference gene expression dataset as a web tool for the functional analysis of genes. Sci Data. 2017;4: 170105.
...Papatheodorou I, Fonseca NA, Keays M, Tang YA, Barrera E, Bazant W, Burke M, Fullgrabe A, Fuentes AM, George N, Huerta L, Koskinen S, Mohammed S, Geniza M, Preece J, Jaiswal P, Jarnuczak AF, Huber W, Stegle O, Vizcaino JA, Brazma A, Petryszak R. Expression atlas: gene and protein expression across multiple studies and organisms. Nucleic Acids Res. 2018;46(Database issue):246–51.
...Papatheodorou I, Moreno P, Manning J, Fuentes AMP, George N, Fexova S, Fonseca NA, Füllgrabe A, Green M, Huang N, Huerta L, Iqbal H, Jianu M, Mohammed S, Zhao L, Jarnuczak AF, Jupp S, Marioni J, Meyer K, Petryszak R, Medina CAP, TalaveraLópez C, Teichmann S, Vizcaino JA, Brazma A. Expression atlas update: from tissues to single cells. Nucleic Acids Res. 2019;48:947.
Cao Y, Zhu J, Jia P, Zhao Z. scRNASeqDB: a database for RNASeq based gene expression profiles in human single cells. Genes (Basel). 2017;8(12):368.
Franzén O, Gan LM, Björkegren JLM. Panglaodb: a web server for exploration of mouse and human singlecell RNA sequencing data. Database 2019, 046
...Carithers LJ, Ardlie K, Barcus M, Branton PA, Britton A, Buia SA, Compton CC, DeLuca DS, PeterDemchok J, Gelfand ET, Guan P, Korzeniewski GE, Lockhart NC, Rabiner CA, Rao AK, Robinson KL, Roche NV, Sawyer SJ, Segre AV, Shive CE, Smith AM, Sobin LH, Undale AH, Valentino KM, Vaught J, Young TR, Moore HM, Consortium G. A novel approach to highquality postmortem tissue procurement: the GTEx project. Biopreservation Biobanking. 2015;13(5):311–9.
RIKEN: Collection  29 August 2017 The FANTOM5 project. https://www.nature.com/collections/jcxddjndxy
Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, Kellis M, Marra MA, Beaudet AL, Ecker JR, Farnham PJ, Hirst M, Lander ES, Mikkelsen TS, Thomson JA. The NIH roadmap epigenomics mapping consortium. Nat Biotechnol. 2010;28(10):1045–8.
Consotium EP. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57–74.
...Uhlen M, Fagerberg L, Hallstrom BM, Lindskog C, Oksvold P, Mardinoglu A, Sivertsson A, Kampf C, Sjostedt E, Asplund A, Olsson I, Edlund K, Lundberg E, Navani S, Szigyarto CA, Odeberg J, Djureinovic D, Takanen JO, Hober S, Alm T, Edqvist PH, Berling H, Tegel H, Mulder J, Rockberg J, Nilsson P, Schwenk JM, Hamsten M, von Feilitzen K, Forsberg M, Persson L, Johansson F, Zwahlen M, von Heijne G, Nielsen JFP. Proteomics tissuebased map of the human proteome. Science. 2015;347(6220):1260419.
Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, Lerner J, Brunet JP, Subramanian A, Ross KN, Reich M, Hieronymus H, Wei G, Armstrong SA, Haggarty SJ, Clemons PA, Wei R, Carr SA, Lander ES, Golub TR. The connectivity map: using geneexpression signatures to connect small molecules, genes, and disease. Science. 2006;313(5795):1929–35.
Wang Y, Tung HY, Smola A, Anandkumar A. Fast and guaranteed tensor decomposition via sketching. In: NIPS, vol 1, pp 991–999 (2015)
Maehara, T., Hayashi, K., Kawarabayashi, K.: Expected tensor decomposition with stochastic gradient descent. In: AAAI’16, pp 1919–1925 (2016)
Smith, S., Park, J., Karypis, G.: An exploration of optimization algorithms for high performance tensor completion. In: SC ’16 proceedings of the international conference for high performance computing, networking, storage and analysis, vol 31 (2016)
Shin K, Sael L, Kang U. Fully scalable methods for distributed tensor factorization. IEEE Trans Knowl Data Eng. 2017;29(1):100–13.
Tsuyuzaki K. Guidelines to handle largescale and complex tensor data in r. BioC Asia 2021 (2021). https://doi.org/10.7490/f1000research.1118833.1
DelayedArray: A Unified framework for working transparently with ondisk and inmemory arraylike datasets. R Package Version 0.24.0. https://bioconductor.org/packages/DelayedArray
Tsuyuzaki, K., Nikaido, I.: Biological systems as heterogeneous information networks: a minireview and perspectives. HeteroNAM’18 (2018)
Baccin C, AlSabah J, Velten L, Helbling PM, Grünschläger F, HernándezMalmierca P, NombelaArrieta C, Steinmetz LM, Trumpp A, Haas S. Combined singlecell and spatial transcriptomics reveal the molecular, cellular and spatial bone marrow niche organization. Nat Cell Biol. 2020;22(1):38–48. https://doi.org/10.1038/s4155601904396.
Lagger C, Ursu E, Equey A, Avelar RA, Pisco AO, Tacutu R, Magalhães JP. scAgeCom: a murine atlas of agerelated changes in intercellular communication inferred with the package scDiffCom. bioRxiv. 2021. https://doi.org/10.1101/2021.08.13.456238.
Yuan Y, Cosme C, Adams TS, Schupp J, Sakamoto K, Xylourgidis N, Ruffalo M, Kaminski N, BarJoseph Z. Cins: cell interaction network inference from single cell expression data. bioRxiv. 2021. https://doi.org/10.1101/2021.02.22.432206.
Jin S, GuerreroJuarez CF, Zhang L, Chang I, Ramos R, Kuan CH, Myung P, Plikus MV, Nie Q. Inference and analysis of cell–cell communication using cellchat. Nat Commun. 2021;12:1088. https://doi.org/10.1038/s41467021212469.
Raredon MSB, Yang J, Garritano J, Wang M, Kushnir D, Schupp JC, Adams TS, Greaney AM, Leiby KL, Kaminski N, Kluger Y, Levchenko A, Le N. Connectome: computation and visualization of cell–cell signaling topologies in singlecell systems data. Sci Rep. 2022;12(1):4187. https://doi.org/10.1038/s4159802207959x.
Solovey M, Scialdone A. Comunet: a tool to explore and visualize intercellular communication. Bioinformatics. 2020;36(15):4296–300. https://doi.org/10.1093/bioinformatics/btaa482.
Cillo AR, Kürten CHL, Tabib T, Qi Z, Onkar S, Wang T, Liu A, Duvvuri U, Kim S, Soose RJ, Oesterreich S, Chen W, Lafyatis R, Bruno TC, Ferris RL, Vignali DAA. Immune landscape of viral and carcinogendriven head and neck cancer. Immunity. 2020;52(1):183–1999. https://doi.org/10.1016/j.immuni.2019.11.014.
Yuanxin W, Ruiping W, Shaojun Z, Shumei S, Changying J, Guangchun H, Michael W, Jaffer A, Andy F, Wang L. iTALK: an R package to characterize and illustrate intercellular communication. Nature. 2019. https://doi.org/10.1101/507871.
Hu Y, Peng T, Gao L, Tan K. CytoTalk: De novo construction of signal transduction networks using singlecell transcriptomic data. Sci Adv. 2021;7(16):1356. https://doi.org/10.1126/sciadv.abf1356.
Browaeys R, Saelens W, Sayes Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Method. 2019;17(2):159–62. https://doi.org/10.1038/s4159201906675.
Zhang Y, Liu T, Hu X, Wang M, Wang J, Zou B, Tan P, Cui T, Dou Y, Ning L, Huang Y, Rao S, Wang D, Zhao X. Cellcall: integrating paired ligand–receptor and transcription factor activities for cell–cell communication. Nucleic Acids Res. 2021;49(15):8520–34. https://doi.org/10.1093/nar/gkab638.
Yu A, Li Y, Li I, Yeh C, Chiou AE, Ozawa MG, Taylor J, Plevritis SK. Reconstructing codependent cellular crosstalk in lung adenocarcinoma using REMI. bioRxiv. 2021. https://doi.org/10.1101/2021.05.02.440071.
Pham D, Tan X, Xu J, Grice LF, Lam PY, Raghubar A, Vukovic J, Ruitenberg MJ, Nguyen Q. stLearn: integrating spatial location, tissue morphology and gene expression to find cell types, cell–cell interactions and spatial trajectories within undissociated tissues. bioRxiv. 2021. https://doi.org/10.1101/2020.05.31.125658.
Wang S, MacLean A, Nie Q. SoptSC: similarity matrix optimization for clustering, lineage, and signaling inference. bioRxiv. 2018. https://doi.org/10.1101/168922.
Liu Z, Sun D, Wang C. Evaluation of cell–cell interaction methods by integrating singlecell RNA sequencing data with spatial information. BMC Genome Biol. 2022;23(1):218. https://doi.org/10.1186/s1305902202783y.
Yokota T, Zhao Q, Cichocki A. Smooth PARAFAC decomposition for tensor completion. IEEE Trans Signal Process. 2016;64(20):5423–36. https://doi.org/10.1109/TSP.2016.2586759.
Ge H, Caverlee J, Zhang N, Squicciarini A. Uncovering the spatiotemporal dynamics of memes in the presence of incomplete information. In: Proceedings of CIKM’16, pp 1493–1502 (2016). https://doi.org/10.1145/2983323.2983782
Park, J.Y., Carr, K.T., Zheng, S., Yue, Y., Yu, R.: Multiresolution tensor learning for efficient and interpretable spatial analysis. In: Proceedings of ICML2020 695, pp 7499–7509 (2020)
Sasagawa Y, Nikaido I, Hayashi T, Danno H, Uno KD, Imai T, Ueda HR. QuartzSeq: a highly reproducible and sensitive singlecell RNA sequencing method, reveals nongenetic geneexpression heterogeneity. BMC Genome Biol. 2013;14(4):31.
Pierson E, Yau C. Zifa: dimensionality reduction for zeroinflated singlecell gene expression analysis. BMC Genome Biol. 2015;16(241):1–10.
Graeber TG, Eisenberg D. Bioinformatic identification of potential autocrine signaling loops in cancers from gene expression profiles. Nat Genet. 2001;29(3):295–300.
Harding SD, Sharman JL, Faccenda E, Southan C, Pawson AJ, Ireland S, Gray AJG, Bruce L, Alexander SPH, Anderton S, Bryant C, Davenport AP, Doerig C, Fabbro D, LeviSchaffer F, Spedding M, Davies NI. The IUPHAR/BPS guide to pharmacology in 2018: updates and expansion to encompass the new guide to immunopharmacology. Nucleic Acids Res. 2018;46(D1):1091–106.
BenShlomo I, Yu Hsu S, Rauch R, Kowalski HW, Hsueh AJ. Signaling receptome: a genomic and evolutionary perspective of plasma membrane receptors involved in signal transduction. Sci Signal. 2003;2003(187):9.
Hutchins LN, Murphy SM, Singh P, Graber JH. Positiondependent motif characterization using nonnegative matrix factorization. Bioinformatics. 2008;24(23):2684–90. https://doi.org/10.1093/bioinformatics/btn526.
Chicco D. Ten quick tips for machine learning in computational biology. BMC BioData Min. 2017;10:35. https://doi.org/10.1186/s1304001701553.
Saito T, Rehmsmeier M. The precisionrecall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE. 2015;10(3):0118432. https://doi.org/10.1371/journal.pone.0118432.
Davis J, Goadrich M. The relationship between precision–recall and roc curves. In: Proceedings of ICML’06, pp 233–240 (2006). https://doi.org/10.1145/1143844.1143874
Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over f1 score and accuracy in binary classification evaluation. BMC Genom. 2020;21(6):1–13. https://doi.org/10.1186/s1286401964137.
Boughorbel S, Jarray F, ElAnbari M. Optimal classifier for imbalanced data using Matthews correlation coefficient metric. PLoS ONE. 2017;12(6):0177678. https://doi.org/10.1371/journal.pone.0177678.
Chicco D, Tötsch N, Jurman G. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in twoclass confusion matrix evaluation. BMC BioData Min. 2021;14(13):1–22. https://doi.org/10.1186/s1304002100244z.
Acknowledgements
Some cell images used in figures are presented by Â© 2016 DBCLS TogoTV. We thank Mr. Akihiro Matsushima for their assistance with the IT infrastructure for the data analysis. We are also grateful to all members of the Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research for their helpful advice.
Funding
This work was supported by MEXT KAKENHI Grant Number 16K16152 to KT. This work was partially supported by JST CREST Grant Numbers JPMJCR16G3 and JPMJCR1926 to IN.
Author information
Authors and Affiliations
Contributions
KT and IN designed the study. KT designed the algorithm and benchmark test. KT retrieved and preprocessed the test data to evaluate the proposed method, implemented the source code, and performed all analyses. MI implemented the pipeline for biannual automatic updates of the R/Bioconductor packages. All authors have read and approved the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interest
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1
. List of existing L–R scoring and CCI detection methods.
Additional file 2
. Ground truth CCIs in simulated datasets.
Additional file 3
. AUCROC values of all methods.
Additional file 4
. AUCPR values of all methods.
Additional file 5
. Memory values of all methods.
Additional file 6
. Computational time values of all methods.
Additional file 7
. Fmeasure values of all binarization methods.
Additional file 8
. MCC values of all binarization methods.
Additional file 9
. PR values of all binarization methods.
Additional file 10
. FPR values of all binarization methods.
Additional file 11
. FNR values of all binarization methods.
Additional file 12
. TR of all datasets.
Additional file 13
. Three LR pairs in which each of the three methods excelled.
Additional file 14
. HTML report of FetalKidney.
Additional file 15
. HTML report of GermlineFemale.
Additional file 16
. HTML report of HeadandNeckCancer.
Additional file 17
. HTML report of Uterus.
Additional file 18
. HTML report of VisualCortex.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Tsuyuzaki, K., Ishii, M. & Nikaido, I. Sctensor detects manytomany cell–cell interactions from single cell RNAsequencing data. BMC Bioinformatics 24, 420 (2023). https://doi.org/10.1186/s1285902305490y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1285902305490y