 Research
 Open Access
 Published:
Kernelized multiview signed graph learning for singlecell RNA sequencing data
BMC Bioinformatics volume 24, Article number: 127 (2023)
Abstract
Background
Characterizing the topology of gene regulatory networks (GRNs) is a fundamental problem in systems biology. The advent of single cell technologies has made it possible to construct GRNs at finer resolutions than bulk and microarray datasets. However, cellular heterogeneity and sparsity of the single cell datasets render void the application of regular Gaussian assumptions for constructing GRNs. Additionally, most GRN reconstruction approaches estimate a single network for the entire data. This could cause potential loss of information when single cell datasets are generated from multiple treatment conditions/disease states.
Results
To better characterize single cell GRNs under different but related conditions, we propose the joint estimation of multiple networks using multiple signed graph learning (scMSGL). The proposed method is based on recently developed graph signal processing (GSP) based graph learning, where GRNs and gene expressions are modeled as signed graphs and graph signals, respectively. scMSGL learns multiple GRNs by optimizing the total variation of gene expressions with respect to GRNs while ensuring that the learned GRNs are similar to each other through regularization with respect to a learned signed consensus graph. We further kernelize scMSGL with the kernel selected to suit the structure of single cell data.
Conclusions
scMSGL is shown to have superior performance over existing state of the art methods in GRN recovery on simulated datasets. Furthermore, scMSGL successfully identifies wellestablished regulators in a mouse embryonic stem cell differentiation study and a cancer clinical study of medulloblastoma.
Background
Gene expression arises from a network of regulatory interactions between transcription factors, cofactors and signaling molecules [1, 2]. Elucidating the topology of this underlying transcriptomic network is essential for understanding the mechanisms that govern complex biological processes in human physiology and pathology. Identifying the differences in transcriptional regulation between normal and disease states helps in revealing the specific biological and biochemical pathways relevant to disease mechanisms and progression [3, 4].
A major focus area in clinical research lies in studying the changes in gene coexpression networks across different tissues, cell types/states, and conditions. For example, in the extensively studied breast cancer datasets from the cancer genome atlas, there are four main subtypes of breast cancer [5]. The variation between these subtypes holds the key to inferring how genes transcriptionally regulate each other and how their expressions and interactions change across subgroups. In addition one would expect the gene relationships corresponding to different subtypes to be similar to each other since they originate in the same tissue, but also posses crucial differences since they are in different stages of disease progression [6,7,8]. Thus, instead of estimating a single network for all the subtypes, constructing classspecific graphical models for different conditions will provide a more robust and deeper understanding of groupspecific characteristics.
Recent advances in next generation sequencing technologies have made it possible to profile the transcriptomes of individual cells, hence capturing expressions of thousands of genes at a cellular resolution. Dozens of algorithms have been proposed for the reconstruction of gene regulatory networks from single cell RNA sequencing (scRNAseq) datasets [9, 10]. This has further enabled novel insights into the transcriptional regulation underlying various biological processes, including cancer progression [11] and embryonic development [12]. Most of these algorithms, however estimate a single gene regulatory network, assuming the data samples to be identically and independently distributed; hence ignoring the presence of natural subgroups that may be present within the data. Given the assumption of a grouped dataset, one should be able to apply these algorithms to estimate networks from each subgroup separately; but this procedure of independent groupwise network estimation will fail to model the shared structures between the subgroups, eventually leading to information loss. Therefore, there is a pressing need to develop joint graph estimation models that would allow information borrowing across subgroups while retaining subgroup specific heterogenity.
Multiple algorithms have been proposed for joint estimation of networks from high dimensional data. Most of these methods assume that the data has a Gaussian distribution. Seminal papers by [6, 13] paved the way for penalized estimation of multiple Gaussian graphical models, and demonstrated the use of lasso based penalty functions for better estimation across multiple groups. Later, Danaher et al. [6] proposed the fused graphical lasso and the group graphical lasso penalties for better estimation. These methods however are not directly applicable to single cell datasets. Despite many advantages, scRNAseq datasets are undermined by a series of technical limitations, such as dropout events (expressed genes undetected by scRNAseq) and a high level of noise, which renders void the assumption of gaussianity [14,15,16]. Few methods have been proposed for joint estimation of multiple networks from scRNAseq datasets. Mukherjee et al. [17] developed PIPER, a penalized local Poisson graphical model [18] for joint estimation of multiple networks in scRNAseq datasets. One of the main limitations of PIPER is that the Poisson distribution has one single parameter characterizing both the mean and the standard deviation. Single cell datasets would be better characterized by a negative binomial distribution which has a separate dispersion parameter or a zero inflated negative binomial distribution which could account for the excessive zeroes. To account for the nonGaussian nature of the scRNAseq datasets, Wu et al. [19] proposed a modification of the joint Gaussian copula graphical model based on the Gaussian copula transformation proposed in [20]. To facilitate estimation of Kendall’s \(\tau\) correlation matrix in the presence of dropouts they propose a modified Kendall’s \(\tau\) metric that only utilizes the completely observed values, and excludes the missing values. Dong et al. [21] proposed a three step hybrid joint estimation strategy that relies on (a) integrated application of a Bayesian zero inflated Poisson based model imputation strategy and single cell imputation technique McImpute [22, 23], (b) data Gaussianization [24] and eventually (c) joint estimation of a Gaussian graphical model [6]. Contrary to [17], the last two proposed approaches estimate graphical models for continuous data and rely on a data transformation step for making the data continuous.
Recent work in graph signal processing (GSP) extends classical signal processing concepts to data defined on nodes of a graph, i.e. graph signals [25]. GSP based graph learning (GL) approaches infer the graph structure from the observed graph signals based on assumptions made about the relation between the signals and the unknown graph [26]. Since graph signals are represented explicitly in the graph frequency domain, GSP based GL has more flexibility in modeling signals compared to previous network inference methods, such as statistical models reviewed above for GRN inference. Therefore, in this work, we focus on GSP based GL for the joint inference of multiple GRNs, where gene expressions from cells are considered as graph signals on the unknown GRNs. Existing GL algorithms [27,28,29,30] have two important shortcomings for multiple GRN learning. First, they cannot learn signed graphs, which is a more suitable model for GRNs as they include activating and inhibitory edges. Second, with the exception of [30], they can only learn a single graph. Thus, they are not applicable to the joint inference of multiple GRNs problem.
In this paper, we present a multiple signed graph learning algorithm (scMSGL) for joint inference of GRNs from multiple classes (conditions/disease states). Based on the method developed in [31], scMSGL learns multiple GRNs by deriving an optimization problem using three assumptions: (i) expressions of genes connected with activating edges are similar to each other, (ii) expressions of genes connected with inhibitory edges are dissimilar to each other, and (iii) GRNs corresponding to the different datasets are related to each other. Thus, scMSGL optimizes the total variation of graph signals to learn signed graphs while ensuring that the learned signed graphs are similar to each other through regularization with respect to a learned signed consensus graph. The proposed method has several advantages over existing approaches. First, it performs joint GRN inference taking advantage of the shared information across datasets while not making any specific parametric assumptions about the data. Second, during application to single cell data, scMSGL is kernelized as in [31] to take the structure of scRNAseq data into account. For instance, it can employ proportionality measures to reflect relative rather than absolute abundance or zeroinflated Kendall’s tau to handle dropouts [32]. Finally, the proposed method learns an additional consensus graph, which captures the common structure across all graphs.
Methods
Graphs
A weighted undirected graph can be denoted as \(G=(V, E, {\textbf {W}})\) where V is the node set with \(V=n\), E is the edge set and \({\textbf {W}}\) is the adjacency matrix with \(W_{ij}\) the weight of the edge between nodes i and j. G is an unsigned graph, if edge weights are constrained to only positive values. Combinatorial Laplacian matrix of the unsigned graph G is \({\textbf {L}}= {\textbf {D}} {\textbf {W}}\) where \({\textbf {D}}\) is the diagonal matrix with node degrees, i.e. \(D_{ii} = \sum _{j=1}^n W_{ij}\). The Laplacian matrix is positive semidefinite, thus its eigendecomposition is \({\textbf {L}}= {\textbf {V}}\varvec{\Lambda }{\textbf {V}}^\top\) where \(\varvec{\Lambda }\) is the diagonal matrix of eigenvalues. We assume the eigenvalues of \({\textbf {L}}\) are ordered such that \(0 = \Lambda _{11} \le \Lambda _{22} \le \dots \le \Lambda _{nn}\).
If the edge weights are allowed to take on negative values, G is a signed graph. A signed graph G can be decomposed into two unsigned graphs, \({G}^{{+}} = (V, {E}^{{+}}, {{\textbf {W}}}^{{+}})\) and \({G}^{{}} = (V, {E}^{{}}, {{\textbf {W}}}^{{}})\), where \({W}^{{+}}_{ij} = W_{ij}\) (\({W}^{{}}_{ij} = W_{ij}\)) if \(W_{ij} > 0\) (\(W_{ij} < 0\)), and 0, otherwise.
Graph signals
A graph signal over an unsigned graph G is a function \(x: V \rightarrow \mathbb {R}\) and can be represented by a vector \({\textbf {x}}\in \mathbb {R}^n\) where each \(x_i\) is the signal value on node i. Graph Fourier transform (GFT) of \({\textbf {x}}\) can be defined using the spectrum of \({\textbf {L}}\) as its eigenvalues and eigenvectors provide a notion of frequency, i.e., small eigenvalues correspond to low frequencies and large ones to high frequencies [25, 33]. GFT of \({\textbf {x}}\) is defined as \(\widehat{{\textbf {x}}} = {\textbf {V}}^\top {\textbf {x}}\) and inverse GFT is \({\textbf {x}}= {\textbf {V}}\widehat{{\textbf {x}}}\). Thus, \({\textbf {x}}\) is the linear combination of eigenvectors of \({\textbf {L}}\) with the coefficients determined by \(\widehat{{\textbf {x}}}\). If most of the energy of \(\widehat{{\textbf {x}}}\) is concentrated in the entries corresponding to the small eigenvalues, \({\textbf {x}}\) has a lowfrequency representation in the graph Fourier domain. On the other hand, if its energy is concentrated in the entries corresponding to large eigenvalues, it has a highfrequency representation. The total variation of \({\textbf {x}}\) with respect to G can then be quantified as:
whose small(large) values indicate that \({\textbf {x}}\) has low(high) frequency representation on G.
For a graph signal \({\textbf {x}}\) defined on a signed graph G, total variation can be quantified based on the decomposition of G into \({G}^{{+}}\) and \({G}^{{}}\). Namely, let \({{\textbf {L}}}^{{+}}\) and \({{\textbf {L}}}^{{}}\) be the Laplacian matrices of \({G}^{{+}}\) and \({G}^{{}}\), then we define two total variation values for \({\textbf {x}}\): \(\textrm{tr}({\textbf {x}}^\top {{\textbf {L}}}^{{+}} {\textbf {x}})\) and \(\textrm{tr}({\textbf {x}}^\top {{\textbf {L}}}^{{}} {\textbf {x}})\).
Single view signed graph learning
Given a set of graph signals \(\{{\textbf {x}}_i \in \mathbb {R}^n\}_{i=1}^p\) that are defined on an unknown unsigned graph G, Dong et al. [27] proposed to learn the structure of G with the assumption that signals admit lowfrequency representation in the graph Fourier domain of G. Thus, G can be learned by minimizing (1) with respect to \({\textbf {L}}\) as follows:
where \({\textbf {X}}\in \mathbb {R}^{n\times p}\) is the data matrix whose columns are \({\textbf {x}}_i\)’s, \(\mathbb {L}=\{{\textbf {L}}: L_{ij} = L_{ji} \le 0\ \forall i\ne j,\ {\textbf {L}}\varvec{1}= \varvec{0}\}\) is the set of valid Laplacian matrices. The first term in (2) measures the total variation of the graph signals. The second term is the Frobenius norm of \({\textbf {L}}\) and controls the density of the learned graph. Finally, the last constraint is added to prevent the trivial solution \({\textbf {L}}= \varvec{0}\).
In [31], (2) is extended to learn an unknown signed graph G based on the assumption that the graph signals admit (i) lowfrequency (smooth) representation over \({G}^{{+}}\), and (ii) highfrequency (nonsmooth) representation over \({G}^{{}}\). Smoothness and nonsmoothness of the graph signals with respect to signed graphs are defined as follows: (1) Signal values on nodes that are connected with positive edges are similar to each other; (2) Signal values on nodes that are connected with negative edges are dissimilar to each other. These assumptions imply that if genes i and j are connected by an activating edge, their gene expressions should be similar, i.e lowfrequency. On the other hand, if i and j are connected by an inhibitory edge, their expressions should be dissimilar, i.e., high frequency. These assumptions are biologically reasonable and have been validated in [31]. Based on these assumptions, the signed graph G is learned by minimizing \(\textrm{tr}({\textbf {X}}^\top {{\textbf {L}}}^{{+}} {\textbf {X}})\) with respect to \({{\textbf {L}}}^{{+}}\) and maximizing \(\textrm{tr}({\textbf {X}}^\top {{\textbf {L}}}^{{}} {\textbf {X}})\) with respect to \({{\textbf {L}}}^{{}}\). This leads to the following optimization problem:
where \({\textbf {K}}^+ = {\textbf {X}}{\textbf {X}}^\top\), \({\textbf {K}}^ = {\textbf {X}}{\textbf {X}}^\top\) and we used the cyclic property of trace operation, i.e. \(\textrm{tr}({\textbf {X}}^\top {\textbf {L}}{\textbf {X}}) = \textrm{tr}({\textbf {X}}{\textbf {X}}^\top {\textbf {L}})\). \({{\textbf {L}}}^{{+}}\) and \({{\textbf {L}}}^{{}}\) are constrained to be in the set \(\mathbb {C}=\{({{\textbf {L}}}^{{+}}, {{\textbf {L}}}^{{}}):L_{ij}^+=0 \text { if } L_{ij}^\ne 0 \text { and } L_{ij}^=0 \text { if } L_{ij}^+ \ne 0\}\) to ensure that they are not nonzero at the same indices.
The optimization problem in (3) can be kernelized to exploit various (nonlinear) relations between graph signals. Kernelization is important for GRN inference as it is unclear which association measure between gene expressions is best for various scRNAseq data analysis [32]. Therefore, (3) is kernelized by changing \({\textbf {X}}{\textbf {X}}^\top\) with any positive semidefinite kernel matrix. In [31], three kernels are considered: correlation coefficient (r), proportionality measure (\(\rho\)) [34] and zeroinflated Kendall’s tau (\(\tau _{zi}\)) [35]. These kernels are considered because r is a commonly used metric for network inference, \(\rho\) is found as the best performing measure in [32] and \(\tau _{zi}\) can handle the dropouts in scRNAseq.
Multiview signed graph learning
Let \(\{{\textbf {X}}^i\}_{i=1}^N\) be a given set consisting of N matrices. \({\textbf {X}}_i \in \mathbb {R}^{n \times p_i}\) is a data matrix constructed from \(p_i\) graph signals defined on an unknown signed graph \(G^i=(V, E^i, {\textbf {W}}^i)\) with \(V = n\). It is assumed that \(E^i\)’s and associated edge weights are different but similar to each other. Based on this assumption, when learning \(G^i\)’s, one can have better performance by borrowing information across graphs. For example, when analyzing scRNAseq expressions from different disease states/conditions, the datasets generated from the varying groups are generally assumed to share a common genecoexpression structure. Thus, jointly learning celltype specficic graphs can improve inference by allowing information sharing across celltypes. To this end, we propose an optimization problem (scMSGL) that learns \(G^i\)’s simultaneously. In the proposed approach, the learned \(G^i\)’s are regularized to be close to a consensus graph G, which is also learned by combining information from \(G^i\)’s. Thus, the proposed formulation ensures that information is shared across graphs when learning \(G^i\)’s. Furthermore, the structure of G reflects the common connections shared across \(G^i\)’s, whose inference may be beneficial if one is interested in learning the common genecoexpression structure over the different celltypes/diseasestage subgroups.
Let \({\textbf {L}}^{i, +}\) and \({\textbf {L}}^{i, }\) be the Laplacian matrices of the positive and negative parts of \(G^i\), respectively. Similarly, define \({{\textbf {L}}}^{{+}}\) and \({{\textbf {L}}}^{{}}\) for the consensus graph \({\textbf {G}}\). Let \({\mathcal {L}}^{{+}} = \{{\textbf {L}}^{1, +}, \dots , {\textbf {L}}^{N, +}, {\textbf {L}}^+\}\) and \({\mathcal {L}}^{{}} = \{{\textbf {L}}^{1, }, \dots , {\textbf {L}}^{N, }, {\textbf {L}}^\}\). The optimization problem for jointly learning \(G^i\)’s and G is then:
where \({\textbf {K}}^{i,+} = {\textbf {K}}^i\), \({\textbf {K}}^{i,} = {\textbf {K}}^i\), and \({\textbf {K}}^i\) is a kernel matrix constructed from \({\textbf {X}}^i\) as described in “Single view signed graph learning” section. \(\Vert \cdot \Vert _{F,off}\) and \(\Vert \cdot \Vert _{1,off}\) are the Frobenius norm and the \(\varvec{\ell }_1\)norm of the offdiagonal entries, respectively. The first term in the summation measures the smoothness and nonsmoothness of \({\textbf {X}}\) over \(G^{i, +}\) and \(G^{i, }\), respectively. The second term controls the density of the learned \(G^{i, +}\) (\(G^{i, }\)) such that for larger values of \(\alpha _+\) (\(\alpha _\)), we learn denser graphs. The third term ensures that \(G^{i, +}\) (\(G^{i, }\)) are close to the positive (negative) part of consensus graph G with \(\beta _+\) (\(\beta _\)) controlling how close they should be. The last term is a regularizer that controls the sparsity of positive and negative parts of G with larger values of \(\gamma _+\) and \(\gamma _\) resulting in a sparser consensus graph. Finally, the constraints are the same as in (3). Algorithm 1 gives the workflow of scMSGL to learn multiple graphs jointly. See Additional file 1 for an ADMM based optimization for (4).
Hyperparameter selection procedure
scMSGL requires the selection of six hyperparameters, three of which control the properties of the positive parts of the learned graphs while the remaining control the negative parts. As mentioned above, \(\alpha _+\) (\(\alpha _\)) and \(\gamma _+\) (\(\gamma _\)) control the edge density of positive (negative) parts of the learned \(G^i\)’s and G, respectively. \(\beta _+\) (\(\beta _\)) controls how similar the learned \(G^{i, +}\)’s (\(G^{i, }\)’s) are to the consensus graph. We select these hyperparameters similar to that suggested in [6], where hyperparameter selection is guided to learn graphs with desired properties. Alternative to other model selection approaches, such as crossvalidation or Bayesian information criterion, this approach can achieve a model that is interpretable and plausible in practice. Thus, we tune the hyperparameters such that the obtained graphs have a desired edge density and view similarity. In particular, assume that one wants the densities of positive and negative edges in the learned \(G^i\)’s and G to be \(d_+\) and \(d_\), respectively. Furthermore, assume that the pairwise similarity between \(G^{i,+}\) and \(G^{j,+}\), \(\forall i \ne j\) is desired to be \(c_+\), where the similarity is quantified by the correlation coefficient. Similarly, let \(c_\) be the desired similarity amount for the negative edges of the graphs. Once \(d_+\), \(d_\), \(c_+\), \(c_\) are fixed, we select the six hyperparameters accordingly. The values of \(d_+\), \(d_\), \(c_+\), and \(c_\) are selected based on prior knowledge on the datasets under study as detailed in “Results” section.
Results
The performance of scMSGL is evaluated on both simulated and two real scRNAseq datasets. For simulated data, learned graphs are compared to ground truth networks to quantify the performance of scMSGL. Signed version of area under precision recall curve (AUPRC) is used as the performance metric during this analysis. We report AUPRC ratio, which is the ratio of AUPRC value of scMSGL to that of a random predictor. More details on how AUPRC is calculated can be found in Additional file 1. Simulated data are used to benchmark the performance of scMSGL against scSGL and three GRN inference algorithms, GENIE3, GRNBOOST2 and PIDC, which are found to be the best performing algorithms for scRNAseq data [10]. These methods and scSGL can only learn a single graph from each dataset at a time. Therefore, they are applied to each \({\textbf {X}}^i\) separately and the learned graphs are compared to ground truth \(G^i\)’s. In addition, we benchmark against Joint Graphical Lasso with fused lasso penalty (JGLFused) method [6], which learns multiple related Gaussian graphical models, and Joint Gene Networks with scRNAseq data (JGNsc) [21] algorithm, which jointly learns the graphs for multiple classes of single cell data. Other single cell joint graph learning algorithms discussed in “Background” section [17, 19] have not been considered due to the absence of publicly/on request available code.
Selected hyperparameter values
Hyperparameters of scMSGL are set as described in “Methods” section with \(d_+ = d_ = d\) and \(c_+ = c_ = c\). We used the BEELINE [10] pipeline to run GENIE3, GRNBOOST2 and PIDC. GENIE3 and GRNBOOST2 employs random forest and gradient boosting regressors, respectively and hyperparameters of these regressors are set to the default values used in GENIE3 and GRNBOOST2 toolboxes. PIDC uses mutual information to learn gene regulations and it requires a discretizer and an estimator for probability distribution estimation. We used the discretizer and estimator recommended by PIDC toolbox. scSGL requires \(\alpha _+\) and \(\alpha _\), which are determined the same way as \(\alpha _s\)’s of scMSGL, i.e., they are set to values such that learned graphs have desired edge densities of \(d_+ = d_ = d\). JGLFused requires two parameters \(\lambda _1\) and \(\lambda _2\), which are analogous to the parameter of scMSGL, \(\alpha _s\) and \(\beta _s\), respectively. Therefore, they are set the same way, i.e. we choose \(\lambda _1\) and \(\lambda _2\) such that the learned graphs’ desired edge densities satisfy \(d_+ + d_ = 2d\)^{Footnote 1} and view similarity of \(c_+ = c_ = c\). Finally, JGNsc consists of three steps: imputation, Gaussian transformation and GRN inference with JFLFused method. The hyperparameters of the first two steps are set to the default values provided in JGNsc toolbox and \(\lambda _1\) and \(\lambda _2\) of JGLFused step are set as described above.^{Footnote 2} Exact hyperparameter values of all methods are provided in Table 1 of Additional file 1.
For all datasets, we use \(c=0.5\). For simulated data, since benchmarking GRN inference methods (GENIE3, GRNBOOST2 and PIDC) learn fully connected graphs, we set \(d=0.4\) for fair comparison. For real data, we set \(d=0.1\) for ease of analysis. See Additional file 1 for a discussion on the sensitivity of scMSGL to the selection of d and c.
Simulated data
Data generation: To validate the performance of scMSGL, we simulate gene expression data from a multivariate zeroinflated negative binomial (ZINB) distribution. The ZINB distribution has been shown to accurately capture the characteristics of single cell datasets in several published studies [36, 37]. Given a known graph structure, we generate synthetic datasets using an algorithm developed by [38] and illustrated in [22, 31]. Two graph structures are considered for creating the baseline graph G with \(n=100\) genes: random graphs following an ErdősRényi (ER) model with an edge density of 0.1 and hub graphs following a BarabásiAlbert (BA) model with a degree distribution that follows the powerlaw. We then convert G to a signed graph by randomly selecting half of the edges and assigning a negative sign to them while assigning a positive sign to the other half. Next, we generate \(N=5\) individual networks \(\{G_i\}_{i=1}^{N}\) by adding \(0.9 \times \left( {\begin{array}{c}n\\ 2\end{array}}\right) \times \eta\) new edges to the baseline graph G. Half of the added edges are set as negative edges, while the other half are set as positive. The ZINB simulator is then used to generate datasets \(\{X_i\}_{i=1}^{N}\) from the underlying graphs \(\{G_i\}_{i=1}^{N}\). The three parameters of the ZINB distribution; \(\lambda\), k and \(\omega\), which control its mean, dispersion and degree of zeroinflation, respectively were determined using a real scRNAseq dataset [39]. Each simulation is repeated 10 times and the average performance over 10 realizations is reported. More details for data generation process can be found in Additional file 1.
Sensitivity to the number of cells: We first study the performance of the methods with varying number of cells when the dropout ratio is set to 0.26, \(\eta =0.1\), i.e. \(90\%\) of the edges are common across views and the correlation kernel is used for both scSGL and scMSGL. From left panel of Fig. 1, it can be seen that for the different cell numbers, scMSGL has higher AUPRC ratios than methods that learn from a single dataset. This indicates that the proposed method incorporates valuable information across views, which improves the performance. scMSGL also performs better than both joint graph learning methods JGLFused and JGNsc. Although JGNsc is based on JGLFused, its performance is worse than JGLFused. This could be due to a change in the data structure owing to multiple imputation and data transformation steps, which form a part of the JGNsc algorithm. As expected, the performance of all methods improves with increasing number of cells. These observations hold for both random graph models.
Sensitivity to dropout ratio: In the second analysis, we evaluate the performance of the different methods with increasing dropout ratio while fixing the number of cells to 400 and \(\eta =0.1\). Results are shown in the middle panel of Fig. 1 for both random graph models, with correlation kernel used for scSGL and scMSGL. Similar to cell sensitivity analysis, scMSGL performs better compared to all other methods irrespective of which graph model is used to generate the datasets. Except for PIDC, AUPRC ratios of all methods drop with increasing dropout ratio as expected. Performance of PIDC mostly remains the same. Since PIDC performs poorly at all dropout levels, this result does not imply robustness against dropouts.
Sensitivity to view similarity: Next, we study the effect of view similarity on the performance of algorithms. Datasets are generated with varying \(\eta\) values while fixing the number of cells to 400 and the dropout ratio to 0.26. Results are reported in right panel of Fig. 1, where the correlation kernel is employed for scSGL and scMSGL. When view similarity is 90%, the best performing algorithm is scMSGL, while for lower view similarity values JGLFused performs slightly better than scMSGL. The reason that JGLFused performs better than scMSGL for smaller view similarity values could be due to the difference in the regularization terms used to impose similarity across views. JGLFused uses a \(\ell _1\) norm penalty, while we employ a squared Frobenius norm. Compared to fused lasso, squared Frobenius norm is susceptible to outliers, which can degrade the performance. The performance of singleview algorithms does not get affected by the changes in view similarity, as they learn each view independently. On the other hand, there is a drop in the performances of all joint graph learning methods with decreasing view similarity. This is an expected behaviour, since both methods assume the dependence of views.
Kernel comparison: Formulation of scMSGL allows us to use various kernels. Therefore, we study how the performance changes with respect to the kernel type. Datasets are created using the BA model and results are shown in Fig. 2 for varying number of cells, dropout ratios and view similarities. The best performing kernel is \(\tau _{zi}\), followed by the correlation kernel. When Figs. 1 and 2 are compared, scMSGL has higher AUPRC ratios than singleview approaches and JGNsc irrespective of the kernel choice. The change in the performance of \(\tau _{zi}\) and \(\rho\) with varying cell numbers, dropout ratios and view similarity are very similar to that of the correlation. Finally, to better understand the effect of kernels, the performance of scMSGL without any kernels, i.e. \({\textbf {K}}^i = {\textbf {X}}^i{{\textbf {X}}^i}^\top\), is also reported. Figure 2 shows all kernels have significantly higher performance compared to when no kernel is used, which indicates the importance of kernel usage in GRN inference.
Time complexity comparison: We compare the different methods based on their run time complexity. We generated datasets using BA model with varying number of cells and number of genes. Table 1 reports the run time of scSGL, scMSGL, JGLFused and JGNsc in seconds. Run times of GENIE3, GRNBOOST2 and PIDC are not reported as they are shown to have higher time complexity than scSGL in [31]. Reported run times correspond to one run without hyperparameter search. Run time of scSGL is the total run time to infer all views.
In the first dataset, number of genes, dropout ratio and \(\eta\) are fixed to 100, 0.26, and 0.1, respectively and number of cells varies. Results for this dataset indicate that scMSGL is faster than joint graph learning methods JGLFused and JGNsc. JGLFused also uses an ADMM based optimization, however it needs singular value decomposition at each ADMM iteration. scMSGL does not need this expensive operation; thus, it runs much faster than JGLFused. scSGL is faster than scMSGL, which is expected as scMSGL optimization takes longer time to converge due to added regularization terms and consensus graph learning. Finally, all methods except JGNsc are observed to run faster with increasing number of cells, since the inference problem becomes easier with higher number of cells, which makes iterative optimization procedure used by all methods converge faster. JGNsc runs slower with increasing number of cells, as its imputation step needs to handle a larger data matrix.
In the second dataset with increasing number of genes, the number of cells, dropout ration and \(\eta\) are fixed to 500, 0.26, and \(\eta =0.1\), respectively. As before, scMSGL is faster than joint graph learning methods and is slower than scSGL. Increasing the number of genes is observed to increase run time complexity of all methods, as it makes the problem harder.
Analysis of scRNAseq data from mouse embryonic stem cell differentiation
Central to the differentiation process and many other cellular processes is the expression of right combination of genes or modules of genes. Accurate characterization of the coexpression networks for progenitor and multiple cell types can help in understanding the cascade of cellular state transitions [12]. In this section, we study the differentiation process of mouse embryonic stem cells (mESC) using single cell RNA sequencing datasets [40]. This data was generated using highthroughput dropletmicrofluidic approach and was primarily used to study differentiation in mESC before and after leukemia inhibitory factor (LIF) withdrawal. Since LIF maintains pluripotency of mESC, LIF withdrawal is considered to initiate the differentiation process. The dataset contains cells sampled from 4 states (or natural subgroups): before LIF withdrawal, day 0 and after the withdrawal for days 2, 4 and 7. The subgroups contain 933, 303, 683 and 798 cells, respectively. This dataset has been previously analyzed using joint graphical estimation in [17, 19] and similar to them we only consider the 72 stem cell markers in our application [41].^{Footnote 3}
We first estimated the subgroup specific and the consensus graphs. Based on the results of simulated data, we employ the zeroinflated Kendall’s tau kernel. Next, we calculate the signed node degrees of each gene, i.e., \(D_{ii}^+ = \sum _{j=1}^n W_{ij}^+\) and \(D_{ii}^ = \sum _{j=1}^n W_{ij}^\) from learned graphs \(G^+\) and \(G^\). We then consider the genes with top signed degrees as hub genes whose signed degrees are reported in Fig. 3. The result confirms the importance of regulator genes NANOG, SOX2, POU5F1, ZFP42, UTF1 in early stages of differentiation. NANOG has been reported to maintain pluripotency by inhibiting genes that activate differentiation to lineages associated with extraembryonic endoderm [43, 44]. Figure 3 clearly shows that the number of inhibitory relationships associated with NANOG decreases as the ES cells proceed to a matured state. POU5F1 and SOX2 also exhibit higher number of inhibitory relationships in the the first few days. SOX2, NANOG and POU5F1 are known to play a fundamental role in the selfrenewal and pluripotency of mouse embryonic stem cells [45]. Reduction in expression of NANOG has been shown to be correlated with the induction of gene GATA4 which initiates differentiation of pluripotent cells [46] and therefore GATA4 has been correctly identified as a hub gene in Days 2 and 4. Collectively, these results confirm the fundamental roles of SOX2, NANOG and POU5F1 in the pluripotency stage and how an eventual reduction in their expression initiates differentiation.
Analysis of scRNAseq data from medulloblastoma
Medulloblastoma (MB) is a highly malignant cerebellar tumor mostly affecting young children [47]. Several studies have been done to pinpoint the genetic drivers in each of the four distinct tumor subgroups: WNTpathwayactivated, SHHpathwayactivated, and the lesswellcharacterized Group 3 and Group 4 [47]. Among these subgroups, Group 3 and Group 4 tumors account for the majority of the MB diagnoses, with Group 3 MB having a metastatic diagnosis of approximately \(50\%\). Transcription factors (TFs) MYC2 and OTX2 have commonly been identified as key oncogenic TFs in Group 3 and 4 tumorogenesis. Dong et al. [21] used the joint single cell network algorithm to study the roles of MYC and OTX2 utilizing the MB scRNAseq data set (GSE119926) by [39].^{Footnote 4} Using the same selected samples from a subset of 17 individuals that were grouped into three subsets Group 3, Group 4 and an intermediate cell type, we estimate the joint gene regulatory network for the three groups for \(\sim 750\) genes among which most are enzymerelated genes from mammalian metabolic enzyme database [48]. Bulk profiling studies for MB cells have consistently observed overlapping transcriptional and epigenetic signatures in Group 3 and Group 4 tumors suggesting shared developmental origins [39, 49]. Based on this, we hypothesize that a joint analysis of the different MB celltypes would better capture the local functional interactions of MYC and OTX2 across different tumor subtypes and would eventually help in delineating their global role in regulating metabolic processes in MB cells.
Subgroup specific networks along with the consensus graph were estimated with zeroinflated Kendall’s tau kernel. Table 2 shows that the average edge weight for the MYC network is considerably higher for Group 3 compared to Group 4 and the intermediate subgroup. Figure 4 further shows that Group 3 MYC network has stronger edge connections and higher density in compared to the intermediate group. In Group 4, almost all the connections become activating except for Aldh3a2 and Eno2; which were found to be strongly downregulated in all the tumor subgroups confirming their role in cancer resistance [50, 51]. This varying network structure over the subgroups confirms the major role MYC plays in initiation, maintenance, and progression of Group 3 tumors [52]. In Fig. 4, it is shown that OTX2 has a denser network for Group 4 MB cells in comparison to the other groups. In Group 3 MB cells, OTX2’s connections to the metabolic genes are very distinct from the MYC’s. In addition, scMSGL detected relationships between OTX2 and metabolic genes PAICS and PPAT in Group 3 tumors. These genes related to the human purine biosynthesis pathways have been previously reported to be induced by MYC [53]. This confirms that OTX2 is functionally cooperating with MYC to regulate gene expression in medulloblastoma [52, 54]. Broadly these results suggest that MYC and OTX2 play significant roles in in the transcriptional regulation of the metabolic genes and the mechanisms underlying MYC and OTX2 mediated MB maintenance and progression likely vary in different subgroups of MB cells.
Conclusion
In this paper, we presented scMSGL for joint inference of multiple GRNs from scRNAseq datasets having multiple classes. scMSGL learns functional relationships between genes across multiple related classes of single cell gene expression datasets under the assumption that there exists a shared structure across classes. The main novelty of our paper lies in the formulation of a highly efficient optimization framework that extends the signed graph learning [31] approach to high dimensional datasets with multiple classes. The kernelization trick embedded within the algorithm renders it capable of handling sparse and noisy features; expected to demonstrate highly nonlinear relationships. Furthermore, the estimation of the consensus graph may help in understanding the joint structure existing within the multiple classes. Using simulation studies, we demonstrated the superior performance of scMSGL over single view learning and existing joint learning methods for ER and BA graph models. In addition, performance was ascertained by varying a number of simulation parameters such as dropout levels, cell numbers and view similarity and scMSGL demonstrated superior performance in all scenarios. Applying scMSGL to the mESC dataset, we robustly identified previously reported regulatory markers as the hub genes for the different days and captured the progression of the differentiation process by analyzing these changes in hubs over the days. For the medulloblastoma data, scMSGL efficiently captured the significant roles that key oncology markers MYC and OTX2 play in the transcriptional regulation of metabolic genes.
There are various aspects of the proposed method that can be considered for improvement as future work. One challenge in implementing scMSGL is how to select the kernel function. This challenge can be addressed by combining information from multiple kernels during learning. An open problem in graph learning literature is hyperparameter selection, which is also a limitation of the proposed method. Current work selects the hyperparameters by searching the values that would result in graphs with desired properties. Future work can improve the accuracy of the learned graphs through better hyperparameter selection and multikernel strategies. Computational complexity of scMSGL is quadratic with respect to the number of genes (similar to scSGL) and linear in number of views. Therefore, its application to datasets with very large number of genes is not feasible. However, recent developments in GSP to scale GL to largescale problems [55] can be exploited to scale scMSGL. Finally, additional sources of data that help in identifying direct interactions between TFs and target genes, can provide a way to filter out false positives. The current availability of singlecell epigenomic datasets has made it easier to further explore the regulatory relationship between TF and genes. Singlecell assay for transposaseaccessible chromatin with sequencing (scATACseq), for example, allows the identification of DNA regulatory elements within accessible genomic DNA regions in single cells, hence enabling the identification of direct regulations in GRNs. Integration of multiomics profiles within the framework of scMSGL could be an interesting avenue for future research.
Availability of data and materials
For real data, please see the cited references in “Results” section.The scMSGL code and simulated data are available at https://github.com/SingleCellGraphLearning/scMSGL.
Notes
JGLFused does not allow edge densities of the negative and positive parts of the learned graph to be controlled separately, therefore we learned a graph with edge density equal to 2d, which is the same edge density for scMSGL if the edge signs are not considered.
JGNsc [21] recommends to use Akaike information criterion for selection of \(\lambda _1\) and \(\lambda _2\). In our analysis, we found this selection technique does not perform well and its time complexity was high.
Abbreviations
 GRN:

Gene regulatory networks
 GSP:

Graph signal processing
 GL:

Graph learning
 GFT:

Graph Fourier transform
 scRNAseq:

Single cell RNA sequencing
 scSGL:

Single cell signed graph learning
 scMSGL:

Single cell multiple singed graph learning
References
Sanguinetti G, HuynhThu VA. Gene regulatory networks. Springer; 2019.
Yin W, Mendoza L, MonzonSandoval J, Urrutia AO, Gutierrez H. Emergence of coexpression in gene regulatory networks. PLoS ONE. 2021;16(4):0247671.
Yang Y, Han L, Yuan Y, Li J, Hei N, Liang H. Gene coexpression network analysis reveals common systemlevel properties of prognostic genes across cancer types. Nat Commun. 2014;5(1):1–9.
Van Der Wijst MG, de Vries DH, Brugge H, Westra HJ, Franke L. An integrative approach for building personalized gene regulatory networks for precision medicine. Genome Med. 2018;10(1):1–15.
13, B..W.H..H.M.S.C.L...P.P.J..K.R., data analysis: Baylor College of Medicine Creighton Chad J. 22 23 Donehower Lawrence A. 22 23 24 25, G., for Systems Biology Reynolds Sheila 31 Kreisberg Richard B. 31 Bernard Brady 31 Bressler Ryan 31 Erkkila Timo 32 Lin Jake 31 Thorsson Vesteinn 31 Zhang Wei 33 Shmulevich Ilya 31, I., et al. Comprehensive molecular portraits of human breast tumours. Nature. 2012;490(7418):61–70.
Danaher P, Wang P, Witten DM. The joint graphical lasso for inverse covariance estimation across multiple classes. J R Stat Soc Ser B Stat Methodol. 2014;76(2):373.
Lee W, Liu Y. Joint estimation of multiple precision matrices with common structures. J Mach Learn Res. 2015;16(1):1035–62.
Ha MJ, Baladandayuthapani V, Do KA. Dingo: differential network analysis in genomics. Bioinformatics. 2015;31(21):3413–20.
Chen S, Mar JC. Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data. BMC Bioinform. 2018;19(1):1–21.
Pratapa A, Jalihal AP, Law JN, Bharadwaj A, Murali T. Benchmarking algorithms for gene regulatory network inference from singlecell transcriptomic data. Nat Methods. 2020;17(2):147–54.
Azizi E, Carr AJ, Plitas G, Cornish AE, Konopacki C, Prabhakaran S, Nainys J, Wu K, Kiseliovas V, Setty M, et al. Singlecell map of diverse immune phenotypes in the breast tumor microenvironment. Cell. 2018;174(5):1293–308.
Matsumoto H, Kiryu H, Furusawa C, Ko MS, Ko SB, Gouda N, Hayashi T, Nikaido I. Scode: an efficient regulatory network inference algorithm from singlecell rnaseq during differentiation. Bioinformatics. 2017;33(15):2314–21.
Guo J, Levina E, Michailidis G, Zhu J. Joint estimation of multiple graphical models. Biometrika. 2011;98(1):1–15.
Fiers MW, Minnoye L, Aibar S, Bravo GonzálezBlas C, Kalender Atak Z, Aerts S. Mapping gene regulatory networks from singlecell omics data. Brief Funct Genomics. 2018;17(4):246–54.
Chen G, Ning B, Shi T. Singlecell rnaseq technologies and related computational data analysis. Front Genet. 2019;10:317.
Akers K, Murali T. Gene regulatory network inference in single cell biology. Curr Opin Syst Biol. 2021;26:87–97.
Mukherjee S, Carignano A, Seelig G, Lee SI. Identifying progressive gene network perturbation from singlecell rnaseq data. In: 2018 40th annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE; 2018. p. 5034–40.
Allen GI, Liu Z. A local Poisson graphical model for inferring networks from sequencing data. IEEE Trans Nanobiosci. 2013;12(3):189–98.
Wu N, Yin F, OuYang L, Zhu Z, Xie W. Joint learning of multiple gene networks from singlecell gene expression data. Comput Struct Biotechnol J. 2020;18:2583–95.
Liu H, Han F, Yuan M, Lafferty J, Wasserman L. Highdimensional semiparametric gaussian copula graphical models. Ann Stat. 2012;40(4):2293–326.
Dong M, He Y, Jiang Y, Zou F. Joint gene network construction by singlecell rna sequencing data. Biometrics. 2022.
Jia B, Xu S, Xiao G, Lamba V, Liang F. Learning gene regulatory networks from next generation sequencing data. Biometrics. 2017;73(4):1221–30.
Mongia A, Sengupta D, Majumdar A. Mcimpute: matrix completion based imputation for single cell rnaseq data. Front Genet. 2019;10:9.
Liu H, Lafferty J, Wasserman L. The nonparanormal: semiparametric estimation of high dimensional undirected graphs. J Mach Learn Res. 2009;10(10):2295–328.
Shuman DI, Narang SK, Frossard P, Ortega A, Vandergheynst P. The emerging field of signal processing on graphs: extending highdimensional data analysis to networks and other irregular domains. IEEE Signal Process Mag. 2013;30(3):83–98.
Dong X, Thanou D, Rabbat M, Frossard P. Learning graphs from data: a signal representation perspective. IEEE Signal Process Mag. 2019;36(3):44–63.
Dong X, Thanou D, Frossard P, Vandergheynst P. Learning Laplacian matrix in smooth graph signal representations. IEEE Trans Signal Process. 2016;64(23):6160–73.
Kalofolias V. How to learn a graph from smooth signals. In: Artificial intelligence and statistics. PMLR; 2016. p. 920–9.
Segarra S, Marques AG, Mateos G, Ribeiro A. Network topology inference from spectral templates. IEEE Trans Signal Inf Process Netw. 2017;3(3):467–83.
Navarro M, Wang Y, Marques AG, Uhler C, Segarra S. Joint inference of multiple graphs from matrix polynomials. J Mach Learn Res. 2022;23(76):1–35.
Karaaslanli A, Saha S, Aviyente S, Maiti T. scsgl: kernelized signed graph learning for singlecell gene regulatory network inference. Bioinformatics. 2022;38(11):3011–9.
Skinnider MA, Squair JW, Foster LJ. Evaluating measures of association for singlecell transcriptomics. Nat Methods. 2019;16(5):381–6.
Sandryhaila A, Moura JM. Discrete signal processing on graphs: frequency analysis. IEEE Trans Signal Process. 2014;62(12):3042–54.
Quinn TP, Richardson MF, Lovell D, Crowley TM. propr: an rpackage for identifying proportionally abundant features using compositional data analysis. Sci Rep. 2017;7(1):1–9.
Pimentel RS, NiewiadomskaBugaj M, Wang JC. Association of zeroinflated continuous variables. Stat Probab Lett. 2015;96:61–7.
Risso D, Perraudeau F, Gribkova S, Dudoit S, Vert JP. Zinbwave: a general and flexible method for signal extraction from singlecell rnaseq data. bioRxiv. 2017;125112.
Hafemeister C, Satija R. Normalization and variance stabilization of singlecell rnaseq data using regularized negative binomial regression. Genome Biol. 2019;20(1):1–15.
Yahav I, Shmueli G. On generating multivariate Poisson data in management science applications. Appl Stoch Model Bus Ind. 2012;28(1):91–102.
Hovestadt V, Smith KS, Bihannic L, Filbin MG, Shaw ML, Baumgartner A, DeWitt JC, Groves A, Mayr L, Weisman HR, et al. Resolving medulloblastoma cellular architecture by singlecell genomics. Nature. 2019;572(7767):74–9.
Klein AM, Mazutis L, Akartuna I, Tallapragada N, Veres A, Li V, Peshkin L, Weitz DA, Kirschner MW. Droplet barcoding for singlecell transcriptomics applied to embryonic stem cells. Cell. 2015;161(5):1187–201.
Przybyla LM, Voldman J. Attenuation of extrinsic signaling reveals the importance of matrix remodeling on maintenance of embryonic stem cell selfrenewal. Proc Natl Acad Sci. 2012;109(3):835–40.
Edgar R, Domrachev M, Lash AE. Gene expression omnibus: Ncbi gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30(1):207–10.
Chambers I, Colby D, Robertson M, Nichols J, Lee S, Tweedie S, Smith A. Functional expression cloning of nanog, a pluripotency sustaining factor in embryonic stem cells. Cell. 2003;113(5):643–55.
Mitsui K, Tokuzawa Y, Itoh H, Segawa K, Murakami M, Takahashi K, Maruyama M, Maeda M, Yamanaka S. The homeoprotein nanog is required for maintenance of pluripotency in mouse epiblast and ES cells. Cell. 2003;113(5):631–42.
Zhou Q, Chipperfield H, Melton DA, Wong WH. A gene regulatory network in mouse embryonic stem cells. Proc Natl Acad Sci. 2007;104(42):16438–43.
Hough SR, Clements I, Welch PJ, Wiederholt KA. Differentiation of mouse embryonic stem cells after rna interferencemediated silencing of oct4 and nanog. Stem Cells. 2006;24(6):1467–75.
Northcott PA, Robinson GW, Kratz CP, Mabbott DJ, Pomeroy SL, Clifford SC, Rutkowski S, Ellison DW, Malkin D, Taylor MD, et al. Medulloblastoma. Nat Rev Dis Primers. 2019;5(1):1–20.
Corcoran CC, Grady CR, Pisitkun T, Parulekar J, Knepper MA. From 20th century metabolic wall charts to 21st century systems biology: database of mammalian metabolic enzymes. Am J Physiol Renal Physiol. 2017;312(3):533–42.
Northcott PA, Korshunov A, Witt H, Hielscher T, Eberhart CG, Mack S, Bouffet E, Clifford SC, Hawkins CE, French P, et al. Medulloblastoma comprises four distinct molecular variants. J Clin Oncol. 2011;29(11):1408.
Moreb JS, Muhoczy D, Ostmark B, Zucali JR. Rnaimediated knockdown of aldehyde dehydrogenase class1a1 and class3a1 is specific and reveals that each contributes equally to the resistance against 4hydroperoxycyclophosphamide. Cancer Chemother Pharmacol. 2007;59(1):127–36.
Chang PMH, Chen CH, Yeh CC, Lu HJ, Liu TT, Chen MH, Liu CY, Wu AT, Yang MH, Tai SK, et al. Transcriptome analysis and prognosis of ALDH isoforms in human cancer. Sci Rep. 2018;8(1):1–10.
Roussel MF, Robinson GW. Role of MYC in medulloblastoma. Cold Spring Harb Perspect Med. 2013;3(11): 014308.
Liu YC, Li F, Handler J, Huang CRL, Xiang Y, Neretti N, Sedivy JM, Zeller KI, Dang CV. Global regulation of nucleotide biosynthetic genes by cMyc. PLoS ONE. 2008;3(7):2722.
Lu Y, Labak CM, Jain N, Purvis IJ, Guda MR, Bach SE, Tsung AJ, Asuthkar S, Velpula KK. OTX2 expression contributes to proliferation and progression in Mycamplified medulloblastoma. Am J Cancer Res. 2017;7(3):647.
Kalofolias V, Perraudin N. Large scale graph learning from smooth signals. In: International conference on learning representations. 2018.
Acknowledgements
Not applicable.
Funding
This work was supported by following grants: NSF CCF 2211645, NSF CCF 2006800, and NSF DMS 1924724.
Author information
Authors and Affiliations
Contributions
AK devised the scMSGL and performed data processing. SS designed the simulation experiments and analyzed the results. TM and SA supervised the project. All authors discussed the results and contributed to the final manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1.
Additional file includes optimization process for scMSGL, definition of Signed AUPRC, simulated data generation process, results on sensitivity of scMSGL to hyperparameter selection and details about selected hyperparameters values of all methods.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Karaaslanli, A., Saha, S., Maiti, T. et al. Kernelized multiview signed graph learning for singlecell RNA sequencing data. BMC Bioinformatics 24, 127 (2023). https://doi.org/10.1186/s1285902305250y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1285902305250y
Keywords
 Gene regulatory networks
 Single cell
 Graph signal processing
 Graph learning