First, we review the related work about the low-rank representation. And then, then we introduced our proposed approach.
Related work
In recent years, the LRR method and its improved algorithms have been widely used in many fields. Furthermore, the group theory based on manifold learning has also captured the attention of the researchers. In subsections Low-Rank Representation and The Symmetric Constraint for the Low-Rank Representation, we review the LRR method [9] and the symmetric constraint for the low-rank representation [16], respectively. Then, in subsection Manifold Learning for Graph Regularization, we give a detail introduction to graph regularization based on manifold learning [29, 30].
Low-Rank Representation
Learning a lowest rank representation matrix of the observation dataset is the aim of LRR method [9]. Given an observation data matrix X = [x1, x2, ..., xn] ∈ ℝm × n with no error. And, there is an overcomplete dictionary matrix A = [a1, a2, ..., ak] ∈ ℝm × k and a union of multiple low-dimensional independent subspaces. It is assumed that the subspaces can be linearly spanned by the dictionary matrix A. Therefore, the given observation data X can be represented by these low-dimensional subspaces, and the relationship between data X and matrix A is X = AZ. In other words, data X is a linear combination of the dictionary matrix A. The function of the LRR method is as follows:
$$ \underset{\mathbf{Z}}{\min}\mathit{\operatorname{rank}}\left(\mathbf{Z}\right)\kern0.75em \mathrm{s}.\mathrm{t}.\mathbf{X}=\mathbf{AZ}. $$
(1)
Here, X = [x1, x2, ..., xn] ∈ ℝm × n is the observation data matrix. m is the total number of features, and n is the total number of the samples. A = [a1, a2, ..., ak] is the overcomplete dictionary matrix, and Z = [z1, z2, ..., zn] ∈ ℝk × n is the low rank representation matrix. The element zi of matrix Z is the mapping relationship from {xi| xi ∈ Xm × n, 1 ≤ i ≤ n} to the dictionary matrix A. In general, the matrix Z is also called the coefficient matrix, and it is a new expression form of X that is based on the dictionary matrix A. The purpose of the LRR method is to find the lowest rank representation matrix Z∗.
In practical analysis, the observation data matrix X is usually selected as the dictionary matrix A, which is a very important aspect of the LRR [9, 15, 26, 27]. According to the matrix multiplication rule, the lowest rank representation matrix Z = [z1, z2, ..., zn] ∈ ℝn × n is a square matrix. Equation (1) can be rewritten as follows:
$$ \underset{\mathbf{Z}}{\min}\mathit{\operatorname{rank}}\left(\mathbf{Z}\right)\kern0.75em \mathrm{s}.\mathrm{t}.\mathbf{X}=\mathbf{XZ}. $$
(2)
In this case, the matrix Z∗ represents the relation of each sample of X. In other words, the element \( {\mathbf{z}}_{ij}^{\ast } \) of matrix Z∗ represents the similarity between the samples xi and xj. Therefore, the element \( {\mathbf{z}}_{ij}^{\ast } \) should be equal to the element \( {\mathbf{z}}_{ji}^{\ast } \). That is, the matrix Z∗ is a symmetric matrix when the observation data matrix X is clean.
Because the rank function is nonconvex, no closed expression can be found. Therefore, Eq. (2) is very difficult to solve. The related research has shown that the nuclear-norm of a matrix is a minimal convex envelope of the rank of the matrix [31,32,33]. Therefore, Eq. (2) is equivalent to the following nuclear-norm convex optimization problem:
$$ \underset{\mathbf{Z}}{\min }{\left\Vert \mathbf{Z}\right\Vert}_{\ast}\kern0.75em \mathrm{s}.\mathrm{t}.\mathbf{X}=\mathbf{XZ}. $$
(3)
Here, ‖⋅‖∗ is the nuclear-norm. It is the sum of all singular values of the matrix Z, which is the minimal convex envelope of the rank function [17]. In the actual situation, the observation data are inevitably polluted by noise or outliers under certain special circumstances. Therefore, a certain regularization constraint ‖⋅‖l is usually added to (3) to balance the interference. The improved formula is as follows:
$$ \underset{\mathbf{Z},\mathbf{E}}{\min }{\left\Vert \mathbf{Z}\right\Vert}_{\ast }+\gamma {\left\Vert \mathbf{E}\right\Vert}_l\kern0.75em \mathrm{s}.\mathrm{t}.\mathbf{X}=\mathbf{XZ}+\mathbf{E}. $$
(4)
Here, the matrix E denotes the noise or outliers. The parameter γ > 0 is to balance the adaptability of each part in (4), and ‖⋅‖l is regularization constraint. In general, the appropriate regularization constraint ‖⋅‖l is selected according to different types of noise and outliers in real environments. For example, ‖⋅‖2, 1, i.e., the l2, 1 norm, is used to extract the sample-specific corruptions and small noise or outliers, and ‖⋅‖0, i.e., the l0 norm, is used to deal with the significant noise or outliers [27]. Solving the l0 norm is an NP-hard problem. Therefore, it is usually replaced by ‖⋅‖1, i.e., the l1 norm.
The above is a description of the classic original LRR method. The LRR method deals with the observation data from a holistic perspective. That is, the global structural features of the observation data are represented by the lowest rank representation matrix Z∗. In addition, the LRR method maps the structures of the observation data from high-dimensional spaces to low-dimensional spaces. It reduces the difficulty of processing high-dimensional observation data.
The Symmetric Constraint for the Low-Rank Representation
With clean observation data, based on function (3), the most ideal matrix Z∗ is a block diagonal and strictly symmetric matrix, as shown in Fig. 1. However, according to function (4), the lowest rank representation matrix Z∗ is not strictly symmetric when using real data with noise and outliers, as shown in Fig. 2 [34].. In other words, because the element Zij is not equal to the element Zji, the degree of similarity of the i-th sample to the j-th sample is not equal to the degree of similarity of the i-th of sample to the j-th sample. One question worth considering is which of the two elements is more suitable to be used to reflect the similarity between the two samples.
In general, an affinity matrix is usually constructed using symmetric operation, i.e., (| Z∗| +|Z∗|T)/2, to reflect the similarity of samples. Then, based on the affinity matrix, spectral clustering algorithms generally use the Ncuts clustering method for subspace clustering. To avoid symmetric operations, Chen et al. imposed a symmetric constraint onto the LRR to obtain a symmetric lowest rank representation matrix [16]. The improved method was named the low-rank representation with symmetric constraint (LRRSC) and it can be expressed as follows:
$$ \underset{\mathbf{Z},\mathbf{E}}{\min }{\left\Vert \mathbf{Z}\right\Vert}_{\ast }+\gamma {\left\Vert \mathbf{E}\right\Vert}_l\kern0.75em \mathrm{s}.\mathrm{t}.\mathbf{X}=\mathbf{XZ}+\mathbf{E},\mathbf{Z}={\mathbf{Z}}^{\mathrm{T}}. $$
(5)
The symmetric lowest rank representation matrix Z∗ can greatly preserve the subspace structures of the observation data. Therefore, the affinity matrix based on the principal direction angular information of the symmetric lowest rank representation matrix Z∗ can effectively reflect the similarity between samples. However, the LRRSC method does not consider the local geometrical structural information. It may lose important information when obtaining the lowest rank representation matrix. In the next section, we use manifold learning with graph regularization to solve this problem.
Manifold Learning for Graph Regularization
In actual situations, the given observation data X ∈ ℝm × n are usually high-dimensional. Thus, the local geometrical structural information exists at each data point and at its k-nearest-neighboring data points. Capturing the local geometrical structural information is important for the performance of subspace clustering. Fortunately, graph regularization based on manifold learning provides a feasible option to achieve this aim [29]. This approach can preserve the intrinsic local geometrical structures that are embedded in the high-dimensional data space.
In graph theory, the “manifold assumption” is that data points near local geometrical structures should keep their proximity under a new basis [35]. If we map the adjacent data points xi and xj in the high-dimensional space to the low-dimensional space, their mapping data points zi and zj should be close in the low-dimensional space. Therefore, the local geometrical structural information of the data points xi and xj can be represented in the low-dimensional space. In other words, if the characteristics of the data points are similar in the high-dimensional space, their mapping data points can be clustered into the same class in the low-dimensional space.
We take each data point as a vertex. The data points are defined by the column of observation data X = [x1, x2, ..., xn]. Therefore, the number of vertices is n. All n vertices form a graph G. The weight of the connected edge of vertices i and j in the graph G is represented by wij. The assignment rule of wij is as follows:
$$ {\mathbf{w}}_{ij}=\Big\{{\displaystyle \begin{array}{l}1\ \mathrm{if}\ {\mathbf{x}}_i\in {\mathbf{N}}_k\left({\mathbf{x}}_j\right)\ \mathrm{or}\ {\mathbf{x}}_j\in {\mathbf{N}}_k\left({\mathbf{x}}_i\right)\\ {}0\ \mathrm{otherwise}\end{array}}, $$
(6)
where Nk(xi) denotes the set of the k-nearest-neighbors of xi. As suggest in [26, 27], we select the k = 5 as the nearest neighbors for the experimental datasets. In addition, all elements wij make up a symmetric weight matrix W.
In the low-dimensional space, the new relationship of the data points is as follows:
$$ \underset{\mathbf{Z}}{\min }{\sum}_{ij}{\left\Vert {\mathbf{z}}_i-{\mathbf{z}}_j\right\Vert}^2{\mathbf{w}}_{ij}. $$
(7)
After a linear algebraic transformation, the above optimization problem (7) can be written as follows:
$$ \underset{\mathbf{Z}}{\min } tr\left({\mathbf{Z}}^{\mathrm{T}}\mathbf{LZ}\right). $$
(8)
Here, tr(⋅) is the trace of the matrix. L is called the graph Laplacian matrix. It is defined by L = D − W. The matrix D is a diagonal matrix, and the element dii of D is sum of the i-th row of W, i.e., \( {\mathbf{d}}_{ii}={\sum}_j^n{\mathbf{w}}_{ij} \).
sgLRR methodology
In this section, we introduce our method for multi-cancer sample clustering. First, we obtain the objective function according to the problem’s formulation and solve the function using the linearized adaptive direction method with the adaptive penalty (LADMAP) method [36]. Then, we provide the complete algorithm for easier understanding. Second, we combined our method with the Ncuts clustering method by learning an affinity matrix. Finally, the proposed method is used for the subspace segmentation of multi-cancer sample clustering.
Problem formulation and the solution
In this subsection, our goal is to propose a novel LRR model to preserve the intrinsic local geometrical structures of the observation data and simultaneously weaken the effects of noise and outliers in the dictionary matrix. We introduce graph regularization based on manifold learning and the symmetric constraint into the original LRR method. It is as follows:
$$ {\displaystyle \begin{array}{l}\underset{\mathbf{Z},\mathbf{E}}{\min }{\left\Vert \mathbf{Z}\right\Vert}_{\ast }+\beta tr\left(\mathbf{ZL}{\mathbf{Z}}^{\mathrm{T}}\right)+\gamma \left\Vert \mathbf{E}\right\Vert {}_1\kern0.75em \\ {}\mathrm{s}.\mathrm{t}.\mathbf{X}=\mathbf{XZ}+\mathbf{E},\mathbf{Z}={\mathbf{Z}}^{\mathrm{T}},\end{array}} $$
(9)
where β and γ are penalty parameters, L is the Laplacian matrix, and ‖ ⋅ ‖1 is the regularization constraint. ‖E‖1 is the sum of the absolute values of each element in the matrix E.
In addition, according to the sparsity-based clustering method, e.g., sparse coding combined with clustering, the sparsity constraint can be thought of as a strategy for information localization [37]. Thus, the coefficient matrix with the sparsity constraint can improve the performance of subspace clustering. Namely, by combining the low-rank and sparse data, the within-class affinities are dense, and the between-class affinities are zeros. So, we introduce the sparsity constraint into Eq. (9), and the finally objective function of our method is as follows:
$$ {\displaystyle \begin{array}{l}\underset{\mathbf{Z},\mathbf{E}}{\min }{\left\Vert \mathbf{Z}\right\Vert}_{\ast }+\lambda \left\Vert \mathbf{Z}\right\Vert {}_1+\beta tr\left(\mathbf{ZL}{\mathbf{Z}}^{\mathrm{T}}\right)+\gamma \left\Vert \mathbf{E}\right\Vert {}_1\ \\ {}\kern0.5em \mathrm{s}.\mathrm{t}.\mathbf{X}=\mathbf{XZ}+\mathbf{E},\mathbf{Z}={\mathbf{Z}}^{\mathrm{T}},\end{array}} $$
(10)
where λ is the penalty parameter, and ‖Z‖1 is the sparsity constraint for the low-rank representation matrix Z.
We call the objective function in (10) the graph regularized low-rank representation under combined the sparse and symmetric constraints (sgLRR) method. To obtain a globally optimal solution, we adopt the LADMAP to solve problem (10).
First, we introduce the auxiliary variable J to separate variables. It is as follows:
$$ {\displaystyle \begin{array}{l}\underset{\mathbf{Z},\mathbf{E}}{\min }{\left\Vert \mathbf{Z}\right\Vert}_{\ast }+\lambda \left\Vert \mathbf{J}\right\Vert {}_1+\beta tr\left(\mathbf{ZL}{\mathbf{Z}}^{\mathrm{T}}\right)+\gamma \left\Vert \mathbf{E}\right\Vert {}_1\kern0.50em \\ {}\ \mathrm{s}.\mathrm{t}.\mathbf{X}=\mathbf{XZ}+\mathbf{E},\mathbf{Z}={\mathbf{Z}}^{\mathrm{T}},\mathbf{Z}=\mathbf{J}.\end{array}} $$
(11)
Second, problem (11) can be converted into an unconstrained optimization problem by using the augmented Lagrange multiplier method (ALM) [38]. The formula is rewritten as follows:
$$ {\displaystyle \begin{array}{l}\ell \left(\mathbf{Z},\mathbf{E},\mathbf{J},{\mathbf{Y}}_1,{\mathbf{Y}}_2\right)={\left\Vert \mathbf{Z}\right\Vert}_{\ast }+\lambda \left\Vert \mathbf{J}\right\Vert {}_1+\beta tr\left(\mathbf{Z}\mathbf{L}{\mathbf{Z}}^{\mathrm{T}}\right)+\gamma \left\Vert \mathbf{E}\right\Vert {}_1\\ {}\kern7.25em +\left\langle {\mathbf{Y}}_1,\mathbf{X}-\mathbf{XZ}-\mathbf{E}\right\rangle +\left\langle {\mathbf{Y}}_2,\mathbf{Z}-\mathbf{J}\right\rangle \\ {}\kern7.5em +\frac{\mu }{2}\left({\left\Vert \mathbf{X}-\mathbf{XZ}-\mathbf{E}\right\Vert}_F^2+{\left\Vert \mathbf{Z}-\mathbf{J}\right\Vert}_F^2\right),\end{array}} $$
(12)
Here, ‖⋅‖F is the Frobenius-norm; λ, β, λ and μ are the penalty parameters; 〈A, B〉 = tr(ATB) represents the Euclidean inner product between the two matrices, and Y1 and Y2 are Lagrangian multipliers.
According to the LADMAP method, problem (12) is divided into three problems. They are as follows:
$$ {\displaystyle \begin{array}{l}{\ell}_1\left(\mathbf{Z}\right)={\left\Vert \mathbf{Z}\right\Vert}_{\ast }+\beta tr\left(\mathbf{Z}\mathbf{L}{\mathbf{Z}}^{\mathrm{T}}\right)+\left\langle {\mathbf{Y}}_1,\mathbf{X}-\mathbf{XZ}-\mathbf{E}\right\rangle \\ {}\kern2.25em +\left\langle {\mathbf{Y}}_2,\mathbf{Z}-\mathbf{J}\right\rangle +\frac{\mu }{2}\left({\left\Vert \mathbf{X}-\mathbf{XZ}-\mathbf{E}\right\Vert}_F^2+{\left\Vert \mathbf{Z}-\mathbf{J}\right\Vert}_F^2\right),\end{array}} $$
(13)
$$ {\ell}_2\left(\mathbf{E}\right)=\gamma \left\Vert \mathbf{E}\right\Vert {}_1+\left\langle {\mathbf{Y}}_1,\mathbf{X}-\mathbf{XZ}-\mathbf{E}\right\rangle +\frac{\mu }{2}{\left\Vert \mathbf{X}-\mathbf{XZ}-\mathbf{E}\right\Vert}_F^2, $$
(14)
$$ {\ell}_3\left(\mathbf{J}\right)=\lambda \left\Vert \mathbf{J}\right\Vert {}_1+\left\langle {\mathbf{Y}}_2,\mathbf{Z}-\mathbf{J}\right\rangle +\frac{\mu }{2}{\left\Vert \mathbf{Z}-\mathbf{J}\right\Vert}_F^2. $$
(15)
Problem (13) can be replaced by solving the following problem (16):
$$ \min {\left\Vert \mathbf{Z}\right\Vert}_{\ast }+\left\langle {\nabla}_{\mathbf{Z}}\mathbf{q}\left({\mathbf{Z}}_k\right),\mathbf{Z}-{\mathbf{Z}}_k\right\rangle +\frac{\eta_1}{2}{\left\Vert \mathbf{Z}-{\mathbf{Z}}_k\right\Vert}_F^2, $$
(16)
where \( {\nabla}_{\mathbf{Z}}\mathbf{q}\left({\mathbf{Z}}_k\right)=\beta \left({\mathbf{Z}}_k{\mathbf{L}}^{\mathrm{T}}+{\mathbf{Z}}_k\mathbf{L}\right)+{\mu}_k\left({\mathbf{Z}}_k-{\mathbf{J}}_k+{\mathbf{Y}}_2^k/{\mu}_k\right)+{\mu}_k{\mathbf{X}}^{\mathrm{T}}\left(\mathbf{X}{\mathbf{Z}}_k-\mathbf{X}+{\mathbf{E}}_k-{\mathbf{Y}}_1^k/{\mu}_k\right) \), \( {\eta}_1=2\beta {\left\Vert \mathbf{L}\right\Vert}_2+{\mu}_k\left(1+{\left\Vert \mathbf{X}\right\Vert}_2^2\right) \).
We use the following Lemma-1 to solve problem (16). Chen et al. have given the rigorous mathematical derivations and detailed proofs for this theorem [16].
Lemma 1
Given a square matrix Q ∈ ℝn × n, there is a unique closed form solution to solve optimization problem (17).
$$ {\mathbf{P}}^{\ast }=\arg \underset{\mathbf{p}}{\min}\frac{1}{\mu }{\left\Vert \mathbf{P}\right\Vert}_{\ast }+\frac{1}{2}{\left\Vert \mathbf{P}-\mathbf{Q}\right\Vert}_F^2\kern0.75em \mathrm{s}.\mathrm{t}.\mathbf{P}={\mathbf{P}}^{\mathrm{T}}. $$
(17)
It is as follows:
$$ {\mathbf{P}}^{\ast }={\mathbf{U}}_r\left({\boldsymbol{\Sigma}}_r-\frac{1}{\mu}\cdot {\mathbf{I}}_r\right){\mathbf{V}}_r^{\mathrm{T}}. $$
(18)
Here, Σr, Ur and Vr are obtained using \( \overset{\sim }{\mathbf{Q}}={\mathbf{U}}_r{\boldsymbol{\Sigma}}_r{\mathbf{V}}_r^{\mathrm{T}} \), which is the skinny SVD of the symmetric matrix \( \overset{\sim }{\mathbf{Q}} \). In addition, Σr = diag (σ1, σ2, ..., σr) with \( \left\{r:{\sigma}_r>\frac{1}{\mu}\right\} \) are the positive singular values of matrix \( \overset{\sim }{\mathbf{Q}} \). Ur and Vr are the singular vectors of matrix \( \overset{\sim }{\mathbf{Q}} \). Matrix \( \overset{\sim }{\mathbf{Q}} \) is obtained by \( \overset{\sim }{\mathbf{Q}}=\left(\mathbf{Q}+{\mathbf{Q}}^{\mathrm{T}}\right)/2 \) and the skinny SVD only keeps the positive singular values of the normal SVD. Ir is an identity matrix.
According to Lemma-1, we set Q = Zk − ∇Zq(Zk)/η1. Then, we solve problem (16) by using the iterative formula (19):
$$ {\mathbf{Z}}_{k+1}^{\ast }={\Theta}_{\frac{1}{\eta_1}}\left(\frac{\mathbf{Q}+{\mathbf{Q}}^{\mathrm{T}}}{2}\right). $$
(19)
Here, \( {\Theta}_{\varepsilon}\left(\mathbf{A}\right)={\mathbf{U}}_r{\mathbf{S}}_{\varepsilon}\left({\boldsymbol{\Sigma}}_r-\frac{1}{\mu_k}\cdot {\mathbf{I}}_r\right){\mathbf{V}}_r^{\mathrm{T}} \) and Sε(x) = sgn(x) max(| x| −ε, 0).
We update Ε and J by minimizing ℓ2(E) and ℓ3(J). And, Ε and J are independent of each other in this minimization problem. And then, based on a singular value thresholding algorithm, we obtain the iterative formulas of Ε and J. We set \( \frac{\partial {\ell}_2}{\partial \mathbf{E}}=0 \) and \( \frac{\partial {\ell}_3}{\partial \mathbf{J}}=0 \), respectively. Then, we obtain the following equations.
$$ \frac{\partial {\ell}_2}{\partial {\mathbf{E}}_k}={\mu}_k\left({\mathbf{E}}_k-\mathbf{X}+\mathbf{X}{\mathbf{Z}}_{k+1}-{\mathbf{Y}}_1^k/{\mu}_k\right)=0\Rightarrow {\mathbf{E}}_k=\mathbf{X}-\mathbf{X}{\mathbf{Z}}_{k+1}+{\mathbf{Y}}_1^k/{\mu}_k, $$
(20)
$$ \frac{\partial {\ell}_3}{\partial {\mathbf{J}}_k}={\mu}_k\left[{\mathbf{J}}_k-\left({\mathbf{Z}}_{k+1}+{\mathbf{Y}}_2^k/{\mu}_k\right)\right]=0\kern0.5em \Rightarrow {\mathbf{J}}_k={\mathbf{Z}}_{k+1}+{\mathbf{Y}}_2^k/{\mu}_k. $$
(21)
According to the NSHLRR method [24] and the singular value thresholding algorithm [39], the iterative formulas of E and J are as follows:
$$ {\mathbf{E}}_{k+1}=\Psi {}_{\frac{\gamma }{\mu_k}}\left(\mathbf{X}-\mathbf{X}{\mathbf{Z}}_{k+1}+{\mathbf{Y}}_1^k/{\mu}_k\right), $$
(22)
$$ {\mathbf{J}}_{k+1}=\max \left\{{\Psi}_{\frac{\lambda }{\mu_k}}\left({\mathbf{Z}}_{k+1}+\frac{1}{\mu_k}{\mathbf{Y}}_2^k\right),0\right\}. $$
(23)
Based on the above, we discuss the time complexity of sgLRR compared to the original LRR. As described in [36], the complexity of LADMAP method for LRR is O(rmn), where r is the rank of the matrix Z, m and n is the size of observation data matrix X ∈ ℝm × n. For sgLRR method, the construction of the k-nearest neighbor graph is O(n2m). Therefore, the complexity of sgLRR is O(rmn + n2m).
Finally, Algorithm 1 provides the complete sgLRR algorithm.
sgLRR method combined with the Ncuts clustering method
We obtain the lowest rank representation matrix Z∗ by Algorithm 1. The obtained matrix Z∗ inherits and improves the grouping effect of the LRR method. The symmetric property of matrix Z∗ strictly reflects the similarity of the data samples, and the data samples that belong to the same group highlight the same subspace of matrix Z∗. However, as mentioned in [9], the complex application may fail in the lowest rank representation matrix Z∗ and in fully using the information within matrix Z∗. Therefore, we combine the sgLRR method with the Ncuts clustering method to guarantee correct segmentation results.
First, we learn an affinity matrix H that is the link between the sgLRR method and the Ncuts clustering method. The affinity matrix H utilizes the angular similarity information of the principal direction of matrix Z∗, and matrix H is a similar undirected graph that further improves the grouping effect. The process below can be defined as learning matrix H.
- 1.
The matrix Z∗ is decomposed into Z∗ = U∗Σ∗(V∗)T using skinny SVD.
- 2.
Define the matrix M = U∗(Σ∗)1/2 or the matrix N = (Σ∗)1/2(V∗)T. Because the matrix Z∗ is a symmetrical matrix, both matrix M and matrix N are equivalent for leaning the affinity matrix H.
- 3.
The element of the affinity matrix H is calculated using function (24).
$$ {\mathbf{H}}_{ij}={\left(\frac{{\mathbf{m}}_i^{\mathrm{T}}{\mathbf{m}}_j}{{\left\Vert {\mathbf{m}}_i\right\Vert}_2{\left\Vert {\mathbf{m}}_j\right\Vert}_2}\right)}^2\kern0.75em \mathrm{or}\kern1em {\mathbf{H}}_{ij}={\left(\frac{{\mathbf{n}}_i^{\mathrm{T}}{\mathbf{n}}_j}{{\left\Vert {\mathbf{n}}_i\right\Vert}_2{\left\Vert {\mathbf{n}}_j\right\Vert}_2}\right)}^2, $$
(24)
where mi is the i-th row of M, and ni is the i-th row of N.
Next, we adopt the Ncuts clustering method to produce the final data sample clustering results. The Ncuts clustering method was proposed by Shi et al. and is closely related to graph theory [40]. This approach can well reflect the degree of similarity within classes and the degree of dissimilarity between classes. This approach has been successfully applied in image segmentation and has numerous successful examples in different fields and datasets, such as gene expression overlapping clustering based on the penalized weighted normalized cut [41].
Finally, we briefly summarize the process of the multi-cancer sample clustering algorithm. It is as follows.