 Research
 Open access
 Published:
Joint learning sample similarity and correlation representation for cancer survival prediction
BMC Bioinformatics volume 23, Article number: 553 (2022)
Abstract
Background
As a highly aggressive disease, cancer has been becoming the leading death cause around the world. Accurate prediction of the survival expectancy for cancer patients is significant, which can help clinicians make appropriate therapeutic schemes. With the highthroughput sequencing technology becoming more and more costeffective, integrating multitype genomewide data has been a promising method in cancer survival prediction. Based on these genomic data, some dataintegration methods for cancer survival prediction have been proposed. However, existing methods fail to simultaneously utilize feature information and structure information of multitype genomewide data.
Results
We propose a Multitype Data Joint Learning (MDJL) approach based on multitype genomewide data, which comprehensively exploits feature information and structure information. Specifically, MDJL exploits correlation representations between any two data types by crosscorrelation calculation for learning discriminant features. Moreover, based on the learned multiple correlation representations, MDJL constructs sample similarity matrices for capturing global and local structures across different data types. With the learned discriminant representation matrix and fused similarity matrix, MDJL constructs graph convolutional network with Cox loss for survival prediction.
Conclusions
Experimental results demonstrate that our approach substantially outperforms established integrative methods and is effective for cancer survival prediction.
Introduction
Cancer has been becoming the leading death cause all over the world, which seriously affects human health and living quality [1, 2]. In addition, the mortality rates increase year by year [3,4,5]. Prognosis prediction can aid physicians significantly in making decisions about care and treatment of cancer patients [6, 7]. Prognosis prediction usually can be described as a censored survival analysis problem, which predicts whether and when a death will occur within a given time period [8, 9]. In the past few decade, many survival prediction methods have been proposed, such as standard Cox regression and its extensions [10], treebased ensemble methods, random survival forests [11], and so on.
Historically, cancer survival prediction works mainly based on histopathological descriptors and lowdimensional clinical data, such as sex, age at diagnosis, cancer grade detail, body fat rate and other clinical features [12,13,14]. However, clinical practice has found that genomic data tends to contain more molecular biomarkers associated with cancer and thereby can describe the cancer more comprehensively [15, 16]. Meanwhile, with the advance of Human Genome Project, highthroughput sequencing technology becomes costeffective, which makes it progressively easier to achieve multiple and diverse genomescale data sets to address clinical and biological questions [17]. In general terms, the above multitype data describing the same cancer can be regarded as multimodal data. Specifically, multimodal data has two basic characteristics [18,19,20]. On the one hand, they share the common information both in feature level and structure level. On the other hand, each modality has its own specific information both in feature level and structure level. Compared with single genetic data type, multiple genomescale data sets can capture more comprehensive information for cancer. Therefore, it is essential and feasible to develop new dataintegration algorithms especially for utilizing multitype highdimensionality genomic data to capture comprehensive information for cancer.
Motivation
During the past several years, many researchers have been devoted to construct dataintegration methods based on binary classification models for cancer survival prediction. In this technology, cancer patients are usually classified to the short or long survival group according to a predefined threshold (e.g., 3 years). For example, Zhang et al. [21] presented a multiple kernel machine learning method combined with minredundancy maxrelevance (mRMR) feature selection algorithm to predict 2year survival rate of glioblastoma multiforme patients. Zhao et al. [22] studied various prediction methods including ensemble models (Gradient Boosting and Random Forest), support vector machine and artificial neural networks to predict 5year survival rate of breast cancer by fusing gene expression data, clinical data and pathological images. Unfortunately, this technology reduces the survival analysis to a classification problem, which is counterpractical and far less useful than the estimation of survival times. Another mainstream technology for survival prediction is survival risk regression based methods, such as Cox proportional hazards (CoxPH) model [23, 24]. Different from binary classification methods, this technology focuses on whether a patient survives at a certain time point rather than when the patient dies, which can handle both uncensored and censored samples. Therefore, patients who survive at a certain time point can be used in modelling patient survivals [25].
Although existing works have promoted the development of dataintegration methods in cancer survival prediction, there are two limitations to develop this technology: (i) simultaneously utilizing structure information and feature information, specifically for small scale dataset; (ii) fully utilizing multitype data for learning effective discriminant features. Here, structure information points to the information of data distribution within data types. Feature information refers to the information contained in the data (such as genes) within a sample. Discriminant features refer to the features learned from original data (such as gene sequences) by utilizing feature learning algorithms, which is useful to separate the samples with different survival time [26]. Existing dataintegration methods for cancer survival prediction have yet to address all of these limitations together. In addition, with excellent feature learning ability, the neural network extension of the Cox model has proved its better performance than traditional CoxPH models in survival prediction, especially for highthroughput sequencing data. Hence, we intend to apply it to our work. In addition, we introduce similarity matrix to exploit structure information, which can access structural information hidden in multitype data.
Inspired by the above analysis, we intend to design a Multitype Data Joint Learning (MDJL) approach to obtain a reliable similarity matrix for exploiting structure information and an effective discriminant feature representation for exploiting feature information. In our proposed MDJL, (a) structure information and feature information can be simultaneously utilized; (b) the discriminant feature representations are exploited by learning correlation representations between any two data types, which can ensure the diversity and provide complementary information; (c) the constructed similarity matrices can explore useful structure information even from a smallscale samples.
Contribution
The main contributions of our approach lie in three aspects:

1.
Different from existing survival prediction methods, we present a Multitype genomewide Data Joint Learning (MDJL) approach for cancer survival prediction, which achieves both a fused similarity matrix and an integrated discriminant feature representation for simultaneously utilizing structure information and feature information.

2.
MDJL exploits correlation representations between any two data types by crosscorrelation calculation for learning discriminant features. Moreover, based on the learned correlation representations, MDJL constructs sample similarity matrices for capturing global and local structures across different data types. With the learned discriminant representations and similarity matrices, MDJL constructs graph convolutional network with Cox loss for survival prediction.

3.
We conduct a number of experiments on four public cancer datasets. Experimental results show that our approach can achieve higher prediction performance than competing methods. Further investigation not only demonstrate the effectiveness of each component for MDJL, i.e., correlation representations extraction component and similarity matrices construction component, but also indicate the robustness.
Organization
The rest of this paper is organized as follows: Sect. Motivation reviews related cancer survival prediction works. The proposed approach and detailed algorithm are introduced in Sect. Contribution. Section Organization talks about the experimental results. Section Related works conducts further experiments to investigate our approach. Section Binary classification based survival prediction works concludes this paper.
Related works
Binary classification based survival prediction works
In the past few decades, a variety of binary classification based multimodal learning methods for survival prediction have been proposed. In general terms, a modality refers to a kind of data type. These methods mainly focus on learning fused representation from multiple data sources, such as clinical data, histopathological images markers and genomic data [27,28,29,30,31]. With multiple types of data, some dataintegration strategies such as jointbased strategy [32, 33] and alignmentbased strategy [34,35,36] have been presented. Jointbased methods utilize multitype data mainly by concatenating multitype data into one unified feature matrix. For example, Sun et al. [37] presented a triple model DNN to respectively learn feature representations from gene expression, copy number alteration and clinical data, and then concatenated the learned multiple representations into one unified matrix. To explore the inherent relation between samples and multitype genomic data, Gao et al. [38] constructed bipartite graphs between patients and gene expression, copy number alteration. Khademi et al. [39] integrated microarray data and clinical data through the probabilistic graph model for prognosis of breast cancer. Methods based on alignment strategy utilize multiple types of data by maximizing the common information across different data types. For example, Wang et al. [40] designed a clusterboosted multitask learning approach to exploit the common information across different data types for survival analysis. Although these methods have promoted the development of multimodal cancer survival analysis, they are limited to binary classification problem and are counterpractical.
Survival risk regression based survival prediction works
Different from binary classification methods, the survival risk regression methods aim to calculate a risk score for each patient, typically with the CoxPH model and its extensions [41,42,43]. For example, to predict an individual survival time, Baek et al. [44] achieved this by integrating hazard network and a distribution function network. Wang et al. [45] proposed a reweighted LassoCox model for cancer survival prediction, which improves the generalization ability of the model by weighing the topologically important genes based on random walk. Considering there are correlations between multitype genomic data, Bichindaritz et al. [46] presented an adaptive multitask learning approach for breast cancer survival prediction, which add an auxiliary ordinal loss to the Cox model.
Recently, with the excellent data representation ability and high learning ability, a variety of deep neural networks extension of the CoxPH model has been proposed [47,48,49,50]. For example, instead of learning linear relationship in the CoxPH model, both DeepSurv [51] and Coxnnet [52] introduce neural networks to learn nonlinear feature representation. To fully utilize multiomics data, Tong et al. [53] designed a concatenation autoencoder to concatenate the learned multiple hidden representations from each data type. In addition, to achieve the consensus representation across multiomics data, they designed a crossmodality autoencoder to maximize the agreement across modalities. Cheerla et al. [54] presented an unsupervised encoder extension of the Cox model to integrate multitype data into one single feature matrix, which introduces similarity loss to force four data sources align the common information. To eliminate the estimation bias in processing such datasets with a large number of censored samples, Zhang et al. [55] introduced Bayesian Perturbation to approximate the prior knowledge of censored samples to optimize the training process of model. To address the limitation that deep networks tend to fall into overfitting with small sample size high feature dimension, Qiu et al. [56] present a metalearning approach based on neural networks for cancer survival prediction. In addition, Kvamme et al. [57] imposed \(L_{1}\) and \(L_{2}\) regulation terms on the network parameters to reduce the overfitting problem. However, these methods mainly exploit feature information but fail to exploit useful structure information.
Similarity matrices construction works
Similarity matrix construction has been widely used in multiview clustering tasks. Usually, existing methods tend to construct similarity matrix for each data types, based on which they learn a shared similarity matrix of all data types. For example, Zhan et al. [58] learned the consensus similarity graph by minimizing disagreement between different views with a disagreement cost function. To address the limitation that incomplete multiview clustering fails to exploit hidden information of missing views and handle the information imbalance across different views, Wen et al. [59] designed adaptive weights to balance the importance of different views. Wang et al. [60] designed a multiview subspace clustering approach, which adopts the HilbertSchmidt Independence Criterion to enforce the similarity of similarity matrix have maximum dependence. Chen et al. [61] designed a nonlinear method for multiview clustering, which jointly learn kernel representation matrix and similarity matrix. Zhang et al. [62] presented an anchorbased approach for multiview semisupervised, which constructs the affinity graphs by using an anchorbased strategy and obtains the optimal consensus graph by using feature and label information. Considering that original multiview data often contain abundant noise and outliers, Xie et al. [63] learned latent feature representation based on the adaptively learned graph. It also introduces Laplacian embedding to maintain the local manifold structure. Zhang et al. [64] constructed a unified similarity matrix for multiple views by utilizing a latent representation explored from the underlying complementary information. Huang et al. [65] integrated similarity learning and local embedding into a unified framework, which constructs a fused similarity matrix and learns a latent lowdimensional representation for capturing the underlying structure. For preserving global structures and obtaining local structures, Wan et al. [66] proposed an embedding method for multiview clustering, which integrates all views into a combination weight matrix for maintaining global structures and imposes constraint on the learned shared affinity matrix for obtaining the local structure.
Proposed method
In this paper, we propose a Multitype Data Joint Learning (MDJL) approach for cancer survival prediction based on multitype genomewide data. Specifically, instead of exploiting common feature information shared by all data types, we exploit correlation/common feature information between any two data types for exploring diverse and complementary feature information across multiple data types. Secondly, we fully utilize the global and local structure to construct similarity matrices based on the learned multiple correlation representations. Here, global structure refers to the similar structure information across different data types, local structure refers to the neighborhood information within data types. The main architecture of our MDJL approach is illustrated in Fig. 1. MDJL consists of four components: (1) correlation representations extraction component, which is designed for utilizing diverse and complementary feature information across multiple data types by learning correlation representations between any two data types; (2) discriminant representations generation component, which is designed for fusing multiple correlation representations by concatenation; (3) similarity matrices construction component, which is designed for generating sample similarity matrix by fully utilizing both global and local structure across different data types; and (4) graph convolutional network construction component, which is used for predicting the survival risk for patients. Key notations used in this paper are listed in Table 1.
Correlation representations extraction
Suppose there are N samples and V different data types. Let \({\textbf {x}}^{v}=\left\{ x_{i}^{v}\in \mathbb {R}^{d_{v}} \right\} _{i=1}^{N}\) be the sample set of the vth data type, and \(x_{i}^{v}\) represents the ith sample of data type v, \(d_v\) is the feature dimensionality of \({\textbf {x}}^{v}\), where \(v=1,2,\ldots ,V\). For correlation representation extraction, we firstly define V neural networks \(\left\{ f_{v}\right\} _{v=1}^{V}\) to conduct feature learning and project \({\textbf {x}}^{v}\) from space \(\mathbb {R}^{d_{v}}\) into space \(\mathbb {R}^{d}\), that is,
where \({\textbf {y}}^{v}\in \mathbb {R}^{d\times N}\), and \(f_{v}\) points to a neural network with \(L=3\) layers,
For the lth layer \(\left( l = 1,2,\ldots ,L\right)\), \({\textbf {w}}_{f_{v}}^{l}\in \mathbb {R}^{m_{l}\times m_{l1}}\) denotes the weight matrix \(\left( m_{0} = d_{v}, m_{L}=d\right)\), \({\textbf {b}}_{f_{v}}^{l}\in \mathbb {R}^{m_{l}}\) is the bias vector, \({\textbf {h}}_{f_{v}}^{l}\in \mathbb {R}^{m_{l}}\) denotes the output of the lth layer \(\left( {\textbf {h}}_{f_{v}}^{0}={\textbf {x}}^{v}, {\textbf {h}}_{f_{v}}^{L}={\textbf {y}}^{v}\right)\), and \(\sigma\) is the acivation function.
To further explore the correlation representations between any two data types, we borrow correlation computation proposed in [67]. Following work [67], for the ith sample, the interactive map \(\chi _{i}^{v,u}\) of \(y_{i}^{v}\) and \(y_{i}^{u}\) can be defined as,
where \(v\ne u\), \(\otimes\) is outer product, \(\chi ^{v,u}=\left\{ \chi _{i}^{v,u}\in \mathbb {R}^{d\times d} \right\} _{i=1}^{N}\), \(\chi ^{v,u} = \chi ^{u,v}\).
Based on the interactive map set, we further construct a set of neural networks \(\psi =\left\{ \psi _{v,u}\right\} _{v,u=\left\{ 1,\ldots ,V\right\} ,v\ne u}\) to project each \(\chi ^{v,u}\) from space \(\mathbb {R}^{d\times d}\) into an embedded space \(\mathbb {R}^{d}\), which learns deep correlation representations between any two data types. That is,
where \({\textbf {y}}^{v,u}\in \mathbb {R}^{d\times N}\) is the correlation representation of \({\textbf {x}}^{v}\) and \({\textbf {x}}^{u}\), \({\textbf {w}}_{\psi _{v,u}}\in \mathbb {R}^{d\times d^{2}}\), \({\textbf {b}}_{\psi _{v,u}}\in \mathbb {R}^{d}\), \(\text {vec}\left( \cdot \right)\) represents the vectorization of a matrix.
Discriminant representations generation
Based on the above subsections, we have learned multiple correlation representations from multiple data types. The finally fused correlation feature representation from all pairwise data types can be written as,
Similarity learning of global and local structure
As mentioned above, MDJL aims to learn a fused similarity matrix based on multitype data. The reliability of the similarity matrices constructed from raw data may be polluted severely by noise and outliers. To enhance the ability to resist noise and outliers, we construct similarity matrices based on the learned multiple correlation representations. By correlation information learning, we collect M different correlation feature representations \(\left\{ {\textbf {o}}^{m}={\textbf {y}}^{v,u}\in \mathbb {R}^{d\times N}\right\} _{m=1}^{M}\), where \(M=V\left( V1 \right) /2\). Based on the multiple correlation representations, similarity learning of global and local structure aims to capture a fused similarity matrix, which preserves sufficient local structure information of samples as well as maintains global structure across different data types. First, we construct the similarity matrix \({\textbf {W}}^{m}=\left[ W^{m}(i,j) \right] _{N\times N}\) for the mth correlation representation \({\textbf {o}}^{m}\) by Gaussian kernel. \(W^{m}(i,j)\) represents the similarity between sample \(x_{i}^{m}\) and \(x_{j}^{m}\) in the mth correlation representation. To integrate these similarity matrices constructed from multiple correlation representations, we introduce a normalized weight matrix \(P^{m}\) as follows:
where \(\sum _{j=1}^{N}P^{m}\left( i,j \right) =1\).
In order to measure local similarity, we design a sparse kernel based on K nearest neighbors (KNN), that is:
where \(N_{i}^{m}\) is a set of neighbors for \(y_{j}^{m}\). This operation sets the similarities of samples that are nonneighboring to zero, which bases on pairwise samples similarity values.
To obtain fused similarity matrix, we iteratively update \(P^{m}\) with its corresponding local similarity matrix \(S^{m}\) and the similarity matrix \(\left\{ P^{u}\right\} _{u=\left\{ 1,\ldots ,M \right\} \setminus m}\) of other data types, so that the updated \(P^{m}_{m=1}^{M}\) can be more similar to each other, at the same time, local similarity information can also be preserved.
For mth correlation representation, we iteratively update \(P^{m}\) as follows:
After T iterations, the learned \(P^{m}_{m=1}^{M}\) would be enough similar to each other. Then the fused similarity matrix can be defined as the average of \(P^{m}_{m=1}^{M}\), that is:
Graph convolutional network
According to correlation representations learning, we obtain the fused discriminant representation matrix \({\textbf {y}}\). According to similarity matrices construction, we obtain the fused similarity matrix P. Then the \({\textbf {y}}\) and P were used as the input of graph convolutional network for model training and prediction. In this paper, we construct the graph convolutional network \(G = f({\textbf {y}},P)\) with three layers for training and prediction, that is,
where \(\tilde{P}=P+I_{N}\) denotes the adjacency matrix of the undirected graph G with added selfconnections. \(I_{N}\) represents identity matrix, \(\tilde{D}_{(i,i)}= \sum _{j}\tilde{P}_{(i,j)}\), \(W_{g}^{l}\) is trainable weight matrix of the lth layer, \(H_{g}^{l}\) points to the matrix of activations in the lth layer (\(H_{g}^{0}={\textbf {y}}\)), and \(\sigma\) is the activation function.
To describe the effectiveness of quantitative variables on survival time, we introduce Cox loss as loss function [25], that is,
where \(\phi _{i}\) denotes the log hazard ratio for sample i, \(z_{i}\) denotes the learned vector from graph convolutional network, \(\beta\) represents coefficient weight vector between \(z_{i}\) and the output \(\phi _{i}\). C(i) is the censorship flag. If sample i is uncensored sample, \(C(i)=1\), otherwise, if sample i is censored sample, \(C(i)=0\). \(t_{i}\) points to the survival time for patient i, where patient i should be uncensored samples. \(t_{j}\geqslant t_{i}\) points to the survival time of jth sample is longer than that of ith sample, where patient j can comes from either uncensored samples or censored samples.
Optimization
Feedforward and calculate the loss
For each of the V data types, the sample set \({\textbf {x}}^{v}\) are fed forward to the MDJL as in Eq. 1, and the output of the MDJL is denoted as \(\left\{ z_{i} \right\} _{i=1}^{N}\). The loss of the whole network is calculated as in Eq. 11, denoted as \(L\left( \beta \right) =\sum _{i:C(i)=1}\left[ \phi _{i} log\left( \sum _{t_{j}\geqslant t_{i} } e^{\phi _{j} }\right) \right]\).
Update neural networks
\(\left\{ \left\{ f_{v} \right\} _{v=1}^{V},\left\{ \psi _{v,u}\right\} _{v,u=\left\{ 1,\ldots ,V\right\} ,v\ne u},G \right\}\). The network parameters of \(\left\{ \left\{ f_{v} \right\} _{v=1}^{V},\left\{ \psi _{v,u}\right\} _{v,u=\left\{ 1,\ldots ,V\right\} ,v\ne u},G \right\}\) can be jonintly optimized by minimizing Eq. 11. We perform batch gradient descent with the whole dataset in each iteration for network training.
Algorithm 1 Algorithm for MDJL  

Input: sample set \(\left\{ {\textbf {x}}^{v}\in \mathbb {R}^{d_{v}\times N}\right\} _{v=1}^{V}\), sample survival time set, sample survival status set.  
Initialize: hyperparameters K, T.  
Update until convergence:  
Forward propagation:  
1. Perform \(f_{v}\) with Eq.1 and then obtain \({\textbf {y}}^{v}\).  
2. Compute interactive map \(\chi ^{v,u}\) with Eq.3.  
3. Obtain correlation representations \({\textbf {y}}^{v,u}\) with Eq.4.  
4. Obtain fused correlation representations with Eq.5.  
5. Construct normalized weight matrix \(P^{m}_{m=1}^{M}\) with Eq.6.  
6. Construct sparse kernel matrix \(S^{m}_{m=1}^{M}\) with Eq.7.  
7. Iteratively update \(P^{m}_{m=1}^{M}\) with Eq.8.  
8. Obtain fused similarity matrix P with Eq.9.  
9. Construct gaph convolutional network G with Eq.10.  
Back propagation:  
Update network parameters of \(\left\{ f_{v}\right\} _{v=1}^{V}\), \(\left\{ \psi ^{v,u} \right\} _{v,u=\left\{ 1,2,\ldots ,V\right\} ,v\ne u}\)  
and G by minimizing Eq.11.  
Output: The predicted hazard ratios of testing samples. 
Algorithm 1 describes the process of cancer survival prediction by using MDJL.
Experiments
Datasets
Four cancer datasets^{Footnote 1} including glioblastoma multiforme (GBM), kidney renal clear cell carcinoma (KRCCC), lung squamous cell carcinoma (LSCC) and breast invasive carcinoma (BIC) are used to evaluate our MDJL approach. For each dataset, we collect three types of genomic data, including DNA methylation, mRNA expression and miRNA expression data. The datasets used in this paper are obtained from http://compbio.cs.toronto.edu/SNF/, which are provided and preprocessed by work [68]. It downloads these data from The Cancer Genome Atlas (TCGA) website and performs three steps of preprocessing: sample selection, missingdata imputation and normalization. Detailed preprocessing process is described as follows: (i) if one patient sample has more than 20% missing data in any data type, then this sample will be removed; (ii) if a certain gene has more than 20% missing values, then this gene will be filtered, otherwise, the knearest interpolation is used for complementing this gene; (iii) the zscore transformation is used for normalizing the data samples. Table 2 summaries the detailed information of datasets used in experiments. Figure 2 describes the survival time distribution for each cancer, which is represented by box plot.
Experimental settings
Compared methods
To evaluate the performance of our MDJL approach, we compare it with several stateoftheart cancer survival prediction methods:

MKL + Cox loss (MKLCox). MKL is a multiple kernel learning based binary classification method for cancer survival prediction, which fuses multitype data using joint strategy [21]. For a fair comparison, we extend MKL with Cox loss.

MDNNMD + Cox loss (MDNNMDCox). MDNNMD is a multimodal deep neural network based binary classification method for cancer survival prediction, which fuses multitype data using joint strategy [37]. For a fair comparison, we extend MDNNMD with Cox loss.

DLMR. DLMR is a multimodal deep neural network extension of the Cox model for cancer survival prediction, which fuses multitype data using alignment strategy [54].

CrossAE. CrossAE is a crossmodality autoencoder based survival prediction method for utilizing the consensus representations across multitype data [53].

VAECox. VAECox is a deep transfer learning architecture for cancer survival prediction based on alignment strategy [25].

DeepSurv. DeepSurv is a deep learning generalization of the Cox proportional hazards model, which predicts survival risks based on singletype data [51]. For comparison, we use the unified feature matrix concatenated from DNA, mRAN and miRAN as the input for DeepSurv.
The implementations of MDNNMDCox, DLMR, CrossAE, VAECox and DeepSurv are downloaded from the websites provided by their authors. With there are no public codes for MKLCox, we implement MKLCox by ourselves.
Implementation details
All these methods are evaluated on GBM, KRCCC, LSCC and BIC datasets. For each cancer dataset, we randomly select 70% data for training and utilize the rest of 30% for testing. The details of network architecture for MDJL are as follows: For feature learning, we design the networks \(\left\{ f_{v}\right\} _{v=1}^{V}\) with second and third layer of size 512 and 128. For prediction, we construct a threelayer graph convolutional network with hidden layer containing 32 nodes. For the network architecture, we adopt Adam optimizer and set the learning rate as 0.0001. In addition, we set hyperparameters K=20, and T=30 in similarity matrix fusion algorithm. In this paper, the concordance index (Cindex) is adopted to evaluate the performance of the competing survival prediction models, which mainly measures the proportion of all sample pairs for which the predictions and actual results are consistent. In order to guarantee fairness and robustness of research methods, for each dataset, we conduct 20 trials for each compared method, and the average performance of 20 trials is reported. For each trial, we would resplit the training and testing sets with 70% data for training and 30% data for testing, and refit the models. The corresponding Python code for carrying out our method is available at https://github.com/githyr/MDJL_Survival.
Experimental results
The predictive results of all competing methods are reported in Fig. 3, from which we can observe that our MDJL approach outperforms other competing methods on four cancer types in terms of average concordance index (Cindex). In general, compared with the second best method, our approach improves the average prediction performance by 4.40%, 6.30%, 6.90% and 7.2% on the GBM, KRCCC, LSCC and BIC datasets, respectively. The reasons are twofold: Firstly, our approach exploits correlation information between any two data types, which can learn more useful information as well as reduce noise more thoroughly than joint based and alignment based methods. In addition, we further explore structural information, which can help learn effective feature representations with small sample size.
We further investigate our MDJL approach with survival analysis which can be regarded as a statistical method considering both results and survival time. The patient samples for each cancer type would be divided into highrisk and lowrisk groups based on their predicted hazard ratios. For example, a patient sample would be assigned to highrisk group if his hazard ratio is higher than the median hazard ratios of all patient samples, otherwise, he would be included in lowrisk group. We illustrate the KaplanMeier (KM) curves in Fig. 4, which can reflect the survival condition of a group. The survival curve is a broken line, with each step corresponding to a time point of death and each mark pointing to a sample censoring, and P values are computed according to the curves. From the figure, we can observe that the survival probability of each group gradually drops with the increase of survival time, and the Pvalues for GBM, KRCCC, LSCC and BIC are \(3.00\times 10^{5}\), 0.02, 0.03 and \(4.91\times 10^{4}\), respectively, which are all smaller than 0.05. From the KM curves and the Pvalues, we can conclude that our approach can achieve a convinced result for predicting the highrisk or lowrisk of one patient sample.
Further investigation
Effectiveness of correlation representation extraction
In this section, we verify the effectiveness of correlation representation extraction. In this paper, we integrate multiple data types for exploiting discriminant features by exploiting correlation information between any two data types, instead of exploiting common information shared by all data types or directly concatenating original multiple data types. In this paper, we call the version of exploiting common information shared by all data types for learning discriminant feature representations as CIAD, and the version of directly concatenating original multiple data types for learning discriminant feature representations as COMD. For CIAD, we exploit shared feature matrix by constructing feature learning networks for each data type and imposing Euclidean distance constraint between the learned feature representations of any two data types, and construct similarity matrices based on original multiple data types. For COMD, we concatenate original multiple data types into a unified feature matrix, and construct similarity matrices based on original multiple data types.
We perform MDJL, CIAD and COMD on each cancer dataset respectively for 20 trials and record the Cindex score for each performance. For each trial, we would resplit the training and testing sets with 70% data for training and 30% data for testing, and refit the models. Figure 5 illustrates the Cindex for 20 times with box plot. From the figure, we can observe that our approach outperforms the other two versions on four cancer types. As a summary, learning discriminant feature representations by exploiting correlation information between any two data types can achieve better performance than exploiting common information shared by all data types or directly concatenating original multiple data types.
Effectiveness of learning structure information
In this section, we verify the effectiveness of learning structure information based on correlation representations. We respectively perform the model with learning structure information based on correlation representations, the model with learning structure information based on original data, and the model without learning structure information. We call the version that utilizes original multitype data to construct similarity matrices as MDJLOS, and call the version of MDJL without learning structure information as MDJLSI. For MDJLOS, we utilize original multitype data to construct similarity matrices and exploit discriminant feature representations by learning correlation information between any two data types. For MDJLSI, we exploit discriminant feature representations by learning correlation information between any two data types and replace the graph convolutional network with a threelayer fully connected network.
We perform MDJL, MDJLOS and MDJLSI on each cancer dataset respectively for 20 trials and record the Cindex score for each performance. For each trial, we would resplit the training and testing sets with 70% data for training and 30% data for testing, and refit the models. Figure 6 reports the Cindex scores for 20 times with box plot, from which we can see that: (1) the performance for MDJL is better than that for MDJLOS and MDJLSI; (2) the performance for MDJLOS is better than that for MDJLSI. These results in this figure confirm that: (1) compared with only utilizing feature information, joint learning structure information and feature information can achieve better performance; (2) compared with constructing similarity matrices with original data, constructing similarity matrices with the learned correlation features can achieve better performance.
To further investigate the effective of the fused similarity matrices respectively learned from multiple correlation representations, we exhibit the fused similarity matrices of the training sets on four cancer datasets in Fig. 7. From the figure, we can observe that the outline of the similarity matrices learned from multiple correlation representations are obvious than these learned from original multiple data types on all four cancer datasets. The reason is that the original data is unfavorable to the estimation of similarity matrices.
Parameter analysis
In this section, we investigate the sensitivity for hyperparameters K and T with fixing any one hyperparameter and changing the value of another hyperparameter. When K is evaluated, we set T as 50. When T is evaluated, we set K as 20. We repeat each execution 20 times and record the average Cindex. For each trial, we would resplit the training and testing sets with 70% data for training and 30% data for testing, and refit the models. Figure 8 shows the Cindex of our MDJL approach versus different values of K and T on GBM and KRCCC. From the figure, we can observe that the Cindex of MDJL on GBM and KRCCC datasets have a small fluctuation range (< 0.2). In general, the proposed approach is insensitive to hyperparameters K ranging from 5 to 50 and T ranging from 10 to 100.
Computing time
In this section, we use the model training time iterating over all the datasets 200 times to measure the computing time of MDJL and other baselines. Computing time of all compared methods is collected from a computer with an Intel i7 quadcore 3.6GHz CPU, a NVIDIA GTX1080Ti GPU, and 16GB memory. As seen from Table 3, the computing time of MDJL is acceptable.
Conclusion
In this paper, we propose a novel multitype data joint learning approach, and apply it to the cancer survival prediction task. MDJL integrates correlation representation learning, similarity learning and graph convolutional network construction into a unified framework. Correlation feature representations between any two data types are effectively and fully exploited to learn discriminant feature representations. Global and local structure information among samples is fully exploited to learn the relationships among samples.
Extensive experiments on four public cancer datasets demonstrate that our approach can achieve better performance than other competing cancer survival prediction methods. In addition, experiments also demonstrate the effectiveness of the designed modules of our approach.
Availability of data and materials
The datasets generated and analysed during the current study are available with http://compbio.cs.toronto.edu/SNF/.
References
Smith RA, Andrews KS, Brooks D, Fedewa SA, ManassaramBaptiste D, Saslow D, Wender RC. Cancer screening in the united states, 2019: a review of current American cancer society guidelines and current issues in cancer screening. CA Cancer J Clin. 2019;69(3):184–210.
Jemal A, Bray F, Center MM, Ferlay J, Ward E, Forman D. Global cancer statistics. CA Cancer J Clin. 2011;61(2):69–90.
Balacescu O, Balacescu L, Virtic O, Visan S, Gherman C, Drigla F, Pop L, BolbaMorar G, Lisencu C, Fetica B, et al. Blood genomewide transcriptional profiles of her2 negative breast cancers patients. Mediators Inflamm. 2016;2016(2):1–12.
Liao Z, Li D, Wang X, Li L, Zou Q. Cancer diagnosis through isomir expression with machine learning method. Curr Bioinform. 2018;13(1):57–63.
Yu L, Huang J, Ma Z, Zhang J, Zou Y, Gao L. Inferring drugdisease associations based on known protein complexes. BMC Med Genomics. 2015;8(S2):1–13.
Yu L, Ma X, Zhang L, Zhang J, Gao L. Prediction of new drug indications based on clinical data and network modularity. Sci Rep. 2016;6(32530):1–12.
Sun Z, Dong W, Shi J, He K, Huang Z. Attentionbased deep recurrent model for survival prediction. ACM Trans Comput Heal. 2021;2(4):35–13518.
Kim DW, Lee S, Kwon S, Nam W, Cha IH, Kim HJ. Deep learningbased survival prediction of oral cancer patients. Sci Rep. 2019;9(6994):1–10.
Doppalapudi S, Qiu RG, Badr Y. Lung cancer survival period prediction and understanding: Deep learning approaches. Int J Med Inform. 2021;148: 104371.
Zhao L. Deep neural networks for predicting restricted mean survival times. Bioinformatics. 2021;36(24):5672–7.
Delgado R, NúñezGonzález JD, Yébenes JC, Lavado Á. Survival in the intensive care unit: a prognosis model based on Bayesian classifiers. Artif Intell Med. 2021;115: 102054.
Louis DN, Perry A, Reifenberger G, Von Deimling A, FigarellaBranger D, Cavenee WK, Ohgaki H, Wiestler OD, Kleihues P, Ellison DW. The 2016 world health organization classification of tumors of the central nervous system: a summary. Acta Neuropathol. 2016;131(6):803–20.
Ding D, Lang T, Zou D, Tan J, Chen J, Zhou L, Wang D, Li R, Li Y, Liu J, Ma C, Zhou Q. Machine learningbased prediction of survival prognosis in cervical cancer. BMC Bioinform. 2021;22(1):331.
Ksiazek W, Gandor M, Plawiak P. Comparison of various approaches to combine logistic regression with genetic algorithms in survival prediction of hepatocellular carcinoma. Comput Biol Med. 2021;134: 104431.
Wang J, Chen Y. Networkadjusted Kendall’s tau measure for feature screening with application to highdimensional survival genomic data. Bioinformatics. 2021;37(15):2150–6.
Bichindaritz I, Liu G, Bartlett CL. Survival analysis of breast cancer utilizing integrated features with ordinal cox model and auxiliary loss. In: Perner P, editor. ICDM. Ibai Publishing; 2020. p. 105–27.
Yu L, Zhao J, Gao L. Drug repositioning based on triangularly balanced structure for tissuespecific diseases in incomplete interactome. Artif Intell Med. 2017;77:53–63.
Jia X, Jing X, Zhu X, Chen S, Du B, Cai Z, He Z, Yue D. Semisupervised multiview deep discriminant representation learning. IEEE Trans Pattern Anal Mach Intell. 2021;43(7):2496–509.
Li Y, Yang M, Zhang Z. A survey of multiview representation learning. IEEE Trans Knowl Data Eng. 2019;31(10):1863–83.
Wan Y, Sun S, Zeng C. Adaptive similarity embedding for unsupervised multiview feature selection. IEEE Trans Knowl Data Eng. 2021;33(10):3338–50.
Zhang Y, Li A, Peng C, Wang M. Improve glioblastoma multiforme prognosis prediction by using feature selection and multiple kernel learning. IEEE/ACM Trans Comput Biol Bioinf. 2016;13(5):825–35.
Zhao M, Tang Y, Kim H, Hasegawa K. Machine learning with kmeans dimensional reduction for predicting survival outcomes in patients with breast cancer. Cancer Inform. 2018;17:1–7.
Yousefi S, Amrollahi F, Amgad M, Dong C, Lewis JE, Song C, Gutman DA, Halani SH, Vega J, Brat DJ. Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models. Sci Rep. 2017;7:1–11.
Mobadersany P, Wang J, Zhang M, Xu M, Zhang Z. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc Natl Acad Sci. 2018;115:2970–9.
Kim S, Kim K, Choe J, Lee I, Kang J. Improved survival analysis by learning shared genomic information from pancancer data. Bioinformation. 2020;36(1):389–98.
Jing X, Liu Q, Wu F, Xu B, Zhu Y, Chen S. Web page classification based on uncorrelated semisupervised intraview and interview manifold discriminant feature extraction. In: IJCAI. 2015:2255–2261.
Chen W, Lv H, Nie F, Lin H. i6mapred: identifying dna n6methyladenine sites in the rice genome. Bioinformatics. 2019;35(16):2796–800.
Chen W, Yang H, Feng P, Ding H, Lin H. idna4mc: identifying dna n4methylcytosine sites based on nucleotide chemical properties. Bioinformatics. 2017;33(22):3518–23.
Gevaert O, Smet FD, Timmerman D, Moreau Y, Moor BD. Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks. Bioinformatics. 2006;22(14):184–90.
Das J, Gayvert KM, Bunea F, Wegkamp MH, Yu H. Encapp: elasticnetbased prognosis prediction and biomarker discovery for human cancers. BMC Genomics. 2015;16:263.
Xiao Y, Wu J, Lin Z, Zhao X. A deep learningbased multimodel ensemble method for cancer prediction. Comput Methods Progr Biomed. 2018;153:1–9.
Chaudhary K, Poirion OB, Lu L, Garmire LX. Deep learningbased multiomics integration robustly predicts survival in liver cancer. Clin Cancer Res. 2018;24(6):1248–59.
Mishra S, Kaddi CD, Wang MD. Pancancer analysis for studying cancer stage using protein and gene expression data. In: Engineering in Medicine and Biology Society (EMBC). 2016:2440–2443.
Nguyen C, Wang Y, Nguyen HN. Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic. J Biomed Sci Eng. 2013;6(5):551–60.
Li Y, Wang L, Wang J, Ye J, Reddy CK. Transfer learning for survival analysis via efficient l2, 1norm regularized cox regression. In: International Conference on Data Mining, 2016:231–240.
Ching T, Zhu X, Garmire LX. Coxnnet: an artificial neural network method for prognosis prediction of highthroughput omics data. PLoS Comput Biol. 2018;14(4):1–18.
Sun D, Wang M, Li A. A multimodal deep neural network for human breast cancer prognosis prediction by integrating multidimensional data. IEEE/ACM Trans Comput Biol Bioinf. 2018;16(3):841–50.
Gao J, Lyu T, Xiong F, Wang J, Ke W, Li Z. Mgnn: a multimodal graph neural network for predicting the survival of cancer patients. In: ACM SIGIR Conference on Research and Development in Information Retrieval, 2020:1697–1700.
Khademi M, Nedialkov NS. Probabilistic graphical models and deep belief networks for prognosis of breast cancer. In: International Conference on Machine Learning and Applications (ICMLA), 2015:727–732.
Wang L, Chignell MH, Jiang H, Charoenkitkarn N. Clusterboosted multitask learning framework for survival analysis. In: International Conference on Bioinformatics and Bioengineering. 2020:255–262.
Dang X, Huang S, Qian X. Penalized cox’s proportional hazards model for highdimensional survival data with grouped predictors. Stat Comput. 2021;31(6):77.
Li R, Tanigawa Y, Justesen JM, Taylor J, Hastie T, Tibshirani R, Rivas MA. Survival analysis on rare events using groupregularized multiresponse cox regression. Bioinform. 2021;37(23):4437–43.
Zhang W, Zhang Y. Integrated survival analysis of mrna and microrna signature of patients with breast cancer based on cox model. J Comput Biol. 2020;27(9):1486–94.
Baek E, Yang HJ, Kim S, Lee G, Oh I, Kang S, Min J. Survival time prediction by integrating cox proportional hazards network and distribution function network. BMC Bioinform. 2021;22(1):192.
Wang W, Liu W. Integration of gene interaction information into a reweighted lassocox model for accurate survival prediction. Bioinformatics. 2021;36(22–23):5405–14.
Bichindaritz I, Liu G, Bartlett CL. Integrative survival analysis of breast cancer with gene expression and DNA methylation data. Bioinformatics. 2021;37(17):2601–8.
Li X, Krivtsov V, Arora K. Attentionbased deep survival model for time series data. Reliab Eng Syst Saf. 2022;217: 108033.
Hathaway QA, Yanamala N, Budoff MJ, Sengupta PP, Zeb I. Deep neural survival networks for cardiovascular risk prediction: the multiethnic study of atherosclerosis (MESA). Comput Biol Med. 2021;139: 104983.
Hassanzadeh HR, Wang MD. An integrated deep network for cancer survival prediction using omics data. Frontiers Big Data. 2021;4: 568352.
Arya N, Saha S. Multimodal advanced deep learning architectures for breast cancer survival prediction. Knowl Based Syst. 2021;221: 106965.
Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. Deepsurv: personalized treatment recommender system using a cox proportional hazards deep neural network. BMC Med Res Methodol. 2018;18(1):1–12.
Ching T, Zhu X, Garmire LX. Coxnnet: an artificial neural network method for prognosis prediction of highthroughput omics data. PLoS Comput Biol. 2018;14(4): e1006076.
Tong L, Mitchel J, Chatlin K, Wang MD. Deep learning based featurelevel integration of multiomics data for breast cancer patients survival analysis. BMC Med Inform Decis Mak. 2020;20(1):225.
Cheerla A, Gevaert O. Deep learning with multimodal representation for pancancer prognosis prediction. Bioinformatics. 2019;35(14):446–54.
Zhang Z, Chai H, Wang Y, Pan Z, Yang Y. Cancer survival prognosis with deep Bayesian perturbation cox network. Comput Biol Med. 2022;141:105012.
Qiu YL, Zheng H, Devos A, Selby H, Gevaert O. A metalearning approach for genomic survival analysis. Nat Commun. 2020;11(6350):1–11.
Kvamme H, Borgan Ø, Scheel I. Timetoevent prediction with neural networks and cox regression. J. Mach. Learn. Res. 2019;20(129).
Zhan K, Nie F, Wang J, Yang Y. Multiview consensus graph clustering. IEEE Trans Image Process. 2019;28(3):1261–70.
Wen J, Yan K, Zhang Z, Xu Y, Wang J, Fei L, Zhang B. Adaptive graph completion based incomplete multiview clustering. IEEE Trans Multimed. 2021;23:2493–504.
Wang X, Lei Z, Guo X, Zhang C, Shi H, Li SZ. Multiview subspace clustering with intactnessaware similarity. Pattern Recognit. 2019;88:50–63.
Chen Y, Xiao X, Zhou Y. Jointly learning kernel representation tensor and affinity matrix for multiview clustering. IEEE Trans Multimed. 2020;22(8):1985–97.
Zhang B, Qiang Q, Wang F, Nie F. Fast multiview semisupervised learning with learned graph. IEEE Trans Knowl Data Eng. 2022;34(1):286–99.
Xie D, Gao Q, Wang Q, Zhang X, Gao X. Adaptive latent similarity learning for multiview clustering. Neural Netw. 2020;121:409–18.
Zhang C, Fu H, Hu Q, Cao X, Xie Y, Tao D, Xu D. Generalized latent multiview subspace clustering. IEEE Trans Pattern Anal Mach Intell. 2020;42(1):86–99.
Huang A, Chen W, Zhao T, Chen CW. Joint learning of latent similarity and local embedding for multiview clustering. IEEE Trans Image Process. 2021;30:6772–84.
Wan Y, Sun S, Zeng C. Adaptive similarity embedding for unsupervised multiview feature selection. IEEE Trans Knowl Data Eng. 2021;33(10):3338–50.
Xu J, Li W, Liu X, Zhang D, Liu J, Han J. Deep embedded complementary and interactive information for multiview classification. In: AAAI. 2020;6494–6501.
Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M, HaibeKains B, Goldenberg A. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. 2014;11(3):333–7.
Acknowledgements
Not applicable.
Funding
This work was supported by the NSFC Project under Grant Nos. 62176069 and 61933013, the Innovation Group of Guangdong Education Department under Grant No. 2020KCXTD014, the 2019 Key Discipline project of Guangdong Province.
Author information
Authors and Affiliations
Contributions
YH: Conceptualization, Methodology, Writing—Original draft preparation. XYJ: Writing—Reviewing and Editing, Supervision, Data curation. QS: Visualization, Investigation, Software, Validation. All authors have read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Hao, Y., Jing, XY. & Sun, Q. Joint learning sample similarity and correlation representation for cancer survival prediction. BMC Bioinformatics 23, 553 (2022). https://doi.org/10.1186/s12859022051101
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859022051101