DeepMF: deciphering the latent patterns in omics profiles with a deep learning method

Background With recent advances in high-throughput technologies, matrix factorization techniques are increasingly being utilized for mapping quantitative omics profiling matrix data into low-dimensional embedding space, in the hope of uncovering insights in the underlying biological processes. Nevertheless, current matrix factorization tools fall short in handling noisy data and missing entries, both deficiencies that are often found in real-life data. Results Here, we propose DeepMF, a deep neural network-based factorization model. DeepMF disentangles the association between molecular feature-associated and sample-associated latent matrices, and is tolerant to noisy and missing values. It exhibited feasible cancer subtype discovery efficacy on mRNA, miRNA, and protein profiles of medulloblastoma cancer, leukemia cancer, breast cancer, and small-blue-round-cell cancer, achieving the highest clustering accuracy of 76%, 100%, 92%, and 100% respectively. When analyzing data sets with 70% missing entries, DeepMF gave the best recovery capacity with silhouette values of 0.47, 0.6, 0.28, and 0.44, outperforming other state-of-the-art MF tools on the cancer data sets Medulloblastoma, Leukemia, TCGA BRCA, and SRBCT. Its embedding strength as measured by clustering accuracy is 88%, 100%, 84%, and 96% on these data sets, which improves on the current best methods 76%, 100%, 78%, and 87%. Conclusion DeepMF demonstrated robust denoising, imputation, and embedding ability. It offers insights to uncover the underlying biological processes such as cancer subtype discovery. Our implementation of DeepMF can be found at https://github.com/paprikachan/DeepMF.

Introduction 1 hidden signatures from the complex biological processes. We refer to U as the molecular feature 17 latent matrix, since the values in each column of U are continuous weights illustrating the relative 18 participation of a molecule in each inferred biology process signature. We call V the sample latent 19 matrix, as each row of V depicts the fractions of samples in the matched biological process signature. 20 Molecular features or sample subgroups can be detected by finding patterns in the molecular feature 21 latent matrix and sample latent matrix, respectively. MF has been successfully applied to multiple 22 data modalities [1]. For instance, it has been used to detect leukemia cancer subtype based on 23 expression profiles. By combining gene expression and DNA methylation data, MF has been used 24 to classify HPV subtypes in head and neck tumors [4]. It has also been used to define COSMIC 25 mutational signatures in pan-cancer studies [5][6][7]. 26 MF methods, such as Principal Component Analysis (PCA), Independent Component Analysis 27 (ICA), and Non-Negative Matrix Factorization (NMF), are widely used to extract the low-dimensional 28 latent structure from high-dimensional biological matrix [1]. Intuitively, PCA finds governing 29 variation in high-dimensional data, securing the most important biological process signatures that 30 differentiate between samples [8]. ICA separates mixed signal matrix into statistically independent 31 biological process signatures [9]. NMF-based approaches extracted latent matrices with non-negative 32 constraints [10,11]. 33 Despite the effectiveness of MF in interpreting biological matrices, several limitations persist 34 in practice. First, real-world data are often plagued with many types of noises, e.g. systematic 35 noise, batch effect, and random noise [12], which potentially mask signals in the downstream process. 36 Second, high throughput omics data frequently suffer from missing values due to various experimental 37 settings [13], whereas the majority of MF tools have no support for input matrix with missing 38 values. At present, the standard practice to deal with these two problems is to perform denoising 39 and imputation prior to MF. However, even when these problems are mitigated, the MF techniques 40 mentioned would still be unable to uncover any non-linear relationship, since they assume a linear 41 association between molecular feature latent variables and sample latent variables.

53
Recent advances in high-throughput technologies have eased the quantitative profiling of biological 54 data and enabled many in silico studies to elucidate complex biological processes [1]. In many 55 cases, the biological data are captured in a matrix with molecular features such as gene, mutation 56 locus, or species as rows and samples/repetition as columns. Values in the matrices are typically 57 measurements such as expression abundances, mutation levels, or species counts. Based on the 58 assumption that samples with similar phenotype (or molecular features) that participate in a similar 59 biological process will share similar distribution of biological variation [1], patterns shared by a 60 significant number of entries in these matrices may yield insights on important biological processes. 61 Clustering through methods like k-means and hierarchical clustering have been used to identify 62 these patterns [2,3]. The success of these studies are contingent on the ability of these clustering 63 methods to capture the underlying structures or models of the interaction patterns. 64 Matrix factorization (MF), as given by the formula A M ×N ≈ U M ×K ×V K×N in Figure 1A, is a 65 popular approach to the problem. Numerous researches have applied MF to identify latent structures 66 (U and V ) from a given matrix of biological data (A M ×N ). A good factorization technique would 67 ensure that as much information as possible from A is conserved [1,2]. Here, we hope to uncover K 68 hidden signatures from the complex biological processes. We refer to U as the molecular feature 69 latent matrix, since the values in each column of U are continuous weights illustrating the relative 70 participation of a molecule in each inferred biology process signature. We call V the sample latent 71 matrix, as each row of V depicts the fractions of samples in the matched biological process signature. 72 Molecular features or sample subgroups can be detected by finding patterns in the molecular feature 73 latent matrix and sample latent matrix, respectively. MF has been successfully applied to multiple 74 data modalities [1]. For instance, it has been used to detect leukemia cancer subtype based on 75 expression profiles. By combining gene expression and DNA methylation data, MF has been used 76 to classify HPV subtypes in head and neck tumors [4]. It has also been used to define COSMIC 77 mutational signatures in pan-cancer studies [5][6][7].

78
MF methods, such as Principal Component Analysis (PCA), Independent Component Analysis 79 (ICA), and Non-Negative Matrix Factorization (NMF), are widely used to extract the low-dimensional 80 latent structure from high-dimensional biological matrix [1]. Intuitively, PCA finds governing 81 variation in high-dimensional data, securing the most important biological process signatures that 82 differentiate between samples [8]. ICA separates mixed signal matrix into statistically independent 83 biological process signatures [9]. NMF-based approaches extracted latent matrices with non-negative 84 constraints [10,11].

85
Despite the effectiveness of MF in interpreting biological matrices, several limitations persist 86 in practice. First, real-world data are often plagued with many types of noises, e.g. systematic 87 noise, batch effect, and random noise [12], which potentially mask signals in the downstream process. 88 Second, high throughput omics data frequently suffer from missing values due to various experimental 89 settings [13], whereas the majority of MF tools have no support for input matrix with missing 90 values. At present, the standard practice to deal with these two problems is to perform denoising 91 and imputation prior to MF. However, even when these problems are mitigated, the MF techniques 92 mentioned would still be unable to uncover any non-linear relationship, since they assume a linear 93 association between molecular feature latent variables and sample latent variables.

94
In this work, we propose a deep neural network-based matrix factorization framework, DeepMF 95 (Figure 1 B), which learns the non-linear association between molecular feature latent matrix 96 and sample latent matrix, tolerant with noisy and missing entries. DeepMF demonstrated robust 97 denoising, imputation, and embedding ability in simulated instances. It outperformed the existing 98 MF tools on subtype discovery in omics profiles of medulloblastoma cancer, leukemia cancer, breast 99 cancer, and small-blue-round-cell cancer, with the highest clustering accuracy on all the four datasets 100 collected for this work. Furthermore, with 70% data randomly removed, DeepMF demonstrated the 101 best recovery capacity with silhouette values 0.47, 0.6, 0.28, and 0.44. It also displayed the best 102 embedding power on the four datasets, with clustering accuracy of respectively 88%, 100%, 84%, 103 and 96%, which improves on the current best methods 76%, 100%, 78%, and 87%. In this section, we introduce the DeepMF architecture and the loss function used for its training. 107 Unless stated otherwise, symbols in bold font refer to vectors or matrices.

109
In Figure 1C, assume the input matrix A is of dimension M × N , where M is the number of 110 features, and N is the number of samples. A row represents a feature while a column represents a 111 sample or a replication. The element A ij refers to the measured values for feature F i on sample S j , 112 Matrix factorization assumes the dot product of feature latent factor u i and sample latent factor v j to capture the interactions between feature F i and sample S j , where u i and v j are vectors of size K which encode structures that underlie the data; that is, the predicted element of feature F i on sample S j is calculated as: The predicted matrixÂ can be thought of as the product of the feature latent factor matrix U and 114 sample latent factor matrix V , DeepMF is modeled to learn the complex interactions between two latent factors U and V based 118 on neural network. Figure 1B illustrates the network architecture of DeepMF. The input layer has 119 M neurons, corresponding to M features in the matrix. The output layer has N nodes to model 120 the N column samples. DeepMF is to capture the non-linear interaction between U and V . As 121 illustrated in Figure 1B, the network utilizes L hidden layers of K nodes each. All the nodes in the 122 hidden layers are fully connected and paired with ReLU activation function. The number of nodes, 123 K, corresponds to the dimensionality of the latent space in matrix factorization. The network is 124 be sufficiently complex in order to approximate f (U , V ), a non-linear function for the interaction 125 between feature and sample latent factors.

4/22
Training 127 The matrix A ∈ R M ×N contains M features. Each feature F i corresponds to one input data point x i ∈ R M and output label y i ∈ R N , where x i is one-hot encoded and y i is the i-th row of matrix A.
The loss function consists of two parts, one for global trends and one for local trends. For a pair 128 of feature F i and sample S j , global proximity refers to the proximity between real measurement A ij 129 and predicted valueÂ ij . The preservation of global proximity is fundamental in matrix factorization. 130 On the other hand, if two samples possess many common features, they tend to be similar. We 131 refer to this similarity as sample local proximity. We define f eature local proximity similarly. 132 By introducing these local proximities into the loss function, we aim to identify and preserve the 133 sample-pairwise and feature-pairwise structures in the low-dimensional latent space.

134
For global proximity, we minimize the L2-norm of the residual: For the local proximities, we use feature local proximity S F M ×M and sample local proximity S S N ×N as supervised information. They respectively constrain the similarity of the latent representations of features and samples. Given matrix A M ×N , we obtain the feature similarity matrix S F M ×M and sample similarity matrix S S N ×N as where A k and A l refer to the k-th and l-th row of matrix A. A k and A l refer to the k-th and l-th 135 column of matrix A.

136
With S F and S S , we define L local to preserve the local proximity of learned latent matrices U 137 and V : where U k and U l refer to the k-th and l-th row of feature latent matrix U , V k and V l refer to 139 the k-th and l-th column of sample latent matrix V .

5/22
The objective function L local incurs a penalty when similar features and similar samples are 141 embedded far away in the latent space. Hence, two features or samples with low similarity will be 142 driven nearer in the embedding space. To prevent this, we first identify the remote sample-sample 143 or feature-feature pair from feature and sample local proximity matrices by k-means. Then we mark 144 their local similarity to zero to exclude them from L local constraints.

145
To avoid overfitting and constrain the latent matrices U and V , an L2-norm regularization is incorporated with U , V , and model hidden layer weights W hidden .
Our final loss function incorporates all the above constraints, with two additional hyperparameters α and β, as follows: Dealing with missing value

146
To be tolerant to missing values, DeepMF discards the missing entries in back-propagation by a variational L2-norm. Denote ξ as a missing value.
Then, DeepMF can infer a missing value A αβ by utilizing the trained model DeepMF architecture parameter selection

147
If the data assumes C (C ≥ 2) clusters with respect to samples, we recommend that the network 148 structure be pruned as guided by the validation loss L mix in the range of K ∈ [2, C] and L ∈ [1, +∞). 149 For a matrix V K×N,(K<N ) , a rank of C is enough to represent the latent hierarchical structure for 150 a C-clustering problem, thus K ≤ C. To extract simple patterns like the linear association between 151 feature and sample, L = 1 suffices. A larger L would provide more complexity in the latent space of 152 DeepML. For hyperparameter tuning, we recommend running each K, L combination more than ten 153 times with different random weights initialization to avoid possible local optima.

155
DeepMF operates on the basis of the ML formulaÂ M ×N = f K,L (U M ×K , V K×N ), where f K,L refers 156 to a collection of non-linear mapping, U is the feature latent factor matrix, and V is the sample 157 latent factor matrix ( Figure 1). It learns about missing values in training, and imputes them in 158 prediction. Since DeepMF is trained by minimizing the loss between A andÂ, denoising is built 159 into the learning process.

160
The two extracted matrices U and V are modeled to uncover the underlying latent structures of 161 the features and samples, respectively. They can hence be applied to features and samples related 162 clustering and pattern recognition tasks for data interpretation.

167
Cancer subtyping experiments 168 For real datasets, the four cancer datasets as follows are used.

169
Cancer data preparation 170 Medulloblastoma data set Gene expression profiles from childhood brain tumors medulloblas-171 tomas were obtained from Brunet's work [2]. It consists of classic and desmoplastic subtypes of size 172 25 and 9, respectively. We further extracted the top 100 differentially expressed genes using "limma" 173 R package [14].

174
Leukemia data set The Leukemia data set was obtained from R package "NMF" with the 175 command "data(esGolub)" [10]. It stores Affymetrix Hgu6800 microarray expression data from 38 176 Leukemia cancer patients, where 19 patients with B cell Acute Lymphoblastic Leukemia (B-cell 177 ALL), eight patients with T cell Acute Lymphoblastic Leukemia (T-cell ALL), as well as 11 patients 178 with Acute Myelogenous Leukemia (AML). The 236 most highly diverging genes were selected by 179 comparison on their coefficient of variation using limma R package [14].

180
TCGA BRCA data set A subset of human breast cancer data generated by The Cancer Genome 181 Atlas Network (TCGA) was obtained from R package mixOmics [15]. It holds 150 samples with 182 three subtypes Basal-like, Her2, and LumA, of size 45, 30 and 75, respectively. The top 55 correlated 183 mRNA, miRNA, and proteins which discriminate the breast cancer subtypes subgroups Basal, Her2, 184 and LumA were selected using the mixOmics DIABLO model.  193 We fit all model with log-treated matrices. All tools were executed with their recommended settings; 194 that is, prcomp function in package "FactoMineR"; fastICA with algorithm type "parallel", function 195 "logcosh", alpha 1, method "R", row normalization 1, maxit 200, tol 0.0001; CoGAPS with 5000 196 iterations; NMF with method "brunet" and 200 runs.

197
As CoGAPS and NMF accept only non-negative values, we used NMF.posneg to transform the 198 input matrices into corresponding non-negative matrices.

199
Imputation baselines 200 We evaluated the DeepMF imputation efficiency by comparing it with two popular imputation 201 approaches, MeanImpute, and SVDImpute.

202
MeanImpute MeanImpute adopted the approach that the missing entries are to be substituted 203 by the mean of the current values of a particular feature in all samples. We used the mean impute 204 function in the R package "CancerSubtypes".

205
SVDImpute SVDImpute first centers the matrix, replaces all missing values by 0, decomposes 206 the matrix into the eigenvectors. Then, SVDImpute predicts the NA values as a linear combination 207 of the k most significant eigenvectors [19]. We chose SVDImpute as an imputation baseline since the 208 mechanism behind it is similar to DeepMF. The k most significant eigenvectors can be analogized 209 to the k-dimensional latent matrix in DeepMF. We used R package "pcaMethods" in practice.

211
Silhouette width The silhouette width measures the similarity of a sample to its class compared 212 to other classes [20]. It ranges from -1 to 1. A higher silhouette value implies a more appropriate 213 clustering. A silhouette value near 0 intimates overlapping clusters, and a negative value indicates 214 that the clustering has been performed incorrectly. 215 We adopted the silhouette width to evaluate the model's denoising and imputation power. We 216 used the ground-truth subtype classes as the input cluster labels. Then, the silhouette width for a 217 given matrix was calculated with Euclidean distance using the R package "cluster".

218
Adjusted Rand Index We also used the adjusted Rand index to evaluate the clustering accuracy. 219 The adjusted Rand index measures the similarity between predicted clustering results and actual 220 clustering labels [21]. A value close to 0 indicates random labeling, and a value of 1 demonstrates 221 100% accuracy of clustering.

222
To check the cancer subtyping effectiveness of different matrix factorization tools. We first used 223 the R hierarchy clustering packaging "hclust" to obtain the sample latent factor matrices in order 224 to partition samples into subgroups, through the Euclidean distance and "ward.D2" linkage. Then, 225 we computed the adjusted Rand index to measure the clustering accuracy via the R package "fpc". 226

227
Denoising, imputation, and embedding evaluation on synthetic data 228 To evaluate the denoising, imputation, and embedding efficacy of DeepMF, we first generated three 229 patterns A, B and C, each which consists of matrices of size 1000 × 600, 10 × 6, and 100 × 60 in 230 (Figures 2, S1, S2). Matrices with pattern A hold three subgroups in feature and sample. Pattern 231 B has two subgroups in feature and three subgroups in sample. Pattern C matrices are transposed 232 of pattern B of dimension 600 × 1000, 6 × 10, and 60 × 100. Then we generated sparse matrices 233 randomly by dropping the entries of matrices with rate 10%, 50%, and 70%.

234
Figures 2, S1, S2 show the performance of DeepMF on the raw matrix and sparse matrix with 235 size 1000 × 600, 10 × 6, and 100 × 60, respectively. In Figures 2 A, S1 A, S2 A, the DeepMF 236 predicted matrices significantly reduced the noisy and missing entries. In spite of the noise and 237 70% missing entries, the feature latent factors and sample latent factors generated by DeepMF 238 consistently uncovered ground truth feature subgroups and sample subgroups with 100% accuracy. 239 The same conclusion applies to pattern B and pattern C (Figures 2 B-C, S1 B-C, S2 B-C). We 240 note that pattern B matrices and pattern C matrices are transposed, which suggests that DeepMF 241 can uncover the feature and sample subclasses either from a feature-sample matrix or its transposed 242 matrix. Since fitting a matrix with N < M is more efficient than a matrix with N > M in DeepMF, 243 it may be beneficial to do so, as long as it is unnecessary to adhere to the paradigm of "treating the 244 feature as row and sample as column" [1].

245
DeepMF accurately elucidates cancer subtypes on multiple cancer omics 246 data sets 247 To discover complex biological processes from massive amounts of high-throughput matrix data, 248 researchers customarily separate features or samples with similar profiles into biologically significant 249 partitions, with the assistance of clustering or pattern recognition techniques. Here, to demonstrate 250 how DeepMF can assist in this biological discovery, we collected a series of cancer omics data sets, 251 namely the Medulloblastoma data set (mRNA) [2], Leukemia data set (mRNA) [2, 10], TCGA 252 BRCA data set (mRNA, miRNA, protein) [15], and small blue round cell tumor (SRBCT) data set 253 (mRNA) [15,16]. Then, we employed them as benchmark sets for cancer subtyping analysis. 254 We first verified the correctness of the output matrices. Figure 3 shows that DeepMF reduced 255 the noise in raw matrices while preserving cancer subtype structures on all cancer omics data sets. 256 Silhouette validation corroborated that the in-cluster similarity and out-cluster separation were 257 enhanced after DeepMF processing; that is, the average silhouette value was increased from 0.26 to 258 0.56 for Medulloblastoma data set, from 0.35 to 0.66 for Leukemia data set, from 0.19 to 0.47 for 259 TCGA BRCA data set, from 0.31 to 0.58 for SRBCT data set, respectively (see Figure 3). 260 We then checked whether the DeepMF produced sample latent matrix preserves the cancer 261 subtype information. We compared the decomposition efficiency on DeepMF against four traditional 262 matrix factorization methods, PCA (FactoMineR [17]), ICA (fastICA [18]), Bayesian-based NMF 263 (CoGAPS [11]), and gradient-based NMF (NMF [10]). We fitted high dimensional raw matrices 264 into DeepMF and the above four tools, extracted the low-dimensional sample latent matrices with 265 rank K = 2 for Medulloblastoma data set, rank K = 3 for Leukemia data set, rank K = 3 for 266 TCGA BRCA data set, and rank K = 4 for SRBCT data set, respectively. The DeepMF structure 267 configuration in training is listed in Table S1. To escape from local optima caused by DeepMF 268 random weight initialization, we conducted ten different runs for each data set configuration and 269 9/22 selected the latent matrices with minimal loss. Next, we applied hierarchical clustering into obtained 270 sample latent matrices ( Figure S3). Clustering accuracy is evaluated by the adjusted Rand index, 271 which measures the overlap between the inferred clusters and ground-truth subtype, a score of 0 272 signifies random labeling and 1 denotes perfect inference. In Figure 3, DeepMF outperforms all 273 four methods and manifests the best embedding strength, with highest clustering accuracy of 76% 274 for Medulloblastoma data set, 92% for TCGA BRCA data set, and 100% accuracy for Leukemia 275 and SRBCT data sets.

276
DeepMF captures the cancer subtype patterns despite 70% random dropouts277 Several studies have suggested that missing values in large-scale omics data can drastically obstruct 278 the interpretation of complex biological processes, such as unsupervised cancer subtyping [22]. At 279 present, this is most commonly treated by imputing the missing values before performing downstream 280 analysis of multi-omics data. To evaluate the imputation efficiency of DeepMF, we randomly dropout 281 70% entries on all four cancer data sets, then fit the sparse matrices into DeepMF and two imputation 282 baselines: MeanImpute and SVDImpute. We selected MeanImpute by considering its popularity. 283 From the perspective of imputation mechanism, we can regard SVDImpute as a linear analogy of 284 DeepMF. The DeepMF structure configuration in training is listed in Table S1. To avoid local 285 optima, we conducted ten different runs for each data set configuration and picked the one with 286 minimal loss. Figure 4 demonstrates that for all 70% missing rate data sets, both DeepMF and 287 SVDImpute recovered distinctive cancer subtype structures, while the MeanImpute approach was 288 unable to reconstruct a clearly visible pattern. Silhouette validation confirmed that DeepMF reduced 289 the most substantial interior cluster heterogeneity and out-cluster similarity, with the largest average 290 silhouette value of 0.47 for the Medulloblastoma data set, 0.6 for the Leukemia data set, 0.28 for 291 TCGA BRCA data set, and 0.44 for SRBCT data set.

292
Remainder that alongside the imputation process, DeepMF produced sample latent matrix. 293 To investigate whether missing entries will hinder DeepMF in matrix decomposition, we applied 294 hierarchical clustering into sample latent matrices generated by sparse matrices ( Figure S4) and 295 computed the clustering accuracy with ground-truth subtyping labels (Figure 4). Since the four 296 matrix factorization tools do not accept input with missing values, we fitted the high dimensional 297 matrices treated by MeanImpute and SVDImpute into four baseline approaches, then obtained the 298 corresponding low-dimensional sample latent matrices with rank K = 2 for Medulloblastoma data 299 set, rank K = 3 for Leukemia data set, rank K = 3 for TCGA BRCA data set, and rank K = 4 for 300 SRBCT data set, respectively. Figure 4 E shows that in terms of clustering accuracy, DeepMF 301 outperforms all eight imputation and factorization combinations, exhibiting the best embedding 302 power with clustering accuracy of 88% for Medulloblastoma data set, 100 % accuracy for TCGA 303 BRCA data set, 84% for Leukemia, and 96% for SRBCT data sets.

305
In this paper we presented DeepMF, a supervised learning approach to the dimension reduction 306 problem. Unlike current approaches, the method is designed to have high tolerance with respect to 307 noisy data and missing values. Experiments using synthetic and real data corroborated this fact, 308 showing DeepMF to be particularly suited for subtype discovery on omics data. 309 We have not addressed several issues. The first is with regard to the choice of the three hyper-310 parameters K, L, W in DeepMF. The choice of the reduced dimensionality K is arguably difficult, 311 since it is an open problem for the entire dimension reduction research community. Different 312 combination of K, L might lead to distinct molecular feature and sample latent matrices. To find 313 the optimal network structure for accurate biological signature interpretation, we defined L mix to 314 guide the hyperparameter search. Otherwise, we resort to multiple trials for the tuning of these 315 parameters.

316
In this paper we have used DeepMF only on mRNA, miRNA, and protein data. However, 317 DeepMF is not limited to these data modality. Human metabolome profiles can certainly benefit 318 from analysis using DeepMF, since the data is known to often suffer from missing values. We intend 319 to apply DeepMF to metabolome and discover signatures beneficial to human health.

320
In this study, we only utilized the sample latent matrix for subtype detection, we plan to employ 321 molecular feature latent matrix to uncover gene functional pathways in future work.

322
Conclusion 323 MF-based analyses are commonly used in the interpretation of high-throughput biological data. 324 Our proposed DeepMF is an MF-based deep learning framework which overcomes traditional 325 shortcomings such as noise and missing data. Our experiments on simulation data and four omics 326 cancer data sets established DeepMF's feasibility in denoising, imputation, and in discovering the 327 underlying structure of data.    .08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08   Figure 4. DeepMF's imputation and factorization effect on cancer data sets with 70% random dropout A-D The heatmap presentation and Silhouette width of four cancer data sets with 70% random dropout. The gray tiles in heatmap indicate missing entries. From left to right: matrix with 70% random dropout, after mean impute, after SVDImpute, after DeepMF.
A Medulloblastoma data set B Leukemia data set C TCGA BRCA data set D SRBCT data set E Clustering accuracy of cancer subtyping on sample latent matrices generated by two imputations and five matrix factorization tools on different cancer data sets with 70% random dropout.  Figure S1. DeepMF performance on 10 × 6 synthetic matrices DeepMF denoising, imputation, and factorization performance on 10 × 6 synthetic matrices with different pattern. Inside each pattern, from left to right: raw matrix, 10% random dropout, 50% random dropout, 70% random dropout; from top to bottom: before DeepMF, and DeepMF. The horizontal line plot show the sample latent factors, the vertical line plot refer to feature latent factors.    Figure S4. Hierarchical Clustering plot for sample latent matrices generated from 70% random dropout data sets Sample latent matrices are generated by two imputation tools and five matrix factorization tools on different cancer data sets with 70% random dropout. From top to bottom, each row represents sample latent matrices generated by meanImpute + PCA, meanImpute + ICA, meanImpute + CoGAPS, meanImpute + NMF, SVDImpute + PCA, SVDImpute + ICA, SVDImpute + CoGAPS, SVDImpute + NMF, DeepMF. A Medulloblastoma data set B Leukemia data set C TCGA BRCA data set D SRBCT data set 22/22