- Research
- Open access
- Published:
InClust+: the deep generative framework with mask modules for multimodal data integration, imputation, and cross-modal generation
BMC Bioinformatics volume 25, Article number: 41 (2024)
Abstract
Background
With the development of single-cell technology, many cell traits can be measured. Furthermore, the multi-omics profiling technology could jointly measure two or more traits in a single cell simultaneously. In order to process the various data accumulated rapidly, computational methods for multimodal data integration are needed.
Results
Here, we present inClust+, a deep generative framework for the multi-omics. It’s built on previous inClust that is specific for transcriptome data, and augmented with two mask modules designed for multimodal data processing: an input-mask module in front of the encoder and an output-mask module behind the decoder. InClust+ was first used to integrate scRNA-seq and MERFISH data from similar cell populations, and to impute MERFISH data based on scRNA-seq data. Then, inClust+ was shown to have the capability to integrate the multimodal data (e.g. tri-modal data with gene expression, chromatin accessibility and protein abundance) with batch effect. Finally, inClust+ was used to integrate an unlabeled monomodal scRNA-seq dataset and two labeled multimodal CITE-seq datasets, transfer labels from CITE-seq datasets to scRNA-seq dataset, and generate the missing modality of protein abundance in monomodal scRNA-seq data. In the above examples, the performance of inClust+ is better than or comparable to the most recent tools in the corresponding task.
Conclusions
The inClust+ is a suitable framework for handling multimodal data. Meanwhile, the successful implementation of mask in inClust+ means that it can be applied to other deep learning methods with similar encoder-decoder architecture to broaden the application scope of these models.
Introduction
Recently, the progress of single-cell technology (e.g. single-cell RNA sequencing (scRNA-seq) [1, 2], single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) [3] and single-cell bisulfite sequencing (scBS-seq) [4]) makes it possible to obtain a variety of traits in a single cell. These single-cell methods have greatly promoted our understanding of cells. As a result, the heterogeneity of cell population was revealed [2, 5], the trajectory of cell development was inferred [6], and the gene regulatory network was reconstructed [7]. But data collected in one modality just represents a limited side view of the cell state. In order to obtain more holistic and comprehensive information, data from different modalities need to be integrated together to better reveal the biological significance of the data.
Initially, the integration of data from different modalities was accomplished by computational approaches [8, 9]. Then, multi-omics profiling technology that could jointly profile multiple traits in a single cell was developed [10]. Several methods (e.g. SNARE-seq [11], sci-CAR [12], Paired-seq [13], and SHAER-seq [14]) could simultaneously measure the gene expression and chromatin accessibility in a single cell. Cellular indexing of Transcriptomes and Epitopes by sequencing (CITE-seq) could jointly profile the gene expression and a panel of cell surface proteins [15, 16]. The scNMT-seq could profile chromatin accessibility, DNA methylation, and transcription in single cells at one time [17].
Several computational approaches have been developed to process and integrate data in single-cell analysis. Some are universal methods that could handle situations with either multiple monomodal data profiled from different cells in similar populations or multimodal data extracted from single cells [18]. Others are specially designed for processing data generated by multi-omics profiling technology [19, 20]. Usually, for data generated by multi-omics profiling technology, the information from different modalities is coded by different encoders first, and then integrated in the latent space. The one-to-one correspondence between encoders and modalities is mainly due to the fact that data from different modalities have different data formats and lengths, which can’t be encoded by the same encoder. Following the encoding, the coded information from different modalities is integrated in the latent space, through adversarial loss [21], mixture of expert [22], attention-transfer [23], regulatory interaction information [24], and so on. After integrating data from different modalities, data from one modality could be used to impute data from another modality [18]. Meanwhile, integration of multimodal data makes the translation between different modalities possible [21]. Furthermore, the multimodal data could be used as a reference to generate data of the missing modality in monomodal data [25].
Previously, we presented inClust (integrated clustering), a flexible all-in deep generative framework for transcriptome data [26]. Here, we extended the inClust by adding two new modules, namely, the input-mask module in front of encoder and the output-mask module behind decoder (Fig. 1A). We named the augmented inClust as inClust+, and demonstrated that it could complete not only data integration but also gene imputation by the merits of mask modules (Fig. 1B). Furthermore, for multimodal datasets with different data types for each modality, inClust+ adopted the architecture of stacked encoder and decoder, which was used in conjunction with the mask modules. Therefore, inClust+ could integrate multimodal data, where different modalities have different data types (Fig. 1C). Finally, inClust+ with stacked encoder-decoder and mask modules was used to tackle the problem of cross modal generation (Fig. 1D). All results show that inClust+ is an ideal tool to deal with multimodal data, and adding masks is a suitable way to model augmentation in the field of multi-omics.
Results
InClust+ imputes genes for MERFISH data after integrating scRNA-seq and MERFISH data
The rationale for integrating scRNA-seq and MERFISH data by inClust+ is simple: just treat the multimodal data from different modalities as scRNA-seq data from different batches. In addition, the input-mask module and the output-mask module would enable the gene imputation in MERFISH data based on transferring knowledge from scRNA-seq data (Fig. 2A). For scRNA-seq data, inClust+ uses the common genes to reconstruct common genes and scRNA-seq-specific genes. The reconstructed expression profile of the common genes and the scRNA-seq-specific genes were compared with the real expression profile to update the model parameters (Fig. 2B). For MERFISH data, only reconstructed expression profile of common genes is used for parameter updating (Fig. 2C). Although expression of scRNA-seq-specific genes in MERFISH data did not contribute to the updating of model parameters, they were still reconstructed as a by-product. Since the encoder and decoder used for scRNA-seq and MERFISH data are the same, the reconstructed expression of scRNA-seq-specific genes in MERFISH data could depend on the knowledge transferred from scRNA-seq data (Fig. 2A).
For comparison, we randomly selected 80% of genes in MERFISH data as common genes, and the rest as test genes (scRNA-seq-specific) waiting for imputations, as described in the uniPort [18]. The inClust+ first encodes the scRNA-seq and MERFISH data into latent space respectively. As the input data (Fig. 3A), the encoded representations from different modalities are also separated in the latent space (Fig. 3B). After covariates (modalities) removal by vector subtraction, the samples from different modalities were mixed together and clustered according to their cell types (Fig. 3C). As the uniPort [18], the evaluation of imputation is calculated using median and average Spearman correlation coefficients (mSCC and aSCC), and the median and average Pearson correlation coefficients (mPCC and aPCC) over imputed and true testing genes. As shown in the plot, inClust+ demonstrated the higher mSCC (0.243), aSCC (0.255), mPCC (0.263), and aPCC (0.322), above those of uniPort (mSCC of 0.236, aSCC of 0.247, mPCC of 0.233, and aPCC of 0.274) (Fig. 3D).
InClust+ integrate multi-omics datasets
Single cell multi-omics can extract information from different cellular components of a single cell at the same time, with different data types and lengths. By flexibly adjusting the input (output) mask modules, inClust+ can be transformed into a model specially used for multimodal data processing. In order to use data from multiple modalities at the same time, data from all modalities are stacked together in the input (Fig. 4A). Accordingly in the model, the first layer of encoder could be regarded as multiple independent neural network layers stacked together, and each part corresponds to data from one modality (Fig. 4B–F) (e.g. one for gene expression, one for protein abundance, and one for chromatin accessibility). The last layer of the decoder is also divided into multiple parts, respectively, with each part reconstructing the data from one modality. The model training is divided into multiple stages, which could be grouped as self-reconstruction and alternative-reconstruction. The self-reconstruction means inClust+ uses data from one modality to reconstruct itself (e.g. in Fig. 4B, inClust+ uses the gene expression data to reconstruct gene expression data). On the contrary, alternative-reconstruction means inClust+ uses data from one modality to reconstruct data from another modality (e.g. in Fig. 4E, inClust+ uses the protein abundance data to reconstruct gene expression data). The rationale is as follows: in the stages of self-reconstruction, each component of the first layer of the encoder is coupled with the corresponding part of the last layer of the decoder. This could be thought of as an encoder/decoder combination for data from one modality. Each encoder/decoder combination updated themselves relatively independently. In contrast, in the stage of alternative-reconstruction, each component of the first layer of the encoder is coupled with the alternative part of the last layer of the decoder. This is an attempt to translate between different modalities in a single cell and integrate them more thoroughly. Furthermore, the batch effect between different datasets could be explicitly removed by vector arithmetic in the latent space as the original inClust (Fig. 4A).
We first applied inClust+ to integrate the multimodal PBMC data with scATAC-seq data and scRNA-seq data (Additional file 1: Fig S1). Before integration, scATAC-seq data and scRNA-seq data were separated in the original space (Additional file 1: Fig S2A). After integration by inClust+, the data from scATAC-seq and scRNA-seq are mixed together in the latent space (Additional file 1: Fig S2B). As in uniPort, the Batch Entropy score is used to measure the degree of mixing cells across datasets and the Silhouette coefficient is used to evaluate the separation of biological distinctions [18]. The result shows that inClust+ has obtained a Batch Entropy score of 0.686 and a Silhouette coefficient of 0.808, which is much higher than those of uniPort, harmony and scVI (Batch Entropy score of 0.619, 0.678 and 0.576. Silhouette coefficient of 0.64, 0.604 and 0.616) (Additional file 1: Fig S2C).
We then applied our model to integrate multiple multimodal datasets with batch effect. In the first example, two CITE-seq datasets from different donors with batch effects are used (Additional file 1: Fig S3). Both gene expression data (Additional file 1: Fig S4A) and protein abundance data (Additional file 1: Fig S4B) in CITE-seq datasets have batch effects. The inClust+ integrates data from different modalities in the latent space (Additional file 1: Fig S4C). And the vector arithmetic further integrates data from different batches (Additional file 1: Fig S4D). The result shows that inClust+ has obtained a Batch Entropy score of 0.641 and a Silhouette coefficient of 0.724, which is much higher than those of harmony and scVI (Batch Entropy score of 0.225 and 0.375, Silhouette coefficient of 0.416 and 0.39) (Additional file 1: Fig S4E). In the second example, a CITE-seq dataset (gene expression and protein abundance) and an ASAP-seq dataset (protein abundance and chromatin accessibility) are used (Fig. 4) [27]. There are three modalities (gene expression, protein abundance, chromatin accessibility), where protein abundance data exist in both datasets with batch effect. As in the first example, inClust+ integrates data from different modalities in the latent space (Fig. 5A). And the vector arithmetic further integrates data from different batches (Fig. 5B). We compared the integration results of inClust+ with that of scMoMat through the metrics of adjusted rand index (ARI) and the normalized mutual information (NMI), which used the previous identified seven cell type labels as the ground truth clustering labels [27]. The result shows that inClust+ has obtained an ARI of 0.957 and an NMI of 0.949, which is much higher than those of scMoMat (ARI of 0.585 and NMI of 0.650) (Fig. 5C).
Cross-modal generation by inClust+
The multi-omics dataset contains data from multiple modalities, and could be used as a reference to complete the monomodal data into multimodal data. Our inClust+ can extract information from multi-omics reference, and translate monomodal data into data of another modality. As the situation for multimodal integration, the first layer of encoder and the last layer of decoder could be regarded as multiple independent neural network layers stacked together to handle the stack data of multiple modalities (Fig. 6A). The translation from data of gene expression into data of protein abundance in the multimodal reference was carried out in two stages in each round of training. In the first stage, inClust+ uses the gene expression data to reconstruct itself (Fig. 6B). Alternatively, in the second phase, inClust+ uses the gene expression data to reconstruct protein abundance (Fig. 6C). There is the third stage for the monomodal data that needs to be completed. In this stage, inClust+ uses the gene expression data to reconstruct itself in the monomodal dataset (Fig. 6B). After training, inClust+ could transfer labels from the gene expression data in the multimodal reference to the gene expression data in the monomodal dataset. Meanwhile, based on the gene expression data in the monomodal dataset, the corresponding protein abundance data could be generated by automatic translation.
We evaluated the capability of inClust+ to complete a monomodal dataset into a multimodal dataset through two CITE-seq references and a scRNA-seq dataset. The UMAP plots show that inClust+ could integrate the gene expression data from different datasets well (Fig. 7A, Additional file 1: Fig S5). And the results of the labels transferring are plotted in the confusion matrix, which show that inClust+ is better than sciPENN, with the accuracies of 0.947 in inClust+ (Fig. 7B) and 0.915 in sciPENN (Fig. 7C). The generated protein abundance data by inClust+ was visualized by UMAP (Fig. 7D). And the prediction accuracy of protein abundance was measured by calculating the Pearson correlation and the Spearman correlation between the predicted data and real data. The results show that inClust+ (mSCC of 0.334, mPCC of 0.376) is comparable to sciPENN (mSCC of 0.356, mPCC of 0.405), which is specially optimized for protein abundance prediction related to CITE-seq multimodal data [25] (Fig. 7E).
Discussion
In this paper, we described a means to enhance inClust through adding an input-mask module and an output-mask module, and called the augmented version of model inClust+. We applied inClust+ to various datasets, ranging from multiple monomodal (unpaired) datasets, one or several multimodal datasets, and datasets containing multimodal data and monomodal data. In these examples, inClust+ demonstrated its capability of data integration, imputation and data generation. Firstly, through the merits of mask modules, inClust+ was used to impute MERFISH data by referring to scRNA-seq data with similar cell population. Then, the capability of inClust+ with stacked encoder-decoder architecture and mask modules for multimodal integration was evaluated on three examples. The results show that inClust+ can’t only mix data between modalities, but also separate biological differences and remove the batch effect. Finally, inClust+ was used to integrate data with both monomodal dataset and multimodal dataset. The results show that inClust+ can transfer labels from multimodal data to monomodal data, and complete the missing modality in monomodal data. The application of inClust+ is not limited to the above cases. For gene imputation, there will be a situation where all datasets have their own specific genes, rather than just one dataset with its own unique genes. By adjusting the output mask, inClust+ can integrate the two datasets based on the shared genes, and impute the rest genes in both datasets by referring to the specific genes in the corresponding dataset. For missing modality generation, there will be a situation where all datasets have their own specific modalities, inClust+ can integrate both datasets based on the shared modalities and generate the missing modality in each dataset by referring to the specific modality in the corresponding dataset.
Because inClust+ is an extension of inClust in multimodal applications, inClust+ and inClust can be put together as a whole when compared with other integration methods. What distinguishes our model (inClust and inClust +) from other integration methods lies in its flexibility to adapt to different situations and its ability to integrate information as much as possible. The flexibility is reflected in the following two points. Firstly, as we described in inClust, the label information could be flexibly handled [26]. This merit is also inherited by inClust+, and is reflected in the fact that inClust+ can transfer labels from reference dataset to query dataset in semi-supervised mode. Secondly, the two mask modules in inClust+ could be flexibly adjusted to deal with different inputs. The model's ability to integrate information as much as possible is embodied in the following two points. Firstly, it is proved in inClust that the model could use not only expression data, but also covariant information (e.g. batch) and label information [26]. This merit is also inherited by inClust+. Secondly, as shown in inClust+, the model could utilize not only the shared data (shared gene expression or shared modality) to integration, but also specific genes or modality to missing genes imputation or missing modality generation. In short, our model can not only integrate data, but also complete other downstream tasks on the basis of data integration (e. g. Out-of-distribution generation, label transfer and new type identification, spatial domain segmentation, cross modal imputation and generation).
Adding masks is a common way to enhance models in deep learning [28]. In inClust+, we augment our model through a pair of mask modules (the input-mask module and the output-mask module). The flexible design and use of masks enable model to complete a series of tasks, which usually need to be completed by multiple models respectively. For example, inClust+ can utilize the common and dataset-specific genes for integration and imputation, as uniPort [18]. Masking makes things simple: the input-mask screens out common genes and the output-mask screens out common and dataset-specific genes of the corresponding data. Meanwhile, inClust+ could integrate multimodal dataset to achieve multi-domain translation, as cross-modal autoencoder [21]. Input-mask and output-mask make inClust+ into multiple independent and related encoder-decoder combinations. Therefore, inClust+ can not only compress and reconstruct the data from the same modality, but also compress the data from one modality and reconstruct it into another modality, thus realizing cross-modal translation. Furthermore, inClust+ could integrate multimodal datasets and monomodal dataset, transfer labels from multimodal data to monomodal data, and complete monomodal data into multimodal data by data generation, as sciPENN [25]. InClust+ refers to multimodal dataset to generate the data of missing modality in monomodal dataset. Generally speaking, as a model augmentation technology, adding a pair of masks to the model is not only limited to inClust, but also can be extended to deep learning models with similar encoder-decoder structures, such as scArches [29].
Conclusions
The inClust+ gains the ability to process multimodal data by using two mask modules. It could impute genes in MERFISH data by referring scRNA-seq data with similar cell populations. It was also shown to have the capability to integrate the multimodal data (e.g. tri-modal data with gene expression, chromatin accessibility and protein abundance) with batch effect. Furthermore, inClust+ was used to integrate an unlabeled monomodal scRNA-seq dataset and labeled multimodal CITE-seq datasets, transfer label from CITE-seq datasets to scRNA-seq dataset, and generate the missing modality of protein abundance in monomodal scRNA-seq data. Although the tasks mentioned above are different, inClust+ can flexibly change the mask modules to adapt to the corresponding tasks. And the performance of inClust+ in the corresponding tasks is better than or comparable to the latest tools. The successful implementation of mask in inClust+ implies that the augmentation through mask modules has application in other deep learning methods with similar encoder-decoder architecture to broaden the application scope of these models.
Methods
Datasets and preprocessing
Brain scRNA-seq and MERFISH dataset: The mouse brains' scRNA-seq and spatial transcriptomics datasets were obtained from Gene Expression Omnibus (GSE113576) [30] and Dryad repositories [https://datadryad.org/stash/dataset/https://doi.org/10.5061/dryad.8t8s248], respectively. Then, the data were preprocessed according to the method of Cao et al. [18]. We obtained 30,370 cells in scRNA-seq and 64,373 cells in MERFISH with 153 common genes.
Human PBMC paired multi-omics dataset: The single cell paired-omics dataset including DNA accessibility and gene expression comes from the publicly available dataset (PBMCs from C57BL/6 mice (v1, 150 × 150), Single Cell Immune Profiling Dataset by Cell Ranger 3.1.0. 10 × Genomics, 2019), and were preprocessed according to the method of Cao et al. [18]. We obtained 11,259 cells with 2000 highly variable common genes across cells of all datasets.
Human PBMC dataset (soMoMat): The CITE-seq and ASAP-seq human PBMC dataset is available at Gene Expression Omnibus under accession number GSE156478 [31] and the dataset after filtering is available at https://github.com/PeterZZQ/scMoMaT [27]. We preprocessed the dataset using the method of Zhang et al. [27] and we selected 2 batches of these cells which batch1 (5023 cells) is measured with gene expression and protein abundance simultaneously using CITE-seq and batch2 (3517 cells) is measured with protein abundance and chromatin accessibility simultaneously using ASAP-seq. The integrating three-modality matrices of the human PBMC dataset contain overlapping 4768 genes, 17,442 regions and 216 proteins.
Human PBMC and MALT CITE-seq dataset: The CITE-seq of human PBMC datasets was obtained from Gene Expression Omnibus (GSE164378) [32]. Then, we selected highly variable genes (HVGs) according to the method of Lakkis et al. [25]. For gene, we used the scanpy 1.7.1 to normalize expression value [33]. Finally, for the PBMC dataset, we obtained 161,748 cells with 1000 HVGs and 224 proteins. Cells from donor 7 (25,827 cells) and donor 8 (26,208 cells) were used in the integration experiment. In cross-modal generation experiment, cells from donor 6 (20,651 cells) and donor 7 were used as multimodal dataset and scRNA-seq data in donor 8 was used as monomodal dataset.
InClust+ : overview
InClust+ is based on inClust applied for transcriptome data [26]. InClust+ is the multi-modal version of inClust, with an input mask module in front of the encoder and an output module after the encoder (Fig. 1A).
Architecture of inClust+
Input
InClust+ take 5 inputs, input 1 is the multimodal data and input 2 is the covariates information such as batch or modality. Input 3 is the label information, which is optional. Input 4 is the input-mask for screening out input. Input 5 is the output-mask for screening out output.
Input-mask module
The input-mask is a matrix as big as the input, with 0 or 1 in each element. The input is multiplied element-wise with the input-mask matrix to screen out the desired elements.
Encoder
The encoder is a three-layer neural network with a non-linear function as the activation function.
Latent sampling layer
A neural network without activation function is used to estimate mean (μz) and standard deviation (Σz). The reparameterization trick was used for sampling latent variables Z1.
Embedding layer
The embedding layer embeds the auxiliary information (input2) into the latent space as a real-valued vector.
Vector arithmetic layer
The vector arithmetic is performed in the latent space. The estimated mean (μz) would substrate (or add) the embedding vector E. The resulting vector Z2 retains the real biological information after removing the unwanted covariates or mixing the auxiliary information.
Classifier
The real-valued vector Z2 will pass through a neural network with softmax as the activation function. The output of the classifier is the output2.
Decoder
The decoder is a three-layer neural network with non-linear function as the activation function.
Output-mask module
The output-mask is a matrix as big as the output, with 0 or 1 in each element. The output is multiplied element-wise with the output-mask matrix to screen out the desired elements.
Pseudocode
-
1:
screen out the input by masking: inputin = input1⊙ input4(input-mask)
-
2:
encode the input into latent space: h2 = relu(W2(relu(W1* inputin)))
-
3:
estimate the parameter: μz = Wμh2, Σz = WΣh2
-
Reconstruction:
-
-
4:
reparameterization trick: Z1 = μz + Σz*εε ~ N(0,1)
-
5:
decode the latent space vector z into expression profile:
-
output1 = relu(Wout(relu(W3Z1)))
-
-
6:
screen out the output by masking: output = output1⊙input5(output-mask)
-
Clustering:
-
-
3:
embed the auxiliary information: E = WE*input2
-
4:
latent space vector arithmetic: Z2 = μz − E (or μz + E)
-
5:
classification (clustering): output2 = softmax(WCZ2)
Methods for comparison
uniPort
UniPort from the python package uniPort to the comparison of data integration and imputation [18].
Harmony
Harmony from the R package harmony to the comparison of data integration [34].
scVI
ScVI from scvi-tool to the comparison of data integration [35].
scMoMaT
ScMoMaT from the python package scMoMaT to the comparison of data integration [27].
sciPENN
SciPENN from the python package sciPENN to the comparison of cross-modal generation [25].
Availability of data and materials
InClust+ model is open source, and available for download from https://github.com/wanglf19/inClust_plus. It is implemented in python 3.6.
Abbreviations
- scRNA-seq:
-
Single cell RNA sequencing
- scATAC-seq:
-
Single-cell assay for transposase-accessible chromatin sequencing
- scBS-seq:
-
Single-cell bisulfite sequencing
- CITE-seq:
-
Cellular indexing of Transcriptomes and Epitopes by sequencing
- inClust:
-
Integrated clustering
- InClust+ :
-
Augmented integrated clustering
- mSCC:
-
Median Spearman correlation coefficients
- aSCC:
-
Average Spearman correlation coefficients
- mPCC:
-
Median Pearson correlation coefficients
- aPCC:
-
Average Pearson correlation coefficients
- ARI:
-
Adjusted rand index
- NMI:
-
Normalized mutual information
References
Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, Wang X, Bodeau J, Tuch BB, Siddiqui A, et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods. 2009;6(5):377–82.
Zheng C, Zheng L, Yoo JK, Guo H, Zhang Y, Guo X, Kang B, Hu R, Huang JY, Zhang Q, et al. Landscape of Infiltrating T Cells in Liver Cancer Revealed by Single-Cell Sequencing. Cell. 2017;169(7):1342–56.
Buenrostro JD, Wu B, Litzenburger UM, Ruff D, Gonzales ML, Snyder MP, Chang HY, Greenleaf WJ. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature. 2015;523(7561):486–90.
Smallwood SA, Lee HJ, Angermueller C, Krueger F, Saadeh H, Peat J, Andrews SR, Stegle O, Reik W, Kelsey G. Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity. Nat Methods. 2014;11(8):817–20.
Guo X, Zhang Y, Zheng L, Zheng C, Song J, Zhang Q, Kang B, Liu Z, Jin L, Xing R, et al. Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing. Nat Med. 2018;24(7):978–85.
Saelens W, Cannoodt R, Todorov H, Saeys Y. A comparison of single-cell trajectory inference methods. Nat Biotechnol. 2019;37(5):547–54.
Aibar S, Gonzalez-Blas CB, Moerman T, Huynh-Thu VA, Imrichova H, Hulselmans G, Rambow F, Marine JC, Geurts P, Aerts J, et al. SCENIC: single-cell regulatory network inference and clustering. Nat Methods. 2017;14(11):1083–6.
Argelaguet R, Cuomo ASE, Stegle O, Marioni JC. Computational principles and challenges in single-cell data integration. Nat Biotechnol. 2021;39(10):1202–15.
Efremova M, Teichmann SA. Computational methods for single-cell omics across modalities. Nat Methods. 2020;17(1):14–7.
Zhu C, Preissl S, Ren B. Single-cell multimodal omics: the power of many. Nat Methods. 2020;17(1):11–4.
Chen S, Lake BB, Zhang K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat Biotechnol. 2019;37(12):1452–7.
Cao J, Cusanovich DA, Ramani V, Aghamirzaie D, Pliner HA, Hill AJ, Daza RM, McFaline-Figueroa JL, Packer JS, Christiansen L, et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science. 2018;361(6409):1380–5.
Zhu C, Yu M, Huang H, Juric I, Abnousi A, Hu R, Lucero J, Behrens MM, Hu M, Ren B. An ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. Nat Struct Mol Biol. 2019;26(11):1063–70.
Ma S, Zhang B, LaFave LM, Earl AS, Chiang Z, Hu Y, Ding J, Brack A, Kartha VK, Tay T, et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell. 2020;183(4):1103–16.
Peterson VM, Zhang KX, Kumar N, Wong J, Li L, Wilson DC, Moore R, McClanahan TK, Sadekova S, Klappenbach JA. Multiplexed quantification of proteins and transcripts in single cells. Nat Biotechnol. 2017;35(10):936–9.
Stoeckius M, Hafemeister C, Stephenson W, Houck-Loomis B, Chattopadhyay PK, Swerdlow H, Satija R, Smibert P. Simultaneous epitope and transcriptome measurement in single cells. Nat Methods. 2017;14(9):865–8.
Clark SJ, Argelaguet R, Kapourani CA, Stubbs TM, Lee HJ, Alda-Catalinas C, Krueger F, Sanguinetti G, Kelsey G, Marioni JC, et al. scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nat Commun. 2018;9(1):781.
Cao K, Gong Q, Hong Y, Wan L. A unified computational framework for single-cell data integration with optimal transport. Nat Commun. 2022;13(1):7419.
Gong B, Zhou Y, Purdom E. Cobolt: integrative analysis of multimodal single-cell sequencing data. Genome Biol. 2021;22(1):351.
Li G, Fu S, Wang S, Zhu C, Duan B, Tang C, Chen X, Chuai G, Wang P, Liu Q. A deep generative model for multi-view profiling of single-cell RNA-seq and ATAC-seq data. Genome Biol. 2022;23(1):20.
Yang KD, Belyaeva A, Venkatachalapathy S, Damodaran K, Katcoff A, Radhakrishnan A, Shivashankar GV, Uhler C. Multi-domain translation between single-cell imaging and sequencing data using autoencoders. Nat Commun. 2021;12(1):31.
Minoura K, Abe K, Nam H, Nishikawa H, Shimamura T. A mixture-of-experts deep generative model for integrated analysis of single-cell multiomics data. Cell Rep Methods. 2021;1(5):100071.
Zuo C, Dai H, Chen L. Deep cross-omics cycle attention model for joint analysis of single-cell multi-omics data. Bioinformatics. 2021;37(22):4091–9.
Cao ZJ, Gao G. Multi-omics single-cell data integration and regulatory inference with graph-linked embedding. Nat Biotechnol. 2022;40(10):1458–66.
Lakkis J, Schroeder A, Su K, Lee MYY, Bashore AC, Reilly MP, Li M. A multi-use deep learning method for CITE-seq and single-cell RNA-seq data integration with cell surface protein prediction and imputation. Nat Mach Intell. 2022;4(11):940–52.
Wang L, Nie R, Zhang Z, Gu W, Wang S, Wang A, Zhang J, Cai J. A deep generative framework with embedded vector arithmetic and classifier for sample generation, label transfer, and clustering of single-cell data. Cell Rep Methods. 2023;3(8):1–21.
Zhang Z, Sun H, Mariappan R, Chen X, Chen X, Jain MS, Efremova M, Teichmann SA, Rajan V, Zhang X. scMoMaT jointly performs single cell mosaic integration and multi-modal bio-marker detection. Nat Commun. 2023;14(1):384.
Wang L, Nie R, Zhang J, Cai J. scCapsNet-mask: an updated version of scCapsNet with extended applicability in functional analysis related to scRNA-seq data. BMC Bioinformatics. 2022;23(1):539.
Lotfollahi M, Naghipourfar M, Luecken MD, Khajavi M, Büttner M, Wagenstetter M, Avsec Ž, Gayoso A, Yosef N, Interlandi M, et al. Mapping single-cell data to reference atlases by transfer learning. Nat Biotechnol. 2022;40(1):121–30.
Moffitt JR, Bambah-Mukku D, Eichhorn SW, Vaughn E, Shekhar K, Perez JD, Rubinstein ND, Hao J, Regev A, Dulac C, et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science. 2018;362(6416):5324.
Mimitou EP, Lareau CA, Chen KY, Zorzetto-Fernandes AL, Hao Y, Takeshima Y, Luo W, Huang TS, Yeung BZ, Papalexi E, et al. Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nat Biotechnol. 2021;39(10):1246–58.
Hao Y, Hao S, Andersen-Nissen E, Mauck WM 3rd, Zheng S, Butler A, Lee MJ, Wilk AJ, Darby C, Zager M, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184(13):3573–87.
Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19(1):15.
Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh PR, Raychaudhuri S. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods. 2019;16(12):1289–96.
Lopez R, Regier J, Cole MB, Jordan MI, Yosef N. Deep generative modeling for single-cell transcriptomics. Nat Methods. 2018;15(12):1053–8.
Funding
This work was supported by grants from the National Key R&D Program of China [2021YFF1200904 to C.J.]; the Beijing Natural Science Foundation [Z210011 to C.J.]; the National Natural Science Foundation of China [32070795 to C.J. and 61673070 to J.Z.]
Author information
Authors and Affiliations
Contributions
LW, JC and JZ envisioned the project. LW implemented the model. LW and RN performed the analysis and wrote the paper. XM, YC, AW, HZ, JZ and JC provided assistance in writing and analysis.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests. Each of the funding bodies took part in the design of the study and collection, analysis, and interpretation of data, and the writing of the manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1
. Supplementary Figures.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Wang, L., Nie, R., Miao, X. et al. InClust+: the deep generative framework with mask modules for multimodal data integration, imputation, and cross-modal generation. BMC Bioinformatics 25, 41 (2024). https://doi.org/10.1186/s12859-024-05656-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859-024-05656-2