Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: Analysis of single-cell RNA sequencing data based on autoencoders

Fig. 2

The proposed workflow to integrate different samples. Given E different samples, their gene expression matrices are merged. Then, the top k HVGs are selected by considering the different samples. Specifically, they are selected within each sample separately and then merged to avoid the selection of batch-specific genes. scAEspy is used to reduce the HVG space (k dimensions), and the obtained latent space can be (i) used to calculate a t-SNE space, (ii) corrected by Harmony, and (iii) used to infer an uncorrected neighbourhood graph. The corrected latent space by Harmony is then used to build a neighbourhood graph, which is clustered by using the Leiden algorithm and used to calculate a UMAP space. Otherwise, BBKNN is applied to rebuild a uncorrected neighbourhood graph by taking into account the possible batch-effects. The corrected neighbourhood graph built by BBKNN is then clustered by using the Leiden algorithm and used to calculate a UMAP space. In order to assign the correct label to the obtained clusters, the marker genes are calculated by using the Mann–Whitney U test. Finally, the annotated clusters can be visualised in both t-SNE and UMAP space

Back to article page