Skip to main content
Fig. 5 | BMC Bioinformatics

Fig. 5

From: Contrastive self-supervised clustering of scRNA-seq data

Fig. 5

Gene selection analysis on real data. The selection of the top variable genes (500–5000) was compared with no selection (all genes). The plots depict 3 runs on each of the 15 real datasets on all computed scores (a–d). On average, best scores are achieved using the top 500 genes. Both the internal and external quality decline when using more than 1000 genes, which corresponds to including many genes with low levels of expression. The dataset-level results, depicted as ARI (e) and Silhouette scores (f), indicate that for some datasets, a significant gain in performance can be attained when using up to 5000 genes (e.g. Worm Neuron Cell dataset)

Back to article page