Skip to main content
Fig. 5 | BMC Bioinformatics

Fig. 5

From: Large-scale labeling and assessment of sex bias in publicly available expression data

Fig. 5

Leveraging within-study distributions of sample sex scores to identify high-confidence mislabeled samples. Each row is a study (randomly sampled from the list of mixed sex tissue studies with multiple clusters). Samples are separated by metadata sex (on the y axis) and our model sample sex score (P(male)) (on the x axis). Samples are colored by whether they show a high confidence (as indicated by a P(sample belongs to cluster) > 0.95) “match” (blue) or “mismatch” (red) between the metadata and expression-based sex; samples that were not classified by the model are labeled “unclassified” (gray), classified samples that do not pass the 0.95 threshold for their cluster are labeled “unclear” (purple). Clustering was obtained by fitting a mixture of Gaussians; and the estimated mean (solid line) and 95% confidence interval (dashed line) for each cluster is shown

Back to article page