Skip to main content
Fig. 4 | BMC Bioinformatics

Fig. 4

From: Comparison study of differential abundance testing methods using two large Parkinson disease gut microbiome datasets derived from 16S amplicon sequencing

Fig. 4

Hierarchical clustering of genera and methods based on similarity in replicated differential abundance signatures. Hierarchical clustering was performed to group genera (rows) and methods (columns) based on similarities in replicated differential abundance signatures and was visualized via heatmap. Three groups of genera were revealed by hierarchical clustering: (1) genera more likely to be called differentially abundant by the majority of methods in both datasets, (2) genera who were mostly found enriched in PD and called differentially abundant in both datasets by a subset of methods, and (3) genera who were called differentially abundant by only 1–3 methods. Group 2 interestingly contained a sub-group of rarer genera with larger effect sizes (2.A) compared to other group 2 sub-groups (2.B and 2.C) and groups 1 and 3. All but one of these genera were enriched in PD and detected almost exclusively by methods who consistently resulted in lower than average concordances. Two groups of methods were also revealed: methods that mainly replicated DA signatures of group 2 (a), and the remaining methods (b). Hierarchical clustering was based on method results from filtered taxonomic data, and only genera that were detected as differentially abundant in both datasets by at least one method were included in the clustering and heatmap (61 in total). Cells correspond to a differential abundance signature that was replicated across datasets (value = 1, color = black), or was not replicated (value = 0, color = grey). Mean relative abundance ratios for genera in dataset 1 (MRAR_1) and dataset 2 (MRAR_2) were plotted next to the heatmap, and given a color gradient from red (lowest MRAR) to light grey (MRAR ~ 1) to blue (highest MRAR). Mean relative abundances of genera for dataset 1 (MRA_1) and dataset 2 (MRA_2) were also plotted next to the heatmap, and given a color gradient from light grey (lowest MRA) to dark green (highest MRA). GLM: generalized linear model; CLR: centered log-ratio; KW: Kruskal–Wallis; TSS: total sum scaling (relative abundances); rCLR: robust centered log-ratio transformation with matrix completion; RLE: relative log expression; NBZI: negative binomial zero-inflated

Back to article page