Beware to ignore the rare: how imputing zero-values can improve the quality of 16S rRNA gene studies results

BMC Bioinformatics

Table 4 Count matrix sparsity for the three simulated datasets

Imputation	Dataset 1 True sparsity: 63.03% Raw data sparsity: 72.56%		Dataset 2 True sparsity: 56.61% Raw data sparsity: 67.91%		Dataset 3 True sparsity: 91.26% Raw data sparsity: 94.34%
Imputation	Mean (%)	SD (%)	Mean (%)	SD (%)	Mean (%)	SD (%)
None	72.56	0.00	67.91	0.00	94.34	0.00
scImpute	69.50	0.01	55.84	0.00	87.07	0.01
DrImpute	62.67	0.04	42.32	0.00	80.03	1.42
LLSimpute	21.94	3.25	23.19	1.75	23.10	1.71
zCompositions_SQ	0.00	0.00	0.00	0.00	0.00	0.00
zCompositions_CZM	0.00	0.00	0.00	0.00	0.00	0.00

Preprocessed datasets sparsity were aggregated according to the zero-imputation method included in the pipeline, reporting the mean and standard deviation calculated across different normalization approaches. Ground truth and raw data sparsity for each dataset are reported in the table header row. “None” identifies pipelines where no zero-imputation step was performed, i.e. normalization-only pipelines

ISSN: 1471-2105