BMC Bioinformatics

Table 5 SMAPE and Aitchison distance between the ground truth and different pipelines for the three simulated scenarios

From: Beware to ignore the rare: how imputing zero-values can improve the quality of 16S rRNA gene studies results

Dataset 1				Dataset 2				Dataset 3
SMAPE (Raw: 12.109)		Aitchison distance (Raw: 19.039)		SMAPE (Raw: 15.493)		Aitchison distance (Raw: 23.754)		SMAPE (Raw: 2.717)		Aitchison distance (Raw: 4.972)
Pipeline	Mean (SD)	Pipeline	Mean (SD)	Pipeline	Mean (SD)	Pipeline	Mean (SD)	Pipeline	Mean (SD)	Pipeline	Mean (SD)
DrImpute**	7.648 (0.263)	DrImpute*	18.683 (0.170)	scImpute**	12.059 (0.007)	zCompositions_SQ*	23.354 (0.192)	None	2.717 (0)	None	4.972 (0)
scImpute**	11.193 (0.001)	None	19.039 (0)	None	15.493 (0)	None	23.754 (0)	scImpute	6.188 (0.034)	zCompositions_CZM	7.919 (2.384)
None	12.109 (0)	zCompositions_SQ*	19.141 (0.041)	DrImpute	28.437 (2.566)	DrImpute*	24.044 (2.384)	DrImpute	15.187 (1.555)	zCompositions_SQ	12.498 (1.880)
zCompositions_CZM	73.427 (0.674)	zCompositions_CZM*	20.614 (1.549)	zCompositions_CZM	70.274 (1.563)	scImpute	25.112 (0.001)	LLSimpute	78.645 (1.215)	DrImpute	19.280 (3.263)
zCompositions_SQ	77.280 (0.045)	scImpute	26.408 (0.001)	zCompositions_SQ	71.647 (0.594)	zCompositions_CZM*	28.250 (4.941)	zCompositions_CZM	95.605 (0.135)	scImpute	25.240 (0.001)
LLSimpute	77.717 (3.246)	LLSimpute	105.394 (8.463)	LLSimpute	80.013 (2.001)	LLSimpute	81.979 (0.702)	zCompositions_SQ	95.614 (0.061)	LLSimpute	56.107 (3.932)

For each metric and dataset, results are ordered according to decreasing performance (the lower the better). Raw data results are reported in the second row of the table header. The table shows the mean and standard deviation calculated across the pipelines using a specific zero-imputation tool of the median SMAPE and Aitchison distance calculated across the samples in each dataset. “None” identifies pipelines where no zero-imputation step was performed, i.e. normalization-only pipelines. Imputation pipelines that always achieve a statistically significant improvement (i.e. a lower metric compared to raw data) are indicated with “**”. Imputation pipelines that achieve a statistically significant improvement only in some of the associated pipelines (i.e. only when combined with some normalization methods) are indicated with “*”

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com