Skip to main content

Table 5 SMAPE and Aitchison distance between the ground truth and different pipelines for the three simulated scenarios

From: Beware to ignore the rare: how imputing zero-values can improve the quality of 16S rRNA gene studies results

Dataset 1

Dataset 2

Dataset 3

SMAPE

(Raw: 12.109)

Aitchison distance

(Raw: 19.039)

SMAPE

(Raw: 15.493)

Aitchison distance

(Raw: 23.754)

SMAPE

(Raw: 2.717)

Aitchison distance

(Raw: 4.972)

Pipeline

Mean

(SD)

Pipeline

Mean

(SD)

Pipeline

Mean

(SD)

Pipeline

Mean

(SD)

Pipeline

Mean

(SD)

Pipeline

Mean

(SD)

DrImpute**

7.648

(0.263)

DrImpute*

18.683

(0.170)

scImpute**

12.059

(0.007)

zCompositions_SQ*

23.354

(0.192)

None

2.717

(0)

None

4.972

(0)

scImpute**

11.193

(0.001)

None

19.039

(0)

None

15.493

(0)

None

23.754

(0)

scImpute

6.188

(0.034)

zCompositions_CZM

7.919

(2.384)

None

12.109

(0)

zCompositions_SQ*

19.141

(0.041)

DrImpute

28.437

(2.566)

DrImpute*

24.044

(2.384)

DrImpute

15.187

(1.555)

zCompositions_SQ

12.498

(1.880)

zCompositions_CZM

73.427

(0.674)

zCompositions_CZM*

20.614

(1.549)

zCompositions_CZM

70.274

(1.563)

scImpute

25.112

(0.001)

LLSimpute

78.645

(1.215)

DrImpute

19.280

(3.263)

zCompositions_SQ

77.280

(0.045)

scImpute

26.408

(0.001)

zCompositions_SQ

71.647

(0.594)

zCompositions_CZM*

28.250

(4.941)

zCompositions_CZM

95.605

(0.135)

scImpute

25.240

(0.001)

LLSimpute

77.717

(3.246)

LLSimpute

105.394

(8.463)

LLSimpute

80.013

(2.001)

LLSimpute

81.979

(0.702)

zCompositions_SQ

95.614

(0.061)

LLSimpute

56.107

(3.932)

  1. For each metric and dataset, results are ordered according to decreasing performance (the lower the better). Raw data results are reported in the second row of the table header. The table shows the mean and standard deviation calculated across the pipelines using a specific zero-imputation tool of the median SMAPE and Aitchison distance calculated across the samples in each dataset. “None” identifies pipelines where no zero-imputation step was performed, i.e. normalization-only pipelines. Imputation pipelines that always achieve a statistically significant improvement (i.e. a lower metric compared to raw data) are indicated with “**”. Imputation pipelines that achieve a statistically significant improvement only in some of the associated pipelines (i.e. only when combined with some normalization methods) are indicated with “*”