Skip to main content

Table 1 Number of target and non-target sequences in simulated SAG data

From: SAG-QC: quality control of single amplified genome information by subtracting non-target sequences based on sequence compositions

Target Contamination Proportion of contaminant sequence [%]
E. coli/M. magneticum Pseudomonas Delftia
1000 0 0 0
900 75 25 10
800 150 50 20
700 225 75 30
600 300 100 40
500 375 125 50
400 450 150 60
300 525 175 70
200 600 200 80
100 675 225 90
0 750 250 100
  1. We utilized public bacterial sequences to simulate SAG datasets. We defined Escherichia coli and Magnetospirillum magneticum as target species in this simulation. We mixed their sequences with sequences of Pseudomonas and Delftia to simulate sequences of contaminated samples. The sequences were mixed in several proportions to simulate datasets with different contamination levels.