(a) Number of disjoint random ℓ-mers within a read Number of disjoint random ℓ-mers within a read for L=400 for L=1000 0 1 between 1 and ≈L/ℓ ≈L/ℓ 0 1 between 1 and ≈L/ℓ ≈L/ℓ (d,ℓ)=(0,20) 0.9 % 0.7 % 69.1 % 29 % 0.6 % 0.19 % 85.6 % 13.5 % (d,ℓ)=(0,40) 0.7 % 0.4 % 39.4 % 59.1 % 0.54 % 0.17 % 62.6 % 36.68 % (d,ℓ)=(3,40) 6.3 % 8 % 84.8 % 0.8 % 2.26 % 1.49 % 96.2 % 0.04 % (d,ℓ)=(0,80) 0.7 % 0.3 % 4.2 % 94.8 % 0.49 % 0.17 % 8.2 % 91.1 % (b) Number of low-repeat ℓ-mer within a read Number of low-repeat ℓ-mer within a read for d=0 for d=3 $$\mathcal {L}_{s,1}$$ 0 between 1 and 80 between 81 and ≈L/ℓ 0 between 1 and 80 between 81 and ≈L/ℓ 5 56.73% 11.94% 30.98% 32.01% 30.97% 36.88% 10 54.46% 4.9% 4.9% 26.21% 19.81% 53.83% 20 52.75% 0.08% 46.81% 21.90% 4.06% 73.89% 40 52.75% 0.08% 46.81% 21.69% 0.07% 78.09% (c) Number of low-repeat ℓ-mer within a read Number of low-repeat ℓ-mer within a read for d=0 for d=3 $$\mathcal {L}_{s,1}$$ 0 between 1 and 80 between 81 and ≈L/ℓ 0 between 1 and 80 between 81 and ≈L/ℓ 5 52.31% 12.28% 35.08% 17.18% 20.15% 62.59% 10 50.22% 4.91% 44.54% 14.07% 11.67% 74.18% 20 48.64% 0.1% 50.93% 11.75% 2.24% 85.93% 40 48.64% 0.1% 50.93% 11.64% 0.04% 88.23%
1. (a) Percentage of disjoint random -mers within reads of lengths L=400 and L=1000 of ch19 of hg19. (b) and (c) Fraction of the remaining reads after the first step and their number of low-repeat -mers with different list sizes $$\mathcal {L}_{s,1}=\{5,10,20,40\}$$ for =40. In (b) and (c), we assume that, all -mers and only non-overlapping -mers, are respectively used at the first step