From: Efficient counting of k-mers in DNA sequences using a bloom filter

k -mer distribution. Distribution of coverage levels for k-mers in the sequence reads from chromosome 21. There is a clear distinction between the coverage levels of the 31.7M observed k-mers that are found in the hg18 reference genome sequence compared to the 48.7M k-mers that are not in hg18. Of the k-mers not found in hg18, 44.5M or 99.87%, are observed only once, and are likely sequencing errors. A small fraction of k-mers that do not match hg18 are observed many times in the data; these likely represent SNP differences between the sequenced individual and hg18 and would be retained by the Bloom filter.

