Schematic illustration of sequence selection by Neatfreq pipeline. A). Blue blocks represent fragment-only reads. The left side of the figure shows 2 retention bins (20% and 85% retention) created by the ratio between the RMKF of the read and the cutoff input by the user. The random bin selection method extracts a random subset of reads from each bin up to the count denoted by its retention level and the number of reads available. B). The targeted bin selection method, run as fragment-only sequences, is illustrated on a 20% retention bin (left block). Within each retention bin, reads were clustered by the cd-hit-est alignment  program based on similarity and sorted by uniqueness, or the population of the sub-bin cluster (middle block). Illustrated here, reads from each intra-bin sub-cluster were selected randomly from within each cluster approached within a bin. C). Green blocks represent forward mates and red blocks reverse mates, with dark green brackets indicating 2-sided mate relationships. When applying the targeted bin selection method to the same bin containing mate pairs (paired-ends), analysis is identical to that described for fragments except that the retrieval of 2-sided mates across all bins, and their sub-bin clusters, is prioritized. Note that highly unique clusters containing only fragments are still given priority in selection.