From: Efficient motif finding algorithms for large-alphabet inputs
Input: set of k-mers with associated sequence index, distance parameter d |
---|
Output: set of k-mers at distance d from each input string |
1. Pick d positions and remove from the k-mers symbols at the corresponding positions to obtain a set of (k − d)-mers. |
2. Use counting sort to order (lexicographically) the resulting set of (k – d)-mers. |
3. Scan the sorted list to create the list of all sequences in which k-mers appear. |
4. Output the k-mers that appear in every input string. |