From: RefSelect: a reference sequence selection algorithm for planted (l, d) motif search
Notation | Explanation |
---|---|
|x| | The length of a string, the size of a set, or the number of elements in a matrix. |
D, D' | D is the set of input sequences. D' is the set of reference sequences. D = {s _{1}, s _{2}, …, s _{ t }} and D' = {s _{ r1}, s _{ r2}, …, s _{ rk }}, satisfying D' ⊂ D. |
t | The number of sequences in the input sequence set D, namely |D| = t. |
k | The number of required reference sequences, namely |D'| = k. |
n | The length of each input sequence. |
x ∈_{ l } s | The string x is an l-length substring of the sequence s. In other words, x is an l-mer in the sequence s. |
s[i] | The ith character in the string s. |
s[i…j] | A substring of the string s starting from the ith position to the jth position. |
d _{ H }(x, x') | The Hamming distance between two strings x and x' of the same length. |
M _{ d }(x, x') | The common candidate motifs of two l-mers x and x'. M _{ d }(x, x') = {y: |y| = |x| = |x'|, d _{ H }(y, x) ≤ d, d _{ H }(y, x') ≤ d}. |
N _{ r }(D') | The number of candidate motifs generated from the reference sequences set D', calculated by (1). |
N _{ r }(s _{ i } , s _{ j }) | The number of candidate motifs generated from two sequences s _{ i } and s _{ j }, calculated by (2). |
min(i, j) | The minimum value between two integers i and j. min(i, j) = i if i ≤ j, j otherwise. |
sim(s _{ i } , s _{ j }) | The similarity of two sequences s _{ i } and s _{ j }. |