From: RefSelect: a reference sequence selection algorithm for planted (l, d) motif search
Notation | Explanation |
---|---|
|x| | The length of a string, the size of a set, or the number of elements in a matrix. |
D, D' | D is the set of input sequences. D' is the set of reference sequences. D = {s 1, s 2, …, s t } and D' = {s r1, s r2, …, s rk }, satisfying D' ⊂ D. |
t | The number of sequences in the input sequence set D, namely |D| = t. |
k | The number of required reference sequences, namely |D'| = k. |
n | The length of each input sequence. |
x ∈ l s | The string x is an l-length substring of the sequence s. In other words, x is an l-mer in the sequence s. |
s[i] | The ith character in the string s. |
s[i…j] | A substring of the string s starting from the ith position to the jth position. |
d H (x, x') | The Hamming distance between two strings x and x' of the same length. |
M d (x, x') | The common candidate motifs of two l-mers x and x'. M d (x, x') = {y: |y| = |x| = |x'|, d H (y, x) ≤ d, d H (y, x') ≤ d}. |
N r (D') | The number of candidate motifs generated from the reference sequences set D', calculated by (1). |
N r (s i , s j ) | The number of candidate motifs generated from two sequences s i and s j , calculated by (2). |
min(i, j) | The minimum value between two integers i and j. min(i, j) = i if i ≤ j, j otherwise. |
sim(s i , s j ) | The similarity of two sequences s i and s j . |