Skip to main content

Table 1 Notations used in this paper

From: RefSelect: a reference sequence selection algorithm for planted (l, d) motif search

Notation

Explanation

|x|

The length of a string, the size of a set, or the number of elements in a matrix.

D, D'

D is the set of input sequences. D' is the set of reference sequences. D = {s 1, s 2, …, s t } and D' = {s r1, s r2, …, s rk }, satisfying D' D.

t

The number of sequences in the input sequence set D, namely |D| = t.

k

The number of required reference sequences, namely |D'| = k.

n

The length of each input sequence.

x l s

The string x is an l-length substring of the sequence s. In other words, x is an l-mer in the sequence s.

s[i]

The ith character in the string s.

s[i…j]

A substring of the string s starting from the ith position to the jth position.

d H (x, x')

The Hamming distance between two strings x and x' of the same length.

M d (x, x')

The common candidate motifs of two l-mers x and x'. M d (x, x') = {y: |y| = |x| = |x'|, d H (y, x) ≤ d, d H (y, x') ≤ d}.

N r (D')

The number of candidate motifs generated from the reference sequences set D', calculated by (1).

N r (s i , s j )

The number of candidate motifs generated from two sequences s i and s j , calculated by (2).

min(i, j)

The minimum value between two integers i and j. min(i, j) = i if i ≤ j, j otherwise.

sim(s i , s j )

The similarity of two sequences s i and s j .