SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets

Yu, Qiang; Wei, Dingbang; Huo, Hongwei

doi:10.1186/s12859-018-2242-y

BMC Bioinformatics

Table 1 Notations used in this paper

From: SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets

Notation	Explanation
\|x\|	The length of a string or the size of a set.
Σ	The DNA alphabet, Σ = {A, C, G, T}.
l-mer	An l-length string over Σ.
s[i]	The ith character in the string s.
s[i..j]	A substring of the string s from the ith position to the jth position.
s∙s’	The concatenation of two strings s and s’.
x ∈_ls	The string x is an l-length substring of the string s. In other words, x is an l-mer in the string s.
x ∈_lD	The string x is an l-length substring of the sequence set D. In other words, there exists s ∈ D such that x ∈_ls.
D = {s₁, s₂, …, s_t}, t, n, q, l, d	Notations for the input. D is the input DNA sequence set, where each sequence s_i is an n-length string over Σ; t = \|D\|; n = \|s_i\| for 1 ≤ i ≤ t; q is the proportion of the input sequences containing motif instances in D; l is the motif length and d is the maximum number of mismatches between a motif and its instance.
D’, t’, q’	Notations for the output. D’ is a sample sequence set selected from D, i.e., D’ ⊂ D; t’ = \|D’\|; q’ is the proportion of the input sequences containing motif instances in D’.
count_k(x)	The count (number of occurrences) of a string x in D with up to k mismatches, represented by (4).
count(x)	The count (number of occurrences) of a string x in D.
d_H(y, x)	The Hamming distance between two strings y and x of equal length.
B_k(x)	The set of k-neighbors of a string x, i.e., the set of strings with Hamming distance no more than k from x. B_k(x) = {y: y ∈ Σ^\|x\|, d_H(y, x) ≤ k}.
stn(y)	The integer obtained by conversion from a string y over Σ. The characters A, C, G and T are converted to binary numbers 00, 01, 10 and 11, respectively. Because of the need to compute count_k(y), y is first reversed and then converted to an integer. For example, if y = AC, then y is converted to the binary number 0100, i.e., the decimal number 4.

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com