Skip to main content

Table 2 Motif discovery algorithms used in the performance comparison. Nuisance parameters are parameters that cannot be precisely defined without knowledge of the true binding sites (such as motif length, number of occurrences and orientation). For MotifSampler and wConsensus, the lower part of the range indicates required parameters, while the upper part indicates the total number of parameters, including "power user" parameters that the program authors stress should typically be left as default. Motif model abbreviations: cons = consensus; PWM = position weight matrix; mis = consensus with predefined number of allowed non-position-specific mismatches.

From: A novel ensemble learning method for de novo computational identification of DNA binding sites

Program

# Nuisance Parameters

Motif Model

Search Strategy

Citation

Oligo analysis (RSAT)

3

cons

Exhaustive enumeration of short and bipartite oligos. Clusters overlapping motifs. Uses a binomial approximation to the hypergeometric score, similar to the overrepresentation objective function.

[14, 33, 34]

Yeast Motif Finder (YMF)

2

cons

Exhaustive enumeration of short and bipartite oligos. Alphabet is {ACGTYR}. Uses the Normal approximation to the hypergeometric function, similar to the overrepresentation objective function.

[35]

AlignAce (AA)

2

PWM

Gibbs sampling to optimize a Maximum a Posteriori (MAP) score.

[36]

MotifSampler (MS)

3–5

PWM

Gibbs sampling with higher order Markov model.

[37]

BioProspector (Biopros)

7

PWM

Gibbs sampling with higher order Markov model. Designed for long and bipartite motifs common in prokaryotes.

[16, 38]

MEME

4

PWM

Expectation Maximization over a modified information content.

[39]

Improbizer (Imp)

8

PWM

Expectation Maximization. Uses 2nd order Markov model and optionally accounts for positional restrictions using a Gaussian model.

[40]

MITRA

1

mis

Tree-based search for long bipartite motifs with many mismatches. Uses a hypergeometric score similar to the overrepresentation objective function.

[41]

wConsensus (wCons)

1–13

PWM

Greedy enumeration to maximize information content. Infers motif length.

[42]

Weeder

4

mis

Bounded enumeration using a suffix tree. Tries all motif lengths from 6–12.

[43]