Skip to main content

Table 3 Data Sets

From: A specialized learner for inferring structured cis-regulatory modules

Data Set

Organism

Description

Lee et al.

S. cerevisiae

25 sets of genes with strong evidence (p-value ≤ 0.01) from the genome-wide location analysis of Lee et al. [15] that a specific pair of regulators bind to their upstream regions. This is a recreation of the data sets used by Segal et al. [2]. For each data set, we use 100 yeast promoters chosen at random as negative examples.

Gasch et al.

S. cerevisiae

Three sets of genes associated with environmental stress response (ESR) in Yeast, described in [16]. We use promoter sequences from non-ESR yeast genes as negative examples.

Sinha et al.-Yeast

S. cerevisiae

A set of six yeast sequences where MCM1 and MATα2 are known to bind, described in Sinha et al. [3]. For negative examples, we used nine promoter sequences which contain binding sites for either MCM1 or MATα2, but not both.

Sinha et al.-Fly

D. melanogaster

A set of eight fly genes associated with the gap gene system, described in Sinha et al. [3]. We use 10 kb promoter sequences, and 100 promoter sequences selected randomly from the fly genome to use as negative examples.

  1. Summary of the data sets on which we test our algorithms