A specialized learner for inferring structured cis-regulatory modules

BMC Bioinformatics

Table 1 Train Function

TRAIN(trainset, aspects, phases, metric, K)
1 queue ← {NULL_ SOLUTION}
2 CRM ← NULL_SOLUTION
3 for phase ∈ phases
4 while queue is not empty
5 current ← POP(queue)
6 for each applicable CRM change in aspects allowed in phase
7 alt ← APPLY(change, current)
8 if there is a sufficiently low χ² test probability that the trainset
9 predictions of current, alt are from the same distribution
10 then insert alt into queue
11 sort queue by metric
12 limit queue to K solutions.
13 if current has a better score than CRM given trainset, metric
14 then CRM ← current
15 repopulate queue with the best K solutions from phase
16 return CRM

The Train function takes: trainset, a set of labeled DNA sequences; aspects, a list of CRM aspects which can be included (i.e. the maximum number of binding sites, whether or not distance constraints are allowed, etc.); phases, a list of phases, specifying the set of model changes allowed in each; metric, a CRM model scoring metric; and K, a maximum queue size (beam width). For each phase, Train searches from the current list of solutions by making the changes allowed in that phase. It returns the best CRM model it finds.

ISSN: 1471-2105