Skip to main content

Table 1 Train Function

From: A specialized learner for inferring structured cis-regulatory modules

TRAIN(trainset, aspects, phases, metric, K)

1 queue ← {NULL_ SOLUTION}

2 CRMNULL_SOLUTION

3 for phase phases

4     while queue is not empty

5        current ← POP(queue)

6        for each applicable CRM change in aspects allowed in phase

7           alt ← APPLY(change, current)

8           if there is a sufficiently low χ2 test probability that the trainset

9              predictions of current, alt are from the same distribution

10              then insert alt into queue

11                 sort queue by metric

12                 limit queue to K solutions.

13        if current has a better score than CRM given trainset, metric

14           then CRMcurrent

15     repopulate queue with the best K solutions from phase

16 return CRM

  1. The Train function takes: trainset, a set of labeled DNA sequences; aspects, a list of CRM aspects which can be included (i.e. the maximum number of binding sites, whether or not distance constraints are allowed, etc.); phases, a list of phases, specifying the set of model changes allowed in each; metric, a CRM model scoring metric; and K, a maximum queue size (beam width). For each phase, Train searches from the current list of solutions by making the changes allowed in that phase. It returns the best CRM model it finds.