A specialized learner for inferring structured cis-regulatory modules

BMC Bioinformatics

Table 2 Select-Train Function

SELECT-TRAIN(trainset, tuneset, aspects, phases, metric, K)
1 CRM ← TRAIN(trainset, aspects, phases, metric, K)
2 repeat
3 unjustified_ aspects ← { }
4 for aspect ∈ aspects
5 alt_CRM ← TRAIN(trainset, aspects – aspect, phases, metric, K)
6 if there is not a sufficiently low χ² test probability that the tuneset predictions of CRM, alt_CRM
7 are from the same distribution or CRM scores better on tuneset than alt_CRM
8 then unjustified_aspects ← unjustified_aspects ∪ aspect
9 aspects ← highest scoring set resulting from removing one of unjustified_ aspects based on tuneset
10 CRM ← alt_CRM associated with these aspects
11 until unjustified_aspects is empty
12 final_CRM ← TRAIN(trainset + tuneset, aspects, phases, metric, K)
13 return final_CRM

The Select-Train algorithm takes: trainset, a set of labeled DNA sequences; tuneset, held-aside evaluation data; aspects, a list of CRM aspects to consider; as well as phases; metric, and K, which are arguments to the Train algorithm. It removes aspects from the original list which are statistically shown (using the tuning set) not to contribute. Finally, it returns a CRM trained with all the data, using the CRM aspects chosen.

ISSN: 1471-2105