A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs

Seitzer, Phillip; Wilbanks, Elizabeth G; Larsen, David J; Facciotti, Marc T

doi:10.1186/1471-2105-13-317

BMC Bioinformatics

Table 2 Ability to discover TFB motif from ChIP-seq data sets

From: A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs

Motif-finding strategy	TP	TN	FN	Sensitivity	Specificity	F-measure
Theoretical best	28	8	0	1.00	1.00	1.00
MEME ZOOPS	5.5	8	22.5	0.20	1.00	0.33
Iterative MEME/MAST MC	16	8	12	0.57	1.00	0.73
Gibbs recursive	15	8	13	0.54	1.00	0.70
Iterative Gibbs/MAST MC	18	8	10	0.64	1	0.78
Iterative Gibbs/MAST MC +Ideal	23	8	5	0.82	1	0.90

Five groups of 7 sequence data sets were constructed from the putative binding sites derived from TfbB, TfbD, and TfbG ChIP-seq experiments. Various lengths of sequence taken surrounding each site (60bp, 100bp, and 200bp) were examined, as were stretches of sequence 60 base pairs long displaced a distance of 60 base pairs from ChIP-seq sites (displaced), and data sets built from randomly shuffling binding sites from the 7 different TFB binding site groups into new groups of equal size (shuffled). A data set of 126 60-bp segments from the Hb sp. NRC-1 genome was generated as an additional control (random). Evaluation of these 36 data sets with 4 alternative motif-finding strategies revealed distinct differences in the ability of each strategy to discover the putative TFB motif. For all data sets except the random and displaced data sets, discovery of a strong match to the TFB motif was scored as a true positive (TP). Failure to discover a TFB motif could from the random and displaced datasets was scored as a true negative (TN). If a weak match to the TFB motif was discovered in any dataset other than a random or displaced data set, it was scored as a half TP (0.5). One hundred runs were carried out for the Iterative MEME/MAST MC and Iterative Gibbs/MAST MC runs. A number of ‘ideal’ seeds were artificially created and were found to converge to the TFB motif. Given a very large number of runs, an ideal seed or near-ideal seed is expected to occur by chance, so the TFB motif would be recovered in these cases. For both MEME and the Gibbs recursive sampler, the application of the MotifCatcher extension significantly improved each finder’s performance.

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com