Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data

Fig. 2

Algorithm of DiNAMO for parameters L=4, d=4, p=0.05 and fixed position mode. The algorithm takes two input files, a positive file, \({\mathcal {P}}\), and a negative file, \({\mathcal {N}}\) (step 1). Here \(\mathcal {P}\) and \(\mathcal {N}\) both contain 8 sequences. The positive file \(\mathcal {P}\) contains 4 different L-mers, which numbers of occurrences in \(\mathcal {P}\) and \(\mathcal {N}\) are stored in a hastable (step 2). The hashtable is used in step 3 to construct the IUPAC lattice. We start from the 4 L-mers, and generate IUPAC motifs gradually. The bottom level contains the 4 L-mers. Each node at level i corresponds to an IUPAC motif for which all instances are present in the initial set of L-mers. A link between two nodes of level i and i+1 indicates that the IUPAC motif at level i is a subset of the IUPAC motif of level i+1. We do not consider all IUPAC motifs. For example, there is no node for YAST, which could have been obtained form the combination of YACT and CAST, because the instance TAGT of YACT is not present in \(\mathcal {P}\). For each node, we also construct a contingency table using the counts from the hashtable, and we calculate its MI. In step 4, the lattice is simplified in order to keep only the IUPAC motifs that maximize the MI. For example, the MI of TACT is higher than the MI of YACT, so we remove YACT. The final step consists in computing the Fisher’s exact test P-value, in order to identify the significantly over-represented motifs

Back to article page