Skip to main content
Figure 4 | BMC Bioinformatics

Figure 4

From: Hidden Markov Model Variants and their Application

Figure 4

The y-axis shows the mutual information on base-pairs in he V. cholerae gemone, the x-axis is the gap size between bases used to construct the base pairs. Mutual Information, MI(X;Y) is: μ = ΣxΣyp(xy)log(p(xy)/p(x)p(y)). If X and Y are independent r.v's then MI = 0. If we have a DNA sequence x1.... xixi+1 xi+2........xn (where xk = {a,c,g, or t}) then we can get counts on pairs xixi+1 for i = 1..n, and assuming stationarity on the data, and large enough n, we can speak of the joint probability p(X,Y). Calculation of MI(X,Y) then gives an indication of the linkage between base probabilities in dinucleotide probabilities. This can be extended to linkages when the two bases aren't sequential (have a base gap between them greater than zero), such as pairs based on xixi+2 (gap = 1), etc. This type of statistical framework can then be iterated to higher order MI calculations in a variety of ways to explore a number of statistical linkages and build towards a motif identifier based on such linkages (gIMM). Such an analysis on the V. Cholerae Chr. I genome, above, clearly indicates a three-component encoding of data, i.e., a 3-element codon structure is revealed. Furthermore, judging from the strong linkages for gaps 1 through 5, it is also clear that hexamer Markov model statistics will be strong in many regions of the genome.

Back to article page