Figure 1From: Clustering metagenomic sequences with interpolated Markov modelsMarkov models. In a standard wth-order Markov chain model, the next base b in the DNA sequence is assigned a probability that is conditioned on the previous w bases (underlined above for w = 6). w should be chosen so that the data contains a sufficient number of instances of all 4w substrings of length w. An IMM uses all of the Markov models from order 0 to w and computes the probability of the next base by interpolating among them. Our version of the IMM takes this a step further: rather than using the w immediately preceding positions, we use the most "informative" positions (shown above with arrows) of the previous w according to a recursive mutual information calculation.Back to article page