Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Gene finding in metatranscriptomic sequences

Figure 1

The Hidden Markov Model employed in TransGeneScan. The model consists of 9 super-states in two modules, 4 for the sense (coding) strand (top module), representing coding regions (i), start codons (ii), stop codons (iii) and un-translated regions (iv), respectively; and 5 for the antisense strand (bottom module), representing start codons (v), stop codons (vi), coding regions (vii), and un-translate regions (viii and ix), respectively. The un-translated regions in the antisense strand are represented as two distinct states, one for the 5' un-translated region and one for the 3' un-translated region to prohibit the transition from the coding regions in one gene to those on another (because antisense transcripts are often a part of gene in the opposite strand). Furthermore, an idle start state is used to ensure that the annotation (hidden state) sequence can only initiate from the un-translated regions in positive strand (but can initiate from any state in the negative strand). The transition from the hidden states in one strand to the states in another strand is prohibited. Each of the two super-states for coding regions (i and vii) consists of six consecutive match states (M1 to M6, and M1- to M6-, respectively) represented by diamonds, which collectively correspond to a six-periodic inhomogeneous HMM. Comparing to the HMM used in FragGeneScan [22], this model does not contain the insertion and deletion states, based on the assumption that the assembled transcripts from metatranscriptomic sequences contain no frameshift errors.

Back to article page