In its quest for survival the bacterium *Staphylococcus aureus* secretes *α*-hemolysin monomers that bind to the outer membrane of susceptible cells, where seven such units can oligomerize to form a water-filled transmembrane channel [1–4]. The channel can cause death to the target cell by rapidly discharging vital molecules (such as ATP) and disturbing the membrane potential.

Suspended in lipid bilayer [see Additional File 1] the *α*-hemolysin channel becomes a sensor when large molecules interact with the nanopore and modulate uniform ionic flow through the channel. Driven by transmembrane potential, single stranded DNA or RNA molecules translocate through the nanopore [5, 6], while more complex hairpins either unzip and translocate [7, 8] or toggle in the channel's vestibule [8, 9] [see Additional File 1]. The durations of ionic flow blockade events in these experiments are important signatures of interacting nucleic acid fragments composition [7, 10] or in certain cases characterize the molecular length [11].

Two distinct approaches of duration modelling have been proposed for HMM framework by speech recognition community, based on

*explicit* duration modelling, which is normally implemented with histograms or parametric distributions, and

*implicit* modeling based on set of geometrically distributed self-recurring nodes [

12]. The most common way of implementing explicit duration model is Generalized Hidden Markov Model (GHMM), where each state can emit more than one symbol at a time [

13]. Following [

14], the optimal GHMM parse could be expressed by the following equation

$\begin{array}{lll}{\phi}_{optimal}\hfill & =\hfill & \underset{\phi}{\mathrm{arg}\mathrm{max}}P(\phi |S)\hfill \\ =\hfill & \underset{\phi}{\mathrm{arg}\mathrm{max}}\frac{P(\phi ,S)}{P(S)}\hfill \\ =\hfill & \underset{\phi}{\mathrm{arg}\mathrm{max}}P(\phi ,S)\hfill \\ =\hfill & \underset{\phi}{\mathrm{arg}\mathrm{max}}P(S|\phi )P(\phi )\hfill \\ =\hfill & \underset{\phi}{\mathrm{arg}\mathrm{max}}{\displaystyle \prod _{i=1}^{n}{P}_{e}({S}_{i}|{q}_{i},{d}_{i}){P}_{t}({q}_{i}|{q}_{i-1}){P}_{d}({d}_{i}|{q}_{i})}\hfill \end{array}$

(1)

where *ϕ* is a parse of the sequence consisting of a series of states *q*_{
i
}and state durations *d*_{
i
}, 0 ≤ *i* ≤ *n*, with each state *q*_{
i
}emitting subsequence *S*_{
i
}of length *d*_{
i
}, so that the concatenation of all *S*_{0}*S*_{1} ... *S*_{
n
}produces the complete output sequence *S*. *P*_{
e
}(*S*_{
i
}|*q*_{
i
}, *d*_{
i
}) denotes the probability that state *q*_{
i
}emits subsequence *S*_{
i
}of duration *d*_{
i
}. *P*_{
t
}(*q*_{
i
}|*q*_{i-1}) is GHMM transition probability from state *q*_{i-1}to state *q*_{
i
}and *P*_{
d
}(*d*_{
i
}|*q*_{
i
}) is the probability that state *q*_{
i
}has duration *d*_{
i
}. The primary objective, expressed in (1), is to combine probability returned by content probabilistic model (such as HMM) with duration probability for optimal parse. The GHMM implementation, as well as HMM-with-Duration approach mentioned in [15], require explicit assignment of duration histogram to run Viterbi decoding.

When we try to classify single DNA base pair by nanopore ionic flow blockade signal processing [16], we frequently have to deal with a sequence of blockades resulting from complex molecular interactions with unknown states. For this reason, we are interested in *de novo* learning of emission content and duration distributions corresponding to these stationary blockade states. In this study we research several approaches to the problem of duration and content sensor learning in the context of nanopore ionic flow blockades analysis.