Skip to main content

Table 3 Comparison of oligo kernels (OK) with inhomogeneous Markov models of order 0 (MM1) and order 1 (MM2) based on monomer and dimer occurrences, respectively. All higher order Markov models led to a severe breakdown of the performance with an error rising to ≈ 30 percent. The best spectrum kernel (SK) among the position-independent oligo kernels (σ → ∞) with K = 1,...,6 is incorporated into the comparison in order to stress the importance of position information. The table shows the mean classification error, given in percent, on the test sets. The rates are averages over 50 runs on randomly partitioned data. The lowest classification error is achieved by the combined oligo kernel OK1...6 with simple adding of length 1,...,6 kernels. The combined oligo kernel is closely followed by the best single length trimer kernel OK3 which still performs better than the two Markov model based methods. Obviously, the "best" position-independent kernel SP2, based on dimer occurrences is performing worst, only slightly better than classification by chance.

From: Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites

method

OK3

OK1...6

MM1

MM2

SP2

mean (median) error

8.9 (8.7)

8.1 (7.8)

11.4 (11.4)

11.3 (11.4)

44.6 (44.9)