EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance

BMC Bioinformatics

Table 2 Specificity, sensitivity and precision estimates for different gene finders in E. coli.

Data set	EasyGene	Glim	rbs-Glim	Orpheus	Gm24	GmS	Gmhmm	Frame
A'-% found	98.4	98.9/98.9	98.9	98.0/95.3	91.5	97.2	98.1	97.0
A'-% exact	93.8	98.9/95.3	84.1	95.1/92.4	41.6	88.0	85.7	93.2
B'-% found	98.4	98.5/98.6	98.6	95.9/96.5	90.2	96.6	97.2	96.4
T-% found	98.1(98.0)	98.3/98.4	98.4	96.5/95.6	89.8	96.3	97.1	96.1
Genome	4145	6827/5756	5756	9333/7543	3552	4064	4230	4064
zero order	7	169/211	211	6761/5430	6	153	1459	0
first order	7	545/723	723	6836/4804	13	241	830	0
third order	1	2423/2694	2694	6582/4817	43	659	866	1
shadows	0	19/21	21	22/9	1	0	2	0

Upper part shows the percentage of genes found exactly (both 5' and 3' end) and partially (only 3' end exact) for different gene finders and sets of high confidence genes in E. coli. For Glimmer and Orpheus, the numbers before the "/" are based exclusively on their ORF scores and recommended threshold whereas the numbers after the "/" are based on their post-processing procedures. The number of genes predicted in the whole genome is also shown. This should be compared to the 4288 annotated genes in E. coli. The lower part of the table shows the number of false positives predicted in random sequences generated by Markov chains of order 0, 1 and 3 and the very last row shows the number of false predictions in the shadows of the high-confidence genes in data set A. All values listed for EasyGene are based on an R-value threshold of R = 2.

ISSN: 1471-2105