Skip to main content
Figure 2 | BMC Bioinformatics

Figure 2

From: Re-annotation of genome microbial CoDing-Sequences: finding new genes and inaccurately annotated genes

Figure 2

Assignation of a status to some additional CDSs. A. The annotated Genes Not Found by the AMIGA method (CDSd). B. The potential AMIGA New Genes (CDSa). The procedure takes into account the length of the CDS, its coding probability, results of similarity search in the non-redundant protein databank and overlaps between adjacent CDSs, these CDSs being an AMIGA CDS (CDSa) and a databank CDS (CDSd) (see text). Although all situations are investigated in the procedure, there are obviously preferred ways (thick arrows): for example a CDSa of the lst-NG>=Sure-Pc list is often found with no overlap with a CDSd. In this case, the CDSa often has a length below 300 bp and, either no similarity (AMBIGUOUS status) or similarity (NEW status) with proteins in the databank. If a CDSa does overlap a CDSd, the last one often has a weak coding probability and no similarity with proteins in the databank (in this case, the CDSa has the NEW status). Therefore it is extremely rare to found a CDSa of the lst-NG>=Sure-Pc in overlap with a CDSd having a strong coding probability, this overlap between the two CDSs being also important (broken arrows). In case of A. pernix and P. horikoshii the threshold for the CDSd length has been fixed to 600 bp instead of 300 bp. This choice is motivated by the nature of the annotation procedure of the authors of the genome sequences (see text). (L) length; (Pc) coding probability; (lst-NG>=Sure-Pc) list of CDSa having a coding probability above 0.4; (lst-GNF<Min-Pc) list of CDSd having a coding probability below 0.2.

Back to article page