Skip to main content
Fig. 3 | BMC Bioinformatics

Fig. 3

From: FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences

Fig. 3

Comparison of performance of predicted annotations in three model species—ac A. thaliana, df O. sativa and gi Z. mays. Annotation Edit Distance (AED) is an assessment of how well predicted annotations agree with the evidence and was used as a quality control metric. A value of 0 denotes complete agreement of two annotations while a value of 1 denotes that the ‘gold standard’ reference annotation was not detected. Transcripts from ‘gold standard’ reference annotations that are not detected in any of the predicted annotations are removed from analysis. a, d, g Distribution of AED scores. Violin plots wider at the base indicate high density of annotations with lower AED. FINDER was able to create gene models having lowest AED resulting in a wide base. Gene models generated by FINDER were enhanced by adding predictions made by BRAKER and including protein evidence. Wilcoxon’s signed rank test was used to compare the AED scores between FINDER and other annotating pipelines. The “***” symbol implies that the AED scores of FINDER gene models were significantly lesser (p_value < 0.01) than the AED scores of the gene models reported by other pipelines. b, e, h Bar plot of F1 score of multiple approaches of annotation. Having a high nucleotide F1 (Base F1) or a high exon F1 score is not sufficient to conclude a good annotation. High value of transcript F1 score is indicative of good gene models with high sensitivity and high specificity. c, f, i Stacked bar plot showing percentage of transcripts in each of the four groups of AEDs. Higher number of transcripts to low AED denotes better annotation. In each of the three species, FINDER was able to generate a higher percentage of transcripts with low AED compared to other techniques of annotation. (Generated using ggplot2 v3.3.3)

Back to article page