Skip to main content
Fig. 3 | BMC Bioinformatics

Fig. 3

From: SamPler – a novel method for selecting parameters for gene functional annotation routines

Fig. 3

Scheme of the semi-automated method implemented in merlin’s EC number annotation tool. Merlin proposes × 1 entries (5 to 10% of the sequences to be annotated) for manual curation which will become standard of truth. Then, for each α value the corresponding automatic annotations are retrieved and assessed against the standard of truth. After assessing each entry for each α value, merlin calculates a confusion matrix for each pair (threshold, α value). This multi-dimensional array, of confusion matrices, allows calculating the accuracy of each α, and the precision and NPV of each pair. Finally, the number of records between thresholds (taking into account the error allowed in precision and NPV) are assigned as entries to be curated and the curation ratio score helps determining the best α value and thresholds as a function of the highest accuracy divided by the ratio of entries to be curated. An error up to 25% is allowed in both precision and NPV

Back to article page