Skip to main content
Figure 3 | BMC Bioinformatics

Figure 3

From: DomSign: a top-down annotation pipeline to enlarge enzyme space in the protein universe

Figure 3

DomSign performance comparison with BLAST and FS models by 1,000-fold cross-validation of “sprot protein”. Three levels of 1,000-fold cross-validations were conducted for each method. Homologs of a query above a given threshold (“identity ≤ 100%”, “identity ≤ 60%” and “identity ≤ 30%” described in Methods) were removed from the reference dataset and, for each reference dataset, only sequences below the given threshold were kept. In this test, an 80% specificity threshold, 10−3 E-value and default parameters were applied to the DomSign, BLASTP and FS models. The relative standard errors were not significant (<1%) and therefore are not illustrated here. (A) Results for the evaluation of the three methods. As shown on the right, four attributes are defined to evaluate the annotation results in contrast to the “true EC number” (see Methods for details). (B) The EC hierarchy level distribution in the annotation results of the three methods. Seven attributes are defined here to describe the annotation results. Among them, “No best hit” is specific to BLASTP. “More than one EC” is specific to the FS model because this dataset encompasses only enzymes with single EC numbers or non-enzymes, and this attribute is regarded as “OP” in Panel A. We integrated the annotation result “Non-enzyme” and “EC = −.-.-.-”, as shown in Figure 2 into one unified group, “Non-enzyme”, in the result’s illustration because the latter has no EC number assigned and also only occupies a small fraction of the annotation results (the ratio of the “EC = −.-.-.-” subclass is only 1.4% in the “identity ≥ 100%” group for DomSign) of the annotation results.

Back to article page