Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Integrating phenotype and gene expression data for predicting gene function

Figure 1

Pseudocode for annotation prediction algorithm. The algorithm predicts whether gene g should be annotated with annotation a. The algorithm consists of four key steps. First, threshold_low, which is the lowest similarity of any gene known to have a to all other genes known to have a, is calculated. Next, threshold_high, which is the highest similarity of any gene known to not have a to all genes known to have a, is calculated. Then, total_similarity of g to all genes known to have a is calculated. Finally, the prediction is made. If total_similarity exceeds threshold_high, then g is always predicted to have annotation a. If total_similarity is less than threshold_low, then g is never predicted to have annotation a. If total_similarity falls between threshold_low and threshold_high, then it is linearly interpolated between the two thresholds to produces a number between 0 and 1. Specifically, the formula for the linear interpolation is . An predefined cutoff, such as 0.5, is then used to predict whether or not to assign the annotation to gene g. Thus, if cutoff = 0.5 and interpolated_sim = 0.6 for gene g and annotation a, then gene g would be predicted to have annotation a.

Back to article page