Skip to main content

Table 1 Description of annotation for the three newly introduced features

From: A model to predict the function of hypothetical proteins through a nine-point classification scoring schema

Feature

Principle

Scoring criteria

Result

Pseudogenes linked to HPs

It is generally believed that the majority of HPs are the products of pseudogenes. Follow-up of BLAST: if the hits do not have starting codon ATG across six reading frames, then it may be assumed to be a pseudogene.

Predicted and synthetic sequences, sequences with end-to-end alignment are ignored. Sequences from Homo sapiens with E- value less than zero are considered.

Sequences starting without methionine and meeting all the above criteria were given 1, otherwise 0.

Homology Modelling

As sequence-structure implies function, it is possible to assign function to HP if we could model the protein to find any interacting domains.

Based on % identity between query and PDB template

If there is more than 30% similarity, score = 1, otherwise 0.

Non-coding RNAs associated to HPs

Most of the HPs from GenBank lack protein coding capacity and some of them may themselves be noncoding RNAs

The top three hits are considered for sequences from Homo sapiens, while the top five hits are considered when there is no considerable difference between scores.

If the above criterion is met, score 1, otherwise 0.