Table 1 Description of annotation for the three newly introduced features

From: A model to predict the function of hypothetical proteins through a nine-point classification scoring schema

Feature Principle Scoring criteria Result
Pseudogenes linked to HPs It is generally believed that the majority of HPs are the products of pseudogenes. Follow-up of BLAST: if the hits do not have starting codon ATG across six reading frames, then it may be assumed to be a pseudogene. Predicted and synthetic sequences, sequences with end-to-end alignment are ignored. Sequences from Homo sapiens with E- value less than zero are considered. Sequences starting without methionine and meeting all the above criteria were given 1, otherwise 0.
Homology Modelling As sequence-structure implies function, it is possible to assign function to HP if we could model the protein to find any interacting domains. Based on % identity between query and PDB template If there is more than 30% similarity, score = 1, otherwise 0.
Non-coding RNAs associated to HPs Most of the HPs from GenBank lack protein coding capacity and some of them may themselves be noncoding RNAs The top three hits are considered for sequences from Homo sapiens, while the top five hits are considered when there is no considerable difference between scores. If the above criterion is met, score 1, otherwise 0.