Skip to main content

Table 1 The feature set. The list of features which were made available to the machine learning application (Weka) to build the alternating decision tree.

From: Speeding disease gene discovery by sequence based candidate prioritization

Feature Source Description
Gene length EnsemblMart 22.1 Length of gene in bp.
CDS length EnsemblMart 22.1 Length of coding sequence in bp.
cDNA length EnsemblMart 22.1 Length of complementary DNA in bp.
Protein length EnsemblMart 22.1 Length of protein in aa.
Length of 3' UTR EnsemblMart 22.1 The length of the 3' untranslated region (UTR) in bp
Length of 5' UTR EnsemblMart 22.1 The length of the 5' untranslated region (UTR) in bp
Distance to nearest neighbouring gene EnsemblMart 22.1 Distance to the next known gene on the same chromosome on either strand in bp.
Number of exons EnsemblMart 22.1 Number of exons in the gene.
GC EnsemblMart 22.1 GC content (as a %) of gene
Transmembrane EnsemblMart 22.1 Prediction of transmembrane domains (1 for yes or 0 for no)
Signal peptide EnsemblMart 22.1 Prediction of signal peptide (1 for yes or 0 for no)
Paralog EnsemblMart 22.1 If the gene has a paralog in the human genome (1 for yes or 0 for no)
Paralog % identity EnsemblMart 22.1 % protein identity of best paralog in the human genome. Genes without paralogs have "unknown" entered here.
Mouse homolog % identity Homologene % protein identity of mouse homolog. Genes without a mouse homolog have "0" entered here.
Rat homolog % identity Homologene % protein identity of rat homolog. Genes without a rat homolog have "0" entered here.
Worm homolog % identity Homologene % protein identity of worm homolog (potentially 0, see above)
Fly homolog % identity Homologene % protein identity of fly homolog (potentially 0, see above)
Yeast homolog % identity Homologene % protein identity of yeast homolog (potentially 0, see above)
Arabidopsis homolog % identity Homologene % protein identity of Arabidopsis homolog (potentially 0, see above)
Mouse homolog Ka Homologene Measure of non-synonymous changes between human and mouse homolog.
Mouse homolog Ks Homologene Measure of synonymous changes between human and mouse homolog.
Mouse homolog Ka / Ks Homologene Ratio of above two fields.
CpG island at 3' end of gene EnsemblMart 22.1 If a CpG island exists at the 3' end of the gene (1 or 0)
CpG island at 5' end of gene EnsemblMart 22.1 If a CpG island exists at the 5' end of the gene (1 or 0)