Skip to main content

Table 1 The feature set. The list of features which were made available to the machine learning application (Weka) to build the alternating decision tree.

From: Speeding disease gene discovery by sequence based candidate prioritization

Feature

Source

Description

Gene length

EnsemblMart 22.1

Length of gene in bp.

CDS length

EnsemblMart 22.1

Length of coding sequence in bp.

cDNA length

EnsemblMart 22.1

Length of complementary DNA in bp.

Protein length

EnsemblMart 22.1

Length of protein in aa.

Length of 3' UTR

EnsemblMart 22.1

The length of the 3' untranslated region (UTR) in bp

Length of 5' UTR

EnsemblMart 22.1

The length of the 5' untranslated region (UTR) in bp

Distance to nearest neighbouring gene

EnsemblMart 22.1

Distance to the next known gene on the same chromosome on either strand in bp.

Number of exons

EnsemblMart 22.1

Number of exons in the gene.

GC

EnsemblMart 22.1

GC content (as a %) of gene

Transmembrane

EnsemblMart 22.1

Prediction of transmembrane domains (1 for yes or 0 for no)

Signal peptide

EnsemblMart 22.1

Prediction of signal peptide (1 for yes or 0 for no)

Paralog

EnsemblMart 22.1

If the gene has a paralog in the human genome (1 for yes or 0 for no)

Paralog % identity

EnsemblMart 22.1

% protein identity of best paralog in the human genome. Genes without paralogs have "unknown" entered here.

Mouse homolog % identity

Homologene

% protein identity of mouse homolog. Genes without a mouse homolog have "0" entered here.

Rat homolog % identity

Homologene

% protein identity of rat homolog. Genes without a rat homolog have "0" entered here.

Worm homolog % identity

Homologene

% protein identity of worm homolog (potentially 0, see above)

Fly homolog % identity

Homologene

% protein identity of fly homolog (potentially 0, see above)

Yeast homolog % identity

Homologene

% protein identity of yeast homolog (potentially 0, see above)

Arabidopsis homolog % identity

Homologene

% protein identity of Arabidopsis homolog (potentially 0, see above)

Mouse homolog Ka

Homologene

Measure of non-synonymous changes between human and mouse homolog.

Mouse homolog Ks

Homologene

Measure of synonymous changes between human and mouse homolog.

Mouse homolog Ka / Ks

Homologene

Ratio of above two fields.

CpG island at 3' end of gene

EnsemblMart 22.1

If a CpG island exists at the 3' end of the gene (1 or 0)

CpG island at 5' end of gene

EnsemblMart 22.1

If a CpG island exists at the 5' end of the gene (1 or 0)