Skip to main content

Table 1 Automated Candidate Gene Prediction Systems

From: Comparison of automated candidate gene prediction systems using genes implicated in type 2 diabetes by genome-wide association studies

Semi-Automated Systems

GeneSeeker is a semi-automated web-server tool which selects positional candidates based on expression and phenotypic data from human and mouse. Queries must be formulated by the end-user using Boolean expressions [13, 33]. â™  â—‡

Systems Biology Techniques

Prioritizer uses pathway and interaction data from KEGG [17, 34], Reactome [35], and HPRD [36]. Interactions are also predicted using a Bayesian technique based on GO keywords [23] and other databases [5].

In Gentrepid Common Pathway Scanning (CPS), pathways are associated with phenotypes using either known disease genes, or by searching for enrichment of pathways across multiple disease intervals associated with the phenotype [4]. â™ â—‡

Oti et al use protein-protein interaction data from HPRD [36], Y2H [37, 38], and PCP [39, 40] giving coverage of 10 894 human genes [24].

Genotype-Phenotype Mapping Methods

G2D [32] uses biomedical literature to associate pathological conditions with GO terms [23]. Candidate genes are identified by homology to GO-annotated disease-associated genes. â™ â—‡

Gentrepid Common Module Profiling (CMP) searches for enrichment of particular domains in gene clusters associated with a particular phenotype. Domains are extracted either from known disease genes or by comparison of multiple disease intervals [4]. â™ â—‡

POCUS searches for over-representation of functional annotation among multiple loci associated with the same disease. Functional annotation is based on keywords from InterPro domains [22] and GO [23]. No a priori knowledge of the phenotype is required [3]. â™ 

Techniques based on a bipartite distribution of "disease" and "non-disease" genes

The eVOC system uses text mining of biomedical literature to associate a phenotype with anatomy terms and links these with human expression data to produce a ranked list of disease genes. The classifier is a machine-learning technique, based on a bipartite training set of 17 known "disease genes" and 306 "non-disease genes" [30]. â™ 

DGP (Disease Gene Prediction) is a web tool which selects genes based on protein sequence properties. The properties analysed by DGP include protein length, degree of sequence conservation, the extent of phylogenetic relationship and paralogy patterns [31, 41]. â™ 

PROSPECTR (PRiOrization by Sequence and Phylogenetic Extent of CandidaTe Regions) uses an alternating decision tree to discriminate "disease genes" from "non-disease genes" using a classifier based on sequence features such as gene length, protein length, and similarity of homologs in other species [12]. â™ 

Hybrid techniques

SUSPECTS combines a genotype-phenotype mapping method based on disease-gene-associated keywords from InterPro and GO, and expression libraries, with the PROSPECTR Boolean classifier. Disease genes are prioritized [21]. â™  â—‡

  1. â™  Assessed here, â—‡ Webserver.