Skip to main content


Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

Figure 1 | BMC Bioinformatics

Figure 1

From: Comparison of automated candidate gene prediction systems using genes implicated in type 2 diabetes by genome-wide association studies

Figure 1

Data sources and approaches used in automated candidate gene prediction methods. (A): Most systems draw on at least two types of data. SUSPECTS [21] (not shown) uses keywords from InterPro [22] and GO [23], co-expression data, and also incorporates the PROSPECTR module [12] (shown on right). (B): Upper left Gene clustering approaches associate a gene cluster with a phenotype via a group member. For example, Systems Biology approaches [4, 5, 24] group genes whose protein products interact; and link them to a phenotype using a group-member gene associated with the phenotype. Systems Biology methods assume oligogenic diseases are associated with disruption in proteins that participate in a common complex or pathway [25]. Other gene clustering systems look for enrichment of keywords or domains associated with particular phenotypes and suggest candidate genes with similar properties. These systems are based on the principle that candidate genes have similar functions to disease genes already determined [2628]. Upper right Phenotype clustering approaches such as that of Freudenberg & Propping [29] group related phenotypes into superphenotypes. Lower left Most of the Machine Learning approaches do not use phenotype information and are based on the concept that the genome consists of a bipartite distribution of genes: those which cause diseases, and those that do not. By analysing these two gene sets with respect to discriminating variables, a profile for "non-disease genes" and "disease genes" is produced which enables training of a classifier. A novel gene submitted to the classifier is flagged as either "disease-causing" or "non-disease causing". Systems include eVOC [30], PROSPECTR [12], SUSPECTS [21] and DGP [31]. Finally G2D, lower right, is a transitive method that maps phenotypes to genes [32] by interfacing literature- and keyword-based ontologies.

Back to article page