Data sources and approaches used in automated candidate gene prediction methods. (A): Most systems draw on at least two types of data. SUSPECTS  (not shown) uses keywords from InterPro  and GO , co-expression data, and also incorporates the PROSPECTR module  (shown on right). (B): Upper left Gene clustering approaches associate a gene cluster with a phenotype via a group member. For example, Systems Biology approaches [4, 5, 24] group genes whose protein products interact; and link them to a phenotype using a group-member gene associated with the phenotype. Systems Biology methods assume oligogenic diseases are associated with disruption in proteins that participate in a common complex or pathway . Other gene clustering systems look for enrichment of keywords or domains associated with particular phenotypes and suggest candidate genes with similar properties. These systems are based on the principle that candidate genes have similar functions to disease genes already determined [26–28]. Upper right Phenotype clustering approaches such as that of Freudenberg & Propping  group related phenotypes into superphenotypes. Lower left Most of the Machine Learning approaches do not use phenotype information and are based on the concept that the genome consists of a bipartite distribution of genes: those which cause diseases, and those that do not. By analysing these two gene sets with respect to discriminating variables, a profile for "non-disease genes" and "disease genes" is produced which enables training of a classifier. A novel gene submitted to the classifier is flagged as either "disease-causing" or "non-disease causing". Systems include eVOC , PROSPECTR , SUSPECTS  and DGP . Finally G2D, lower right, is a transitive method that maps phenotypes to genes  by interfacing literature- and keyword-based ontologies.