Figure 1

From: Automatic discovery of cross-family sequence features associated with protein function

Figure 1

Outline of approach: simultaneous sequence and annotation classifications. Part of the dataset is shown with sequences (to the left) and Swiss-Prot annotation words (to the right). The evolutionary search produces two independent classifiers which act on the two types of information. Fictional examples of these classifiers are shown. Two binary vectors are produced from the application of these classifiers to their respective inputs. Ideally, a pair of classifiers would produce identical (non-trivial) binary vectors. The goal of the evolutionary search is to maximise the correlation between these vectors.

