Predictor map. A total of 500 sequence-to-function predictors were produced by the self-supervised genetic programming approach. In this figure, an 8 × 8 Self-Organizing Map (SOM) is used to cluster the predictors based on the pattern of sequence-based test set classifications. Predictors which classify similar subsets of the sequences will be localised to the same region of the map. Each SOM node is annotated as follows (the example used is at row 3 column 2): the number of "A-type" and "B-type" predictors which map to this node (e.g. "4A + 2B"); the common target words for the annotation-based classifier and their frequencies (e.g. "2 biosynthesis, 2 mitochondrial"); the inset boxes show which annotation words are over-represented in the test set sequences which are positively classified by the sequence-based classifier (e.g. "oxidised"). See Methods for detailed information.