Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Efficient discovery of responses of proteins to compounds using active learning

Figure 1

An active learning pipeline for an experimental space with N proteins and M compounds. (a) A round of active learning begins with the data for all of the experiments that have been observed so far. (b) A separate model is constructed for each protein using the compound features to make predictions for the effect of each compound on the activity of that protein. This is illustrated for Protein 2 for which regression using the observed experiments for Compounds 2 and 5 predicts that Compound 4 would show an activity of 6. This model is referred to as CFO. (c) A separate model is constructed for each compound using the protein features to make predictions for the effect of that compound on the activity of each protein. This is illustrated for Compound 4 for which regression using the observed experiments for Proteins 4 and N predicts that Protein 2 would show an activity of 2. This model is referred to as PFO. (d) For the CCT approach, if predictions from both methods are available, they are averaged. (In the early rounds when no experiments may have been observed for a given protein or compound, predictions from both models may not be possible). (e) The complete set of observations and predictions is shown, and experiments that would be chosen for the next round of acquisition by different methods are shown (greedy selection would pick the experiments with the highest predicted values, while density selection would pick experiments for compounds and proteins that are most different from those previously selected). The results for the chosen experiments will be added to those observed so far to begin the next round of active learning.

Back to article page