Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: CysPresso: a classification model utilizing deep learning protein representations to predict recombinant expression of cysteine-dense peptides

Fig. 2

Dataset partitioning and time series classification improves prediction of knottin expressibility. A Splitting the dataset into non-knottin and knottin partitions improves prediction of knottin but not non-knottin CDP expressibility when using random forest classifiers. B Transforming AlphaFold2 protein representations with ROCKET time-series classification further improves prediction of knottin but not non-knottin CDP expressibility. Error bars represent standard deviation of the mean. C Mean AUC rank for various models at predicting non-knottin expressibility (50 permutations). Training random forest algorithms on the combined AlphaFold2 representation provided the best performance. Ranks that are not significantly different are connected by horizontal lines. D Mean AUC rank for various models at predicting knottin expressibility (50 permutations). Utilizing ROCKET on the combined AlphaFold2 representation provided the best performance. Ranks that are not significantly different are connected by horizontal lines. E Confusion matrix of the final machine learning model for non-knottin CDPs evaluated by leave-one-out cross validation. F Confusion matrix of the final machine learning model for knottins evaluated by leave-one-out cross validation

Back to article page