Learning curves for drug response prediction in cancer cell lines

BMC Bioinformatics

Table 2 Prediction errors of all dataset-model combinations

Dataset	ML Model	\(m_K=\|T\|\)	\(\tilde{y}_K\)	\(\Delta _{\tilde{y}}\)	\(\tilde{y}(m=2\|T\|)\)	\(m(\tilde{y}=0.9\tilde{y}_K)\)
GDSC1	dGBDT	115,863	0.0665	N/A	0.0661 (0.68%)	N/A
	hGBDT		0.0611	8.16%	0.0586 (4.14%)	649,056 (x5.6)
	sNN		0.0602	9.46%	0.0560 (7.07%)	312,381 (x2.7)
	mNN		0.0574	13.69%	0.0532 (7.33%)	304,224 (x2.6)
GDSC2	dGBDT	78,423	0.0586	N/A	0.0581 (0.93%)	N/A
	hGBDT		0.0518	11.69%	0.0496 (4.15%)	598,003 (x7.6)
	sNN		0.0512	12.70%	0.0478 (6.58%)	232,820 (x3.0)
	mNN		0.0509	13.21%	0.0477 (6.26%)	247,656 (x3.2)
CTRP	dGBDT	203,650	0.0497	N/A	0.0495 (0.34%)	N/A
	hGBDT		0.0429	13.63%	0.0407 (5.15%)	789,843 (x3.9)
	sNN		0.0384	22.60%	0.0345 (10.17%)	402,308 (x2.0)
	mNN		0.0355	28.58%	0.0302 (14.96%)	322,865 (x1.6)
NCI-60	dGBDT	675,000	0.0554	N/A	0.0554 (0.04%)	N/A
	hGBDT		0.0326	41.16%	0.0313 (3.93%)	18,355,942 (x27.2)
	sNN		0.0333	39.95%	0.0311 (6.59%)	2,109,907 (x3.1)
	mNN		0.0321	42.17%	0.0305 (4.69%)	5,175,827 (x7.6)

\(\tilde{y}_K\): prediction error of models trained with the full training set size. \(\Delta _{\tilde{y}}\): improvement in prediction error as compared with the dGBDT baseline. \(\tilde{y}(m=2|T|)\): expected prediction error if the training size is doubled (in parentheses is the percentage reduction in the error score as compared with \(\tilde{y}_K\)). \(m(\tilde{y}=0.9\tilde{y}_K)\): training size required to reduce the error score by 10% (in parentheses is the required increase in sample size as a factor of |T| to achieve the score)

ISSN: 1471-2105