Skip to main content

Table 2 Prediction errors of all dataset-model combinations

From: Learning curves for drug response prediction in cancer cell lines

Dataset

ML Model

\(m_K=|T|\)

\(\tilde{y}_K\)

\(\Delta _{\tilde{y}}\)

\(\tilde{y}(m=2|T|)\)

\(m(\tilde{y}=0.9\tilde{y}_K)\)

GDSC1

dGBDT

115,863

0.0665

N/A

0.0661 (0.68%)

N/A

hGBDT

0.0611

8.16%

0.0586 (4.14%)

649,056 (x5.6)

sNN

0.0602

9.46%

0.0560 (7.07%)

312,381 (x2.7)

mNN

0.0574

13.69%

0.0532 (7.33%)

304,224 (x2.6)

GDSC2

dGBDT

78,423

0.0586

N/A

0.0581 (0.93%)

N/A

hGBDT

0.0518

11.69%

0.0496 (4.15%)

598,003 (x7.6)

sNN

0.0512

12.70%

0.0478 (6.58%)

232,820 (x3.0)

mNN

0.0509

13.21%

0.0477 (6.26%)

247,656 (x3.2)

CTRP

dGBDT

203,650

0.0497

N/A

0.0495 (0.34%)

N/A

hGBDT

0.0429

13.63%

0.0407 (5.15%)

789,843 (x3.9)

sNN

0.0384

22.60%

0.0345 (10.17%)

402,308 (x2.0)

mNN

0.0355

28.58%

0.0302 (14.96%)

322,865 (x1.6)

NCI-60

dGBDT

675,000

0.0554

N/A

0.0554 (0.04%)

N/A

hGBDT

0.0326

41.16%

0.0313 (3.93%)

18,355,942 (x27.2)

sNN

0.0333

39.95%

0.0311 (6.59%)

2,109,907 (x3.1)

mNN

0.0321

42.17%

0.0305 (4.69%)

5,175,827 (x7.6)

  1. \(\tilde{y}_K\): prediction error of models trained with the full training set size. \(\Delta _{\tilde{y}}\): improvement in prediction error as compared with the dGBDT baseline. \(\tilde{y}(m=2|T|)\): expected prediction error if the training size is doubled (in parentheses is the percentage reduction in the error score as compared with \(\tilde{y}_K\)). \(m(\tilde{y}=0.9\tilde{y}_K)\): training size required to reduce the error score by 10% (in parentheses is the required increase in sample size as a factor of |T| to achieve the score)