Skip to main content

Table 6 Performance scores for models constructed on the Dunedin data and tested on validation data sets

From: A comparison of feature selection methodologies and learning algorithms in the development of a DNA methylation-based telomere length estimator

Estimator

Feature- selection method

Features in estimator (CpGs)

 

EXTEND

TWIN

MAE

MAPE

Correlation [CI 83.4%]

MAE

MAPE

Correlation [CI 83.4%]

Baseline

None

832

0.550

38.31

0.166

0.711

114.50

0.070

PCA-EN TL

PCA

111*

0.570

39.37

0.295

[0.201, 0.384]

0.718

112.46

0.074

[− 0.031, 0.177]

F-test-0.01-EN TL

F-test (0.01)

3893

0.614

42.93

− 0.003

[− 0.103, 0.097]

0.728

106.52

0.119

[0.015, 0.221]

F-test-0.05-EN TL

F-test (0.05)

5448

0.626

43.35

0.07

[− 0.031, 0.169]

0.752

102.35

0.092

[− 0.012, 0.194]

r-EN TL

Pearson’s R

251

0.599

41.31

0.136

[0.036, 0.233]

0.718

116.93

− 0.102

[− 0.204, 0.002]

Boost-EN TL

Gradient Boosting

446

0.601

41.39

0.085

[− 0.016, 0.184]

0.725

114.49

0.052

[− 0.053, 0.155]

MI-EN TL

Mutual Information

407

0.640

43.51

0.203

[0.105, 0.297]

0.753

107.83

− 0.067

[− 0.17, 0.038]

LSVR-EN TL

Linear SVR

4945

0.620

43.13

0.114

[0.014, 0.212]

0.760

108.37

0.006

[− 0.098, 0.11]

RF-EN TL

Random Forest

1059

0.615

41.83

0.135

[0.035, 0.232]

0.762

106.23

0.044

[− 0.061, 0.148]

  1. Metrics include MAE, MAPE and Pearson correlation (predicted and actual TL). Confidence intervals are shown for correlations. The number of features in each estimator is also shown. The estimator Baseline refers to the model that utilised elastic net regression with no prior feature-selection stage. Estimator names denote the feature-selection method and the regression algorithm used e.g., F-test-0.01-EN TL refers to the F-test feature-selection stage with FDR of 0.01, followed by application of elastic net. *denotes principal components. Values in bold text relate to a selection of the best models