 Methodology article
 Open Access
 Published:
Data integration by multituning parameter elastic net regression
BMC Bioinformatics volume 19, Article number: 369 (2018)
Abstract
Background
To integrate molecular features from multiple highthroughput platforms in prediction, a regression model that penalizes features from all platforms equally is commonly used. However, data from different platforms are likely to differ in effect sizes, the proportion of predictive features, and correlations structures. Subtle but important features may be missed by shrinking all features equally.
Results
We propose an Elastic net (EN) model with separate tuning parameter penalties for each platform that is fit using standard software. In a comprehensive simulation study, we evaluated the performance of EN logistic regression with multiple tuning penalties. We found that when the number of informative features differs among the platforms, and when there is no notable correlation between the features from different platforms, the multituning parameter EN yields more predictive models. Moreover, the multituning parameter EN is robust, in the sense that there is no loss of predictivity relative to a single tuning parameter EN when features across all platforms have similar effects. We also investigated the performance of multituning parameter EN using real cancer datasets.
Conclusion
The proposed multituning parameter EN model, fit using standard penalized regression software, can achieve better prediction in sample classification when integrating multiple genomic platforms, compared to the traditional method where a single penalty parameter is used for all features in different platforms.
Background
As multiplatform profiling of tissues enabled by advances in highthroughput ‘omic’ technologies becomes routine, efficient statistical methods to integrate multiomic data is becoming increasingly important. Multiomic profiling has been used to successfully investigate prognostic biomarkers and identify aberrant pathways in cancer [1, 2], enhance clustering and subclassification [3, 4], and improve prediction of cancer prognosis and therapeutic response [5,6,7]. Multiomic data integration can be challenging for several reasons. First, different data types will typically have different scales of measurement. To meaningfully integrate data sets with diverse scales, proper standardization and/or data transformation is required. A second challenge is the increasingly highdimensionality of multiplatform ‘omic’ data. This could be addressed by feature prescreening methods such as sure independence screening [8], or by dimensional reduction techniques such as principal component analysis [9] or partial least squares [10]. The third challenge, which to the best of our knowledge has not been yet fully addressed, is the potentially different contributions of individual data types in the final prediction models.
Regularized regression with a sparsity inducing penalty (e.g. LASSO [11], Elastic Net [12], SCAD [13]) is a common approach to feature selection for building predictive models based on highdimensional data, and can be effectively used for a joint analysis of multiomic profiles measured on the same samples. For example, Taskesen et al. used the standard LASSO for classification of samples measured on methylation and expression platforms [14] by including all the features from both platforms. In regularized regression, a tuning parameter controls the degree of shrinkage applied to the regression coefficients, and penalties that induce sparsity shrink many coefficients to exactly zero, performing in effect model selection. However, typical regularized regression approaches, would apply the same degree of shrinkage to all features regardless of their omic type, which can be suboptimal if the number and effect sizes of predictive features differ between data types. If for example, there were fewer predictive gene expression features than DNA methylation features, but the predictive expression features had larger effects than the predictive methylation features, forcing a common degree of shrinkage could result in all methylation feature coefficients shrunk to zero, and a final model containing only expression features. Independently predictive methylation features with subtler effects would be missed. Approaches that account for these differences may offer improved predictive performance.
A natural question of interest is whether allowing for differential shrinkage across features from different omic types by using a separate tuning parameter for each type, can yield better predictive models. Taskesen et al. addressed this to some extent in a stagewise analysis, performing separate analysis of omic platforms prior to creating a single classifier from the posterior probabilities of each model fit [14]. In this approach, individual features from the separate platforms were not combined in their final classifier but only used to pick the more predictive data type. Even if the selected features from multiple platforms were combined, such a twostage approach would likely result in a different set of selected features having different prediction potential. It is this joint analysis using features from multiple platforms that is our goal. Note that allowing differential shrinkage across different platforms is a distinct issue from how to deal with subgroups of features, such as coregulated genes within a cluster structure, that one may have reason to believe apriori are either all predictive or not as a group, and which has been addressed by methods such as the group LASSO [15,16,17] that encourage the selection of either all or none of the features in each subgroup.
In this paper, we investigate the performance of a multituning parameter elastic net regression (MTP EN) with separate tuning parameters for each omic type. Through simulations with a range of scenarios differing in number of predictive features, effect sizes, and correlation structures between omic types, we show that MTP EN can yield models with better prediction performance. We apply the MTP EN to publically available prostate cancer and acute myeloid leukemia data sets from the Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO).
Methods
Setup and notation
We assume we have training data consisting of highdimensional features from multiple omic types (e.g. gene expression, DNA methylation, somatic mutations, etc.), measured on n samples, and an outcome of interest (e.g. cancer vs. normal tissue). The goal is to use the training data to build a model that can be used to predict the outcome for new samples based on their corresponding multiomic profiles. This is a standard supervised learning setting with the particularity that the features are from multiple omic types.
For simplicity, we focus our presentation on a binary outcome and omic features of two types, but the multituning parameter regression framework can be applied to other outcomes (e.g. multinomial, continuous, time to event, etc.) and more than two data types. For sample i, i = 1, …, n, we denote the binary outcome by y_{i} ∈ {0, 1} and its corresponding omic profile as the partitioned vector \( {\boldsymbol{x}}_{\boldsymbol{i}}=\left({\boldsymbol{x}}_i^{(1)},{\boldsymbol{x}}_i^{(2)}\right) \), where \( {\boldsymbol{x}}_i^{(1)} \) and \( {\boldsymbol{x}}_i^{(2)} \) represent the features from the first and second omic type, respectively. The number of features of each type is usually much larger than the sample size n, and is in the hundreds or thousands in typical studies. For example, gene expression is measured for thousands of genes/features, DNA methylation for hundreds of thousands of CpGs, and genotypes for millions of SNPs. We let p_{1} and p_{2} be the number of features of the first and second omic type respectively, and p = p_{1} + p_{2} be the total number of features. In typical applications, there is also a lowdimensional vector of personal and/or clinical features (e.g. age, gender, etc.), that may also be predictive and included in the feature set, but for simplicity we omit here any additional nonomic features.
We denote by y = (y_{1}, …, y_{n}) the vector of outcomes and by X^{(1)} and X^{(2)} the design matrices whose rows are the i^{th} sample vector \( {\boldsymbol{x}}_i^{(1)} \) and \( {\boldsymbol{x}}_i^{(2)} \) defined above. We denote by X= [X^{(1)} X^{(2)}] the matrix containing the full set of features obtained by rowwise concatenation of X^{(1)} and X^{(2)}.
In addition to constructing a model that generalizes well, i.e. accurately predicts y based on X in new subjects, a parsimonious prediction model based on a small subset of the full feature set is generally preferred. A model with fewer features is more interpretable and more easily translated into a custom assay deployable in a clinical setting.
Regularized regression with a sparsity inducing penalty is an effective way to simultaneously perform feature selection and parameter estimation to build a predictive model with a small subset of features. The simplest and most commonly used sparsity inducing penalty is the L_{1} norm, which gives rise to LASSO regression [11]. Regularization with a combination of the L_{1} and L_{2} norms, i.e. elastic net regression, typically outperforms feature selection with the LASSO in settings with correlated features [18]. Many additional extensions and variations of the LASSO have been proposed to, for example, reduce overshrinkage of larger coefficients (adaptive LASSO [19]) or to handle features with additional structure (e.g. group LASSO [15], fused LASSO [20], group LASSO with overlap [21], graph LASSO [22]). For definiteness, in this paper we focus on the elastic net, which includes the LASSO as a special case. However, customized tuning parameters for each omic type can be used with any type of regularized regression.
The elastic net regularization penalty is a weighted mixture of the LASSO (L_{1} norm) and ridge (square of L_{2} norm) penalties given by: \( N\left(\boldsymbol{\beta} \right)=\left(1\alpha \right)\frac{\mathbf{1}}{\mathbf{2}}{\left\Vert \boldsymbol{\beta} \right\Vert}_{{\boldsymbol{L}}_{\mathbf{2}}}^{\mathbf{2}}+\alpha\ {\left\Vert \boldsymbol{\beta} \right\Vert}_{{\boldsymbol{L}}_{\mathbf{1}}}=\left(1\alpha \right)\frac{\mathbf{1}}{\mathbf{2}}\sum \limits_{j=1}^p{\beta}_j^2+\alpha \sum \limits_{j=1}^p\left{\beta}_j\right \), where α, 0 ≤ α ≤ 1, is the weight given to the LASSO penalty and 1 − α the weight given to the ridge penalty. Both the LASSO (α = 1) and the ridge (α = 0) penalties are particular cases of the elastic net penalty.
Standard elasticnet logistic regression solves the penalized regression problem given by:
where \( l\left(\boldsymbol{y},\boldsymbol{X};{\beta}_0,\boldsymbol{\beta} \right)={\sum}_{i=1}^n\log \left(1+\exp \left({\beta}_0+{\boldsymbol{x}}_i^T\boldsymbol{\beta} \right)\right){\sum}_{i=1}^n{y}_i\left({\beta}_0+{\boldsymbol{x}}_i^T\boldsymbol{\beta} \right) \) is the standard logistic loglikelihood function and the regularization parameter λ ≥ 0 controls the degree of penalization applied to the vector of regression coefficients β (except the intercept β_{0} which is typically not penalized). The single regularization parameter λ is common to all features and is usually tuned by crossvalidation.
In this paper, we propose using separate tuning parameters for each omic type by solving the penalized regression problem given by,
where the vector of regression coefficients β = (β^{(1)}, β^{(2)}) is partitioned according to the omic type conformably to X = [X^{(1)} X^{(2)}].The regularization parameters λ_{1} ≥ 0 and λ_{2} ≥ 0 are now specific to each omic type. Our hypothesis is that a ‘custom’ degree of regularization for each type can account for intrinsic differences between the data types and lead to a selected model with better prediction performance.
Model fitting and parameter tuning
Although the twotuning parameter model (2) is nonstandard, it can be fitted using elasticnet regression software provided it has an option to use a weighted elasticnet penalty of the form
\( {N}_{\boldsymbol{w}}\left(\boldsymbol{\beta} \right)=\alpha {\sum}_{j=1}^p{w}_j\left{\beta}_j\right+\left(1\alpha \right){\sum}_{j=1}^p{w}_j{\beta}_j^2 \). Using a weighted penalty, the multituning parameter elasticnet penalty in (2) can be equivalently written as
with a pdimensional weight vector \( \boldsymbol{w}=\left(1,1,\dots, 1,\frac{\uplambda_2}{\uplambda_1},\frac{\uplambda_2}{\uplambda_1},\frac{\uplambda_2}{\uplambda_1}\right) \) with 1 in its first p_{1} entries (corresponding to the p_{1} features of the first omic type and the dimension of β^{(1)}) and a \( \kappa =\frac{\uplambda_2}{\uplambda_1} \)in the next p_{2} entries (corresponding to the p_{2} features of the second omic type and the dimension of β^{(2)}). Thus, the twopenalty model can be alternatively specified using the tuning parameters λ = λ_{1}, \( \kappa =\frac{\uplambda_2}{\uplambda_1} \), where λ controls the overall shrinkage on both types of omic features, and κ controls the shrinkage ratio between the two types. For κ = 1, the model reduces to the standard elastic net with a single tuning parameter.
To investigate the performance of the multituning parameter elastic net regression (MTP EN) for building predictive models based on multiomic features we conduct a series of simulations under a range of scenarios. To fit the MTP EN we use the efficient and widely used elasticnet implementation in the glmnet R package [23], which allows for a userspecified weighted penalty via the “penalty.factor” argument. Additional file 1 contains an R script implementing a complete and selfcontained analysis example using MTP EN.
We set the elastic net penalty α to ½ and tune the overall shrinkage, λ, and shrinkage ratio parameter, κ,by kfold crossvalidation (CV), with the area under the ROC curve, AUC, as the performance metric. The training data is randomly split into k equallysized folds, each with (approximately) the same proportion of positive (y = 1) and negative (y = 0) samples as the full training data. The MTP EN is trained on k − 1 folds for all values of (λ, κ) in a grid of possible tuning parameter values and the AUC is computed based on the validation heldoff fold. This is repeated, using in turn each of the folds as validation heldoff fold. The ‘optimal’ tuning parameters are the values κ = κ_{max} and λ = λ_{max} that maximize the average AUC across all folds. The optimal tuning parameter values are then applied to a fully independent test set to unbiasedly assess the model AUC. For model tuning we utilized both 5fold and 10fold CV, 5fold for the analysis of real data because of a small sample size in one class and 10fold for the simulation study.
Simulation study
We evaluate the MTP elasticnet through simulation, exploring varying proportions and effect sizes of the relevant features in each omic type, varying dimensionalities of the feature sets, and a range of correlation structures. Specifically, the binary outcome was generated based on a logistic regression model of the form:
where we set q_{j} features among the p_{j} in omic type j = 1, 2 (q_{j} ≪ p_{j}) to be predictive by arranging the vector of regression coefficients β_{(j)} to be sparse, with q_{j} nonzero and p_{j} − q_{j} zero entries. We set the effect size of the predictive features, i.e. the nonzero entries in omic type j = 1, 2 to a common value β_{j}.
We generated feature data X = [X^{(1)} X^{(2)}] by sampling from a multivariate normal distribution X~N(0, Σ), where Σ is a population covariance matrix with the following structure: i) dia(Σ) = 1, i.e. the X’s are already standardized and have marginal variances equal to one; ii) r_{j} features among the q_{j} informative ones in omic type j = 1, 2 have a common pairwise correlation ρ_{j} and correlation ρ_{12} with the counterpart set of features in the other platform iii) all remaining correlations are zero. The simulation scenarios are summarized in Table 1. For each scenario, we simulated 400 replicate data sets, 200 training and 200 test sets. Every data set included 500 features on 200 samples, with an expected 100 samples in each class (β_{0} = 0).
Model tuning parameters (λ, κ) were selected using the training data, and applied to the test data set for estimating prediction performance. To further reduce the dimensionality of the selected features beyond what is achieved by maximizing the training data prediction performance, we set λ to λ_{1se}, the largest value of λ such that the AUC is within one standard error of the maximum (achieved at λ = λ_{max}). This strategy by Friedman et al. [12] yields predictive performance similar to that achieved by setting λ = λ_{max} but with a more parsimonious model. Thus, the parameters κ = κ_{max} and λ = λ_{1se}, are used to estimate AUC from the independent test data set.
In addition to the AUC, we also report accuracy (1misclassification error) and sensitivity and specificity of feature selection. The accuracy was calculated as the number of samples correctly classified in testing dataset divided by the total number of samples. The sensitivity (specificity) was calculated as the number of informative (uninformative) features correctly selected in the final model divided by the total number of informative (uninformative) features.
Real data applications
Acute myeloid leukemia data
The Acute myeloid leukemia data for 344 samples were obtained from NCBI Gene Expression Omnibus, with accession numbers for gene expression and methylation data as GSE14468 (HOVONSAKK cohort) and GSE18700, respectively. The data was also used and described by Taskensen et al. [14]. Briefly, for each patient sample, Affymetrix HGU133 plus2.0 (Santa Clara, CA, USA) and HELPassay [24] was used to measure the gene expression and DNA methylation data, respectively. We filtered the gene features with fewer than 10 unique expression values, resulting in 46,083 gene features. All of the 25,626 DNA methylation features were considered in this study. Groups of AML patients that are characterized by common cytogenetic or molecular abnormality are denoted as subtypes. From the 15 subtypes studied by Taskesen et al. [14], we focus on the 7q AML subtype, characterized by partial or complete deletion of the genome fragments on the long arm of chromosome 7. In the 344 patients, 35 have 7q AML, and the rest can be characterized as non7q. The multituning parameter Elastic net was used to build classifiers by combining gene expression and DNA methylation data to differentiate 7q AML from the rest of the subtypes.
Prostate cancer
The prostate adenocarcinoma data was obtained from the Genomic Data Commons (GDC) Data Portal (Project ID TCGAPRAD) and assembled by our collaborators at USC. In total 444 samples with both RNASeq gene expression and HumanMethylation450 (HM450) array data available were considered. The goal is to classify tumor aggressiveness, defined by both grade and stage of prostate cancer. We defined the tumor samples with Gleason score 7 and below, and T category of T1 or T2 (T2a, T2b and T2c) as nonaggressive, and those with Gleason score 7 or higher and T category of T3 (T3a, T3b) and T4 as aggressive. This resulted in 143 samples defined as nonaggressive and 265 defined as aggressive. Thirtysix samples (8%) were omitted for being high grade/low stage (25) or low grade/high stage (11) and unknown aggressiveness. After filtering gene features with zero variance across all samples 20,216 gene features remained. For the HM450 DNA methylation array, we excluded probes targeting SNPs, mapped to the X and Y chromosomes, in crossreactive regions, and having missing values in at least one sample. This resulted in 371,513 remaining features. As customary, the gene expression data were log2 transformed to reduce the skewness of the gene expression distribution in its original scale.
Data analysis
We split the data into training and test sets to build a prediction model and assess its performance. Training was performed using a random 80% of the samples, stratified by outcome to preserve the original proportion of positive and negative cases in the full data. The remaining 20% of samples were used to compute the test AUC. In order to obtain stable estimates, the data splitting, model training and testing was repeated 50 times with the average performance from the independent test sets reported. In the training step, MTP EN penalties were tuned by 5fold crossvalidation over a grid of values ranging from 0.1 to 1.5 for the penalty ratio parameter κ, and a large grid of values for the overall penalty parameter λ. The 5fold CV was repeated 10 times and the optimal λ for each ratio value κ was selected as the one yielding the largest mean AUC across the 10 CV repeats.
Results
Simulation studies
The main finding (see Figure 1) is that when the number of informative features differs between the two omic types (q_{1} < q_{2}), differential penalization can yield a model with better prediction performance in terms of AUC. Specifically, the optimal parameter tuning favors a larger penalty on the omic type with fewer informative features (κ < 1) (Table 1, scenarios 1–3). Differential penalization increased AUC the most when the effect sizes are smaller in the omic type with fewer informative features (Figure 1). By contrast, when the two omic types have the same number of informative features (q_{1} = q_{2}), the optimal penalty ratio parameter κ is close to 1, indicating that the standard EN yields the best predictive performance. For Scenarios 1–3 we also evaluated the performance of MTPEN in terms of accuracy and sensitivity and specificity of feature selection (Additional file 2). Consistent with Figure 1, MTPEN achieves higher rates of correct classification and better sensitivity in Scenarios 1–3. In terms of specificity, MTPEN outperforms standard EN in Scenario 1. In Scenario 2 and 3, however, standard EN shows slightly higher specificity.
We explored the change of the optimal penalty ratio parameter as we varied characteristics of the true regression model one at a time. For the scenario with different numbers of informative features, q_{1} < q_{2}, the optimal penalty ratio parameter κ is smallest (i.e. there is maximal differential penalization) when the effect sizes are smaller in the omic type with fewer informative features (β_{1} < β_{2}), and it increases monotonically to 1 (i.e. less differential penalization) as the effect size, β_{1}, in the first omic type increases (Figure 2a). For fixed effect sizes β_{1} and β_{2}, the optimal penalty ratio parameter κ becomes smaller (more differential penalization) as the number of informative features in the second omic type, q_{2}, increases (Figure 2b). In this case, it is advantageous to penalize the omic with more noise features more highly. As the overall proportion of informative features of both types, \( \frac{q_1+{q}_2}{p} \), decreases relative to the total number of features, the optimal penalty ratio parameter κ approaches 1, i.e. less differential penalization is required to maximize the AUC (Figure 2c). With the decrease of \( \frac{q_1+{q}_2}{p} \), the AUC for both the standard and MTP EN decreases and also the difference in AUC between the two approaches decreases (results not shown).
Because correlation among features both within and between omic types is common, we also investigated the effects of correlations on the performance of MTP EN. Across all settings we fixed the effect sizes and number of informative features as described in the third scenario in Figure 1 (β_{1} > β_{2}, q_{1} < q_{2}). We compare the situations where the correlations are only between informative features within the individual platforms (Table 1, scenario 6), and correlations both within and between platforms (scenario 7). We find that the higher the correlations between informative features, the higher the AUC for a given weight parameter, and the closer the optimal weight is to 1. This can be explained by the fact that the higher correlations increase the chance the set of correlated informative features are selected by standard EN, improving prediction performance without requiring the help of weight parameters. As a matter of fact, the advantage of MTP EN comparing with the standard EN is more obvious when there are fewer correlated features (Additional file 3A), or the correlation coefficients are smaller (Additional file 3B and C). Further, the correlations between the informative features from different platforms have an even larger impact on the performance of MTP EN than the correlations between the features from a single platform (Additional file 3B and C).
Real data analysis
We also applied the MTP EN to two real cancer data sets, combining gene expression and DNA methylation data for outcome prediction. For the AML data, the goal is to discriminate the 7q subtype from the rest. Classifying patients as 7q is clinical significance as the subtype is characterized susceptibility to infection, quick aggravation, treatment resistance and poor prognosis [25, 26]. We found that the MTP EN achieved the best performance at the weight of 0.7 (solid line in Figure 3), indicating that by penalizing less the methylation features, we can obtain a better classifier than using standard elasticnet. This is consistent with previous finding that the molecular subtypes involving chromosomal abnormalities such as − 7/7q could not be correctly predicted using gene expression profiling alone [27], while DNA methylation signatures have been shown predictive in classifying 7q subtype [28, 29]. Using MTPEN we have a better chance to keep informative methylation features in the final model, which in turn yields improved prediction performance over using a standard single EN penalty regression.
For the prostate cancer data, the goal is to classify tumor aggressiveness, defined by both grade and stage of prostate cancer. In contrast to the AML data, in prostate cancer data the MTP EN achieved the best performance at the weight of 1.1 (dashed line in Figure 3), indicating MTPEN yields almost equivalent prediction performance over standard EN. In fact, either DNA methylation or gene expression data alone achieve good prediction, which is consistent with previous findings [30, 31]. Given that the two platforms have similar prediction performance, the observation of little difference between MTPEN and standard EN is within expectation.
We also investigated the difference in feature selection between MTP EN and standard EN for the two cancer data sets. Differential feature selection was defined by the numbers of times a variable was selected across the 50 repetitions of crossvalidation. Since the optimal weight parameter in AML data is less than 1, we expect the MTP EN model to select more methylation features than the standard EN. Indeed, six methylation loci, seldom selected by standard EN, were often selected by MTP EN. Furthermore, the majority of those six features are associated with blood cancer (Additional file 4). Moreover, the genes that were selected by standard EN were also selected by MTP EN. In the PRAD data the opposite occurred. The optimal weight parameter was slightly larger than 1 and MTP EN selected more gene expression features than standard EN (Additional file 4). Four of the five genes, which were not selected by standard EN, have previously been identified as associated with cancer.
Discussion
We propose a multituning parameter elastic net (MTP EN) for the classification of samples with data from multiple –omic platforms. The MTP EN yields more predictive models in several scenarios, including when the proportion of predictive features is larger in one omic type. In all other scenarios, the predictive performance of MTP EN matches that of the standard single penalty EN, so there is no performance downside for using MTP EN. Importantly, MTP EN can be fitted using standard EN software like the ‘glmnet’ R package.
However, the MTP EN requires the tuning of one penalty parameter per omic type, and the computational effort for penaltyparametertuning using crossvalidation grows exponentially in the number of tuning parameters when using a grid search. This should not prove a limitation for the majority of current studies that collect data on a few omic types only. Alternatively, much less costly parameter tuning using a random search of the tuning parameter space has been shown to be effective [32].
As is the case with penalized regression approaches in general, the benefit of MTP EN decreases with the increase of the number of noise features, so it is important to consider feature preselection before applying MTP EN. Excluding noise features and appropriately reducing dimensionality can maximize the performance of MTP EN.
The acute myeloid leukemia (AML) data set we used to illustrate the MTP EN has been previously analyzed by Taskesen et al. [14] using logistic regression with Lasso regularization to predict AML subtypes in 344 samples. They evaluated three different classification strategies, including early, late and no integration. In early integration, they combined gene expression profile (GEP) and DNA methylation profile (DMP) features first and then applied Lasso regression on all features to predict AML subtypes (‘concatenationbased’ integration). In late integration, they first used Lasso regression on GEP and DMP individually, and then trained a nearest mean classifier with the posterior probabilities of the GEP and DMP logistic regression as predictors (a twolayer classifier). They showed that early integration improved the predictive power compared to classifiers trained on GEP or DMP alone, and that in turn late integration outperformed early integration. Our results are consistent with Taskesen et al. regarding the performance of combined data versus the individual data; we observed that the maximum AUC was obtained at the weight of 0.7 (combined data) rather using methylation data only (equivalent to using a small weight in the MTP EN; left tail of the solid line in Figure 3) or using gene expression data only (equivalent to using a large weight; right tail of the solid line in Figure 3). More importantly, however, we noticed that the mean AUC for MTP EN is 0.82 for the optimal weight parameter, which is higher than the AUC of 0.80 obtained from the best method in Taskesen et al. These results further support the view that when we consider the intercorrelations between different platforms when setting up the prediction model, we can better utilize the complementary information across different platforms and obtain a model with better prediction performance.
We also applied our method on prostate adenocarcinoma (PRAD) TCGA data, which utilized the HM450 DNA methylation array and contained many more features than the AML data set. The change of the mean testing AUCs with the change of weight parameter was very slight in the PRAD data. Still, we identified several genes that had a much larger chance to be selected in MTP EN (under the weight parameter of 1.1) than in standard EN (weight = 1). Specifically, ABCC5 has been reported to support osteoclast formation and promote breast cancer metastasis to bone [33], ITGA11 has been identified to regulate cancer stromal stiffness and promote metastasis in nonsmall cell lung cancer [34], and ZNF706 has been associated with tumor progression in head and neck cancer [35].
Although we only investigated the simple convex EN penalty, using multiple penalties is likely to be beneficial for more sophisticated convex penalties such as group and fusedLASSO penalties but it remains an open question whether this would extend to nonconvex penalties such as SCAD, or the minimax concave penalty [36].
Conclusions
We proposed a multituning parameter elastic net (MTP EN) model for the classification of samples with data from multiple –omic platforms, with separate tuning parameters for each omic type that can be fitted using existing software. We found that MTP EN yields a more predictive model than ordinary EN where a single penalty parameter is used for all features in different platforms, particularly when the proportion of informative features differs between platforms and when there is no notable correlation between the informative features.
Abbreviations
 AML:

Acute myeloid leukemia
 CV:

Crossvalidation
 DMP:

DNA methylation profile
 EN:

Elastic net regression
 GDC:

Genomic Data Commons
 GEO:

Gene Expression Omnibus
 GEP:

Gene expression profile
 HM450:

HumanMethylation450
 MTP EN:

Multituning parameter elastic net regression
 PRAD:

Prostate adenocarcinoma
 TCGA:

The Cancer Genome Atlas
References
 1.
Mariani M, He S, McHugh M, Andreoli M, Pandya D, Sieber S, et al. Integrated multidimensional analysis is required for accurate prognostic biomarkers in colorectal cancer. PLoS One. 2014;9:e101065.
 2.
Chari R, Coe BP, Vucic EA, Lockwood WW, Lam WL. An integrative multidimensional genetic and epigenetic strategy to identify aberrant genes and pathways in cancer. BMC Syst Biol [Internet] . 2010;4:67. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2880289&tool=pmcentrez&rendertype=abstract
 3.
Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, et al. High density DNA methylation array with single CpG site resolution. Genomics [Internet]. 2011;98:288–95 Available from: https://doi.org/10.1016/j.ygeno.2011.07.007.
 4.
Serra A, Fratello M, Fortino V, Raiconi G, Tagliaferri R, Greco D. MVDA: a multiview genomic data integration methodology. BMC Bioinformatics [Internet]. 2015;16:261. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4539887&tool=pmcentrez&rendertype=abstract
 5.
Zhao Q, Shi X, Xie Y, Huang J, Shia B, Ma S. Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA. Brief Bioinform [Internet]. 2014;16:bbu003 Available from: https://www.bib.oxfordjournals.org/content/early/2014/03/09/bib.bbu003.short?rss=1%5Cn https://www.bib.oxfordjournals.org/cgi/doi/10.1093/bib/bbu003%5Cn https://www.ncbi.nlm.nih.gov/pubmed/24632304.
 6.
Costello JC, Heiser LM, Georgii E, Gönen M, Menden MP, Wang NJ, et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat Biotechnol [internet]. 2014;32:1–103 Available from: http://www.ncbi.nlm.nih.gov/pubmed/24880487.
 7.
Kim D, Li R, Dudek SM, Ritchie MD. Predicting censored survival data based on the interactions between metadimensional omics data in breast cancer. J Biomed Inform. 2015;56:220–8.
 8.
Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B Stat Methodol. 2008;70:849–911.
 9.
Jolliffe IT. Principal component analysis. 2nd ed. New York: Springer; 2002. p. 150–66.
 10.
Abdi, H. Partial least squares regression (PLSregression). In: LewisBeck M, Bryman A, Futing T, editors. Encyclopedia for research methods for the social sciences. Thousand Oaks (CA): Sage; 2003. p. 792–5.
 11.
Tibshirani R, Society RS. Regression and shrinkage via the lasso. J R Stat Soc Ser B. 1996;58:267–88.
 12.
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. 2nd ed. New York:Springer; 2009. p. 61–73.
 13.
Fan J, Li R. Variable selection via nonconcave penalized likelihood and its Oracle properties. J Am Stat Assoc. 2001;96:1348–60.
 14.
Taskesen E, Babaei S, Reinders MM, De Ridder J. Integration of gene expression and DNA methylation profiles improves molecular subtype classification in acute myeloid leukemia. BMC Bioinformatics. 2015;16:S5.
 15.
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol. 2006;68:49–67.
 16.
Ma S, Song X, Huang J. Supervised group lasso with applications to microarray data analysis. BMC Bioinformatics. 2007;8:60.
 17.
Sun H, Wang S. Penalized logistic regression for highdimensional DNA methylation data with casecontrol studies. Bioinformatics. 2012;28:1368–75.
 18.
Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol. 2005;67:301–20.
 19.
Zou H. The adaptive lasso and its Oracle properties. J Am Stat Assoc. 2006;101:1418–29.
 20.
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K. Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B Stat Methodol. 2005;67:91–108.
 21.
Jacob L, Obozinski G, Vert JP. Group lasso with overlaps and graph lasso. Icml. 2009;433–40.
 22.
Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9:432–41.
 23.
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw [Internet]. 2010;33:1–22 Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2929880&tool=pmcentrez&rendertype=abstract.
 24.
Shaknovich R, Figueroa ME, Melnick A. HELP (HpaII tiny fragment enrichment by ligationmediated pcr) assay for dna methylation profiling of primary normal and malignant b lymphocytes. Methods Mol Biol. 2010;632:191–201.
 25.
Nevill TJ, Fung HC, Shepherd JD, Horsman DE, Nantel SH, Klingemann HG, et al. Cytogenetic abnormalities in primary myelodysplastic syndrome are highly predictive of outcome after allogeneic bone marrow transplantation. Blood. 1998;92:1910–7.
 26.
Kumar CC. Genetic abnormalities and challenges in the treatment of acute myeloid leukemia. Genes Cancer. 2011;2:95–107.
 27.
Verhaak RGW, Wouters BJ, Erpelinck CAJ, Abbas S, Beverloo HB, Lugthart S, et al. Prediction of molecular subtypes in acute myeloid leukemia based on gene expression profiling. Haematologica. 2009;94:131–4.
 28.
Figueroa ME, Lugthart S, Li Y, ErpelinckVerschueren C, Deng X, Christos PJ, et al. DNA methylation signatures identify biologically distinct subtypes in acute myeloid leukemia. Cancer Cell. 2010;17:13–27.
 29.
Christiansen DH, Anderson MK, PedersenBjergaard J. Methylation of p15INK4Bis common, is associated with deletion of genes on chromosome arm 7q and predicts a poor prognosis in therapyrelated myelodysplasia and acute myeloid leukemia. Leukemia. 2003;17:1813–9.
 30.
Glinsky GV, Glinskii AB, Stephenson AJ, Hoffman RM, Gerald WL. Gene expression profiling predicts clinical outcome of prostate cancer. J Clin Invest. 2004;113:913–23.
 31.
Mundbjerg K, Chopra S, Alemozaffar M, Duymich C, Lakshminarasimhan R, Nichols PW, et al. Identifying aggressive prostate cancer foci using a DNA methylation classifier. Genome Biol. 2017;18:3.
 32.
Bergstra JAMESBERGSTRAJ, Yoshua Bengio YOSHUABENGIOU. Random search for hyperparameter optimization. J Mach Learn Res. 2012;13:282–305.
 33.
Mourskaia A a, Amir E, Dong Z, Tiedemann K, Cory S, Omeroglu A, et al. ABCC5 supports osteoclast formation and promotes breast cancer metastasis to bone. Breast Cancer Res. [Internet]. 2012;14:R149 Available from: http://www.ncbi.nlm.nih.gov/pubmed/23174366.
 34.
Navab R, Strumpf D, To C, Pasko E, Kim KS, Park CJ, et al. Integrin α11β1 regulates cancer stromal stiffness and promotes tumorigenicity and metastasis in nonsmall cell lung cancer. Oncogene [Internet]. 2015;35:1899–908. Available from: http://www.nature.com/doifinder/10.1038/onc.2015.254.
 35.
Belbin TJ, Singh B, Smith R V, Socci ND, Wreesmann VB, SanchezCarbayo M, et al. Molecular profiling of tumor progression in head and neck cancer. Arch Otolaryngol Head Neck Surg [Internet]. 2005;131:10–18. Available from: http://www.ncbi.nlm.nih.gov/pubmed/15655179.
 36.
Zhang CH. Nearly unbiased variable selection under minimax concave penalty. Ann Stat. 2010;38:894–942.
Funding
This work was supported by NHGRI grant number R01 HG006705, NCI grant numbers 5P30 CA014089 and 5P01 CA196569 and NIEHS grant number P30 ES07048. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The funding body, NIH, did not play any roles in the design of the study, the collection, analysis, and interpretation of data, or in writing the manuscript.
Availability of data and materials
The AML datasets analyzed during the current study are available in NCBI Gene Expression Omnibus, with accession numbers for gene expression and methylation data as GSE14468 and GSE18700, respectively. The data was also used and described by Taskensen et al. [14]. The prostate adenocarcinoma data was obtained from GDC Data Portal (Project ID TCGAPRAD).
Author information
Affiliations
Contributions
The study was conceived by JPL. JL conducted the simulation study and performed the data analysis. The manuscript was written by JL, JPL, and KDS GNL proposed the use of and provided the assembled PRAD data. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interest.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional files
Additional file 1:
Selfcontained R script for MTPEN with full example. (R 5 kb)
Additional file 2:
The accuracy in testing dataset, sensitivity and specificity of feature selection from Standard EN and MTPEN for different simulation settings. MTPEN achieves better classification and sensitivity in Scenarios 1–3. (PNG 214 kb)
Additional file 3:
The optimal penalty ratio parameters versus the change of (A) number of correlated features in the second data type; (B) correlations among features in the second data type; and (C) correlations among features between different platforms. Dots represent the mean of optimal weights and caps represent the standard error of the mean; N = 200 simulation replicates. (PNG 100 kb)
Additional file 4:
Listing of features that have more chance to be selected by MTP EN in AML and PRAD datasets. (DOCX 29 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Liu, J., Liang, G., Siegmund, K.D. et al. Data integration by multituning parameter elastic net regression. BMC Bioinformatics 19, 369 (2018). https://doi.org/10.1186/s1285901824011
Received:
Accepted:
Published:
Keywords
 Data integration
 Classification
 Elastic net