 Software
 Open Access
 Published:
Estimating the optimal linear combination of predictors using spherically constrained optimization
BMC Bioinformatics volume 23, Article number: 436 (2022)
Abstract
Background
In the context of a binary classification problem, the optimal linear combination of continuous predictors can be estimated by maximizing the area under the receiver operating characteristic curve. For ordinal responses, the optimal predictor combination can similarly be obtained by maximization of the hypervolume under the manifold (HUM). Since the empirical HUM is discontinuous, nondifferentiable, and possibly multimodal, solving this maximization problem requires a global optimization technique. Estimation of the optimal coefficient vector using existing global optimization techniques is computationally expensive, becoming prohibitive as the number of predictors and the number of outcome categories increases.
Results
We propose an efficient derivativefree blackbox optimization technique based on pattern search to solve this problem, which we refer to as Spherically Constrained Optimization Routine (SCOR). Through extensive simulation studies, we demonstrate that the proposed method achieves better performance than existing methods including the stepdown algorithm. Finally, we illustrate the proposed method to predict the severity of swallowing difficulty after radiation therapy for oropharyngeal cancer based on radiation dose to various structures in the head and neck.
Conclusions
Our proposed method addresses an important challenge in combining multiple biomarkers to predict an ordinal outcome. This problem is particularly relevant to medical research, where it may be of interest to diagnose a disease with various stages of progression or a toxicity with multiple grades of severity. We provide the implementation of our proposed SCOR method as an R package, available online at https://CRAN.Rproject.org/package=SCOR.
Background
In precision medicine, there is great interest in utilizing multiple biomarkers or continuous variables to categorize patients based on disease stage, predict treatment response, or anticipate the potential severity of toxicity. To address this challenge, various classification methods have been proposed. Existing modelbased approaches to the classification problem include logistic regression and linear discriminant analysis. As an alternative to modelbased methods, [1, 2] focused on classifying different subgroups based on maximization of the area under the receiver operating characteristic (ROC) curve (AUC). In its simplest form, the ROC curve can be interpreted as the probability of detection as a function of the false alarm rate, so higher values of the AUC signify higher average hit rates over different possible values of the false alarm rate. In the field of medical science, the AUC is widely used as a criterion to combine multiple diagnostic test results or other continuous predictor values [3]. However, the ROC curve does not have an exact analytical expression, so estimating the AUC and finding optimal predictor combinations to maximize its value remains challenging.
In the case of a binary outcome, various nonparametric estimates of the AUC have been proposed. For example, [2] estimated the AUC using MannWhitney U statistics, while [4] proposed a smooth estimate obtained using a sigmoid function. However, with increasing sample size, these objective functions become challenging to maximize. In order to handle this problem for large datasets, [5] proposed the minmax method, which only considers the linear combination of two extreme biomarkers in estimating the AUC, and therefore maintains the same computation time for any given sample size. For multicategorical outcomes, the hypervolume under the manifold (HUM), also known as the volume under the ROC surface, has been proposed as an analogue to the AUC [6]. For the threecategory outcome scenario, various estimates of the HUM have been proposed [7,8,9]. In the multicategory setting, [10] proposed a method to estimate the optimal combination vector when the outcomes of the biomarkers follow a normal distribution. However their proposed method was noted to underperform when the biomarker values are nonnormal [11, 12]. In the presence of multiple biomarkers, to avoid the computational burden of simultaneous estimation of the combination vector, [2] proposed the stepdown algorithm. However, this strategy of maximizing the HUM estimate by estimating one coordinate at a time can work poorly in higher dimensional settings. More recently, [13] proposed an empirical estimate of HUM (EHUM) for the threecategory outcome scenario, but because of the lack of concavity of most of the proposed estimates, maximization of the objective functions remained challenging. To overcome this computational burden, under the assumption of normality, [14] proposed a penalized and scaled stochastic distance based method. Finally, [11] proposed an efficient approach which relies on upper and lower bounds (ULBA).
As discussed above, most of the proposed estimates of the HUM are discontinuous and nondifferentiable functions of the combination coefficients, and potentially have multiple maximums. Therefore, one of the most vital aspects regarding their performance remains the maximization step. This problem becomes more challenging with increasing sample size, number of biomarkers, and the number of outcome categories. For simple cases, even in the presence of multiple local maximums, most of the optimization algorithms used in practice can still find the best solution. But, as the maximization problem gets harder, it becomes more difficult to find the global maximum out of all local maximums. Thus, performance becomes highly dependent on the optimization algorithm used for the maximization. When maximizing discontinuous, nondifferentiable, multimodal objective functions, derivativefree nonconvex optimization techniques are critical to ensure a good solution [15]. However, to the best of our knowledge, none of the previous articles proposing HUM estimates considered using global optimization techniques for maximizing the corresponding objective function. Our goal in the current work is both to point out the limitation of existing methods for this problem, and to propose a global optimization method that offers improved performance in the prediction of ordinal outcomes.
Global optimization using pattern search
We now give a more precise mathematical description of the problem of interest. Suppose \({\beta }\) denotes the ddimensional linear combination vector to be estimated, and \(f({\beta })\) denotes the empirical value of the HUM. Unfortunately, \(f({\beta })\) is not identifiable since \(f({\beta }) = f(a{\beta })\) for any scalar \(a>0\) [5]. This means that we cannot simply optimize \(f({\beta })\) over the unconstrained space \({\beta } \in \mathrm {R}^d\). In order to address this nonidentifiability, instead of maximizing \(f({\beta })\) over \(\mathrm {R}^d\), we add an extra constraint \({\beta } = 1\) where \(\cdot \) denotes the Euclidean norm. So the problem can be redefined as
where \(f({\beta })\) can be discontinuous (as is the case for EHUM), nondifferentiable, and multimodal. Note that here the coordinates \({\beta }\) must lie on the surface of a unit sphere. Due to the possibly multimodal nature of \(f({\beta })\), global optimization tools are preferred over convex optimization methods.
There are a number of existing optimization methods, which we summarize briefly here. For maximizing any multimodal function, global optimization techniques such as the genetic algorithm [16] and simulated annealing [17] have been shown to yield better results as compared to convex optimization methods such as the interiorpoint algorithm [18] or sequential quadratic programming [19]. The NelderMead algorithm [20], also known as the simplex method, is a heuristic derivativefree optimization technique that has been widely applied to statistical problems, and serves as the default option for the optim function in R and the basis of fminsearch in MATLAB. To improve on the convergence of these algorithms, [21] proposed pattern search (PS), where possible solution points around the current solution are found using an adaptive stepsize vector. Importantly, none of these algorithms were specifically designed to handle global optimization over a spherical parameter space.
To minimize a nonconvex function (or, equivalently, maximize a nonconcave function) globally over a hyperrectangular parameter space, [22] proposed a modified version of global pattern search called Recursive Modified Pattern Search (RMPS). This approach, which is discussed in detail in the Methods section, has desirable properties in terms of computational scalability and the fact that many of the operations can be performed in parallel. Subsequently, [23, 24] extended RMPS to problems where the parameter space is given by a collection of simplexes.
In this article we develop an algorithm called ‘Spherically Constrained Optimization Routine’ (SCOR) which utilizes the basic principle of the RMPS algorithm but accommodates the constraint of a spherical parameter space. Based on an extensive set of simulation studies, the computational efficiency of the proposed algorithm is established and its performance is compared with existing optimization techniques. We show that using SCOR to maximize EHUM or ULBA results in noticeable improvement over the NelderMead, stepdown and minmax algorithms. Finally, we apply our proposed algorithm to derive a combination score which best predicts the severity of dysphagia, or difficulty swallowing, based on the mean radiation dose delivered to various structures in the head and neck during cancer treatment.
Prior work
In this section, we describe existing methods of estimating the optimal coefficient vector for combining biomarkers. Consider a study with M possible outcome categories. Let \(\mathbf {X}_1, \ldots , \mathbf {X}_M\) denote the ddimensional observed biomarker values for the M outcome categories. Each coordinate of \(\mathbf {X}_j\) denotes the value of a biomarker, \(j=1,\ldots ,M\), where d is the number of biomarkers. Suppose \(\mathbf {X}_j \sim F_j\) for \(j=1,\ldots , M\) where \(F_j\) denotes a multivariate continuous distribution. Now the linear combination of the biomarkers corresponding to the jth outcome category is given by \({\beta }^T\mathbf {X}_j = \sum _{k=1}^d\beta _kX_{jk},\) where \({\beta } = (\beta _1,\ldots ,\beta _d)\) denotes the combining coefficient vector. Without loss of generality, assuming that the higher value of the combination value corresponds to the higher outcome category, the HUM [6], which measures the diagnostic accuracy, is given by \(D({\beta }) = P({\beta }^T\mathbf {X}_M> \cdots {\beta }^T\mathbf {X}_2 > {\beta }^T\mathbf {X}_1)\). In order to distinguish the outcome categories, our goal is to find the optimal combination coefficient vector \({\beta }_0\) for which \(D({\beta })\) is maximized. So, \({\beta }_0 = \mathop {\mathrm {arg\,max}}\limits _{{\beta } = 1} D({\beta })\). Note that \(D({\beta })\) should be maximized over all possible coefficient vectors of norm 1 to avoid the issue of nonidentifiability. When \(\mathbf {X}_1,\ldots , \mathbf {X}_M\) follow multivariate normal distributions, under a few regularity conditions, the value of \({\beta }_0\) can be derived [25]. However, without any distributional assumption on \(\mathbf {X}_1,\ldots , \mathbf {X}_M\), the value of \({\beta }_0\) cannot be analytically obtained.
Estimates of the HUM
We now review estimates of the HUM proposed in the literature. Essentially, these approaches provide different options for how to formulate the objective function.
Empirical hypervolume under the manifold (EHUM)
In order to estimate the optimal coefficient vector, [13] proposed maximizing the empirical estimate of the HUM from the given sample. Suppose the sample is denoted by \(\{\mathbf {X}_{ji_j}: j = 1,\ldots , M, i_j = 1,\ldots ,n_j\}\), so the total sample size is \(n = \sum _{j=1}^Mn_j\). Then the empirical estimate of HUM is given by
Here \(I(\cdot )\) denotes the indicator function. The optimal combination coefficient vector obtained by maximizing \({D}_{E}({\beta })\) is given by \(\hat{{\beta }}_E = \mathop {\mathrm {arg\,max}}\limits _{{\beta } = 1} D_E({\beta })\). Note that \(D_E({\beta })\) can be multimodal. The discontinuity and nondifferentiablity of \(D_E({\beta })\) pose additional challenges in maximizing it using existing optimization methods.
Upper and lower bound approach (ULBA)
In order to alleviate the computational burden of maximizing \(D_E({\beta })\), [11] proposed alternative objective functions which can be maximized more easily. [11] showed that
where \(P_{A}({\beta })\) and \(P_{M}({\beta })\) are defined by
[11] proposed that instead of maximizing \(D_E({\beta })\), we can either maximize \(P_{A}({\beta })\) or \(P_{M}({\beta })\), as they are much easier to solve. Since they showed that the solution obtained by maximizing \(P_A({\beta })\) yields better results than that obtained using \(P_M({\beta })\), in this paper we consider the optimal coefficient combination vector using ULBA as given by \(\hat{{\beta }}_{ULBA} = \mathop {\mathrm {arg\,max}}\limits _{{\beta } = 1} P_A({\beta })\). Despite the simpler form of \(P_A({\beta })\) compared to \(D_E({\beta })\), this function is still discontinuous with possibly multiple modes.
Existing techniques for estimating the optimal value of \({\beta }\) beta
Due to the multimodal nature of the objective functions discussed above, it is challenging to optimize the combination coefficient vector \({\beta }\) simultaneously. In Additional file 1 Section A, we provide background on generalpurpose global optimization techniques including the genetic algorithm (GA), simulated annealing (SA), and pattern search (PS) which can be applied here. In the current section, we focus on approaches designed specifically for estimation of the combination vector developed in the past few decades.
[2] proposed the stepdown algorithm for combining multiple biomarkers. Although the method was first proposed for the binary categorical outcome case, this principle can also be used in scenarios with more than two categorical outcomes [14]. In the stepdown approach, the biomarkers are first ordered based on their individual EHUM value. Then the coefficient of the biomarker with highest EHUM value is taken to be 1. Then at each step one additional biomarker is included and its coefficient is estimated. Thus, the coefficients of the biomarkers are estimated one at a time. A detailed description of the stepdown algorithm is provided in Additional file 1 Section B.1. [11] used this algorithm to maximize the upper or lower bound of HUM, namely \(P_{M}\) or \(P_{A}\). [5] proposed the minmax (MM) principle in the context of a binary outcome, where, corresponding to each vector of biomarkers of length d, only the maximum and the minimum values of those d values are used to estimate the combination coefficient vector. The estimation procedure using the MM principle is provided in detail in Additional file 1 Section B.2.
Results
We propose a new optimization approach, Spherically Constrained Optimization Routine (SCOR), to address the challenge of identifying the optimal linear combination of input variables to predict an ordinal outcome. For a detailed description of our efficient derivativefree blackbox optimization technique, please see the Methods section below.
Simulation study
In this section, we compare the performance of SCOR with that of the NelderMead (NM), stepdown, and minmax methods in the context of simulated data. Under different simulation scenarios, we estimate the optimal combination vector by maximizing the EHUM and ULBA objective functions using each of these four approaches. We measure performance in terms of the EHUM objective function value \(D_E(\cdot )\) at the estimated optimal solution, evaluated on a separate test data set. The SCOR algorithm is implemented in MATLAB 2016b. NM is an unconstrained optimization technique, so in applying this algorithm, we set the first parameter value to be either 1 or 1, whichever maximizes the individual EHUM if we set other parameter values to be 0. Then, the rest of the parameters are estimated by maximizing EHUM and ULBA using NM, performed using the fminsearch function in MATLAB 2016b. For the stepdown and minmax algorithms also, we use the fminsearch function with default options.
We consider three simulation scenarios, two based on the normal distribution, and one based on the Weibull distribution (to illustrate performance in the context of nonnormal data). Among the two normal distribution scenarios considered, in the first one, we simulate covariates that are not correlated, while in the second, the covariates have nonzero correlation. For each simulation scenario, we consider settings with \(M=2\) and \(M=3\) ordinal outcomes. The number of biomarkers is taken to be \(d = 5,10,\) or 20. In each case, the d biomarkers are generated from the corresponding normal or Weibull distribution. For the case of \(M=2\) categories, we set the sample sizes for each category to be 15, 30, and 60. For the \(M=3\) category case, we take the sample sizes of each category to be 15 and 30.
Scenario 1: For the ith disease category, the values of the biomarkers are simulated from the dvariate normal distribution for \(d=5,10,\) or 15 with mean \({\mu }_{i} = \big (\mu _{i1},\ldots ,\mu _{id}\big )\), and covariance matrix \(\mathbf {\Sigma }_i=\mathbf {I}_d\), where \(\mu _{ij}=(1)^j*i[1+0.1(j1)]\), for \(j = 1,\ldots ,d\), \(i=0,1\) (for the two category outcome case) and \(i=0,1,2\) (for the three category outcome case).
Scenario 2: In order to explore the relative performance of all methods in cases with correlated predictors, here we consider the same model as Scenario 1 except we take \(\mathbf {\Sigma }_i=(a_{st})_{d\times d}\) where \(a_{st} = (0.5)^{st}\) for \(s,t = 1,\ldots ,d\) and \(i=0,1\) (for the two category outcome case) and \(i=0,1,2\) (for the three category outcome case).
Scenario 3: Here the values of the biomarkers are generated from the multivariate Weibull distribution. For the ith disease category, the jth biomarker follows a univariate Weibull distribution with scale parameter \(\lambda _i\), shape parameter \(k_j\) and location parameter \(\gamma _j\). We take \(\lambda _i = i+1,k_j=0.5*j,\gamma _j = (5)^j\) for \(j = 1,\ldots ,d\), \(i=0,1\) (for the two category outcome case) and \(i=0,1,2\) (for the three category outcome case).
For any given scenario using any of the methods considered, we first estimate the biomarker coefficient vector maximizing EHUM and ULBA. We then generate a new sample of the same size under the same simulation design. The EHUM values reported (i.e., the objective function value \(D_E(\cdot )\)) are obtained using the biomarker coefficient vectors estimated on the training data on the new test set. We adopt this strategy to ensure that the performance estimates do not reflect overfitting to the training data. We report the mean EHUM at the estimated optimal solutions using the SCOR, NM, stepdown and minmax algorithms for the different simulation scenarios over 100 replications.
In Table 1, we provide the results for the two category outcome setting for sample sizes 15 and 30 per group and for the three category outcome setting with sample size 15 per group. The results for the two category outcome setting with sample size 60 and for the three category outcome setting with sample size 30 are provided in the Additional file 1: Table S3. SCOR outperforms both NM and stepdown for all the scenarios considered. SCOR also outperforms the minmax method for all scenarios except for a few cases for Scenario 3 with higher sample sizes; however, in those cases, the SCOR performance is quite close to that obtained using minmax. As the dimension d of the biomarker vector increases, estimates by SCOR tend to improve, while the stepdown algorithm tends to underperform. As the dimension d increases, SCOR tends to outperform stepdown by a higher margin. In general, the SCOR estimates have lower standard errors compared to other methods, indicating more stable estimates, which is expected when using a global optimization approach.
In Additional file 1 : Table S2, we compare the performance of SCOR with generalpurpose global optimization algorithms based on optimization of five benchmark functions on the unitspherical parameter space with dimensions \(d=5,20,50,100,\) and 500, and show that in general SCOR outperforms existing optimization algorithms, with greatly reduced computation time. Specifically, using SCOR, we obtain an improvement in computation time up to 67 fold over pattern search, up to 43 fold over simulated annealing, and up to 38 fold over the genetic algorithm.
Theoretical properties of the estimator obtained from SCOR depend on the choice of objective function. When using EHUM as the objective, \(\hat{{\beta }}_E = \mathop {\mathrm {arg\,max}}\limits _{{\beta } = 1} D_E({\beta })\), so obtaining the optimal \({\beta }\) requires finding the set of values that maximize \(D_E({\beta })\). Since this objective may be multimodal, it is difficult to maximize in practice, creating the need for the proposed SCOR procedure. However, from a theoretical point of view, the properties of the estimator are independent of the choice of algorithm, meaning that existing work on the theoretical properties EHUM can be applied to our results as well. We can therefore rely on previous theoretical results establishing that the empirical estimate of HUM is consistent in probability for the true HUM and asymptotically normal [6], while \(\hat{{\beta }}_E\) is consistent and asymptotically normal under certain regularity conditions [13]. Similar theoretical results for ULBA do not exist. Although we found that in practice maximizing \(P_A({\beta })\) resulted in good performance, likely because its simpler form is easier to maximize than \(D_E({\beta })\), the resulting estimate is a lower bound on the true HUM, and may be strictly lower under certain conditions. Additionally, to show how the mean squared error (MSE) of the estimated optimal coefficient vector \({\beta }\) obtained using SCOR changes with sample size, we provide the results of a simulation study in Table S4 of Additional file 1 Section F. We note that as the sample size increases, the MSE decreases for both the ULBA and EHUM objective functions.
The variance of the estimates for the coefficients and the HUM itself are challenging to estimate in practice [6, 13]. We therefore recommend the use of bootstrap resampling to obtain empirical confidence intervals for both the coefficients and the HUM. For the empirical HUM estimate, the bootstrap standard error \(\widehat{\text {se}}_B\) can be obtained from the bootstrap samples, and the resulting \(100(1\alpha )\)% confidence interval can be estimated as \(\widehat{\text {HUM}} \pm z_{\alpha /2}\widehat{\text {se}}_B\), where \(z_{\alpha /2}\) represents the upper \(\alpha /2\) quantile of the standard normal distribution [6]. A similar procedure can be used to obtain confidence intervals for \(\hat{{\beta }}\).
Application to the prediction of swallowing difficulty following radiation therapy
In this section, we illustrate the SCOR method with an application to data from a study of swallowing difficulty, or dysphagia, in patients treated with radiation therapy for oropharyngeal cancer. Outcomes of observational studies and clinical trials in cancer are commonly defined as categorical or ordinal variables, and standardized systems have been adopted for both response and toxicity assessment in this framework. Specifically, the standard definition of response to cancer therapy in solid tumors includes four categories reflecting the treatment efficacy: complete response, partial response, stable disease, and progressive disease [26]. Toxicities experienced as a consequence of treatment also follow a standard definition for classification and grading, the Common Terminology for Criteria for Adverse Events (CTCAE) [27]. The categories on this scale are 0 for no symptoms, 1 for mild, 2 for moderate, 3 for severe, 4 for lifethreatening, and 5 for death. Similar scales have been developed for outcomes of interest in specific cancer types: in particular, the DIGEST scale measures swallowing difficulty using a similar grading scale to that of CTCAE [28, 29].
In this case study, we analyze data derived from a study of swallowing difficulties, quantified using the DIGEST scale, as a result of radiation therapy for oropharyngeal cancer [30]. Radiation therapy is commonly used to treat oropharyngeal cancer [31]. Although this treatment is critical to ensure control of the tumor, radiation to structures in the head and neck related to swallowing can lead to loss of function, and ultimately reduced loss of quality of life or serious complications, such as aspirationrelated pneumonia [32, 33]. Understanding the impact of radiation dose to specific structures on swallowing outcomes can inform the development of radiation treatment plans designed to minimize dose to critical structures [34].
In our dataset, radiation dose is characterized as the mean dose delivered to 13 critical structures. Here we focus on the 87 subjects with no missing values for dose. The response variable describing swallowing difficulty was quantified using the DIGEST scale; to ensure sufficient sample size in each class, we collapsed grades 24 into a single group [28]. The sample sizes per class are then 29 subjects with no dysphagia (0 on the DIGEST scale), 27 with mild dysphagia (1 on the DIGEST scale), and 30 with moderate to severe dysphagia (24 on the DIGEST scale). Since some of the predictors were highly correlated, we performed a screening step to filter out the most highly correlated using a greedy approach, with variables that had the highest average correlation removed from the set until all pairwise correlations were less than 0.8. After screening out 4 variables, we had 9 remaining predictors: mean radiotherapy dose to cricopharyngeal muscle, esophagus, glottic area, inferior pharyngeal constrictor (PC), ipsilateral anterior digastric muscle (ADM), middle PC, mylohoid muscle, superior PC, and intrinsic tongue muscles.
To illustrate the relative performance of the SCOR, stepdown and minmax techniques, we compute the estimated combination coefficients maximizing the ULBA and EHUM criteria using each method. The EHUM value at those solutions are then evaluated. We also compute the optimal cutpoints to categorize the combination scores (obtained by multiplying the optimal coefficient vector with the biomarker values) using Youden’s index [35]. Youden’s index is defined as the maximum possible value of \((\text {sensitivity}+ \text {specificity}  1)\) over all possible decision thresholds, and provides a summary measure combining sensitivity and specificity [36]. Note that before obtaining the combination scores, all the solution combination vectors obtained in the stepdown and minmax techniques are divided by their corresponding norms so that each solution vector has norm 1.
Figure 1 shows the optimal combination vector scores for each class. SCOR results in improved separation of these scores across the classes, which is reflected in the higher values of EHUM and Youden’s index achieved for both versions of the objective function (ULBA and EHUM). In addition, the cutpoints (marked by horizontal dotted lines) obtained by SCOR distinguish the three outcome categories more clearly than those of stepdown or minmax.
In Table 2 we provide the values of the optimal combination coefficients using SCOR, NelderMead, and stepdown. In the familiar setting of AUC estimation, where there are two outcome categories, a value of 0.5 corresponds to the expected value for random guessing, with higher values demonstrating improved classification accuracy. Here, since we have three ordered categories, the expected value of the HUM under random guessing is \(\frac{1}{3\; !} \approx 0.167\). Hence, estimated values of EHUM or ULBA above 0.167 can be considered as an improvement over a random guess. Since we obtain the highest value of \(D_E(\hat{\beta })\) using the estimates obtained by maximizing ULBA with the SCOR algorithm, the coefficients obtained using this approach (i.e., those given in the first column) represent the preferred solution. The esophagus and middle pharyngeal constrictor, which are known to be essential to swallowing function [37], have the largest estimated coefficients, representing a relatively large contribution to the predicted combination scores. Since increasing radiation dose to structures in the head and neck is generally expected to worsen function, the slight negative coefficient for glottic area likely reflects correlation with other predictors, rather than a protective effect. The results from this case study could be used to inform radiation treatment planning, since intensitymodulated radiation therapy plans could be adjusted to minimize predicted dysphagia risk by sparing radiation to key muscles and organs in the headandneck region.
Discussion
Here the proposed SCOR algorithm is used mainly in the context of a classification problem with a hypervolume under manifolds criteria. However, this algorithm can be used in various other statistical problems such as directional statistics or singleindex models where fixing the norm of the coefficient vector is needed to avoid the issue of nonidentifiability. In the future, the SCOR algorithms can be extended to the variable selection problem over the coefficients belonging to the surface of a unit sphere.
Conclusions
In this paper, we propose a novel derivativefree blackbox optimization technique to minimize any nonconvex function where the parameters are constrained to the surface of a unit sphere. The proposed algorithm is highly efficient, as it allows parallelization using up to 2n parallel threads when maximizing a function whose parameters belong to the surface of an ndimensional unit sphere. Our simulations demonstrate that SCOR outperforms existing methods for biomarker combination, as well as other blackbox optimization techniques, in terms of both performance and computation time. The R package SCOR, which implements the method described here, is available at https://CRAN.Rproject.org/package=SCOR.
We show that using SCOR, we obtain better estimates of the empirical hypervolume under the manifold (EHUM) compared to the estimates obtained using the NelderMead, stepdown and minmax algorithms. Irrespective of the objective function considered, the EHUM value at the solution obtained by SCOR is always better than that obtained using the alternative algorithms. In our case study, we applied SCOR to find the optimal combination coefficients of radiation dose to various structures in the headandneck to distinguish subjects with increasing severity of swallowing difficulty experienced as a result of radiation treatment for cancer. We found that the values of EHUM and Youden’s index obtained were higher for SCOR than for existing alternatives, suggesting that this approach can be applied to more accurately stratify patients based on their risk of developing dysphagia.
Methods
In this section, we provide a detailed description of the algorithm for our proposed SCOR method.
Basic Principle
The basic principle of the search for the optimal value is as follows. Within an iteration, at first a step size \(s>0\) is fixed. Then, two new points in the neighborhood are obtained by adding and subtracting s from each coordinate keeping all other coordinates fixed. Thus, at any given iteration, 2d possible new solution points are generated. For example, if \((\beta _1,\beta _2,\beta _3)\) denotes the current solution (taking \(d=3\)), then the new set of possible solutions are \((\beta _1+s,\beta _2,\beta _3), (\beta _1s,\beta _2,\beta _3), (\beta _1,\beta _2+s,\beta _3), (\beta _1,\beta _2s,\beta _3), (\beta _1,\beta _2,\beta _3+s)\) and \((\beta _1,\beta _2,\beta _3s)\). This strategy is related to the approach used in [38] to solve an unconstrained optimization problem. This specific principle has two desirable properties: firstly, at each iteration the size of the search space (i.e., the set of newly generated points) is of O(d) (i.e., 2d), unlike GA, where the search space increases exponentially with the dimension [39]. Secondly, after the stepsize s is fixed, the operations for finding the 2d new points and evaluating the objective function at those points are independent, so they can be performed in parallel. However, due to the spherical constraint on \({\beta }\), this simple search strategy cannot be directly applied here.
We now summarize the proposed optimization scheme designed to address this issue. The algorithm can be broken into a sequence of runs where within each run, a number of iterations are performed. At the beginning of each iteration, the stepsize \(s>0\) is fixed. The location of the new set of points to be evaluated in the iteration depends on this step size s. The exact relationship between the step size and the new set of possible solutions is described in the following subsections. The first iteration in the first run starts from a starting point provided by the user. At each iteration, a new set of 2d points around the current solution is generated. As mentioned above, the location of these new points is directly related to the value of stepsize s. In general, larger values of s result in more distant points from the current solution. Then, the objective function is evaluated at all \(2d+1\) points. At the end of an iteration, the point with the maximum objective function value is taken as the updated solution. Thus, at each iteration the objective function value either increases or stays the same. At the beginning of a run, the step size is taken to be large. Depending on the improvement of the objective function across iterations, the step size is either reduced or kept same after each iteration. Once the step size within a run becomes sufficiently small (determined by a threshold \(\phi\) which is a tuning parameter of the algorithm), the run ends, and the current solution is passed to the next run, which uses this solution as its starting point. Once two consecutive runs yield the same solution, the algorithm stops execution and returns the final solution. A brief overview of the SCOR algorithm is given in Fig. 2.
Derivation of adjustment stepsize
The RMPS method [22] incorporates adjustments to the step size to ensure that the proposed points remain within a restricting hyperrectangle. In that approach, once the step size s is added to (or subtracted from) any of the coordinates, the other coordinates are kept unchanged. However, over a spherically constrained parameter space, adding the step size s to a coordinate of the current solution while keeping the other coordinates fixed would yield a point outside the unit sphere surface, assuming the current solution is on the unit sphere. To handle a spherically constrained parameter space, a critical challenge is devising an adjustment strategy such that when s is added to (or subtracted from) the ith coordinate, the adjustment of other coordinates ensures that the proposed solution point is still on the unit sphere. We derive an appropriate adjustment step size \(t_i\) which depends on s and is chosen so that when \(t_i\) is added to the remaining coordinates, the point obtained is still on the unitsphere. In Figure 3(a) an exemplary plot is provided giving an idea why adjustment stepsize should be a function of s.
For example, suppose within any given run, at the jth iteration, the current solution is given by \({\beta }^{(j)} = (\beta _1^{(j)},\ldots ,\beta _d^{(j)})\). Note that \(\sum _{k=1}^d(\beta _k^{(j)})^2 = 1\). Suppose we update the ith coordinate as \(\beta _i^{(j+1)} = \beta _i^{(j)}+s.\) Then the corresponding adjustment step size \(t_i\) is added to the rest of the coordinates, \(\beta _k^{(j+1)} = \beta _k^{(j)}+t_i, \; k \in \{1,\ldots ,d\} \setminus \{i\}.\) By the definition of the adjustment step size, \({\beta }^{(j+1)}\) is on the unit sphere as long as such a \(t_i\) exists. Since \({\beta }^{(j)}\) is also on the unitsphere, we have,
which implies,
The value of \(t_i\) can be obtained by solving (3) to obtain \(t_i = T_1(s)\) or \(T_2(s)\) where
In order for the proposed algorithm to converge to the true solution, the value of \(t_i\) needs to be chosen such that the distance of the new possible solution \({\beta }^{(j+1)}\) from the current solution \({\beta }^{(j)}\) should go to 0 as s goes to 0, which only holds true for \(t_i = T_1(s)\). So for a given step size s, the adjustment step size is taken to be \(T_1(s)\). It should be noted that \(T_1(s)\) may come out to be complex implying that for the given step size s and coordinates of the current solution, no appropriate adjustment step size \(t_i\) exists. In those cases, we adopt alternative strategies, as discussed in the section on local step sizes. In Fig. 3b, starting with a current solution on the surface of unit sphere, the new set of explored points are plotted on the unitsphere corresponding to two different exemplary step sizes. In order to make the algorithm more efficient, a minor modification is incorporated which is described in Section C of the Additional file 1
Tuning parameters
In SCOR, the tuning parameters and their roles are similar to those considered in RMPS and RMPSS [22, 24]. Since the solution update strategy is consistent across runs, explaining the roles of the tuning parameters within a single run is sufficient to illustrate their effect. Within each run, iterations are initialized with a starting point which is either the initial guess provided by the user (for the first run) or the solution returned by the previous run (for all runs after the first run). At the beginning, the step size is set to \(s_{initial}\), which is preferably taken to be large in order to promote selection of distant candidate solutions w.r.t. the current solution. Suppose \({\beta }^{(j)} = (\beta _1^{(j)},\; \ldots , \beta _{d}^{(j)})\) denotes the solution at the end of jth iteration. At the beginning of \((j+1)\)th iteration, suppose the current stepsize is \(s = s^{(j+1)} (>0)\). Then the set of d new candidate solutions is given by \(\{{\beta }_{(i,+)}^{(j+1)}\}_{i=1}^d\) where
for \(s>0, i=1,\ldots ,d\). Similarly, taking \(s = s^{(j+1)}\), another d candidate solutions are generated given by \(\{{\beta }_{(i,)}^{(j+1)}\}_{i=1}^d\) where
Thus, within any given iteration, based on the step size s, 2d new candidate solutions are generated. The objective function is evaluated at these \(2d+1\) points (2d candidate solutions and the current solution \({\beta }^{(j)}\)), and the point with the lowest value of the objective function value is taken as the updated solution \({\beta }^{(j+1)}\). As discussed in the previous section on the derivation of the adjustment stepsize, in some scenarios, the adjustment step sizes \(t_i\) or \(t_i^\prime\) might be complex. In those settings, we propose alternative update strategies which are discussed in the following section on local step sizes.
Within a run, at the end of any given iteration, if the improvement of the objective function value is less than a userspecified tolerance tol_fun, the step size is divided by a factor \(\rho >1\) denoted as the step decay rate. So, at each iteration, the step size is either kept the same or reduced by dividing it by the step decay rate. For example, \(s^{(j+1)}\) will be either \(s^{(j)}\) or \(\frac{s^{(j)}}{\rho }\) depending on the improvement to the solution obtained in the \((j+1)\)th iteration. The step size is reduced to enable finer search close to the current solution if no better solution is found in the set of candidate solutions using a larger value of s. The idea of reducing the step size is a wellknown strategy used in existing derivativefree optimization algorithms on unconstrained parameter spaces [38, 40].
We consider the step size as sufficiently close to 0 once its value gets smaller than the step size threshold \(\phi\). Once the step size becomes less than \(\phi\), no further iterations are performed within that particular run. If two consecutive runs yield the same solution, the algorithm terminates after returning the solution obtained in the last run.
To handle cases where the solution is known to be sparse a priori, we consider the sparsity threshold \(\lambda\) as another tuning parameter. Once the step size for the jth iteration \(s^{(j)}\) is determined, before calculating the adjustment step size \(t_i\), all the coefficients (except the ith coordinate) with absolute values less than \(\lambda\) are set equal to zero. Suppose at the jth iteration, while updating ith coordinate, out of the remaining \(d1\) coordinates, h of them have absolute values less than \(\lambda\). Then those h coordinates are set equal to 0. The adjusted step size \(t_i\) is calculated based on rest of the remaining \(dh\) coordinates. In case a sparse solution is not expected, the user can set the value of \(\lambda\) to 0.
Local step sizes
At the beginning of the jth iteration, we initialize 2d local step sizes such that \(s_i^+=s^{(j)}\) and \(s_i^=s^{(j)}\) for \(i = 1,\ldots ,d\). When adding \(s_i^+\) (or \(s_i^\)) to the ith coordinate, if the corresponding adjustment step size \(t_i\) (or \(t_i^\prime\)) exists, update movements are performed without modifying \(s_i^+\) (or \(s_i^\)). However, in case no such real \(t_i\) (or \(t_i^\prime\)) exists for the given local step size \(s_i^+\) (or \(s_i^\)), it is subsequently divided by the step decay rate \(\rho\) until the solution \(t_i\) (or \(t_i^\prime\)) becomes real. Note that in equation (4), \(D_i(s) \xrightarrow {} \big (2\sum _{k=1,k\ne i}^d \beta _k^{(j)}\big )^2>0\) as \(s \xrightarrow {} 0\). Therefore, a real \(t_i\) (or \(t_i^\prime\)) will exist given a local step size \(s_i^+\) (or \(s_i^\)) sufficiently close to 0 (see Theorem 1). Since at the beginning of the iteration, the values of these local stepsizes are set equal to the step size for that iteration, the values of the local step sizes in the current iteration do not depend on the values from the previous iteration.
Overview of algorithm
Pseudocode for the proposed procedure is given in Algorithm 1. There we let \(\hat{{\beta }}^{(R)}\) denote the solution obtained at the end of Rth run, and \({\beta }^{(j)}\) denotes the solution obtained after the jth iteration within a particular run. We set an upper limit \(max\_runs\) on the maximum number of runs to be executed. Within each run, there is an upper limit \(max\_iter\) on the number of iterations. The roles of the tuning parameters along with their default values are provided in Additional file 1 Table S1.
Theorem 1
Suppose \(\mathbf {S} = \{(x_1,\ldots ,x_n) \in \mathrm {R}^n \; : \; \sum _{i=1}^n x_i^2 = 1, i=1,\cdots ,n\}\). Consider a sequence of stepsizes \(\delta _k = \frac{s}{\rho ^k}\) for \(k\in \mathrm {N}\) and \(s\ne 0, \rho >1\). Then there exists a K such that for \(k \ge K\), all adjustment step sizes \(\{t_i\}_{i=1}^d\) are real.
Theorem 2
Suppose f is convex, continuous and differentiable on \(\mathbf {S}\). Consider a sequence \(\delta _k = \frac{s}{\rho ^k}\) for \(k\in \mathrm {N}\) and \(s>0, \rho >1\). Suppose \(\mathbf {u}\) is a point in \(\mathbf {S}\). Define \(\mathbf {u}_k^{(i+)} = (u_1+t_i(\delta _k),\ldots , u_{i1}+t_i(\delta _k), u_i+\delta _k,u_{i+1}+t_i(\delta _k), \ldots , u_n+t_i(\delta _k))\) and \(\mathbf {u}_k^{(i)} = (u_1+t_i(\delta _k),\ldots , u_{i1}+t_i(\delta _k),u_i\delta _k,u_{i+1}+t_i(\delta _k), \ldots , u_n+t_i(\delta _k))\) for \(i=1,\cdots ,n\), where \(t_i(s)\) denotes the adjustment step size corresponding to step size s.
Given conditions detailed in Additional file 1 Section D, the global minimum of f over \(\mathbf {S}\) occurs at \(\mathbf {u}\).
The proofs of Theorems 1 and 2 are provided in Additional file 1 Section D. As is the case for any existing blackbox optimization method, SCOR is not theoretically guaranteed to find the global minimum of any arbitrary blackbox function. However, it incorporates strategies to avoid getting stuck at local minima. As shown in Section E of the Additional file 1, SCOR outperform competing blackbox optimization methods including GA, SA, and PS in terms of identifying better solutions for benchmark functions. Moreover, as discussed in the next section, in the context of maximizing HUM estimates, SCOR outperforms existing methods proposed for biomarker combination as well.
Availability of data and materials
The R package SCOR is available online at https://CRAN.Rproject.org/package=SCOR, and the MATLAB code is available at https://github.com/priyamdas2/SCOR. The case study data were originally described in [30]. An anonymized version of this data set is available at https://doi.org/10.6084/m9.figshare.14753052. This optimization method can also be used to maximize or minimize any blackbox function on a spherically constrained parameter space. MATLAB code: The MATLAB code is available at https://github.com/priyamdas2/SCOR.
Abbreviations
 AUC::

Area under the ROC curve
 CTCAE::

Common terminology for criteria for adverse events
 EHUM::

Empirical estimate of HUM
 GA::

Genetic algorithm
 HUM::

Hypervolume under the Manifold
 MSE::

Mean squared error
 NM::

Neldermead algorithm
 PS::

Pattern search
 RMPS::

Recursive modified pattern search
 ROC::

Receiver operating characteristic
 SA::

Simulated annealing
 SCOR::

Spherically constrained optimization routine
 ULBA::

Upper and lower bounds approach
References
Pepe MS, Thompson ML. Combining diagnostic test results to increase accuracy. Biostatistics. 2000;1(2):123–40.
Pepe MS, Cai T, Longton G. Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics. 2006;62(1):221–9.
Zhou XH, McClish DK, Obuchowski NA. Statistical methods in diagnostic medicine, vol. 464. New York: Wiley; 2002.
Ma S, Huang J. Combining multiple markers for classification using ROC. Biometrics. 2007;63(3):751–7.
Liu C, Liu A, Halabi S. A minmax combination of biomarkers to improve diagnostic accuracy. Stati Med. 2011;30(16):2005–14.
Li J, Fine JP. ROC analysis with multiple classes and multiple tests: methodology and its application in microarray studies. Biostatistics. 2008;9(3):566–76.
Scurfield BK. Multipleevent forcedchoice tasks in the theory of signal detectability. J Math Psychol. 1996;40(3):253–69.
Mossman D. Threeway ROCs. Med Decis Making. 1999;19(1):78–89.
Nakas CT, Yiannoutsos CT. Ordered multipleclass ROC analysis with continuous measurements. Stat Med. 2004;23(22):3437–49.
Yanyu Z. ROC analysis in diagnostic medicine (phd thesis). Department of Statistics and Applied Probability, National University of Singapore 2010.
Hsu MJ, Chen YH. Optimal linear combination of biomarkers for multicategory diagnosis. Stat Med. 2016;35(2):202–13.
Maiti R, Li J, Das P, Feng L, Hausenloy D, Chakraborty B. A distributionfree smoothed combination method of biomarkers to improve diagnostic accuracy in multicategory classification. 2019 arxiv.org/abs/1904.10046
Zhang Y, Li J. Combining multiple markers for multicategory classification: an ROC surface approach. Aust N Z J Stat. 2011;53(1):63–78.
Kang L, Xiong C, Crane P, Tian L. Linear combinations of biomarkers to improve diagnostic accuracy with three ordinal diagnostic categories. Stat Med. 2013;32(4):631–43.
Horst R. Introduction to Global Optimization (Nonconvex Optimization and Its Applications). Berlin: Springer; 2002.
Fraser AS. Simulation of genetic systems by automatic digital computers i. introduction. Austr J Biol Sci. 1957;10:484–91.
Kirkpatrick S, Gelatt CD, Vecchi MP. Optimization by simulated annealing. Austr J Biol Sci. 1983;220(4598):671–80.
Karmakar N. New polynomialtime algorithm for linear programming. Combinatorica. 1984;4:373–95.
Boggs PT, Tolle JW. Sequential quadratic programmings. Acta Numerica. 1996;1–52.
Nelder JA, Mead R. A simplex method for function minimization. Comput J. 1965;7:308–13.
Torczon VJ. On the convergence of pattern search algorithms. SIAM J Optimiz. 1997;7:1–25.
Das P. Blackbox optimization on hyperrectangle using recursive modified pattern search and application to matrix completion problem with nonconvex regularization. arXiv, 2016. arxiv.org/pdf/1604.08616pdf
Das P. Blackbox optimization on multiple simplex constrained blocks. arXiv, 2016. arxiv.org/abs/1609.02249
Das P. Recursive modified pattern search on highdimensional simplex : A blackbox optimization technique. The Indian Journal of Statistics: Series B 2020.
Su JQ, Liu JS. Linear combinations of multiple diagnostic markers. J Am Stat Assoc. 1993;88(424):1350–5.
Therasse P, Arbuck SG, Eisenhauer EA, Wanders J, Kaplan RS, Rubinstein L, et al. New guidelines to evaluate the response to treatment in solid tumors. J Natl Cancer Inst. 2000;92(3):205–16.
Trotti A, Colevas AD, Setser A, Rusch V, Jaques D, Budach V, et al. Ctcae v3.0: development of a comprehensive grading system for the adverse effects of cancer treatment. Semin Radiat Oncol. 2003;13:176–81.
Hutcheson, et al. Dynamic imaging grade of swallowing toxicity (digest): scale development and validation. Dysphagia. 2017;23(1):62–70.
Goepfert RP, Lewin JS, Barrow MP, Warneke CL, Fuller CD, Lai SY, et al. Grading dysphagia as a toxicity of head and neck cancer: differences in severity classification based on MBS DIGEST and clinical CTCAE grades. Dysphagia. 2018;33(2):185–91.
Kamal M, Mohamed AS, Volpe S, Zaveri J, Barrow MP, Gunn GB, et al. Radiotherapy dosevolume parameters predict videofluoroscopydetected dysphagia per DIGEST after IMRT for oropharyngeal cancer: results of a prospective registry. Radiother Oncol. 2018;128(3):442–51.
Daly ME, Le QT, Maxim PG, Loo BW Jr, Kaplan MJ, et al. Intensitymodulated radiotherapy in the treatment of oropharyngeal cancer: clinical outcomes and patterns of failure. Int J Radiat Oncol Biol Phys. 2010;76(5):1339–46.
Eisbruch, et al. Objective assessment of swallowing dysfunction and aspiration after radiation concurrent with chemotherapy for headandneck cancer. Int J Radiat Oncol Biol Phys. 2002;53(1):23–8.
Hutcheson, et al. Longterm functional and survival outcomes after induction chemotherapy and riskbased definitive therapy for locally advanced squamous cell carcinoma of the head and neck. Head Neck. 2014;36(4):474–80.
Duprez, et al. Systematic review of dosevolume correlates for structures related to late swallowing disturbances after radiotherapy for head and neck cancer. Dysphagia. 2013;28:337–49.
Youden W. Index for rating diagnostic tests. Cancer. 1950;3:32–5.
Luo J, Xiong C. Diagtest3grp: an r package for analyzing diagnostic tests with three ordinal groups. J Stat Softw. 2012;51(3):1–24.
Levendag PC, Teguh DN, Voet P, van der Est H, Noever I, de Kruijf WJ, et al. Dysphagia disorders in patients with cancer of the oropharynx are significantly affected by the radiation therapy dose to the superior and middle constrictor muscle: a doseeffect relationship. Radiother Oncol. 2007;85(1):64–73.
Fermi E, Metropolis N. Numerical solution of a minimum problem. los alamos unclassified report la–1492. Los Alamos National Laboratory, Los Alamos, USA 1952.
Geris L. Computational modeling in tissue engineering. Berlin: Springer; 2012.
Kerr C, DuraBernal S, Smolinski T, Chadderdon G, Wilson D. Optimization by adaptive stochastic descent. PLOS One. 2018;13(3):0192944.
Acknowledgements
The authors would like to acknowledge the funding sources listed above which supported this work.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 23 Supplement 3, 2022: Selected articles from the International Conference on Intelligent Biology and Medicine (ICIBM 2021): bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume23supplement3.
Funding
We would like to acknowledge the following funding sources: MK is partially funded by NIH (R01CA205146) and the Charles and Daneen Stiefel MD Anderson Oropharynx Program (00059189). KAH receives funding support from the PCORI, NIH/NCI (R21CA22620002), NIH/NIDCR (R01DE02524805), THRIVE/HESI foundation, and Atos Medical not directly related to this work; the data were collected with direct funding support from the Charles and Daneen Stiefel MD Anderson Oropharynx Program. CDF received/receives relevant funding and salary support during the period of data collection and study execution from: the NCI Early Phase Clinical Trials in Imaging and ImageGuided Interventions Program (1R01CA218148); the NSF/NIH Joint Smart Connected Health Program (R01CA257814); and the NCI Parent RPG mechanism (R01CA258827). CDF received/receives unrelated funding and salary support during the period of study execution from: NIH NIBIB Research Education Programs for Residents and Clinical Fellows Grant (R25EB02578701); NIDCR Academic Industrial Partnership Grant (R01DE028290); NCI Parent Research Project Grant (R01CA258827) an NIH/NCI Cancer Center Support Grant (CCSG) Pilot Research Program Award from the UT MD Anderson CCSG Radiation Oncology and Cancer Imaging Program (P30CA016672); and an NSF Division of Civil, Mechanical, and Manufacturing Innovation (CMMI) grant (NSF 1933369). CDF has received direct industry grant support, honoraria, and travel funding from Elekta AB unrelated to this project. CDF receives direct infrastructure support from the multidisciplinary the Radiation Oncology/Cancer Imaging Program (P30CA01667244) of the MD Anderson Cancer Center Support Grant (P30CA016672) and the MD Anderson Program in Imageguided Cancer Therapy. BC is partially supported by Ministry of Education  Singapore (MOE2015T22056) and DukeNUS Medical School startup grant. CBP is partially funded by NIH/NCI CCSG (P30CA016672, Biostatistics Shared Resource) and by Cancer Prevention and Research Institute of Texas (RP160668/RP150521). Publication costs are funded by CBP’s MD Anderson Physicians Referral Service (PRS) funding.
Author information
Authors and Affiliations
Contributions
PD, RM, BC and CBP conceived the study. PD and CBP drafted the manuscript. PD and DD developed the code. PD performed the data analysis. MK, CDF, and KAH contributed to the acquisition of the case study data and interpretation of the results. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The case study is a secondary analysis of data collected as a part of the MD Anderson Oropharynx Cancer Registry PatientReported Outcomes and Functional (PROF) Core. The study protocol under which the data were collected was approved by The University of Texas MD Anderson Cancer Center Institutional Review Board (PA140947).
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1
. This file contains a brief description about the literature on existing global optimization techniques, existing techniques alternative to SCOR, theoretical properties and optimization performance evaluation of SCOR, additional simulation results, and an application of SCOR to Alzheimer's disease data. Table S1. Tuning parameters, their roles and default values in the SCOR algorithm. Table S2. Comparison of minimum value achieved and average computation time (in seconds) for solving the transformed ddimensional negative logproduct function, modified Griewank’s function, negative sum of squares function, modified Exponential function, and modified Easom function for d = 5, 20, 50, 100 and 500 using SCOR, PS, GA, SA starting from 10 randomly generated points in each case. Table S3. Performance comparison for two and three ordinal outcomes, where each class has sample size 60 for the two category case, and 30 for the three category case. The empirical hypervolume under manifolds (EHUM) and upper and lower bound approach (ULBA) objective functions are maximized by the proposed Spherically Constrained Optimization Routine (SCOR) algorithm and the existing NelderMead (NM), stepdown, and minmax algorithms. The estimated biomarker coefficient vectors are then used to calculate the EHUM value on a new dataset of the same size generated from the corresponding model. The entire procedure is repeated 100 times, resulting in 100 simulated training and test data sets, and the mean EHUM objective function values on the test data are reported, with the standard error in the parentheses. The result for the method with the best performance is marked in bold.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Das, P., De, D., Maiti, R. et al. Estimating the optimal linear combination of predictors using spherically constrained optimization. BMC Bioinformatics 23 (Suppl 3), 436 (2022). https://doi.org/10.1186/s1285902204953y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1285902204953y
Keywords
 Area under the curve
 Classification
 Global optimization
 Hypervolume under the manifold
 Pattern search
 ROC curve