 Methodology Article
 Open Access
 Published:
Identification of gene pairs through penalized regression subject to constraints
BMC Bioinformatics volume 18, Article number: 466 (2017)
Abstract
Background
This article concerns the identification of gene pairs or combinations of gene pairs associated with biological phenotype or clinical outcome, allowing for building predictive models that are not only robust to normalization but also easily validated and measured by qPCR techniques. However, given a small number of biological samples yet a large number of genes, this problem suffers from the difficulty of high computational complexity and imposes challenges to the accuracy of identification statistically.
Results
In this paper, we propose a parsimonious model representation and develop efficient algorithms for identification. Particularly, we derive an equivalent model subject to a sumtozero constraint in penalized linear regression, where the correspondence between nonzero coefficients in these models is established. Most importantly, it reduces the model complexity of the traditional approach from the quadratic order to the linear order in the number of candidate genes, while overcoming the difficulty of model nonidentifiablity. Computationally, we develop an algorithm using the alternating direction method of multipliers (ADMM) to deal with the constraint. Numerically, we demonstrate that the proposed method outperforms the traditional method in terms of the statistical accuracy. Moreover, we demonstrate that our ADMM algorithm is more computationally efficient than a coordinate descent algorithm with a local search. Finally, we illustrate the proposed method on a prostate cancer dataset to identify gene pairs that are associated with preoperative prostatespecific antigen.
Conclusion
Our findings demonstrate the feasibility and utility of using gene pairs as biomarkers.
Background
In biomedical research, gene identification has been critical towards understanding and predictive modeling, whose activities are associated with biological phenotype, disease status, or clinical outcome. These genes, referred to as biomarkers, are further utilized for predictive modeling to facilitate scientific investigation, clinical diagnosis, prognosis, and treatment developments. In this discovery process, the expression levels of candidate genes are measured through genomic techniques enabling thousands of genes simultaneously, permitting monitoring molecular variation on a genomewide scale [1] and providing more precise and reliable diagnosis [2]. As widely used techniques, DNA microarray, parallel qPCR, and RNASeq measure gene expression at the mRNA level. Yet, two major issues emerge with regard to the utilization of gene expression. First, the number of genes greatly exceeds that of biological samples typically, with tens of thousands of genes in the presence of up to only a few hundred biological samples or observations. As a result, inference tends to be unstable, misleading, or even invalid due to high statistical uncertainty, in addition to extremely high cost of computation. This, in turn, demands reliable and accurate methods of identification. Second, prior to any analysis, raw gene expressions must be normalized to compensate for differences in labeling, sample preparation, and detection methods. A common practice focuses on normalization of each sample’s raw expression based on remaining ones in the same dataset, known as betweensample normalization, often in the forms of samplewise scaling in RNASeq data [3]. However, such a normalization requires recomputation when a new sample is removed or added from the dataset, imposing computational challenges for large studies. Moreover, any analysis using selected genes based on one dataset may be sensitive to normalization, leading to nongeneralizable and/or nonreproducible scientific findings [4].
To address the foregoing challenges, a modeling method based on gene pairs was first presented in the topscoring pair (TSP) classifier by [5] and later implemented by [6]. Compared to predictors based on individual genes, gene pairbased predictors are more robust to normalization and have better predicting or classifying accuracy. Another advantage of genepair based predictive modeling is its ease of evaluation and validation by qPCR methods. Ideally, to use qPCR to measure a single gene’s expression level, one applies the deltaCt method [7], in which the differenced Ct values between the gene of interest and another housekeeping gene such as GAPDH measures gene expression level. However, between sample variation of a housekeeping gene may be large, imposing a great challenge [8]. In this sense, genepair based modeling removes the requirement of housekeeping genes since the differenced Ct values between the two genes of interest can be directly treated as a measurement. Consequently, the two genes in a gene pair serve as internal controls for each other. Due to all these advantages, gene pairbased predictors have been adopted in several cancer studies [9–11].
Despite the advantage of the gene pair approach, due to the combinatorial complexity, identifying the best gene pair, or best combinations of several gene pairs, is statistically and computationally challenging, from all the possible pairs from a pool of tens of thousands of genes. For instance, the TSP algorithm employs a direct search, whose running time grows quadratically in terms of the number of candidate genes. Although in practice one can first identify differentially expressed genes and then perform a restrictive search to these individual genes, such a twostep approach is no longer invariant to normalization and may miss informative pairs in which at most one gene is differentially expressed [5]. The computational problem is even more severe when more than one gene pair is sought, such as in kTSP which involves exactly k top disjoint pairs in prediction [12].
Moreover, even though rankbased gene pair predictors like the TSP are robust to normalization, their utility in modeling complex data remains limited. One possible extension is to use ratios of gene expression levels as predictors and use regression models to select gene pairs. In recent years, regression models with penalties enforcing sparsity (such as the Lasso [13], SCAD [14], and TLP [15] penalties) have been widely used for variable selection, and many efficient algorithms have been proposed for fitting such models. One may employ such an approach by treating ratios of gene expression levels from all possible gene pairs as candidate predictors. However, this amounts to a quadratic complexity in the number of candidate genes.
In this paper, we develop a new regression approach and an efficient algorithm for identifying gene pairs associated with biological phenotype or clinical outcome. we propose an equivalent model subject to a sumtozero constraint on regression coefficients, where the correspondence between nonzero coefficients in these models is established. The model of this type has been proposed for compositional data [16] and recently for reference point insensitive data [17]. One salient aspect is that this model is more parsimonious, involving only predictors linearly in the number of candidate genes. To deal with the constraint, we develop an efficient algorithm based on the alternating direction method of multipliers (ADMM) [18, 19], for identification and model parameter estimation. The new approach shares not only the benefit of simplicity in interpretation but also a linear complexity. Most importantly, the proposed method substantially improves the statistical accuracy and computation efficiency. Finally, in simulations, the method compares favorably against the traditional method in terms of the accuracy of identification, and our ADMM algorithm is more computationally efficient than a coordinate descent with local search (CD+LS) algorithm of [17].
Methods
Highdimensional linear regression
This section proposes predictive models based on combinations of ratios of gene expression levels on the ground that ratios of gene expression levels not only are robust to normalization but also can be easily validated and measured by qPCR techniques.
Given p predictors (x _{1},…,x _{ p }) measuring the expression levels of p genes (g _{1},…,g _{ p }), we consider informative secondorder interactions defined by pairwise ratios {x _{ j }/x _{ k },1≤j<k≤p} of (x _{1},…,x _{ p }) with respect to a continuous response such as the preoperative prostatespecific antigen level measured from prostate cancer patients, as demonstrated in the “Results” section. It is assumed that there are only a small number (i.e., much smaller than p) of informative genes. Now consider a regression model in which response Y depends on a predictor vector x in a linear fashion:
where α=(α _{12},α _{13},…,α _{ p−1p })^{T} and z=(log(x _{1}/x _{2}), log(x _{1}/x _{3}), …, log(x _{ p−1}/x _{ p }))^{T} are \({q=\!\frac {p(p1)}{2}}\)dimensional vectors of regression coefficients and predictors, and ε is random error that is independent of z. For convenience, for i<j, we let α _{ ji }=−α _{ ij }. In Eq. (1), primary reasons for the logarithm of the pairwise ratios {x _{ j }/x _{ k },1≤j<k≤p} are twofold. First, it stabilizes the variance of gene expression levels so that Eq. (1) is suitable. In fact, the logarithm transformation is widely used in the literature on gene expression modeling [20]. Second, it facilitates an efficient model fitting algorithm to be introduced subsequently. Our objective is to identify nonzero coefficients of α corresponding to informative gene pairs based on gene expression.
There are several challenges for identification of informative ratios within the framework of Eq. (1), in which p may greatly exceed the sample size n, known as highdimensional regression. Normally, one may apply a feature selection method such as the Lasso [13] for this task. Unfortunately, however, highdimensionality of Eq. (1) impedes the accuracy of feature selection in the presence of noise in addition to computational cost, which are roughly proportional to p ^{2}. To overcome these difficulties, we propose an alternative yet equivalent model of Eq. (1) through a more parsimonious representation involving one linear constraint.
The next proposition says that f(z) in Eq. (1) has an equivalent representation with only pvariables. In a sense, it achieves the objective of dimensionality reduction.
Proposition 1
The following equivalent form of f(z) in Eq. (1) is as follows:
Importantly, \(\sum _{j=1}^{p} \beta _{j}=0\).
Based on Proposition 1, we derive an equivalent model of Eq. (1):
where \(\tilde {\boldsymbol {x}}=(\log x_{1},\ldots,\log x_{p})\) and β=(β _{1},…,β _{ p })^{T}. Most critically, the correspondence between coefficients of α and β is established by Eq. (2), where Eq. (1) and Eq. (3) can have different number of nonzero coefficients, which is because of the reparametrization and the sumtozero constraint. For instance, suppose that α _{12}=3, α _{23}=−2, and α _{ ij }=0 otherwise in Eq. (1), then β _{1}=3, β _{2}=−5, β _{3}=2, and β _{ j }=0 otherwise in Eq. (3). Model Eq. (3) has been proposed for compositional data [16] and recently also for reference point insensitive data [17]. Here [16] established model selection consistency and bounds for the resulting estimator.
In contrast to Eq. (1), Eq. (3) contains only p instead of \(\frac {p(p1)}{2}\) predictors, subject to the sumtozero constraint for the regression coefficients. In other words, model Eq. (3) is more parsimonious than model Eq. (1) in terms of the number of active parameters in a model. As a result, there can not be a onetoone correspondence between α and β. It is shown in Eq. (2) that the value of β in Eq. (3) is uniquely determined by that of α in Eq. (1). The inverse does not hold – many values of α in Eq. (1) correspond to the same value of β in Eq. (3). The nonexistence of onetoone correspondence between α and β is due to the fact that model Eq. (1) is largely nonidentifiable. In fact, for any cycle formed by sequence i _{1},i _{2},…,i _{ k },i _{ k+1}=i _{1}, we can add any constant c to the \(\alpha _{i_{j}i_{j+1}}\)’s formed by adjacent indices without changing the model. That is, we can construct α ^{′} where \(\alpha ^{\prime }_{i_{j}i_{j+1}}=\alpha _{i_{j}i_{j+1}}+c\) for j=1,…,k and \(\alpha ^{\prime }_{ij}=\alpha _{ij}\) otherwise and model Eq. (1) with α is equivalent to that with α ^{′}. Therefore, when we obtain a solution β by solving Eq. (3), due to the argument above, there are an infinite number of α that are related to β through Eq. (2). Among them, we would like to choose the “simplest” one. In this paper, we define the “simplest” α to be the one(s) with the minimum L _{1} norm, where the L _{1}norm of a vector y=(y _{1},…,y _{ p }) is \(\boldsymbol {y}_{1}=\sum _{i=1}^{p}y_{i}\). In practice, given an estimate of β from (3), an estimate of α can be obtained using Algorithm 1 below.
Algorithm 1
(Peeling) Given an estimate of \(\boldsymbol {\beta }, \hat {\boldsymbol {\beta }}=\left (\hat {\beta }_{1},\cdots, {\hat {\beta }}_{p}\right)^{T}\) satisfying the sumtozero constraint \(\sum _{j=1}^{p}\hat \beta _{j}=0\), initialize \(\tilde {\boldsymbol {\beta }}\) as \(\hat {\boldsymbol {\beta }}\) and \(\hat {\boldsymbol {\alpha }}\) as \({\hat {\alpha }}_{j,k}=0\) for all 1≤j<k≤p.
Step 1: Identify one positive and one negative \(\tilde {\beta }_{j}\)’s, say \(\tilde {\beta }_{k_{1}}>0\) and \(\tilde {\beta }_{k_{2}}<0\), where k _{1} and k _{2} are two distinct indices from {1,⋯,p}. For instance, \(\tilde {\beta }_{k_{1}}\) and \(\tilde {\beta }_{k_{2}}\) can be taken as the most positive and most negative (ties can be broken arbitrarily) \(\tilde {\beta }_{j}\)’s. This can always proceed as long as not all \(\tilde {\beta }_{j}\)’s are zero.
Step 2: Set \(\hat \alpha _{k_{1} k_{2}}=\min \left (\tilde {\beta }_{k_{1}}, \tilde {\beta }_{k_{2}}\right)\). For instance, \(\tilde {\beta }_{1} = 1.5\) and \(\tilde {\beta }_{2}=0.5\), then set \(\hat {\alpha }_{12}=0.5\).
Step 3: Subtract \(\hat {\alpha }_{k_{1} k_{2}}\) from \(\tilde {\beta }_{k_{1}}\) and \(\hat {\alpha }_{k_{1} k_{2}}\) from \(\tilde {\beta }_{k_{2}}\) to make one of them zero, that is, \(\tilde {\beta }_{k_{1}}\leftarrow \tilde {\beta }_{k_{1}}\hat {\alpha }_{k_{1} k_{2}}\) and \(\tilde {\beta }_{k_{2}} \leftarrow \tilde {\beta }_{k_{2}}+\hat {\alpha }_{k_{1} k_{2}}\). In the previous example, \(\tilde {\beta }_{1} \leftarrow 1\) and \(\tilde {\beta }_{2} \leftarrow 0\).
Step 4: Repeat Steps 1–3 until all components of \(\tilde {\boldsymbol {\beta }}\) become zero.
Algorithm 1 terminates in at most p steps because the number of nonzeros in \(\tilde {\boldsymbol {\beta }}\) decreases by either 1 or 2 after each iteration. Note that \(\tilde {\beta }_{k_{1}}\) and \(\tilde {\beta }_{k_{2}}\) identified in Step 1 may not be unique. Therefore it may lead to different \(\hat {\boldsymbol {\alpha }}\)’s. Importantly, this algorithm always yields a minimum L _{1}norm α estimate (see Proposition 5 later in this section).
The following two propositions characterize properties of such α with respect to its representations.
Proposition 2
(Minimum L _{1}norm of α)
Given α and β satisfying Eq. (2), the following conditions are equivalent:

(A)
For all α ^{′} satisfying (2), α_{1}≤α ^{′}_{1}.

(B)
For all 1≤i,j,k≤p, i≠j, j≠k, α _{ ij } α _{ jk }≤0.

(C)
\(\boldsymbol {\alpha }_{1}=\frac 12\boldsymbol {\beta }_{1}\).
Proposition 3
Given α and β satisfying Eq. (2), the following conditions are equivalent:

(D)
For all α ^{′}≠α satisfying Eq. (2), α_{1}<α ^{′}_{1}.

(E)
The conditions in Proposition 2 are met by α. Furthermore, there does not exist distinct (j _{1},k _{1},j _{2},k _{2}) such that \(\alpha _{j_{1}k_{1}}\neq 0\) and \(\alpha _{j_{2}k_{2}}\neq 0\) simultaneously.

(F)
There exists j such that \(\beta _{j}=\sum _{i\neq j}\beta _{i}\). Correspondingly, α _{ ij }=β _{ i } for all i≠j and α _{ ik }=0 for all i≠j, k≠j.
The following proposition establishes the relations between the numbers of nonzero elements of α and β under different settings.
Proposition 4
Assume that α and β satisfy Eq. (2). Let \(A=\boldsymbol {\alpha }_{0}=\sum _{1 \leq j < k \leq p} I(\alpha _{jk} \neq 0)\) and \(B=\boldsymbol {\beta }_{0}=\sum _{j=1}^{p} I(\beta _{j} \neq 0)\) denote the numbers of nonzero elements of α and β, respectively. Then,

(G)
B≤2A.

(H)
If α and β satisfy the conditions in Proposition 2, then \(2\sqrt {A}\leq B\leq 2A\).

(I)
If α and β satisfy the conditions in Proposition 3 and α≠0 and β≠0, then B=A+1.
In view of condition (H) in Proposition 4, if those conditions of Proposition 2 are met with B=2 or 3, then those of Proposition 3 must be satisfied.
Proposition 5
For any \(\hat {\boldsymbol {\beta }}\) satisfying the sumtozero constraint, the corresponding \(\hat {\boldsymbol {\alpha }}\) produced by Algorithm 1 satisfies Proposition 2.
The proofs of the propositions are supplied in Appendix.
Constrained penalized likelihood
Given model Eq. (3), a random sample of n observations \((\tilde {\boldsymbol {x}}_{i}, Y_{i})_{i=1}^{n}\) is obtained, based on which the loglikelihood function l(β) can be written as \(l(\boldsymbol {\beta })=\frac {1}{2 \sigma ^{2}} \sum _{i=1}^{n} \left (Y_{i}{\boldsymbol {\beta }}^{T} \tilde {\boldsymbol {x}}_{i}\right)^{2}\). In a highdimensional situation, model Eq. (3) is overparameterized when p>n, and hence that l(β) has multiple maximizers. Towards this end, we introduce a constrained penalized likelihood as a generalization of the Lasso regression using L _{1}regularization.
Minimization of Eq. (4) in β yields its minimizer \(\hat {\boldsymbol {\beta }}\). Since the term σ ^{2} can be absorbed into the regularization coefficient λ in the penalized likelihood, we set σ=1 in the objective function for simplicity.
In contrast to the Lasso problem, Eq. (4) has one additional linear constraint. The coordinate descent algorithm has been shown to be very efficient for solving L _{1}penalized problems [21] since the nondifferentiable L _{1} penalty is separable with respect to β _{ j }’s. However, the sumtozero constraint destroys the separability so that the coordinate descent algorithm can no longer guarantee convergence. In [17], the authors proposed adding additional diagonal moves and random local search to the coordinate descent algorithm, which improves the chance for convergence.
To deal with this convex optimization subject to linear constraints, we develop an algorithm using the alternating direction method of multipliers (ADMM) [18, 19] to solve iteratively, see Appendix for details. In each iteration, we derive an analytic updating formula to expedite convergence of ADMM, and convergence is guaranteed by a result in Section 3.1 of [18]. We compare our ADMM algorithm with the algorithm proposed in [17] in the “Results” section.
Results
Comparison of ADMM and CD+LS algorithms
This section compares our ADMM algorithm with a coordinate descent with local search (CD+LS) algorithm of [17] for Eq. (4) with respect to computation efficiency through one simulated example. The CD+LS algorithm is implemented in R package zeroSum (version 1.0.4, https://github.com/rehbergT/zeroSum). In this example, we consider correlated predictors, that is, \(\tilde {\boldsymbol {x}}_{i}\)’s are drawn iid from N(0,V) and are independent of ε _{ i }’s that are sampled from N(0,1), and V is a p×p matrix whose ijth element is 0.5^{i−j}. Moreover, the true β _{ j }’s are drawn iid from N(0,1) and then centered to have a sum zero, and λ is fixed as 1. Then, their rates for successful convergence and running times are recorded for the ADMM and the CD+LS algorithms over 20 simulations. Particularly, a precision or tolerance error of 10^{−10} is used for both algorithms. Successful convergence is reported if a solution from an algorithm satisfies the sumtozero constraint with a tolerance error of 10^{−8}, and both the solution and its objective value are no further than 10^{−8} to the optimal solution and its corresponding objective value in terms of the L _{2}distance. Here, the optimal solution is defined as the one, among the two solutions from the two algorithms, having the minimal objective value and satisfying the sumtozero constraint.
Four different settings are compared, ranging from low to highdimensional situations. As showed in Table 1, the proposed ADMM algorithm outperforms the CD+LS algorithm with respect to both convergence guarantee and running time.
Comparison of the proposed method and the Lasso
This section examines effectiveness of the proposed method through simulated examples. Specifically, the proposed method is compared with the Lasso in terms of predictive accuracy and identification of the true model, where the Lasso is implemented in R package glmnet [21].
Simulated examples are generated with correlation structures as to be analyzed. These simulations are designed to examine various operating characteristics of the proposed method with respect to (p,n), noise level σ ^{2}, and correlation structures among predictors in Eqs. (1) and (3). For tuning, λ is searched over 100 grid points that are uniformly spaced (in the logscale) between 10^{4} and 10^{−2}. An independent testing dataset with 1000 randomly generated data points are used to find the optimal λ which minimizes the mean squared error (MSE).
For performance metrics, an independent validation dataset with 1000 randomly generated data points are used to evaluate the performance of the fitted model in terms of mean squared error (MSE) and R ^{2}. To assess robustness of the approaches under data normalization, we randomly add samplewise shifts from N(0,1) to the validation dataset. Furthermore, we consider other two metrics for parameter estimation and the quality of identification of zero elements of true α in Eq. (1) and β in Eq. (3). For parameter estimation, we use the relative error (RE) for estimating the true regression coefficients \(\boldsymbol {\gamma }^{0}=\left (\gamma ^{0}_{1},\ldots,\gamma ^{0}_{p}\right)^{T}\), defined as
where \(\tilde {\boldsymbol {\gamma }}=(\tilde {\gamma }_{1},\ldots,\tilde {\gamma }_{p})^{T}\) are estimated regression coefficients. This metric allows for accounting for different scales between α and β. For accuracy of identification, we use the false identification rate (FR), defined as
where \(\Gamma ^{0}=\{j  \gamma ^{0}_{j}\neq 0\}\) and \(\tilde \Gamma =\{j  \tilde \gamma ^{\prime }_{j}\neq 0\}\), with \(\tilde \gamma ^{\prime }\) being a truncated version of \(\tilde \gamma \) such that only the coefficients with the Γ ^{0} largest absolute values are retained, and all others are zeroed out. Our simulated example concerns correlation structures among predictors. In Eqs. (1) and (3), logx _{ i }’s are iid from N(0,V) and are independent of ε _{ i }’s that are iid from N(0,σ ^{2}), and V is a p×p matrix whose ijth element is 0.5^{i−j}, α=(α _{12},α _{13},…,α _{ p−1p })^{T}. Three settings for α are considered:

1)
α _{12}=1, α _{13}=0.5, α _{24}=0.5, and α _{ jk }=0 otherwise, which does not satisfy the conditions defined in Proposition 2 with β _{1}=1.5, β _{2}=β _{3}=β _{4}=−0.5, and β _{ j }=0 for j≥5.

2)
α _{12}=1, α _{13}=0.5, α _{24}=−0.5, and α _{ jk }=0 otherwise, which satisfies the conditions defined in Proposition 2 but does not satisfy the conditions defined in Proposition 3 with β _{1}=1.5, β _{2}=−1.5, β _{3}=−0.5, β _{4}=0.5, and β _{ j }=0 for j≥5.

3)
α _{12}=1, α _{13}=0.5, α _{14}=0.5, and α _{ jk }=0 otherwise, which satisfies the conditions defined in Proposition 3 with β _{1}=2, β _{2}=−1, β _{3}=β _{4}=−0.5, and β _{ j }=0 for j≥5.
The proposed method is compared with the Lasso in two models, corresponding to the genepair level design matrix z in Eq. (1) and \(\tilde {\boldsymbol {x}}\) in Eq. (3) (without the sumtozero constraint), referred to as Lasso 1 and Lasso 2, respectively. These three methods are examined on simple and difficult situations correspondingly with (p,n,σ)=(25,50,0.5),(100,25,0.2). Then the values of MSE, R ^{2}, RE, and FR are reported. As suggested in Table 2, the proposed method performs better or the same compared with Lasso 1 and Lasso 2 in terms of accuracy and robustness across all the six situations. The improved performance is attributed to a reduced number of candidate parameters in Eq. (1) than Eq. (3), as well as to the sumtozero constraint introduced in Eq. (3). Interestingly, the false identification rates of the proposed method are almost zero in three lownoise setting of (p,n,σ)=(20,50,0.5) regardless if the conditions in Propositions 2 and 3 are met, and are small in the other three settings. In contrast, Lasso 1 has a higher relative error and false identification rate. While Lasso 2 has similar relative error and false identification rate as the proposed method, it has higher MSE and lower R ^{2} in all settings, due to its nonrobustness to samplewise scaling without the sumtozero constraint. Overall, all three methods perform better in the lownoise situation of (p,n,σ)=(20,50,0.5) than the highnoise situation of (p,n,σ)=(100,25,0.2). Across the three settings of α, the performance of the proposed method is rather stable. However, Lasso 1 performs much worse for settings in which α fails to satisfy the conditions in Proposition 3, corresponding to nonuniqueness of the representation of α. Most critically, when α satisfies the conditions in Proposition 3, the proposed method continues to outperform its counterpart in terms of the performance metrics. Overall, the proposed method achieves our objective.
An application to a real RNASeq dataset
This section applies the proposed method to a prostate adenocarcinoma (PRAD) RNASeq dataset published as part of The Cancer Genome Atlas (TCGA) project [22]. Particularly, we identify gene pairs that are associated with preoperative prostatespecific antigen (PSA), an important risk factor for prostate cancer. Towards this end, we download normalized gene expression data from the TCGA data portal (https://tcgadata.nci.nih.gov/docs/publications/tcga/). As described by TCGA, tissue samples from 333 PRAD patients were sequenced using the Illumina sequencing instruments. While raw sequencing reads were processed and analyzed using the SeqWare Pipeline 0.7.0 and MapspliceRSEM workflow 0.7 developed by the University of North Carolina, read alignment was performed using MapSplice [23] to the human reference genome, and gene expression levels were estimated using RSEM [24] with gene annotation GAF 2.1, and further normalized so that the upper quartile count is 1,000 in each sample. All these steps were performed by the TCGA consortium.
In our experiment, the normalized RSEM gene expression estimates are used, excluding samples with missing preoperative PSA values and genes for which the average normalized expression level is lower than 10. This prepossessing step yields p=15,382 genes and n=187 samples. Furthermore, we run Pearson correlation tests for each gene between logtransformed expression levels and logtransformed preoperative PSA levels, and exclude genes with false discovery rate (FDR) values (calculated using the BenjaminiHochberg [25] method based on the pvalues from the Pearson correlation tests) larger than 0.01. Consequently, only 520 genes are retained in the analysis, on which we fit model Eq. (3) using the proposed ADMM algorithm.
To visualize the selection result, we display the solution paths of the model fitting. As shown in Fig. 1, the first pair of genes entering the model are PTPRR and KRT15. While PTPRR is a member of the protein tyrosine phosphatase (PTP) family, which is known to be related with prostate cancer [26, 27], KRT15 is a member of the keratin gene family, which is known to be associated with breast cancer [28] and lung cancer [29]. Interestingly, we find no publication record on PubMed with keywords such as “KRT15 AND PSA” or “KRT15 AND prostate”. By correlating log expression levels and log PSA levels in the 187 patients, we find that both PTPRR and KRT15 are significantly correlated with PSA levels (r=0.28 and p<10^{−4} for PTPRR, r=−0.33 and p<10^{−5} for KRT15). Not surprisingly, their logratio is even more strongly correlated with log PSA levels (r=0.41 and p<10^{−8}), demonstrating the potential of using gene pairs as biomarkers.
The other selected genes are HIST1H1E, LRAT, LCN2, KCNN4, RHOU, and EPHA5 in the order of them entering the model, among which LRAT [30], LCN2 [31], RHOU [32], and EPHA5 [33] are known to link to prostate cancer, and HIST1H1E and KCNN4 are connected to myeloma [34] and pancreatic cancer [35], respectively.
To demonstrate the scalability of our proposed method, we employ the proposed ADMM algorithm to fit Eq. (3) with all the p=15,382 genes without prescreening. In this situation, the first pair of genes entering the model are BCL8 and KRT15, where BCL8 is known to be associated with lymphoma [36]. The other selected genes are PTPRR, LRAT and LCN2, which are very similar to the foregoing results based on prescreening. By comparison, fitting the corresponding model Eq. (1) using a standard Lasso algorithm, such as glmnet [21], would be practically prohibitive, which requires storing a design matrix of \(\frac {p(p1)}{2}\times n \approx 22\times 10^{9}\) elements. To further demonstrate robustness of the proposed method with respect to data normalization, we randomly scale the gene expression levels, both along the gene dimension and the sample dimension, mimicking the gene length normalization and the library size or sequencing depth normalization, respectively. Numerically, we find that the solution path fitted using the randomly scaled data is always identical to that fitted using the original data.
Discussion
Model Eq. (3) has been proposed for compositional data [16] and recently for reference point insensitive data [17]. In this article, we explore Eq. (3) for the identification of gene pairs as biomarkers, enjoying robustness to samplewise scaling normalization (which is a common practice for RNASeq data) and simplicity of validation and measurement by qPCR techniques. Through Propositions 1–4, we establish the relationship between models Eq. (1) and Eq. (3). Additionally, we develop an efficient ADMM algorithm for solving model Eq. (3), which is guaranteed to converge and is shown to be highly competitive in terms of computational efficiency.
One interesting yet important issue of model Eq. (3) is determination of the value of α. One proposal is choosing the α to minimize the L _{0}norm instead of L _{1}norm. However, in such a situation, it remains unclear what kind of conditions as those in Propositions 2, 3 and 4 may be. Furthermore, minimization of the L _{0}norm in α continues to be challenging by itself due to nonconvexity and discontinuity of the L _{0}function. Therefore, our approach based on the L _{1}norm gives rise to convex minimization, which is easier to manage.
One important aspect of model Eq. (3) is that it enables to identify gene pairs in an unbiased manner without any prior knowledge of the known biology of the disease. However, in some situations, information regarding the disease of interest is available from previous studies. Then the prior knowledge needs to be integrated for gene pair identification so that more probable subset of genes or pathways should have a higher chance of being selected. This can be accomplished through weighted regularization with weights \(\{\lambda _{k}\}_{k=1}^{p}\), with a large weight corresponding to a small chance of being selected. Moreover, in some other situations, gene pairs are constrained in that gene pairs are formed only between relevant genes from the same pathway. This can be achieved by replacing the Lasso penalty by either a (sparse) group Lasso penalty [37] and/or the simple equality constraint by a set of constraints, each corresponding to a given pathway of interest. Finally, some nonconvex penalties such as the SCAD penalty [14] and the TLP penalty [15] can be used as opposed to the Lasso penalty to achieve a higher accuracy of selection at an expense of computation.
For a largescale problem, an ADMM algorithm may have a linear convergence rate. To expedite convergence, an inexact ADMM algorithm may be useful in our setting, which has been shown recently to lead to a substantial improvement over the standard ADMM algorithm [19]. Furthermore, parallelization of our ADMM algorithm may achieve further scalability, which is one advantage of ADMM algorithms over many other optimization techniques [18].
One extension of Eq. (1) is generalized linear models such as logistic regression or other predictive models like support vector machine. In such a situation, the proposed method for Eq. (1) is directly applicable to gene pair identification with some modifications. Further investigation is necessary.
Conclusion
In conclusion, the experimental results demonstrate that gene pairs can be used as robust biomarkers which can tolerate samplewise scaling normalization. Furthermore, using L _{1} penalized regression with equality constraints, the model fitting can be formulated as a convex optimization problem which can be solved efficiently using the proposed ADMM algorithm. This approach has the potential to discover novel and reliable biomarkers for biological or clinical studies.
Appendix
Proofs of propositions
Proof of Proposition 1: From Eq. (1), we have
Furthermore
This completes the proof.
Proof of Proposition 2: We show (A) ⇒(B), (B) ⇒(C) and (C) ⇒(A), respectively.
(A) ⇒(B): We prove by contradiction. Without loss of generality, assume that α _{12}≥α _{23}>0. Then, we construct α ^{′} such that \(\alpha ^{\prime }_{12}=\alpha _{12}\alpha _{23}\), \(\alpha ^{\prime }_{23}=0\), \(\alpha ^{\prime }_{13}=\alpha _{13}+\alpha _{23}>0\) and \(\alpha ^{\prime }_{ij}=\alpha _{ij}\) otherwise. Easily, α ^{′} satisfies Eq. (2) and α ^{′}_{1}−α_{1}=−2α _{13}<0, which contradicts (A).
(B) ⇒(C): By (B), α _{ ij } and α _{ jk } always have opposite signs. This, together with the definition that α _{ ji }=−α _{ ij }, α _{ ji } and α _{ jk } always have the same sign, where 0 can be regarded as an arbitrary sign. Therefore, from Eq. (2), we have
(C) ⇒(A): For any α ^{′} satisfying Eq. (2), we have
This completes the proof.
Proof of Proposition 3: We show (D) ⇒(E), (E) ⇒(F) and (F) ⇒(D), respectively.
(D) ⇒(E): As in the proof of Proposition 2, we assume, without loss of generality, that α _{24}≥α _{13}>0. Then, we construct α ^{′} such that \(\alpha ^{\prime }_{13}=0\), \(\alpha ^{\prime }_{14}=\alpha _{14}+\alpha _{13}\), \(\alpha ^{\prime }_{23}=\alpha _{23}+\alpha _{13}\), \(\alpha ^{\prime }_{24}=\alpha _{24}\alpha _{13}\) and \(\alpha ^{\prime }_{ij}=\alpha _{ij}\) otherwise. Easily, α ^{′} also satisfies Eq. (2), and α ^{′}_{1}≤α_{1}, which contradicts (D).
(E) ⇒(F): From (E), there exists a j such that α _{ ik }=0 for all i≠j and k≠j. From Eq. (2) and (B) in Proposition 2, \(\beta _{i}=\sum _{k\neq i}\alpha _{ik}=\alpha _{ij}, \text {for all}i\neq j\), and \( \beta _{j}=\left \sum _{k\neq j}\alpha _{jk}\right =\sum _{k\neq j}\alpha _{jk}=\sum _{k\neq j}\beta _{k}. \)
(F) ⇒(D): Suppose that α ^{′} satisfies Eq. (2) and the conditions in Proposition 2. From (B), we know for all k≠j, \(\alpha ^{\prime }_{jk}\) have the same sign. From Eq. (2), (C) and (F), we have
Therefore, we must have \(\sum _{k\neq j}\alpha ^{\prime }_{jk}=\boldsymbol {\alpha }^{\prime }_{1}\). That is, \(\alpha ^{\prime }_{ik}=0\) for all i≠j, k≠j. Furthermore, for any i≠j, we have \(\beta _{i}=\sum _{k\neq i}\alpha ^{\prime }_{ik}=\alpha ^{\prime }_{ij}.\) Therefore, α ^{′}=α, implying the uniqueness of α. This completes the proof.
Proof of Proposition 4:
(G): By Eq. (2), β _{ j }≠0 only if at least for some k≠j, α _{ jk }≠0. Based on α and β, we can construct an undirected graph G=(V,E) with p vertices such that there is an edge between vertices i and j if and only if α _{ ij }≠0. We know that β _{ j } can not be nonzero unless vertex V _{ j } has a degree of at least 1. Since the total number of edges is A, we know that \(B\leq \sum _{j=1}^{p}I(degree(V_{j})>0)\leq \sum _{j=1}^{p} degree(V_{j})=2A\).
(H): Suppose α satisfies the conditions defined in Proposition 2. If α _{ ij }≠0, let α _{ ij } be the weight associated with edge connecting V _{ i } and V _{ j }. By condition (B), for any cycle in the graph formed by sequence i _{1},i _{2},…,i _{ k },i _{ k+1}=i _{1}, we know that weights associated with adjacent edges (i.e., \(\alpha _{i_{j1}i_{j}}\) and \(\alpha _{i_{j}i_{j+1}}\)) always have opposite signs. Therefore, the number of edges in the cycle has to be an even number, which means that the graph has to be a bipartite graph. It is then easy to see that \(A\leq \left (\frac {B}{2}\right)^{2}\) for such graphs. That is, \(2\sqrt {A}\leq B\).
(I): Suppose α≠0 satisfies the conditions defined in Proposition 3. By condition (F), B=A+1. This completes the proof.
Proof of Proposition 5: It suffices to show that \(\hat {\alpha }\) satisfies (C) of Proposition 2. Note that \(\tilde {\boldsymbol {\beta }}\) in Algorithm 1 satisfies the sumtozero constraint at each step of iteration before termination. In the beginning, \(\\hat {\boldsymbol {\alpha }}\_{1}=0\) and \(\\tilde {\boldsymbol {\beta }}\_{1}=\\hat {\boldsymbol {\beta }}\_{1}\). After each iteration, \(\\hat {\boldsymbol {\alpha }}\_{1}\) is increased by \(\hat \alpha _{k_{1} k_{2}}=\min (\tilde {\beta }_{k_{1}}, \tilde {\beta }_{k_{2}})\), and \(\\tilde {\boldsymbol {\beta }}\_{1}\) is decreased by \(2\hat \alpha _{k_{1} k_{2}}\). In the end, \(\\tilde {\boldsymbol {\beta }}\_{1}=0\), and therefore \(\\hat {\boldsymbol {\alpha }}\_{1} =\frac {1}{2} \\hat {\boldsymbol {\beta }}\_{1}\). This completes the proof.
ADMM algorithm for solving Eq. (4)
We adopt the notation in [18] and reformulate Eq. (4) as
where x are the parameters of interest. If C=1^{T} and d=0, we will have all coefficients sum up to 0. When there is an intercept in the model, we can add a scalar 0 as the first element in C, meaning that we do not have constraint on the intercept. Similarly, as a convention we also do not penalize the intercept. Denoting \(\boldsymbol {B}=\left [\begin {array}{ll}\boldsymbol {C}\\\boldsymbol {I} \end {array}\right ]\), \(\boldsymbol {D}=\left [\begin {array}{cc}\boldsymbol {0}\\\boldsymbol {I}\end {array}\right ]\), and \(\boldsymbol {d}=\left [\begin {array}{ll}d\\\boldsymbol {0}\end {array}\right ]\), the two equality constraints can be simplified as B x+D z=d. To use the ADMM algorithm [18], we form the augmented Lagrangian
with \(\boldsymbol {u}^{k}=(1/\rho)\boldsymbol {y}^{k}=\left [\begin {array}{ll}u_{1}\\ \boldsymbol {u_{2}}^{p}\end {array}\right ]\) where u _{1} is a scalar and \(\boldsymbol {u_{2}} \in \mathbb {R}^{p\times 1}\).
Let \(\boldsymbol {E}=\left [\begin {array}{cc}\boldsymbol {A}\\ \sqrt {\rho } \boldsymbol {C}\end {array}\right ]\), the ADMM algorithm consists of the following iterations
The x update can be accelerated by caching an initial factorization. Suppose the dimension of E is m×p. If m<p, we cache the factorization of I+ρ E E ^{T} (with dimension m×m) and use the matrix inversion lemma
to update x. Otherwise, we cache the factorization of ρ I+E ^{T} E (with dimension p×p) and use back and forward solve to update x directly. The iteration stops when the primal and dual residuals are smaller than their corresponding tolerances,
where
Usually, the relative stopping criteria is chosen to be ε ^{rel}=10^{−4}, and the choice of absolute stopping criteria ε ^{abs} depends on the scale of the variable values. See Boyd et al. [18] for details. To compute a solution path for a decreasing sequence of λ values, we adopt the approach in Friedman et al. [21] and use warm starts for each λ value. The sequence of λ values are either provided by the user, or we begin with λ _{ max }=∥A ^{T} b∥_{ ∞ } for which all the coefficients are equal to 0. We set λ _{ min }=ε _{ λ } λ _{ max }, where ε _{ λ } is a small value, such as 0.01, and generate a decreasing sequence of 100 λ values from λ _{ max } to λ _{ min } on the logscale.
Abbreviations
 ADMM:

Alternating direction method of multipliers
 CD+LS:

Coordinate decent with local search
 DNA:

Deoxyribonucleic acid
 FDR:

False discovery rate
 FR:

False identification rate
 iid:

Independent identically distributed
 Lasso:

Least absolute shrinkage and selection operator
 mRNA:

Messenger RNA
 MSE:

Mean squared error
 PRAD:

prostate adenocarcinoma
 PSA:

Prostatespecific antigen
 PTP:

Protein tyrosine phosphatase
 qPCR:

Quantitative polymerase chain reaction
 RE:

Relative error
 RNA:

Ribonucleic acid
 RNASeq:

RNA sequencing
 RSEM:

RNASeq by expectationmaximization
 SCAD:

Smoothly clipped absolute deviation
 TCGA:

The cancer genome atlas
 TLP:

Truncated L _{1} penalty
 TSP:

Topscoring pair
References
 1
Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary dna microarray. Science. 1995; 270(5235):467.
 2
Quackenbush J. Microarray analysis and tumor classification. N Engl J Med. 2006; 354(23):2463–72.
 3
Dillies MA, Rau A, Aubert J, HennequetAntier C, Jeanmougin M, Servant N, Keime C, Marot G, Castel D, Estelle J, et al. A comprehensive evaluation of normalization methods for illumina highthroughput rna sequencing data analysis. Brief Bioinform. 2013; 14(6):671–83.
 4
Patil P, BachantWinner PO, HaibeKains B, Leek JT. Test set bias affects reproducibility of gene signatures. Bioinformatics. 2015; 31(14):2318–23.
 5
Geman D, d’Avignon C, Naiman DQ, Winslow RL. Classifying gene expression profiles from pairwise mrna comparisons. Stat Appl Genet Mol Biol. 2004; 3(1):1–19.
 6
Leek JT. The tspair package for finding top scoring pair classifiers in r. Bioinformatics. 2009; 25(9):1203–04.
 7
Livak KJ, Schmittgen TD. Analysis of relative gene expression data using realtime quantitative pcr and the 2 δ δct method. methods. 2001; 25(4):402–8.
 8
Arukwe A. Toxicological housekeeping genes: do they really keep the house?. Environ Sci Technol. 2006; 40(24):7944–9.
 9
Ma XJ, Wang Z, Ryan PD, Isakoff SJ, Barmettler A, Fuller A, Muir B, Mohapatra G, Salunga R, Tuggle JT, et al. A twogene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen. Cancer cell. 2004; 5(6):607–16.
 10
Raponi M, Lancet JE, Fan H, Dossey L, Lee G, Gojo I, Feldman EJ, Gotlib J, Morris LE, Greenberg PL, et al. A 2gene classifier for predicting response to the farnesyltransferase inhibitor tipifarnib in acute myeloid leukemia. Blood. 2008; 111(5):2589–96.
 11
Price ND, Trent J, ElNaggar AK, Cogdell D, Taylor E, Hunt KK, Pollock RE, Hood L, Shmulevich I, Zhang W. Highly accurate twogene classifier for differentiating gastrointestinal stromal tumors and leiomyosarcomas. Proc Natl Acad Sci. 2007; 104(9):3414–19.
 12
Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D. Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics. 2005; 21(20):3896–904.
 13
Tibshirani R. Regression shrinkage and selection via the lasso. R Stat Soc Ser B Methodol J. 1996; 58(1):267–88.
 14
Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc. 2001; 96(456):1348–60.
 15
Shen X, Pan W, Zhu Y. Likelihoodbased selection and sharp parameter estimation. J Am Stat Assoc. 2012; 107(497):223–32.
 16
Lin W, Shi P, Feng R, Li H, et al. Variable selection in regression with compositional covariates. Biometrika. 2014; 101(4):785–97.
 17
Altenbuchinger M, Rehberg T, Zacharias H, Stämmler F, Dettmer K, Weber D, Hiergeist A, Gessner A, Holler E, Oefner PJ, et al. Reference point insensitive molecular data analysis. Bioinformatics. 2016; 598:1–122.
 18
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends®; Mach Learn. 2011; 3(1):1–122.
 19
Wang H, Banerjee A. Bregman alternating direction method of multipliers. In: Advances in Neural Information Processing Systems. Curan Associates, Inc.: 2014. p. 2816–24.
 20
Smyth G. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004; 3(1):1–25.
 21
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010; 33(1):1.
 22
The Cancer Genome Atlas Research Network: The molecular taxonomy of primary prostate cancer. Cell. 2015; 163(4):1011–25.
 23
Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, et al. Mapsplice: accurate mapping of rnaseq reads for splice junction discovery. Nucleic Acids Res. 2010; 38(18):178–8.
 24
Li B, Dewey CN. Rsem: accurate transcript quantification from rnaseq data with or without a reference genome. BMC bioinformatics. 2011; 12(1):1.
 25
Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995; 57(1):289–300.
 26
Rand KA, Rohland N, Tandon A, Stram A, Sheng X, Do R, Pasaniuc B, Allen A, Quinque D, Mallick S, et al. Wholeexome sequencing of over 4100 men of african ancestry and prostate cancer risk. Hum Mol Genet. 2016; 25(2):371–81.
 27
Munkley J, Lafferty NP, Kalna G, Robson CN, Leung HY, Rajan P, Elliott DJ. Androgenregulation of the protein tyrosine phosphatase ptprr activates erk1/2 signalling in prostate cancer cells. BMC cancer. 2015; 15(1):9.
 28
Gelfand R, Vernet D, Bruhn KW, Sarkissyan S, Heber D, Vadgama JV, GonzalezCadavid NF. Longterm exposure of mcf7 breast cancer cells to ethanol stimulates oncogenic features. Int J Oncol. 2017; 50(1):49–65.
 29
Boyero L, SánchezPalencia A, MirandaLeón MT, HernándezEscobar F, GómezCapilla JA, FárezVidal ME. Survival, classifications, and desmosomal plaque genes in nonsmall cell lung cancer. Int J Med Sci. 2013; 10(9):1166.
 30
Guo X, Knudsen BS, Peehl DM, Ruiz A, Bok D, Rando RR, Rhim JS, Nanus DM, Gudas LJ. Retinol metabolism and lecithin: retinol acyltransferase levels are reduced in cultured human prostate cancer cells and tissue specimens. Cancer Res. 2002; 62(6):1654–61.
 31
Sunil VR, Patel KJ, NilsenHamilton M, Heck DE, Laskin JD, Laskin DL. Acute endotoxemia is associated with upregulation of lipocalin 24p3/lcn2 in lung and liver. Exp Mol Pathol. 2007; 83(2):177–87.
 32
Alinezhad S, Väänänen RM, Tallgrén T, Perez IM, Jambor I, Aronen H, Kähkönen E, Ettala O, Syvänen K, Nees M, et al. Stratification of aggressive prostate cancer from indolent diseaseprospective controlled trial utilizing expression of 11 genes in apparently benign tissue. Urologic Oncol. 2016; 34(6):255–15. Elsevier.
 33
Li S, Zhu Y, Ma C, Qiu Z, Zhang X, Kang Z, Wu Z, Wang H, Xu X, Zhang H, et al. Downregulation of epha5 by promoter methylation in human prostate cancer. BMC cancer; 15(1):18.
 34
Walker BA, Boyle EM, Wardell CP, Murison A, Begum DB, Dahir NM, Proszek PZ, Johnson DC, Kaiser MF, Melchor L, et al. Mutational spectrum, copy number changes, and outcome: results of a sequencing study of patients with newly diagnosed myeloma. J Clin Oncol. 2015; 33(33):3911–20.
 35
Bonito B, Sauter DP, Schwab A, Djamgoz MA, Novak I. Kca3. 1 (ik) modulates pancreatic cancer cell migration, invasion and proliferation: anomalous effects on tram34. Pflügers ArchivEuropean J Physiol. 2016; 468(1112):1865–75.
 36
Dyomin VG, Rao PH, DallaFavera R, Chaganti R. Bcl8, a novel gene involved in translocations affecting band 15q11–13 in diffuse largecell lymphoma. Proc. Natl Acad Sci. 1997; 94(11):5728–32.
 37
Simon N, Friedman J, Hastie T, Tibshirani R. A sparsegroup lasso. J Comput Graphical Stat. 2013; 22(2):231–45.
Acknowledgements
The authors thank the editors, the associate editor and anonymous referees for helpful comments and suggestions.
Funding
HJ’s research was supported in part by a startup grant from the University of Michigan and the National Cancer Institute grants 4P30CA046592 and 5P50CA186786. LL’s research was supported in part by the Summer Internship Funds of Certificate in Public Health Genetics (CPHG) Program at the University of Michigan. The funding body(s) played no role in the design or conclusions of the study.
Availability of data and materials
The datasets and R programs for reproducing the results in this paper are available at http://wwwpersonal.umich.edu/~jianghui/genepair/.
Author information
Affiliations
Contributions
RS conducted the experiments and involved in the methodology development. LL developed and implemented the algorithm. HJ conceived the study, developed the algorithm and conducted the experiments. All authors drafted, read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Shen, R., Luo, L. & Jiang, H. Identification of gene pairs through penalized regression subject to constraints. BMC Bioinformatics 18, 466 (2017). https://doi.org/10.1186/s1285901718729
Received:
Accepted:
Published:
Keywords
 Gene pair
 Biomarker
 Penalized regression
 ADMM