Kernel based methods for accelerated failure time model with ultrahigh dimensional data
 Zhenqiu Liu^{1}Email author,
 Dechang Chen^{2},
 Ming Tan^{1},
 Feng Jiang^{3} and
 Ronald B Gartenhaus^{1}
https://doi.org/10.1186/1471210511606
© Liu et al; licensee BioMed Central Ltd. 2010
Received: 17 August 2010
Accepted: 21 December 2010
Published: 21 December 2010
Abstract
Background
Most genomic data have ultrahigh dimensions with more than 10,000 genes (probes). Regularization methods with L_{1} and L_{ p } penalty have been extensively studied in survival analysis with highdimensional genomic data. However, when the sample size n ≪ m (the number of genes), directly identifying a small subset of genes from ultrahigh (m > 10, 000) dimensional data is timeconsuming and not computationally efficient. In current microarray analysis, what people really do is select a couple of thousands (or hundreds) of genes using univariate analysis or statistical tests, and then apply the LASSOtype penalty to further reduce the number of disease associated genes. This twostep procedure may introduce bias and inaccuracy and lead us to miss biologically important genes.
Results
The accelerated failure time (AFT) model is a linear regression model and a useful alternative to the Cox model for survival analysis. In this paper, we propose a nonlinear kernel based AFT model and an efficient variable selection method with adaptive kernel ridge regression. Our proposed variable selection method is based on the kernel matrix and dual problem with a much smaller n × n matrix. It is very efficient when the number of unknown variables (genes) is much larger than the number of samples. Moreover, the primal variables are explicitly updated and the sparsity in the solution is exploited.
Conclusions
Our proposed methods can simultaneously identify survival associated prognostic factors and predict survival outcomes with ultrahigh dimensional genomic data. We have demonstrated the performance of our methods with both simulation and real data. The proposed method performs superbly with limited computational studies.
Keywords
Background
Survival prediction and prognostic factor identification play a very important role in medical research. Survival data normally include the censoring variable that indicates whether some outcome under observation (like death or recurrence of a disease) has occurred within some specific followup time. The modeling procedures must take into account such censoring. It is even more difficult to develop a proper statistical learning method for survival prediction.
Several models for survival predictions have been proposed in statistical literature. The most popular one is the Cox proportional hazards model [1–3], in which model parameters are estimated with partial log likelihood maximization. Another one is the accelerate failure time (AFT) model [4–6]. AFT is linear regression model in which the response variable is the logarithm or a known monotone transformation of a failure (death) time. There are mainly two approaches in literature for fitting a AFT model. One is the the BuckleyJames estimator which adjusts censored observations using the Kaplan Meier estimator [7, 8], and the other is a semiparametric estimation of AFT model with an unspecific error distribution [9–11]. However, the semiparametric Bayesian approach based on complex MCMC procedures is computationally intensive and tends to have inaccurate results, and the Stute's weighted least squares (LS) estimator only implicitly accounts for the censored time. The model has not been widely used in practice due to the difficulties in computing the model parameters [12], and there is no nonlinear AFT model in the literature.
Kernel based methods such as support vector machines (SVM) have been extensively studied recently in the framework of classification and regression [13] in the area of pattern recognition and statistical learning. The concept of kernel formulated as an inner product in the feature space allows us to build nonlinear extensions of many linear models [14]. It would have been a potential alternative if it were not for the complexity of censoring. Moreover, LASSO type penalty and its generalized versions have been proposed for gene (variable) selection with high dimensional genomic profiles with censored survival outcomes [15–18]. However, since the sample size n ≪ m (the number of variables), methods based the primary formulation with a huge m (m > 40, 000) are not efficient. Consequently, in current microarray analysis, what people really do is select a couple of thousands (or hundreds) of genes using filterbased methods (such as Ttest) and then apply the LASSOtype penalty to further reduce the number of disease associated genes. This twostep procedure will lead to missing biologically important genes and introducing bias. The dual solution with kernel proposed in this article attempts to resolve these inadequacies by solving a much smaller n × n matrix.
In this paper, we propose a nonlinear kernel ridge regression for censored survival outcome prediction under the framework of AFT model. We also develop an efficient dual solution with adaptive kernel ridge regression for ultrahigh dimensional genomic data analysis.
Unlike the weighted least square method, our model explicitly accounts for censoring. The proposed models are evaluated with simulation and real data and the prediction error of the test data.
Results and Discussion
Simulation Data
 1.
Sample 12 dimensional input data x with 100 training and test samples respectively from a multivariate normal distribution with mean zero and variancecovariance matrix Σ. The pairwise correlation between the i th and the j th input variables in Σ is r^{ij}and different correlations (r = 0.2, 0.4, 0.6, and 0.8) will be chosen to assess the performance of the proposed method.
 2.
Choose the model parameters w = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]^{ T }, and generate the event time from log T = w^{ t } x^{ k } + ε, where ε ~ N(0, σ^{2}) and σ is determined by the signal to noise ratio (SNR = μ_{ surv }/σ). For instance, with the mean log survival time of 3, and SNR = 3 : 1, we have σ = 1. SNR = 3 : 1 is used in all of our simulations. Finally, k indicates the k th power of input variables, so the log survival time is associated with the input variables nonlinearly.
 3.
The censoring variables are generated as uniformly distributed and independent of the events. Letting d_{ i } = (rand + C)T_{ i }, the censoring status will be δ_{ i } = T_{ i } < d_{ i }. Different Cs give a different portion of censored data. Roughly 40%  60% censored data are produced in our simulations.
Frequencies of Correctly Identified variables with Different Parameters Out of 100 Simulations
Parameters  w_{ i } = 1  w_{ i } = 0.2  w_{ i } = 0.1 

w _{1}  100  100  99 
w _{11}  100  98  99 
w _{21}  97  99  98 
w _{31}  98  99  99 
w _{41}  98  98  94 
w _{51}  88  77  81 
w _{61}  90  86  77 
w _{71}  98  95  99 
w _{81}  98  98  97 
w _{91}  96  96  95 
w _{101}  99  99  98 
w _{111}  99  100  100 
Model performance with Simulation Data and Different Parameter Values
w_{ i }s(λ*, p*)  Av. # of Vars  Exactlymatch  Overfitting  Underfitting 

1 (0.01, 0.6)  12.61  75%  21%  4% 
0.2 (0.002, 0.6)  11.52  54%  3%  43% 
0.1 (0.001, 0.6)  11.43  52%  2%  46% 
Computational Time (in Seconds): AKRR vs LASSO
Input Dimensions  AKRR  LASSO 

100  0.4801  0.6378 
1000  0.5844  6.4577 
10000  1.7500  978.23 
50000  7.5255  >7200 
100000  17.4545    
Diffuse Large Bcell Lymphoma Data
Genes Associated with Survival Time for DLBCL Data
Count  GenBank  Symbal  Description 

200  X59618  RRM2  ribonucleotide reductase M2 polypeptide 
200  X15187  HSP90B1  tumor rejection antigen (gp96) 1 
200  M60315  BMP6  bone morphogenetic protein 6 
176  U04343  CD86  CD86 antigen (CD28 antigen ligand 2, B72 antigen) 
181  X07203  MS4A1  membranespanning 4domains, subfamily A, member 2 
198  S75217  CD79A  CD79A antigen (immunoglobulinassociated alpha) 
200  M28170  SD19  CD19 antigen 
138  U45878  BIRC3  baculoviral IAP repeatcontaining 3 
146  U10485  LRMP  lymphoidrestricted membrane protein 
176  U07620  MAPK10  mitogenactivated protein kinase 10 
179  LC_30727  
153  M63438  HLAC  immunoglobulin kappa constant 
164  U46767  CCL13  small inducible cytokine subfamily A (CysCys), member 13 
142  X14723  CLU  clusterin 
200  M27492  IL1R1  interleukin 1 receptor, type I 
183  J05070  MMP9  matrix metalloproteinase 9 
200  X61118  LMO2  LIM domain only 2 (rhombotinlike 1) 
200  M81750  MNDA  myeloid cell nuclear differentiation antigen 
115  X57809  IGL@  heat shock 70 kD protein 1A 
162  J03746  MGST1  microsomal glutathione Stransferase 1 
200  D38535  ITIH4  interalpha (globulin) inhibitor H4 
200  M21574  PDGFRA  plateletderived growth factor receptor, alpha polypeptide 
187  ESTs  ESTs 
Follicular Lymphoma (FL) Data
Genes Associated with Survival Time for FL Data
count  ProbeID  Symbal  Description 

200  231760_at  C20orf51  chromosome 20 open reading frame 51 
200  232932_at  
200  235856_at  C4A  complement component 4A (Rodgers blood group) 
187  224280_s_a  LOC56181  family with sequence similarity 54, member B 
200  201425_at  ALDH2  aldehyde dehydrogenase 2 family (mitochondrial) 
180  214694_at  MRIP  Myosin phosphatase Rhointeracting protein 
200  214713_at  YLPM1  YLP motif containing 1 
200  218477_at  TMEM14A  transmembrane protein 14A 
200  220669_at  HSHIN1  HIV1 induced protein HIN1 
195  203970_s_a  PEX3  peroxisomal biogenesis factor 3 
200  208470_s_a  HPR  haptoglobinrelated protein; haptoglobin 
175  210920_x_a  
200  215444_s_a  TRIM31  tripartite motifcontaining 31 
Thirteen probes with over 100 relevance counts are identified. Those 13 probes are corresponding to 11 known genes associated with lymphoma and related diseases. For instance, gene C4A localizes to the major histocompatibility complex (MHC) class III region on chromosome 6. It plays a central role in the activation of the classical pathway of the complement system. C4A anaphylatoxin is a mediator of local inflammatory process. It induces the contraction of smooth muscle, increases vascular permeability, and causes histamine release from mast cells and basophilic leukocytes. C4A is on the pathway of Systemic Lupus Erythematosus (SLE). Patients with SLE can increase the risk of certain cancers, including nonHodgkin's lymphoma. We find that C4A is negatively associated with survival time according the estimated coefficient of C4A. ALDH2 is another well studied gene which is significantly associated with acetaldehydeinduced micronuclei and alcoholinduced facial flushing. Defects in ALDH2 are a cause of acute alcohol sensitivity and alcohol induced cancers. There are accumulating evidences that ALDH2deficient individuals are at much higher risk of esophageal cancer and malignant lymphoma. Our study indicates that the upregulated ALDH2 is positively associated with patient survival outcomes. Six other genes are also associated with different cancers including follicular lymphoma.
Conclusions
We proposed kernel based methods for nonlinear AFT model and variable selection for ultrahigh dimensional data. Our evaluations with simulation and real data illustrate that the proposed methods can effectively reduce the dimension of the covariates with sound prediction accuracy. In many studies, both clinical and genomic data are available. Due to the ultrahigh dimension in genomic data, directly applying LASSO based methods to genomic data is usually not feasible. Our proposed method provides an efficient solution for it. Kernel based nonparametric methods have been well studied in statistical learning, but there are not many studies for survival analysis. In this paper, we provide a basis for further explorations in this field.
Methods
where M(x_{ i }) > log T_{ i } if δ_{ i } = 0 and M(x_{ i }) = log T_{ i } if δ_{ i } = 1. Because there are both equality and inequality constraints in the model, new methods need to be developed.
Kernel Ridge Regression (KRR)
where K = (K(x_{ i }, x_{ j }))_{ n × n } = (ϕ(x_{ i })^{ t }ϕ(x_{ j }))_{ n × n } is a kernel matrix which can be defined explicitly and K(x_{ i },.) = ϕ(x_{ i })^{ t }Φ^{ t } is the ith row of the kernel matrix. Popular kernels include:

Linear kernel:$K\left({\mathbf{x}}_{i},{\mathbf{x}}_{j}\right)={\mathbf{x}}_{i}^{t}{\mathbf{x}}_{j},$

Radial basis function (Gaussian) kernel:$K\left({\mathbf{x}}_{i},{\mathbf{x}}_{j}\right)=\mathrm{exp}\left(\frac{{\mathbf{x}}_{i}{\mathbf{x}}_{j}{}^{2}}{{\sigma}^{2}}\right),$

Polynomial kernel:$K\left({\mathbf{x}}_{i},{\mathbf{x}}_{j}\right)={\left({\mathbf{x}}_{i}^{t}{\mathbf{x}}_{j}+{p}_{2}\right)}^{{p}_{1}},$

Sigmoid kernel:$K\left({\mathbf{x}}_{i},{\mathbf{x}}_{j}\right)=\mathrm{tanh}\left(\beta {\mathbf{x}}_{i}^{t}{\mathbf{x}}_{j}\right).$
Our kernel ridge regression algorithm based on the dual equation (DKRR) (6) is as follows:
The Dual Kernel Ridge Regression (DKRR) Algorithm
Given λ, training data ${\left\{{\mathbf{x}}_{i},\mathrm{log}{T}_{i},{\delta}_{i}\right\}}_{i=1}^{n}$, test data ${\left\{{\mathbf{x}}_{k},\mathrm{log}{T}_{k},{\delta}_{k}\right\}}_{k=1}^{{n}_{k}}$, and a small ε.
Calculate the kernel matrices $K={\left[K\left({\mathbf{x}}_{i},{\mathbf{x}}_{j}\right)\right]}_{n\times n}$ and ${K}_{te}={\left[K\left({\mathbf{x}}_{k},{\mathbf{x}}_{i}\right)\right]}_{{n}_{k}\times n}$.
Center the kernels and the survival times: $K=\left({I}_{n}\frac{1}{n}{1}_{n}{1}_{n}^{t}\right)K\left({I}_{n}\frac{1}{n}{1}_{n}{1}_{n}^{t}\right)$ and ${K}_{te}=\left({K}_{te}\frac{1}{n}{1}_{{n}_{k}}{1}_{n}^{t}K\right)\left({I}_{n}\frac{1}{n}{1}_{n}{1}_{n}^{t}\right)$, where 1_{ n }: a vector with n 1's. and I_{ n }: an identity matrix, and $\mathrm{log}{T}_{k}=\mathrm{log}{T}_{k}\frac{{\sum}_{k=1}^{{n}_{k}}\mathrm{log}{T}_{k}}{{n}_{k}}$. Let a^{0} = [0, ..., 0, 0]^{ t }, and j = 0
WHILE 1,

FOR i = 1 to n,
$I\left({\delta}_{i}\right)=\{\begin{array}{ll}1\hfill & \text{if}{\delta}_{i}0,\hfill \\ 1\hfill & \text{if}{\delta}_{i}=0,\hfill \\ K\left({\mathbf{x}}_{i},.\right){\mathbf{a}}^{j}\le \mathrm{log}{T}_{i},\hfill \\ 0\hfill & \text{otherwise}.\hfill \end{array}$
${a}_{i}^{j+1}=\frac{I\left({\delta}_{i}\right)}{n\lambda}\left\{K\left({\mathbf{x}}_{i},.\right){\mathbf{a}}^{j}\mathrm{log}{T}_{i}\right\}$
${\mathbf{a}}^{j+1}={\left[{a}_{1}^{j+1},...,{a}_{i}^{j+1},{a}_{i+1}^{j}...,{a}_{n}^{j}\right]}^{t}$
END

j = j + 1

IF a^{ j+1 } a^{ j } < ε, BREAK.

a^{ j }= a^{j+1}
END
This dual kernel ridge regression (DKRR)algorithm designed for a quadratic error function with linear constraints is a convex function with convex constraints. Theoretically this algorithm will always converge and global optimal solution is guaranteed irrelevant to initial value a^{0}. In our computational experiments, the differences of the estimated parameters with different initial values are very small (less than 0.01 with the infinity norm).
Adaptive Kernel Ridge Regression
The adaptive kernel ridge regression algorithm based on dual variables a with equation (10) and (12) is as follows:
Adaptive Kernel Ridge Regression (AKRR) Algorithm
Given a λ, p ∈ (0.1], training data ${\left\{{\mathbf{x}}_{i},\mathrm{log}{T}_{i},{\delta}_{i}\right\}}_{i=1}^{n}$, and a small ε and η.
Initializing w = u = rand(m, 1), and a = [0, ..., 0]^{ t }
Setting u(u_{ i } == 0) = 10e  5 and j = 1.
While w  u > ε

u = w,

${K}_{\mathbf{u}}=X{X}_{\mathbf{u}}^{t}$

FOR i = 1 to n,
$I({\delta}_{i})=\{\begin{array}{ll}1\hfill & \text{if}{\delta}_{i}0\hfill \\ 1\hfill & \text{if}{\delta}_{i}=0,\hfill \\ K\left({\mathbf{x}}_{i},.\right){\mathbf{a}}^{j}\le \mathrm{log}{T}_{i},\hfill \\ 0\hfill & \text{otherwise}.\hfill \end{array}$
${a}_{i}^{j+1}=\frac{I\left({\delta}_{i}\right)}{n\lambda}\left\{{K}_{\mathbf{u}}\left({\mathbf{x}}_{i},.\right){\mathbf{a}}^{j}\mathrm{log}{T}_{i}\right\}$
${\mathbf{a}}^{j+1}={[{a}_{1}^{j+1},...,{a}_{i}^{j+1},{a}_{i+1}^{j}...,{a}_{n}^{j}]}^{t}$
END

j = j + 1

.$\mathbf{w}={X}_{\mathbf{u}}^{t}\mathbf{a}$
END
w(w < η) = 0
Unlike other LASSO based methods which seek to find optimal w directly, AKRR algorithm updates the mdimensional w through updating a much smaller ndimensional dual variable a. This method is computationally highly efficient when n ≪ m, which is common in genomic data. Although the proposed method is based on the dual problem, the primal variable w is explicitly updated in the computation. Theoretically AKRR algorithm will always converge to global optimal solution when p = 1 irrelevant to initial values of w, u, and a, as the error function is convex under L_{1} penalty, but only local optimal solution is guaranteed when p < 1. However, in our computational experiments with simulation and real data, even though we may have different optimal solutions with different initializations only when p ≤ 0.5, most selected features are still the same in different runs. AKRR always reach the same optimal solution in all of our experiments when p ≥ 0.6. One possible explanation is that the error function may still be near convex or convex almost everywhere when p is large. Therefore it may be possible that we enjoy both the oracle property with less bias and the global optimal solution with larger p (0.6 ≤ p < 1). Theoretical study for the near convex error function, however, is out of the scope of this paper. To prevent the results stick to a local optimal solution when p ≤ 0.5, we run AKRR 30 times and the best solution is chosen from the run with smallest test error. Even though AKRR does choose different variables with different p s, a small subset (≥ 5) of most important genes are always selected in our experiments. The model performance can be evaluated with crossvalidation and the relative root mean squared error $(\text{RRMSE}=\sqrt{\frac{{\displaystyle {\sum}_{i}{(({y}_{i}{\widehat{y}}_{i})/{y}_{i})}^{2}}}{n}})$ of the test data. There are two parameters p and λ for the adaptive kernel ridge regression (AKRR) algorithm. One efficient way is to set p = 0.1, 0.2, ..., 1 alternatively, and then search for the best λ through crossvalidation. The range of can be determined by the path of the optimal solution. λ_{min} = 0 and λ_{max} is set to be the smallest value with all zero estimated parameters by multiple trials. We search the optimal λ from λ ∈ (0, 1] in this paper. Usually we have a larger λ_{max} for p = 1, and smaller λ_{max} when p is smaller.
Declarations
Acknowledgements
We thank the Associate Editor and the two anonymous referees for their constructive comments, which improve this manuscript significantly. This work was partially supported by the 1R03CA133899 grant from the National Cancer Institute and by the National Science Foundation CCF0729080 grant.
Authors’ Affiliations
References
 Cox D: Regression models and lifetables (with discussion). Journal of Royal Statistical Society, Series B 1972, 34: 187–220.Google Scholar
 Gui J, Li H: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, Theory and Methods 2001, 96: 456.Google Scholar
 Van Houwelingen H, Bruinsma T, Hart A, Van't Veer L, Wessels L: Crossvalidated Cox regression on microarray gene expression data. Stat Med 2006, 25: 3201–3216. 10.1002/sim.2353View ArticlePubMedGoogle Scholar
 Kalbfleisch J, Prentice R: The Statistical Analysis of Failure Time Data. New York: John Wiley; 1980.Google Scholar
 Wei L: The accelerated failure time model. a useful alternative to the Cox regression model in survival analysis. Statistics in Medicine 1992, 11: 1871–1879. 10.1002/sim.4780111409View ArticlePubMedGoogle Scholar
 Ying Z: A large sample study of rank estimation for censored regression data. Annals of Statistics 1993, 21: 76–99. 10.1214/aos/1176349016View ArticleGoogle Scholar
 Stute W, Wang J: The strong law under random censorship. Annals of Statistics 1993, 14: 1351–1365.Google Scholar
 Stute W: Distributional convergence under random censorship when covariables are present. Scandinavia Journal of Statistics 1996, 23: 461–471.Google Scholar
 Christensen R, Johnson W: Modelling accelerated failure time with a Dirichlet process. Biometrika 1988, 75: 693–704. 10.1093/biomet/75.4.693View ArticleGoogle Scholar
 Kuo L, Mallick B: Bayesian semiparametric inference for the accelerated failure time model. Canadian J Stat 1997, 25: 457–472. 10.2307/3315341View ArticleGoogle Scholar
 Bedrick E, Christensen R, Johnson W: Bayesian accelerated failure time analysis with application to veterinary epidemiology. Stat Med 2000, 19: 221–237. 10.1002/(SICI)10970258(20000130)19:2<221::AIDSIM328>3.0.CO;2CView ArticlePubMedGoogle Scholar
 Jin Z, Lin D, Wei L, Ying Z: Rankbased inference for the accelerated failure time model. Biometrika 2003, 90: 341–353. 10.1093/biomet/90.2.341View ArticleGoogle Scholar
 Vapnik V: Statistical Learning Theory. New York: Wiley and Sons; 1998.Google Scholar
 ShaweTaylor J, Cristianini N: Kernel Methods for Pattern Analysis. London: Cambridge University Press; 2004.View ArticleGoogle Scholar
 Ma S, Huang J: Additive risk survival model with microarray data. BMC Bioinformatics 2007, 8: 192. 10.1186/147121058192View ArticlePubMedPubMed CentralGoogle Scholar
 Sha N, Tadesse M, Vannucci M: Bayesian variable selection for the analysis of microarray data with censored outcomes. Bioinformatics 2006, 22(18):2262–2268. 10.1093/bioinformatics/btl362View ArticlePubMedGoogle Scholar
 Liu Z, Gartenhaus R, Chen X, Howell C, Tan M: Survival Prediction and Gene Identification with Penalized Global AUC Maximization. Journal of Computational Biology 2009, 16(12):1661–1670. 10.1089/cmb.2008.0188View ArticlePubMedPubMed CentralGoogle Scholar
 Liu Z, Jiang F: Gene identification and survival prediction with Lp Cox regression and novel similarity measure. Int J Data Min Bioinform 2009, 3(4):398–408. 10.1504/IJDMB.2009.029203View ArticlePubMedGoogle Scholar
 Rosenwald A, Wright G, Chan W, Connors J, Campo E, Fisher R, Gascoyne R, MullerHermelink H, Smeland E, Giltnane J, Hurt E, Zhao H, Averett L, Yang L, Wilson W, Jaffe E, Simon R, Klausner R, Powell J, Duffey P, Longo D, Greiner T, Weisenburger DD, DBLJVJAJMELGAGTMTLM, Sanger WG, Ott G, Kvaloy S, Delabie J, Holte H, Krajci P, Stokke T, Staudt L: The use of molecular profiling to predict survival after themotheropy for diffuse largeBcell lymphoma. The New England Journal of Medicine 2002, 346: 1937–1947. 10.1056/NEJMoa012914View ArticlePubMedGoogle Scholar
 Dave S, Wright G, Tan B, Rosenwald A, Gascoyne R, Chan W, Fisher R, Braziel R, Rimsza L, Grogan T, Miller T, LeBlanc M, Greiner T, Weisenburger D, Lynch J, Vose J, Armitage J, Smeland E, Kvaloy S, Holte H, Delabie J, Connors J, Lansdorp P, Ouyang Q, Lister T, Davies A, Norton A, MullerHermelink H, Ott G, Campo E, Montserrat E, Wilson W, Jaffe E, Simon R, Yang L, Powell J, Zhao H, Goldschmidt N, Chiorazzi M, Staudt L: Prediction of survival in follicular lymphoma based on molecular features of tumorin filtrating immune cells. N Engl J Med 2004, 351(21):2159–2169. 10.1056/NEJMoa041869View ArticlePubMedGoogle Scholar
 Knight K, Fu W: Asymptotics for Lassotype estimators. Annals of Statistics 2000, 28: 1356–1378. 10.1214/aos/1015957397View ArticleGoogle Scholar
 Fan J, Peng H: On Nonconcave Penalized Likelihood With Diverging Number of Parameters. The Annals of Statistics 2004, 32: 928–961. 10.1214/009053604000000256View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.