### Additive risk model

Consider a set of

*n* independent observations (

*T*
_{
i
},

*C*
_{
i
},

*Z*
_{
i
}),

*i* = 1, ...,

*n*. Suppose that the

*i*
^{
th
}subject's event time

*T*
_{
i
}is conditionally independent of the censoring time

*C*
_{
i
}, given the

*d*-dimensional covariate vector

*Z*
_{
i
}. For simplicity of notations, we consider time-independent

*Z* only throughout this paper unless otherwise specified. Let

*X*
_{
i
}=

*min*(

*T*
_{
i
},

*C*
_{
i
}) and

*δ*
_{
i
}=

*I*(

*T*
_{
i
}≤

*C*
_{
i
}) for right censored data. We assume the additive risk model (1). Other format of additive risk models have been studied in [

14]. For the

*i*
^{
th
}subject, denote {

*N*
_{
i
}(

*t*) =

*I*(

*X*
_{
i
}≤

*t*,

*δ*
_{
i
}= 1);

*t* ≥ 0} and {

*Y*
_{
i
}(

*t*) =

*I*(

*X*
_{
i
}≥

*t*);

*t* ≥ 0} as the observed event process and the at-risk process, respectively. The regression coefficient

*β*
_{0} can be estimated by solving the following estimating equation:

where

(

*β*,

*t*) is the estimator of Λ

_{0} satisfying

The resulting estimator of

*β*
_{0} satisfies the simple estimating equation

where

. Denote

*L*^{
i
}s are symmetric semi-positive-definite matrices with rank equal to 1.

In a typical microarray study such as the MCL, we usually have *d* ~ 10^{3} - 10^{4}, while the number of subjects *n* is at most ~10^{2}. In this case, the left hand side of (4) does not have full rank, so a unique solution to equation (4) does not exist. Certain regularization or model reduction will be needed along with estimation. Especially, we propose using the Lasso regularization.

### The Lasso method

Denote the (

*s, l*) element of

*L*
^{
i
}as

and the

*s*
^{
th
}component of

*R*
^{
i
}and

*β* as

and

*β*
_{
s
}, respectively. We can see that equation (

4) is equivalent to the following

*d* equations:

for *s* = 1, ..., *d*.

We note that the validity of the estimating equation (

4) does not depend on any assumption of

*d* and

*n*. The similarity between the estimating equations (

5) and the normal equations for simple linear models motivates variable selection for the additive risk model with right censored data using the following Lasso type estimator:

subject to the *L*
_{1} constraint that|*β*|_{1} = |*β*
_{1}| + ... |*β*
_{
d
}| ≤ *u*,

for a data-dependent tuning parameter *u*, which indirectly determines how many components of
are zero. With *u* → ∞, the Lasso estimate will be the same as standard M-estimate, which is not unique in our microarray study. When *u* → 0, certain components of
will be exactly zero. Variable selection is achieved since only genes with nonzero estimated coefficients are included in the predictive model. Previous theoretical and practical studies, for example [9] show that with small *u*, the Lasso estimate is shrank towards zero and biased. However with a large number of covariates present, the biased Lasso estimate may have better prediction performance due to the bias-variance tradeoff. This property is especially valuable for microarray data, when a large number of genes are present and many of them are purely noises. In addition, for microarray data analysis, prediction of survival risks and selection of genes are much more important than accurately estimating the coefficients in predictive models.

One unique characteristic of the Lasso estimate in the additive risk model is that the summation in (6) is over *d*, the dimension of covariates, not over the sample size *n* as in the linear regression model. However, considering the equivalence of (6) and (4), the Lasso estimate defined in (6) can provide model reduction in the *β* space. The simplicity of the estimating equation in (6) for the additive risk model is not shared by other survival models. For the Cox model in [9], a weighted least squares approximation to the partial likelihood function and an iterative computational algorithm are needed.

Occasionally, there may exist certain covariate effects that are known to be effective *a priori*. For example, some genes may have been confirmed to be associated with disease occurrence in previous independent studies. In this case interest lies in more accurate adjustment for other gene effects and shrinkage of coefficients (of effective covariates) is not preferred. In such an instance, one may simply omit the corresponding *β*
_{
s
}' from the *L*
_{1} constraint. The *L*
_{1} boosting algorithm discussed below can be applied to such situations with minor modifications.

### Tuning parameter selection

We propose choosing the tuning parameter

*u* with the following

*V* -fold cross validation [

21] for a pre-defined integer

*V* . Partition the data randomly into

*V* non-overlapping subsets of equal sizes. Chose

*u* to minimize the cross-validated objective function

where
^{(-v) }is the Lasso estimate of *β* based on the data without the *v*
^{
th
}subset for a fixed *u* and *M*
^{(-v)}is the function *M* defined in (6) evaluated without the *v*
^{
th
}subset. When *V* = *n*, we have the widely used Leave-One-Out cross validation. Another possible tuning parameter selection technique is the generalized cross validation as used in [9]. Under mild regularity conditions, all the above cross validation methods are valid and have been shown to have satisfactory numerical results. It is not clear which cross validation method is optimal under the current setting. *V* -fold cross validation with a small *V* is used due to its computational simplicity. We postpone the relative efficacy of different validation techniques to a later study.

### Evaluation

In standard survival analysis with *n* > > *d*, common interest lies in assessing the significance of a covariate via the p-value of its z-score. However, when the sample size is smaller than the number of covariates as in microarray studies, this standard approach of assessing significance may not be appropriate, since its validity typically relies on justifications assuming *n* >> *d*. In addition, in microarray studies, one major goal is to predict survival risks based on selected genes. We propose the following Leave-One-Out (LOO) cross validation based evaluation method for assessing the relative stability of individual genes and overall predictive power.

For *i* = 1, ..., *n*:

1. Generate the reduced dataset by removing the *i*
^{
th
}subject.

2. Compute the proposed Lasso estimate (including cross validation and estimation) with the reduced set. Denote the estimate of *β* using the reduced set as
^{(-i)}. Compute the predicted risk score for the *i*
^{
th
}subject as
.

A total of *n* predictive models are generated with the above procedure. For the *s*
^{
th
}component of *β*, compute the number of times *c*
_{
s
}it is included in the *n* predictive models (i.e, the number of times that estimated coefficient is not zero). Then the proportion *o*
_{
s
}= *c*
_{
s
}/*n* gives a measure of the relative importance and stability of the *s*
^{
th
}gene. We call *o*
_{
s
}the occurrence index (OI) of the *s*
^{
th
}gene. It lies between 0 and 1. Loosely speaking, if a gene is strongly associated with the survival outcome, it should be identified in the reduced datasets. So the corresponding OI should be equal or close to 1. Generally speaking, higher OI indicates more stable and relatively more important gene.

For evaluation of overall predictive power, we dichotomize the *n* predictive risk scores
at their median. Two hypothetical risk groups can then be created. Since different subjects have the same baseline hazard, higher *β*'*Z* leads to higher hazard function and so higher survival risk. So evaluation of prediction (of survival risk) can be constructed based on survival functions of subjects with different predictive risk scores. We compare the difference of survival functions of those two risk groups, which can be measured with a chi-square statistic with degree of freedom one. A significant difference indicates satisfactory prediction performance.

We note that an alternative approach is to create the reduced sets using an approach similar to the V-fold cross validation, i.e, a subset of size *n/V* is removed. The rationale of using the proposed Leave-One-Out approach is that the sample size is usually much smaller than the number of genes for microarray studies. Removing *n*/*V* subjects may leave an even smaller sample size for the reduced data. Estimates from the reduced dataset can be less reliable. The theoretical basis of the proposed evaluation approach is worth further investigation.

### Computational algorithm

The *L*
_{1} constraint is equivalent to adding a *L*
_{1} penalty to the objective function and ignoring the constraint [9]. Since the *L*
_{1} penalty is not differentiable, usual derivative-based minimization techniques (for example the Newton-Raphson) cannot be used to obtain the estimate in (6). In most previous studies, the minimization relies on the quadratic programming (QP) or general non-linear program which are known to be computationally intensive. Moreover, the quadratic programming procedure cannot be applied directly to the settings when the sample size is much smaller than the number of predictors.

Recent study by [22], which relates the minimization step for the Lasso estimate to the *L*
_{1} boosting algorithm, a regularized boosting algorithm proposed by [23], provides a computationally more feasible solution. The *L*
_{1} boosting algorithm can be applied to general objective functions with *L*
_{1} constraints. For the current *L*
_{1} constrained estimator defined in (6) with a fixed *u*, this algorithm can be implemented in the following steps:

1. Initialization *β*
_{
s
}= 0 for *s* = 1 ... *d* and *m* = 0.

2. With the current estimate of

*β* = (

*β*
_{1}, ...,

*β*
_{
d
}), compute

for *k* = 1 ... *d*.

3. Find *k** that minimizes *min*(*φ*
_{
k
}(*β*), -*φ*
_{
k
}(*β*)). If *φ*
_{
k*
}(*β*) = 0, then stop the iteration.

4. Otherwise denote *γ* = -*sign*(*φ*
_{k* }(*β*)). Find
that

= *argmin*_{α∈[0,1]}*M* [(1 - *α*)(*β*_{1}, ..., *β*_{
d
}) + *α* × *u* × *γη*_{k*}],

where *η*
*k** has the k*^{
th
}element equals to 1 and the rest equal to 0.

5. Let *β*
_{
k
}= (1-
)*β*
_{
k
}for *k* ≠ *k** and *β*
_{
k*
}= (1-
)*β*
_{
k*
}+ *γu*
. Let *m* = *m* + 1.

6. Repeat steps 2–5 until convergence or a fixed number of iterations *N* has been reached.

The *β* at convergence is the Lasso estimate in (6). We conclude convergence if the absolute value of *φ*
_{
k*
}(*β*) computed in step 3 is less than a pre-defined criteria, and/or if *M(β)* is less than a pre-defined threshold.

Compared with traditional algorithms, the *L*
_{1} boosting only involves evaluations of simple functions. Data analysis experiences show the computational burden for the *L*
_{1} boosting is minimal. As pointed out in [22], one attractive feature of the *L*
_{1} boosting algorithm is that the convergence rate is independent of the dimension of input. This property of convergence rate is essential to the proposed approach for data like the MCL since the number of genes is very large. On the other hand, it has been known that for general boosting methods, over-fitting usually does not pose a serious problem [24]. So the overall iteration *N* can be taken to be a large number to ensure convergence.

In our numerical study, the above boosting algorithm is realized using R. R code for cross validation, estimation and evaluation is available upon request from the authors. Our empirical study shows that the boosting algorithm is very affordable. For the MCL data, cross validation and estimation combined take less than four minutes.