Prediction of a timetoevent trait using genome wide SNP data
 Jinseog Kim†^{1},
 Insuk Sohn†^{2},
 DaeSoon Son^{3},
 Dong Hwan Kim^{4},
 Taejin Ahn^{3} and
 SinHo Jung^{5}Email author
DOI: 10.1186/147121051458
© Kim et al.; licensee BioMed Central Ltd. 2013
Received: 30 October 2012
Accepted: 12 February 2013
Published: 19 February 2013
Abstract
Background
A popular objective of many highthroughput genome projects is to discover various genomic markers associated with traits and develop statistical models to predict traits of future patients based on marker values.
Results
In this paper, we present a prediction method for timetoevent traits using genomewide singlenucleotide polymorphisms (SNPs). We also propose a MaxTest associating between a timetoevent trait and a SNP accounting for its possible genetic models. The proposed MaxTest can help screen out nonprognostic SNPs and identify genetic models of prognostic SNPs. The performance of the proposed method is evaluated through simulations.
Conclusions
In conjunction with the MaxTest, the proposed method provides more parsimonious prediction models but includes more prognostic SNPs than some naive prediction methods. The proposed method is demonstrated with real GWAS data.
Background
A genomewide association study (GWAS) involves an examination of the entire genome, typically singlenucleotide polymorphisms (SNPs), of different individuals to determine whether any variant is associated with a particular clinical outcome. Many researchers have considered the design and analysis of GWASs with respect to binary clinical outcomes such as case/control or response/nonresponse ones [15].
In clinical cancer research, the primary endpoint of interest is usually a timetoevent trait subject to censoring. In CALGB 80803, for example, germline DNAs are collected, together with time to progression and overall survival data, from 352 advanced pancreatic cancer patients. One objective of an SNP correlative study is to discover SNP markers that are correlated with such timetoevent endpoints.
One of the first objectives of a statistical analysis in a GWAS is the discovery of SNP markers that are associated with a particular trait. The major statistical issue in marker discovery is multiple testing to avoid enlarged type I error probability due to the large number of univariate tests [6, 7]. Each prognostic SNP has two or three possible outcomes depending on its genetic model, and the efficiency of a statistical method in associating it with a trait is maximized when the true genetic model is known. For most SNPs, however, the true genetic model is unknown. To identify the true genetic model of each SNP and optimize the association analysis, many researchers have considered some candidate genetic models for a given trait and derived a null distribution of the maximum of test statistics specific to individual genetic models [8, 9]. This test is referred to as the MaxTest. These methods have been developed for binary traits such as case/control or response/nonresponse ones. We develop a MaxTest to identify the genetic model of each SNP when the trait is a survival endpoint, e.g., the time to tumor progression or death.
Another major objective of a GWAS is to predict a trait of interest by using SNPs. Prediction methods using microarray data have been widely investigated [1012], but cannot be directly applied to SNPbased predictions. The number of SNP markers in genomewide SNP data far exceeds that of gene markers (or probes) in microarray data, e.g., 1M vs. 20K. In addition, although gene expression data in microarray studies are continuous, genomewide SNP data are discrete, taking only three different values at most and showing different values depending on the genetic model.
This paper presents a method for predicting a survival outcome that uses genomewide SNP data but can be easily modified for any type of trait, including binary or continuous outcomes. The proposed method uses the gradient lasso method [13], which has been developed for microarray data. Some investigators fit a prediction model while ignoring the genetic model of each SNP [14]. We also propose a MaxTest associating between a timetoevent trait and a SNP accounting for its possible genetic model and identifies the genetic model of each candidate prognostic SNP by using the proposed MaxTest before fitting a prediction model. The simulation results show that this procedure improves prediction efficiency and prognostic power. For computational efficiency, nonsignificant SNPs are excluded using the MaxTest before starting the gradient lasso. For the facilitation of the proposed MaxTest and prediction method, glcoxphSNP R packages (http://datamining.dongguk.ac.kr/Rlib/glcoxphSNP) are provided.
Methods
Genetic Models of SNPs
Suppose that the genotype for an SNP is encoded as AA, AB, or BB. Let g denote the number of copies of the B allele. That is, g=0, 1 or 2 if the genotype is AA, AB, or BB, respectively. Let λ_{ g }(t) denote the hazard function of genotype g. Without loss of generality, assume that B is the risk allele in the sense that having B increases the risk of an event. More specifically, assume that λ_{0}(t)≤λ_{1}(t)≤λ_{2}(t) for all t≥0. (Note that for some specific diseases, this may not be an appropriate genetic model.) We now consider the following three popular genetic models:

Recessive model: λ_{0}(t)=λ_{1}(t)<λ_{2}(t).

Dominant model: λ_{0}(t)<λ_{1}(t)=λ_{2}(t).

Multiplicative model: λ_{2}(t)/λ_{1}(t)=λ_{1}(t)/λ_{0}(t), or equivalently λ_{1}(t)=γ λ_{0}(t) and λ_{2}(t)=γ^{2}λ_{0}(t) for γ>0.
For a chosen score c_{ g }, we consider a proportional hazard model (PHM), λ_{ g }(t)=λ_{0}(t) exp(β c_{ g }). Then Cox’s partial maximum likelihood test has optimal power with (c_{0},c_{1},c_{2})=(0,0,1) for a recessive model, (0,1,1) for a dominant model, and (0,1,2) for a multiplicative model [15]. Note that the PHM is invariant to the linear transformation of the covariate (c_{0},c_{1},c_{2}).
MaxTest
Suppose that we want to test whether an SNP is associated with a given clinical outcome. The test statistic is dependent on the true genetic model of the SNP. At the time of testing, however, we usually have no knowledge of the true genetic model. In this case, a naive approach is to conduct all statistical tests by assuming different genetic models and choose the lowest pvalue as the measurement of the association. This approach can lead to an enlarged Type I error because of multiple tests. To adjust for multiple tests, investigators have proposed a method considering the maximum of test statistics with respect to all candidate genetic models under consideration, namely the MaxTest.
Many studies have considered the MaxTest for binary clinical outcomes. Zheng et al. [8] propose a robust ranking method when the underlying genetic model is unknown, namely the MAXrank test. Conneely and Boehnke [16] propose a method for computing pvalues that adjusts for correlated tests and show that the method can improve the accuracy of permutation tests with greater computational efficiency. Li et al. [17] propose a method for approximating the pvalue for the MaxTest with or without covariates adjusted for, namely the Prank test. Li et al. [9] compare the results of the MAXrank and Prank tests. Hoggart et al. [18] formulate the problem as variable selection in a logistic regression analysis including a covariate for each SNP and find the posterior mode for shrinkage priors based on a stochastic search on a penalized likelihood function.
under H_{0}:λ_{0}(t)=λ_{1}(t)=λ_{2}(t) [see, e.g., [20]].
where z_{ l i } and s_{ l k }(t) denote z_{ i } and s_{ k }(t), respectively, for genetic model l; $Y\left(t\right)=\sum _{i=1}^{n}{Y}_{i}\left(t\right)$, $N\left(t\right)=\sum _{i=1}^{n}{N}_{i}\left(t\right)$. Let $\widehat{\Sigma}={\left({\widehat{\rho}}_{l{l}^{\prime}}\right)}_{1\le l,{l}^{\prime}\le 3}$, where ${\widehat{\rho}}_{l{l}^{\prime}}={\widehat{\sigma}}_{l{l}^{\prime}}/{\widehat{\sigma}}_{l}{\widehat{\sigma}}_{{l}^{\prime}}$. Then we can obtain the critical value of Q by a numerical method or a simulation method from the $N(0,\widehat{\Sigma})$ distribution. This is a survival trait counterpart for the MaxTest with a binary trait, as discussed in several studies [9, 21].
We can construct an alternative test based on the quadratic form ${W}^{2}={S}^{T}{\widehat{\Sigma}}^{1}S$, where S=(T_{1},T_{2},T_{3})^{ T }. In addition to recessive, dominant, and multiplicative genetic models, we can consider other models to develop a test statistic to measure the relationship between an SNP and a survival trait. For example, we may consider the longrank test based on the oneway ANOVA in [22] or the test based on the Wilcoxon RankSum test in [23], which require no specific genetic model assumptions. In particular, the ANOVAtype test is a reasonable option if the monotone trend in genotypes g = 0, 1, and 2 is doubtful.
Cox model with a lasso penalty
where l(·) is the partial likelihood function [19].
 1.
Initialize: β=0 and k=0.
 2.Do until convergence
 (a)Addition:
 (i)
Compute the gradient ∇l(β).
 (ii)
Find the $\stackrel{j}{\gamma}$ maximizing ∂ l(β)/∂ β _{ j } for j=1,…,p and $\widehat{\gamma}=s\times \mathrm{sign}\left(\mathrm{\partial l}\right(\mathit{\beta})/\partial {\beta}_{\u0135})$.
 (iii)
Let v be a pdimensional vector such that its $\stackrel{j}{\gamma}$th element $\widehat{\gamma}$ and other elements are zeros.
 (iv)
Find $\widehat{\alpha}=arg\underset{\alpha \in [0,1]}{min}l\left((1\alpha )\mathit{\beta}+\alpha \mathit{v}\right).$
 (v)
Update $\mathit{\beta}=(1\widehat{\alpha})\mathit{\beta}+\widehat{\alpha}\mathit{v}.$
 (i)
 (b)Deletion:
 (i)
Calculate h _{ σ }=−∇l(β _{ σ })+θ _{ σ }∇l(β _{ σ })^{ T } θ _{ σ }/σ, where σ={j:β _{ j }≠0}.
 (ii)
Find $\widehat{\delta}=\mathrm{arg}\underset{\delta \in [0,U]}{min}l(\mathit{\beta}+\delta \mathit{h}),$ where $\mathit{h}=\left(\begin{array}{c}{\mathit{h}}_{\sigma}\\ 0\end{array}\right)$ and U= mink∈σ{−β _{ k }/h _{ k }:β _{ k } h _{ k }<0}.
 (iii)
Update $\mathit{\beta}=\mathit{\beta}+\widehat{\delta}\mathit{h}$.
 (i)
 (c)
Set m=m+1.
 (a)
 3.
Return β.
Proposed algorithm for predicting a survival trait
 1.
Read in the clinical data {(X_{ i },δ_{ i }),i=1,...,n} and SNP data {(s_{i1},...,s_{ i m }),i=1,...,n}, where s_{ i j } denotes the number of B alleles for SNP j (=1,...,m).
 2.For SNP j (=1,...,m), calculate the variance and covarince matrix ${\widehat{\Sigma}}_{j}$, and generate the null distribution of the MaxTest as follows.
 (a)
For b=1,...,B (=100,000, say), generate $({t}_{1j}^{\left(b\right)},{t}_{2j}^{\left(b\right)},{t}_{3j}^{\left(b\right)})$ from $N(0,{\widehat{\Sigma}}_{j})$.
 (b)
Let ${q}_{j}^{\left(b\right)}=max\left(\right\underset{1j}{\overset{\left(b\right)}{t}},{t}_{2j}^{\left(b\right)},{t}_{3j}^{\left(b\right)}\left\right)$ for b=1,...,B.
 (a)
 3.For SNP j (=1,...,m),
 (a)
Using original data, calculate the test statistics (T _{1j},T _{2j},T _{3j}), the MaxTest statistic q _{ j }= max(T _{1j},T _{2j},T _{3j}), and twosided pvalues p _{1j},p _{2j},p _{3j} from the marginal test for respective genetic models.
 (b)Approximate the pvalue of the MaxTest by${p}_{j}={B}^{1}\sum _{b=1}^{B}I({q}_{j}^{\left(b\right)}\ge {q}_{j})$
 (a)
 4.
SNP screening: Select J (<<m) SNPs with p_{ j }<α for specified a α (=0.01, say).
 5.
For selected SNPs j (=1,...,J), identify the genetic model (1, 2, 3) by the lowest marginal pvalue from p_{1j},p_{2j},p_{3j} or the largest test statistic from T_{1j},T_{2j},T_{3j}.
 6.
For patient i (=1,...,n), define covariates (z_{i1},...,z_{ i J }) by the identified genetic model and the corresponding score.
 7.Standardize the covariates:${z}_{\mathrm{ij}}^{\prime}=\frac{{z}_{\mathrm{ij}}{\stackrel{\u0304}{z}}_{j}}{{s}_{j}},$
where ${\stackrel{\u0304}{z}}_{j}={n}^{1}\sum _{i=1}^{n}{z}_{\mathrm{ij}}$ and ${s}_{j}^{2}={n}^{1}\sum _{i=1}^{n}{({z}_{\mathrm{ij}}{\stackrel{\u0304}{z}}_{j})}^{2}$.
 8.
Apply the gradient lasso to the Cox regression model with response data {(X_{ i },δ_{ i }),i=1,...,n) and standardized covariates $\left\{\right({z}_{i1}^{\prime},\mathrm{...},{z}_{\mathrm{iJ}}^{\prime}),i=1,\mathrm{...},n\}$.
Results and discussion
Simulation study
where D denotes the number of prognostic SNPs.
For the experiment, we set m=1000, n=200, D=6, ρ=0 or 0.3, β_{ j }=0.8 (j=1,...,D), and a uniform censoring distribution for 15% or 30% of censoring. All six prognostic SNPs have (f_{1},f_{2},f_{3})=(.25,.5,.25). SNP 1 and SNP 4 have a dominant model; SNP 2 and SNP 5, a recessive model; and SNP 3 and SNP 6, a multiplicative model. Each of the remaining 994 SNPs has (AA,AB,BB) with (f_{1},f_{2},f_{3})=(1/3,1/3,1/3).
To evaluate the performance of the proposed method, we generate 200 random samples and divide them into a training set (100 samples) and a test set (100 samples). We calculate the MaxTest pvalue of each SNP by using B=100,000 permutations from the training set and identify the genetic model for each SNP. We select SNPs with pvalues less than α=0.01 and convert selected SNPs into corresponding scores by their genetic models. We apply the gradient lasso to the selected SNPs to fit the prediction model. Let SNPs j (=1,...,K) be included in the fitted prediction model with corresponding regression estimates ${\widehat{\beta}}_{1},\mathrm{...},{\widehat{\beta}}_{K}$. Then we can define the risk score for sample i as ${r}_{i}={\widehat{\beta}}_{1}{z}_{i1}+\cdots +{\widehat{\beta}}_{K}{z}_{\mathrm{iK}}$. Using the median risk score from the test set as a cutoff value, we divide the patients in the test set into high and lowrisk groups. We apply a twosample logrank test to compare the survival distribution between these two risk groups. We repeat this procedure 100 times and count the number of SNPs and that of prognostic SNPs included in each fitted prediction model by the gradient lasso. We summarize the distribution of logrank pvalues from the test set, and for comparison purposes, we consider the prediction methods by assuming that all m SNPs have the same genetic model.
Mean numbers of SNPs and prognostic SNPs included in fitted prediction models, recovery rate and means/standard deviations of the logrank pvalue from test sets for the proposed method and methods assuming recessive, dominant, or multiplicative models for all SNPs
Censoring  ρ  Genetic  Mean number  Mean number of  Recovery  Mean (SD) pvalue 

model  of selected  of selected  rate  of the logrank  
SNPs  prognostic SNPs  test  
30%  0  Proposed  6.72  5.05  0.75  <0.0001 (<0.0001) 
Recessive  8.03  4.01  0.50  0.0052 (0.0018)  
Dominant  6.66  3.85  0.58  <0.0001 (<0.0001)  
Multiplicative  7.72  4.95  0.64  0.0004 (0.0003)  
0.3  Proposed  6.51  4.83  0.74  0.0001 (0.0001)  
Recessive  7.73  3.83  0.50  0.0045 (0.0016)  
Dominant  6.58  3.66  0.56  0.0011 (0.0007)  
Multiplicative  7.52  4.72  0.63  0.0006 (0.0004)  
15%  0  Proposed  6.65  5.18  0.78  <0.0001 (<0.0001) 
Recessive  8.59  4.19  0.49  0.0028 (0.0011)  
Dominant  6.69  3.88  0.58  <0.0001 (<0.0001)  
Multiplicative  7.96  4.98  0.63  0.0005 (0.0005)  
0.3  Proposed  6.37  4.98  0.78  <0.0001 (<0.0001)  
Recessive  7.88  3.94  0.50  0.0048 (0.0028)  
Dominant  6.38  3.74  0.59  0.0011 (0.0011)  
Multiplicative  7.55  4.89  0.65  0.0001 (<0.0001) 
Mean number of SNPs and prognostic SNPs included in the fitted prediction models, recovery rate and means/standard deviations of the logrank pvalues from the test set for the proposed method at ρ = 0 and censoring = 30%
n  β  Mean number of  Mean number of  Recovery rate  Mean (SD) pvalue 

selected SNP  selected prognostic SNP  of the logrank test  
200  0.8  6.72  5.05  0.75  <0.0001 (<0.0001) 
1  6.13  5.18  0.85  <0.0001 (<0.0001)  
2  5.60  5.17  0.92  <0.0001 (<0.0001)  
300  0.8  6.18  5.53  0.89  <0.0001 (<0.0001) 
400  0.8  5.89  5.72  0.97  <0.0001 (<0.0001) 
Example using real data
We apply the proposed method to the GWAS data in Choi et al. [28], who provide a GWAS of 119 patients with normal karyotype acute myeloid leukemia (AMLNK) by using Affymetric GenomeWide Human SNP Arrays 6.0 (San Diego, CA, USA). We exclude those SNPs with missing genotype data for any patient. We also exclude those SNPs with only one genotype for the 119 patients. The final data set for the analysis includes m = 251, 748 autosomal SNPs from n = 119 patients. The primary endpoint in this analysis is eventfree survival (EFS), which is defined as the interval between the registration and the end of induction chemotherapy for patients showing no complete response (CR), a relapse after achieving a CR to induction chemotherapy, or death.
A standard approach may be to fit a prediction model assuming a multiplicative genetic models for all SNPs, e.g. Tan et al. [29]. We analyzed this data set using the same method as above except that all SNPs were assumed to have a multiplicative model. Figure 1(b) displays the LOOCV results. Note that the fitted prediction models do not significantly partition the test set into high and lowrisk groups by ignoring the possible genetic models.
List of 24 SNPs selected by the proposed method from the whole data set of 119 samples, their MaxTest pvalues, genetic models, the number of times selected by prediction models fitted during the LOOCV procedure
RS ID  Chr  Position  Gene name  Genetic  Pvalue  Frequency 

model g  
rs1030254  16  60696651  LOC644649, CDH8, LOC729159  3  0.00009  119 
rs1030252  16  60696869  LOC644649, CDH8, LOC729159  2  0.00010  119 
rs10798122  1  187584699  PLA2G4A, FAM5C  1  0.00048  119 
rs10026207  4  186039201  HELT, SLC25A4  3  0.00233  119 
rs13333329  16  1695776  CRAMP1L  3  0.00015  117 
rs2132183  3  84966867  LOC440970,CADM2  3  0.00149  117 
rs1950400  14  27105035  MIR4307,NOVA1  2  0.00040  115 
rs2155777  11  133290007  OPCML  3  0.00142  113 
rs1677914  12  78274425  NAV3  2  0.00283  106 
rs1476847  18  9834599  RAB31  1  0.00029  102 
rs7614596  3  84986027  LOC440970,CADM2  2  0.00020  100 
rs2648117  4  186787096  SORBS2  3  0.00856  90 
rs1851317  15  35077786  GJD2,ACTC1  1  0.00999  88 
rs3790217  20  19441650  SLC24A3  2  0.00728  85 
rs4902990  14  72618432  RGS6  2  0.00004  81 
rs9482583  6  125318379  RNF217  3  0.00847  79 
rs3020444  14  64791013  ESR2  3  0.00288  77 
rs10851869  15  74331083  PML  2  0.00036  65 
rs11986200  8  22698209  PEBP4  1  0.00222  63 
rs11260756  1  16759616  SPATA21  1  0.00827  63 
rs4968415  17  60264240  MED13,TBC1D3P2  1  0.00075  62 
rs12416722  11  133300460  OPCML  1  0.00067  59 
rs626266  12  72888187  TRHDE  2  0.00070  52 
rs16852300  2  167414424  SCN7A,XIRP2  3  0.00513  33 
The RGS6 gene (rs4902990) is associated with treatment outcomes in AMLNK patients. RGS6, a regulator of Gprotein signaling 6, modulates the Gprotein function in the signaling pathway by activating the intrinsic GTPase activity of alpha subunits [30, 31]. An SNP on RGS6 has been found to modulate the risk of bladder cancer [32]. In addition, it is known that RGS6 induces apoptosis through a mitochondrialdependent pathway, which implies that RGS6 may be involved in cancer progression [29]. Further, membrane drug transporters, including SLC25A4 (rs10026207) and SLC24A3 (rs3790217), are known to be associated with eventfree survival. SLC25A4, solute carrier family 25 (mitochondrial carrier; adenine nucleotide translocator; ANT1), member 4, is known to interact with the Bcl2associated X protein, which is involved in the apoptosis pathway [33, 34]. The rs10798122 SNP on family with sequence similarity 5, member C, FAM5C, is selected by the proposed model. A loss of hypermethylated FAM5c is known to be associated with the development of tongue squamous cell carcinoma or gastric cancer [35, 36].
Conclusions
We have proposed a prediction method for a survival endpoint using SNPs. The paper also proposes a MaxTest to screen out nonprognostic SNPs and identify genetic model of prognostic SNPs. The simulation results indicate substantial prognostic power for the proposed prediction method. Noteworthy is that, in conjunction with the MaxTest, the proposed method provides more parsimonious prediction models with more prognostic SNPs than those prediction methods ignoring the true genetic model of prognostic SNPs. We apply real GWAS data to patients with acute myeloid leukemia and find that the proposed method provides a prediction model that can efficiently classify the patients into high and lowrisk groups by using a small number of SNPs that are known to be biologically informative. Although the proposed method is limited to the prediction of timetoevent traits, it can be easily extended to a wide range of traits, including dichotomous or continuous ones.
Notes
Declarations
Acknowledgements
This research was supported by a grant from the National Cancer Institute, CA142538.
Authors’ Affiliations
References
 Chen BE, Sakoda LC, Hsing AW, Rosenberg PS: Resamplingbased multiple hypothesis testing procedures for genetic casecontrol association studies. Genet Epidemiol. 2006, 30 (6): 495507. 10.1002/gepi.20162.View ArticlePubMedGoogle Scholar
 Gordon D, Finch SJ: Factors affecting statistical power in the detection of genetic association. J Clin Invest. 2005, 115 (6): 14081418. 10.1172/JCI24756.PubMed CentralView ArticlePubMedGoogle Scholar
 Hao K, Xu X, Laird N, Wang X, Xu X: Power estimation of multiple SNP association test of casecontrol study and application. Genet Epidemiol. 2004, 26 (1): 2230. 10.1002/gepi.10293.View ArticlePubMedGoogle Scholar
 Skol AD, Scott LJ, Abecasis GR, Boehnke M: Joint analysis is more efficient than replicationbased analysis for twostage genomewide association studies. Nat Genet. 2006, 38 (2): 209213. 10.1038/ng1706.View ArticlePubMedGoogle Scholar
 Sluis SVD, Dolan CV, Neale MC, Posthuma D: Power calculations using exact data simulation: a useful tool for genetic study designs. Behav Genet. 2008, 38 (2): 202211. 10.1007/s105190079184x.PubMed CentralView ArticlePubMedGoogle Scholar
 Westfall PH, Young SS: Resamplingbased Multiple, Testing: Examples and Methods for Pvalue Adjustment. 1993, New York: WileyGoogle Scholar
 Storey JD: A direct approach to false discovery rates. J R Stat Soc, Ser B. 2002, 64: 479498. 10.1111/14679868.00346.View ArticleGoogle Scholar
 Zheng G, Freidlin B, Gastwirth JL: Comparison of robust tests for genetic association using casecontrol studies. IMS Lecture NotesMonograph Series 2nd Lehmann Symposium  Optimality. 2006, 49: 253265.View ArticleGoogle Scholar
 Li Q, Zheng G, Li Z, Yu K: Efficient approximation of pvalues of the maximum of correlated tests, with applications to genomewide association studies. Ann Human Genet. 2008, 72: 397406. 10.1111/j.14691809.2008.00437.x.View ArticleGoogle Scholar
 Bair E, Tibshirani R: Semisupervised methods to predict patient survival from gene expression data. PLoS Biol. 2004, 2: 511522.View ArticleGoogle Scholar
 Gui J, Li H: Penalized Cox regression analysis in the highdimensional and lowsample size settings, with applications to microarray gene expression data. Bioinformatics. 2005, 21: 30013008. 10.1093/bioinformatics/bti422.View ArticlePubMedGoogle Scholar
 Kaderali L, Zander T, Faigle U, Wolf J, Schultze JL, Schrader R: CASPAR: a hierarchical Bayesian approach to predict survival times in cancer from gene expression data. Bioinformatics. 2006, 22: 14951502. 10.1093/bioinformatics/btl103.View ArticlePubMedGoogle Scholar
 Sohn I, Kim J, Jung SH, Park C: Gradient lasso for Cox proportional hazards model. Bioinformatics. 2009, 25: 17751781. 10.1093/bioinformatics/btp322.View ArticlePubMedGoogle Scholar
 Kooperberg C, LeBlanc M, Obenchain V: Risk Prediction Using GenomeWide Association Studies. Genet Epidemiol. 2010, 34: 643652. 10.1002/gepi.20509.PubMed CentralView ArticlePubMedGoogle Scholar
 Owzar K, Li Z, Cox N, Jung SH: Power and sample size calculations for SNP association Studies with censored timetoevent outcomes. Genet Epidemiol. 2012, 36: 538548. 10.1002/gepi.21645.PubMed CentralView ArticlePubMedGoogle Scholar
 Conneely KN, Boehnke M: So many correlated tests, so little time! Rapid adjustment of p values for multiple correlated tests. American J Hum Genet. 2007, 81: 11581168. 10.1086/522036.View ArticleGoogle Scholar
 Li Q, Yu K, Li Z, Zheng G: Maxrank: a simple and robust genomewide scan for casecontrol association studies. Hum Genet. 2008, 123 (6): 617623. 10.1007/s0043900805148.View ArticlePubMedGoogle Scholar
 Hoggart CJ, Whittaker JC, De Iorio M, Balding DJ: Simultaneous analysis of all SNPs in genomewide and resequencing association studies. PLoS Genet. 2008, 4: e100013010.1371/journal.pgen.1000130.PubMed CentralView ArticlePubMedGoogle Scholar
 Cox DR: Regression Models and Life Tables (with Discussion). J R Stat Soc, Ser B. 1972, 34: 187220.Google Scholar
 Fleming TR, Harrington DP: Counting Processes and, Survival Analysis. 1991, New York: WileyGoogle Scholar
 Freidlin B, Zheng G, Li Z, Gastwirth JL: Trend tests for casecontrol studies of genetic markers: power, sample size and robustness. Hum Hered. 2002, 53 (3): 146152. 10.1159/000064976.View ArticlePubMedGoogle Scholar
 Jung SH, Hui S: Sample size calculations to compare K different survival distributions. Lifetime Data Anal. 2002, 8: 361373. 10.1023/A:1020518905233.View ArticlePubMedGoogle Scholar
 Jung SH, Owzar K, George SL: A multiple testing procedure to associate gene expression levels with survival. Stat Med. 2005, 24: 30773088. 10.1002/sim.2179.View ArticlePubMedGoogle Scholar
 Tibshirani R: The lasso method for variable selection in the Cox model. Stat Med. 1997, 16: 385395. 10.1002/(SICI)10970258(19970228)16:4<385::AIDSIM380>3.0.CO;23.View ArticlePubMedGoogle Scholar
 Gui J, Li H: Penalized Cox regression analysis in the highdimensional and lowsample size settings, with applications to microarray gene expression data. Bioinformatics. 2005, 21: 30013008. 10.1093/bioinformatics/bti422.View ArticlePubMedGoogle Scholar
 Park MY, Hastie T: L1 regularization path algorithm for generalized linear models. J R Stat Soc B. 2007, 69: 659677. 10.1111/j.14679868.2007.00607.x.View ArticleGoogle Scholar
 Kim J, Kim Y, Kim Y: A gradientbased optimization algorithm for lasso. J Comput Graph Stat. 2008, 17: 9941009. 10.1198/106186008X386210.View ArticleGoogle Scholar
 Choi H, Jung C, Kim S, Kim HJ, Kim T, Zhang Z, Shin ES, Lee JE, Sohn SK, Moon JH, Kim SH, Kim KH, Mun YC, Kim H, Park J, Kim J, Kim D, K: Genomewide genotypebased risk model for survival in acute myeloid leukemia patients with normal karyotype. 2012, In submitionGoogle Scholar
 Tan XL, Moyer AM, Fridley BL, Schaid DJ, Niu N, Batzler AJ, Jenkins GD, Abo RP, Li L, Cunningham JM, Sun Z, Yang P, Wang L: Genetic variation predicting cisplatin cytotoxicity associated with overall survival in lung cancer patients receiving platinumbased chemotherapy. Clin Cancer Res. 2011, 17: 58015811. 10.1158/10780432.CCR111133.PubMed CentralView ArticlePubMedGoogle Scholar
 Berman DM, Gilman AG: Mammalian RGS proteins: barbarians at the gate. J Biol Chem. 1998, 273 (3): 12691272. 10.1074/jbc.273.3.1269.View ArticlePubMedGoogle Scholar
 Maity B, Yang J, Huang J, Askeland RW, Bera S, Fisher RA: Regulator of G protein signaling 6 (RGS6) induces apoptosis via a mitochondrialdependent pathway not involving its GTPaseactivating protein activity. J Biol Chem. 2011, 286 (2): 14091419. 10.1074/jbc.M110.186700.PubMed CentralView ArticlePubMedGoogle Scholar
 Berman DM, Wang Y, Liu Z, Dong Q, Burke LA, Liotta LA, Fisher R, Wu X: A functional polymorphism in RGS6 modulates the risk of bladder cancer. Cancer Res. 2004, 64 (18): 68206826. 10.1158/00085472.CAN041916.View ArticlePubMedGoogle Scholar
 Baines CP, Molkentin JD: Adenine nucleotide translocase1 induces cardiomyocyte death through upregulation of the proapoptotic protein Bax. J Mol Cell Cardiol. 2009, 46 (6): 969977. 10.1016/j.yjmcc.2009.01.016.PubMed CentralView ArticlePubMedGoogle Scholar
 Malorni W, Farrace MG, Matarrese P, Tinari A, Ciarlo L, MousaviShafaei P, D’Eletto M, Di Giacomo G, Melino G, Palmieri L, Rodolfo C, Piacentini M: The adenine nucleotide translocator 1 acts as a type 2 transglutaminase substrate: implications for mitochondrialdependent apoptosis. Cell Death Differ. 2009, 16 (11): 14801492. 10.1038/cdd.2009.100.View ArticlePubMedGoogle Scholar
 Chen L, Su L, Li J, Zheng Y, Yu B, Yu Y, Yan M, Gu Q, Zhu Z, Liu B: Hypermethylated FAM5C and MYLK in serum as diagnosis and prewarning markers for gastric cancer. Dis Markers. 2012, 32 (3): 195202.PubMed CentralView ArticlePubMedGoogle Scholar
 Kuroiwa T, Yamamoto N, Onda T, Shibahara T: Expression of the FAM5C in tongue squamous cell carcinoma. Oncol Rep. 2009, 22 (5): 10051011.PubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.