Description Details

Background Combining clinical and molecular data types may potentially improve prediction accuracy of a classifier. However, currently there is a shortage of effective and efficient statistical and bioinformatic tools for true integrative data analysis. Existing integrative classifiers have two main disadvantages: First, coarse combination may lead to subtle contributions of one data type to be overshadowed by more obvious contributions of the other. Second, the need to measure both data types for all patients may be both unpractical and (cost) inefficient. Results We introduce a novel classification method, a stepwise classifier, which takes advantage of the distinct classification power of clinical data and high-dimensional molecular data. We apply classification algorithms to two data types independently, starting with the traditional clinical risk factors. We only turn to relatively expensive molecular data when the uncertainty of prediction result from clinical data exceeds a predefined limit. Experimental results show that our approach is adaptive: the proportion of samples that needs to be re-classified using molecular data depends on how much we expect the predictive accuracy to increase when re-classifying those samples. Conclusions Our method renders a more cost-efficient classifier that is at least as good, and sometimes better, than one based on clinical or molecular data alone. Hence our approach is not just a classifier that minimizes a particular loss function. Instead, it aims to be cost-efficient by avoiding molecular tests for a potentially large subgroup of individuals; moreover, for these individuals a test result would be quickly available, which may lead to reduced waiting times (for diagnosis) and hence lower the patients distress. Stepwise classification is implemented in R-package stepwiseCM and available at the Bioconductor website.

exactLRT RLRsim-package (Restricted) likelihood ratio tests in linear mixed models

Details
The model under the alternative must be a linear mixed model y = Xβ + Zb + ε with a single random effect b with known correlation structure and error terms that are i.i.d. The hypothesis to be tested must be of the form We use the exact finite sample distribution of the likelihood ratio test statistic as derived by Crainiceanu & Ruppert (2004).

Value
A list of class htest containing the following components:

Description
This function provides an (exact) restricted likelihood ratio test based on simulated values from the finite sample distribution for testing whether the variance of a random effect is 0 in a linear mixed model with known correlation structure of the tested random effect and i.

Details
Testing in models with only a single variance component require only the first argument m. For testing in models with multiple variance components, the fitted model m must contain only the random effect set to zero under the null hypothesis, while mA and m0 are the models under the alternative and the null, respectively. For models with a single variance component, the simulated distribution is exact if the number of parameters (fixed and random) is smaller than the number of observations. Extensive simulation studies (see second reference below) confirm that the application of the test to models with multiple variance components is safe and the simulated distribution is correct as long as the number of parameters (fixed and random) is smaller than the number of observations and the nuisance variance components are not superfluous or very small. We use the finite sample distribution of the restricted likelihood ratio test statistic as derived by Crainiceanu & Ruppert (2004). The model under the alternative must be a linear mixed model y = Xβ+Zb+ε with a single random effect b with known correlation structure Sigma and i.i.d errors. The simulated distribution of the likelihood ratio statistic was derived by Crainiceanu & Ruppert (2004). The simulation algorithm uses a gridsearch over a log-regular grid of values of λ = V ar(b) V ar(ε) to maximize the likelihood under the alternative for nsim realizations of y drawn under the null hypothesis. log.grid.hi and log.grid.lo are the lower and upper limits of this grid on the log scale. gridlength is the number of points on the grid.\ These are just wrapper functions for the underlying C code. * Topic datagen LRTSim, 6 * Topic distribution LRTSim, 6 * Topic htest exactLRT, 2 exactRLRT, 3 * Topic package RLRsim-package, 1 * Topic utilities extract.lmeDesign, 5 exactLRT, 2, 2, 4, 5, 7 exactRLRT, 3, 3, 4, 7 extract.lmeDesign, 5 extract.lmerDesign (extract.lmeDesign), 5 LRTSim, 2, 3, 6 RLRsim (RLRsim-package), 1 RLRsim-package, 1 RLRTSim, 3, 5 RLRTSim (LRTSim), 6