Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models

Background Growing interest on biological pathways has called for new statistical methods for modeling and testing a genetic pathway effect on a health outcome. The fact that genes within a pathway tend to interact with each other and relate to the outcome in a complicated way makes nonparametric methods more desirable. The kernel machine method provides a convenient, powerful and unified method for multi-dimensional parametric and nonparametric modeling of the pathway effect. Results In this paper we propose a logistic kernel machine regression model for binary outcomes. This model relates the disease risk to covariates parametrically, and to genes within a genetic pathway parametrically or nonparametrically using kernel machines. The nonparametric genetic pathway effect allows for possible interactions among the genes within the same pathway and a complicated relationship of the genetic pathway and the outcome. We show that kernel machine estimation of the model components can be formulated using a logistic mixed model. Estimation hence can proceed within a mixed model framework using standard statistical software. A score test based on a Gaussian process approximation is developed to test for the genetic pathway effect. The methods are illustrated using a prostate cancer data set and evaluated using simulations. An extension to continuous and discrete outcomes using generalized kernel machine models and its connection with generalized linear mixed models is discussed. Conclusion Logistic kernel machine regression and its extension generalized kernel machine regression provide a novel and flexible statistical tool for modeling pathway effects on discrete and continuous outcomes. Their close connection to mixed models and attractive performance make them have promising wide applications in bioinformatics and other biomedical areas.


Background
The rapid progress in gene expression array technology in the past decade has greatly facilitated our understanding of the genetic aspect of various diseases. Knowledge-based approaches, such as gene set or pathway analysis, have become increasingly popular. In such gene sets/pathways, groups of genes act in concert to accomplish tasks related to a cellular process and the resulting genetic pathway effects may manifest themselves through phenotypic changes, such as occurrence of disease. Thus it is potentially more meaningful to study the overall effect of a group of genes rather than a single gene, as single-gene analysis may miss important effects on pathways and difficult to reproduce from studies to studies [1]. Researchers have made significant progress in identifying metabolic or signaling pathways based on expression array data [2,3]. Meanwhile, new tools for identification of pathways, such as GenMAPP [4], Pathway Processor [5], MAPPFinder [6], have made pathway data more widely available. However, It is a challenging task to model the pathway data and test for a potentially complex pathway effect on a disease outcome.
One way to model pathway data is through the linear model approach, where the pathway effect is represented by a linear combination of individual gene effects. This approach has several limitations. Activities of genes within a pathway are often complicated, thus a linear model is often insufficient to capture the relationship between these genes. Furthermore, genes within a pathway tend to interact with each other. Such interactions are not taken into account by the linear model approach.
In this paper we propose a nonparametric approach, the kernel machine regression, to model a pathway effect. The kernel machine method, with the support vector machine (SVM) as a most popular example, has emerged in the last decade as a powerful machine learning technique in highdimensional settings [7,8]. This method provides a flexible way to model linear and nonlinear effects of variables and gene-gene interactions, unifies the model building procedure in both one-and multi-dimensional settings, and shows attractive performance compared to other nonparametric methods such as splines.
Liu et al. [9] proposed a kernel machine-based regression model for continuous outcomes. In this paper, we propose a logistic kernel machine regression model for binary outcomes, where covariate effects are modeled parametrically and the genetic pathway effect is modeled parametrically or nonparametrically using the kernel machine method. A main contribution of this paper is to establish a connection between logistic kernel machine regression and the logistic mixed model. We show that the kernel machine estimator of the genetic pathway effect can be obtained from the estimator of the random effects in the corresponding logistic mixed model. This connection provides a convenient vehicle to connect the powerful kernel machine method with the popular mixed model method in the statistical literature. This mixed model connection also provides an unified framework for statistical infer-ence for model parameters, including the regression coefficients, the nonparametric genetic pathway function, and the regularization and kernel parameters. Based on the proposed logistic kernel machine regression model, we develop a new test for the nonlinear pathway effect on disease risk. An appealing feature of the proposed test is that it performs well without the need to correctly specify the functional form of the effects of each gene or their interactions. This feature has a significant practical implication when analyzing genetic pathway data, as the true relationship between the pathway and the disease outcome is often unknown. We extend the results to generalized kernel machine regression for a class of continuous and discrete outcomes and discuss its connection with generalized linear mixed models [10].
Recently, Wei and Li [11] proposed a nonparametric pathway-based regression (NPR) to model pathway data. NPR is a pathway-based gradient boosting procedure, where the base learner is usually a regression or classification tree. It provides a flexible approach in modeling pathways and interactions among genes within a pathway. Michalowski et al. [12] proposed a Bayesian Belief Network approach for pathway data. Neither method is likelihood-based. Thus parameter estimation and inference cannot be casted within a unified likelihood framework. It is hence difficult to estimate and quantify the overall pathway effect on disease risk and assess its statistical uncertainty. Secondly, a primary interest in this paper is to test for the statistical significance of the overall pathway effect on the risk of a disease. Both NPR and Bayesian belief network do not provide such a statistical test for the pathway effect. For example, NPR uses an importance score to rank the relative importance of each pathway. It lacks formal inferential procedure for assessing the statistical significance of a pathway. Further, when considering a single pathway, the importance score loses its meaning in assessing the importance of a pathway. Our method, on the other hand, is based on penalized likelihood, and estimation and inference can be conducted in a systematic manner within the likelihood framework. We also propose a formal statistical test for the significance of a pathway effect on the risk of a disease.
Goeman et al. [13] proposed a linear mixed model to relate the pathway effect with a continuous outcome. They modeled the pathway effect using a linear function with each gene entering into the model as a regressor. They assumed the regression coefficients of the gene as random from a common distribution with mean 0 and an unknown variance. The pathway effect can then be tested through a variance component test for random effects. Our approach is different from theirs in the following aspects. First, we model the pathway effect by allowing for a nonparametric model rather than a parametric one. As we commented earlier, the highly complicated nature of activities of genes within a pathway makes the linear model assumption untenable. Secondly, the kernel function used in kernel machine regression usually contains unknown tuning parameters. The parameter is present under the alternative hypothesis but disappears under null hypothesis. This makes tests as proposed in [13,14] not applicable. Our proposed test, on the other hand, works quite well under this scenario. Third, Goeman et al. [14] extended their linear model results to discrete outcomes using basis functions. A key advantage of the kernel machine approach over this basis approach for modeling multi-gene effects is that one does not need to specify bases explicitly, which is often difficult for high-dimensional data especially when interactions are modeled.

Analysis of prostate cancer data
In this section, we apply the proposed logistic kernel machine regression model (3) as described in the Methods section to the analysis of a prostate cancer data set. The data came from the Michigan prostate cancer study [15]. This study involved 81 patients with 22 diagnosed as non-cancerous and 59 diagnosed with local or advanced prostate cancer. Besides the clinical and demographic covariates such as age, cDNA microarray gene expressions were also available for each patient. The early results of Dhanasekaran et al. [15] indicate that certain functional genetic pathways seemed dys-regulated in prostate cancer relative to non-cancerous tissues. We are interested in studying how a genetic pathway is related to the prostate cancer risk, controlling for the covariates. We focus in this analysis on the cell growth pathway, which contains 5 genes. The pathway we describe was annotated by the investigator (A. Chinnaiyan) and is simply used to illustrate the methodology. Of course, one could take the pathways stored in commercial databases such as Ingenuity Pathway Analysis (IPA) and use the proposed methodology based on those gene sets.
The outcome was the binary prostate cancer status and the covariate includes age. Since the functional relationship between the cell growth pathway and the prostate cancer risk is unknown, the kernel machine method provides a convenient and flexible framework for the evaluation of the pathway effect on the prostate cancer risk. Specifically, we consider the following semiparametric logistic model logit(P(y = 1)) = β 0 + β 1 age + h(gene 1 , ..., gene 5 where h(·) is a nonparametric function of 5 genes within the cell growth pathway. The detail of the estimation procedure is provided in the Methods section. In summary, we fit this model using the kernel machine method via the logistic mixed model representation and using the Gaussian kernel function in estimating h (·). Under the mixed model representation, we estimated (β 0 , β 1 ) and h(·) using penalized quasi-likelihood (PQL), and estimated the smoothing parameter τ and the Gaussian kernel scale parameter ρ simultaneously by treating them as variance components. The results are presented in Table 1.
The test for the cell growth pathway effect on the prostate cancer status H 0 : h(z) = 0 vs H 1 :h(z) ≠ 0 was conducted using the proposed score test as described in the Methods section. For the purpose of comparison, we also conducted the global test proposed by Goeman et al. [13] that assumed a linear pathway effect. Note that our test allows a nonlinear pathway effect and gene-gene interactions. Table 1 gives the p-values for both tests. The p-value of our test suggests that cell growth pathway has a highly significant effect on the disease status, while the test from Goeman et al. [13] indicates only marginal significance of the growth pathway effect.

Simulation Study for the Parameter Estimates
We conducted a simulation study to evaluate the performance of the parameter estimates of the proposed logistic kernel machine regression by using the logistic mixed model formulation. We considered the following model where the true regression coefficient β = 1. We consider p = 5 and set h(z 1 , ..., z 5 ) = 2{sin(z 1 ) -+ z 1 exp(-z 3 )sin(z 2 ) cos(z 3 ) + + sin(z 4 ) cos(z 1 ) + + z 3 z 5 }. To allow x i and (z i1 , ʜ, z ip ) to be correlated, x i was generated as x i = sin(z i1 ) + 2u i , where u i and z ij (j = 1, ʜ, p) follow independent Uniform(-0.5, 0.5). The Gaussian kernel was used throughout the simulation. All simulations ran 300 times. Settings 1, 2, and 3 correspond to sample size n = 100, 200, and 300, respectively.
The simulation results are shown in Table 2. Due to the multi-dimensional nature of the variables z, it is difficult to visualize the fitted curve (z). We hence summarized the goodness-of-fit of (·) in the following way. For each simulated data set, we regressed the true h on the fitted value , both evaluated at the design points. We then empirically summarized the goodness-of-fit of (·) by calculating the average intercepts, slopes and R 2 's obtained from these regressions over the 300 simulations. If the kernel machine method fits the nonparametric func- tion well, then we would expect the intercept to be close to 0, the slope close to 1, and R 2 also close to 1.
Our results show that even when the sample size is as low as 100, estimation of the regression coefficient and nonparametric function only has small bias. When the kernel parameter ρ is estimated, these biases tend to be small compared with those when ρ is held fixed. With the increase of sample size the estimates of β and h become closer to the true values, especially when ρ is estimated, while there are still some bias when ρ is fixed at values farther away from the estimated one. Table 3 compares the estimated standard errors of with the empirical standard errors. Our results show that they agree to each other well when ρ is estimated.

Simulation Study of the Score Test for the Pathway Effect
We next conducted a simulation study to evaluate the performance of the proposed variance component score test for the pathway effect H 0 : h(·) = 0 vs H 1 : h(·) ≠ 0. In order to compare the performance of our test with the linearitybased global test proposed by Goeman et al. [13], both tests were applied to each simulated data set. Nonlinear and linear functions of h(z) were both considered. For the nonlinear pathway effect, the true model is logit For the linear pathway effect, the true model where h(z) = 2z 1 + 3z 2 +z 3 + 2z 4 +z 5 .
All z's were generated from the standard normal distribution, and a = 0, 0.2, 0.4, 0.6, 0.8. To allow x and (z i1 , ʜ, z ip ) to be correlated, x was generated as x = z 1 +e/2 with e being independent of z 1 and following N (0, 1). We studied the size of the test by generating data under a = 0, and studied the power by increasing a. The sample size was 100. For the size calculations, the number of simulations was 2000; whereas for the power calculations, the number of runs was 1000. Based on the discussions in Section "Test for the genetic Pathway Effect", the bound of ρ is set up by interval , and the interval is divided by 500 equally spaced grid points. All simulations were conducted using R 2.5.0, and the package "globaltest" v4.6.0 was used for the test proposed by Goeman et al. [13] as a comparison. Table 4 reports the empirical size (a = 0) and power (a > 0) of the variance component score test for the path-

Conclusions and Discussion
In this paper, we developed a logistic kernel machine regression model for binary outcomes, where the covariate effects are modeled parametrically and the genetic pathway effect is modeled nonparametrically using the kernel machine method. This method provides an attractive way to model the pathway effect, without the need to make strong parametric assumptions on individual gene effects or their interactions. Our model also allows for parametric pathway effects if a parametric kernel, such as the first-degree polynomial kernel, is used.
A key result of this paper is that we have established a close connection between the generalized kernel machine regression and generalized linear mixed models, and show that the kernel machine estimators of regression coefficients and the nonparametric multi-dimensional pathway effect can be easily obtained from the corresponding generalized linear mixed models using PQL.
The mixed model connection provides a unified framework for estimation and inference and can be easily implemented in existing software, such as SAS PROC GLIMMIX or R GLMMPQL. The mixed model connection also makes it possible to test for the overall pathway effect through the proposed variance component test. A key advantage of the proposed score test for the pathway effect is that it does not require an explicit functional specification of individual gene effects and gene-gene interactions. This feature is of practical significance as the pathway effect is often complex. Our simulation study shows the proposed test performs well for moderate sample size. It has similar power to the linearity-based pathway test of Goeman et al. [13] when the true effect is linear, but much higher power when the true effect is nonlinear.
We have considered in this paper a single pathway. One could generalize the proposed semiparametric model to incorporate multiple pathways by fitting an additive model: where z j (j = 1, ʜ, m) denotes a p j × 1 vector of genes in the jth pathway and hj(·) denotes the nonparametric function associated with the jth genetic pathway.
Machine learning is a powerful tool in advancing bioinformatics research. Our effort helps to build a bridge between kernel machine methods and traditional statistical models. This connection will undoubtedly provide a new and convenient tool for the bioinformatics community and opens a door for future research.

The Logistic Kernel Machine Model
Throughout the paper we assume that gene expression data have been properly normalized. Suppose the data consist of n samples. For subject i (i = 1, ʜ, n), y i is a binary disease outcome taking values either 0 (non-disease) or 1 (disease), x i is a q × 1 vector of covariates, z i is a p × 1 vector of gene expression measurements in a pathway/gene set. We assume that an intercept is included in x i . The binary outcome y i depends on x i and z i through the following semiparametric logistic regression model: where μ i = P (y i = 1| x i , z i ), β is a q × 1 vector of regression coefficients, and h(z i ) is an unknown centered smooth function.
In model (3), covariate effects are modeled parametrically, while the multi-dimensional genetic pathway effect is modeled parametrically or nonparametrically. A non- β parametric specification for h(·) reflects our limited knowledge of genetic functional forms. Note that h(·) = 0 means that genes in the pathway have no association with the disease risk. If h(z) = γ 1 z 1 + ʜ + γ p z p , model (3) becomes the linear model considered by Goeman et al. [13].
In nonparametric modeling, such as smoothing splines, the unknown function is usually assumed to lie in a certain function space. For the kernel machine method, this function space, denoted by , is generated by a given positive definite kernel function K(·, ·). The mathematical properties of imply that any unknown function h(z) in can be written as a linear combination of the given kernel function K(·, ·) evaluated at each sample point. Two popular kernel functions are the dth polynomial kernel and the Gaussian Kernel K(z 1 , z 2 ) = exp{-|| z 1 -z 2 || 2 /ρ 2 }, where and ρ is an unknown parameter. The first and second degree polynomial kernels (d = 1, 2) correspond to assuming h(·) to be linear and quadratic in z's, respectively. The choice of a kernel function determines which function space one would like to use to approximate h(z). The unknown parameter of a kernel function plays a critical role in function approximation. It is a challenging problem to optimally estimate it from data. In the machine learning literature, this parameter is usually pre-fixed at some values based on some ad-hoc methods. In this paper, we show that we can optimally estimate it from data based on a mixed model framework.

The Estimation Procedure
Assuming h(·) ∈ , the function space generated by a kernel function K(·, ·), we can estimate β and h(·) by maximizing the penalized log-likelihood function where λ is a regularization parameter that controls the tradeoff between goodness of fit and complexity of the model. When λ = 0, it fits a saturated model, and when λ = ∞, the model reduces to a simple logistic model logit Note that there are two tuning parameters in the above likelihood function, the regularization parameter λ and kernel parameter ρ Intuitively, λ controls the magnitude of the unknown function while ρ mainly governs the smoothness property of the function.

Substituting (5) into (4) we have
where K = K(ρ) is an n × n matrix whose (i, i')th element is K(z i , z i ') and often depends on an unknown parameter ρ.

The Connection of Logistic Kernel Machine Regression to Logistic Mixed Models
Generalized linear mixed models (GLMMs) have been used to analyze correlated categorical data and have gained much popularity in the statistical literature [10]. Logistic mixed models are a special case of GLMMs. We show in this section that the kernel machine estimator in the semiparametric logistic regression model (3) corresponds to the Penalized Quasi-Likelihood (PQL) [10] estimator from a logistic mixed model, and the regularization parameter τ = 1/λ and kernel parameter ρ can be treated as variance components and estimated simultaneously from the corresponding logistic mixed model. Specifically, consider the following logistic mixed model: where β is a q × 1 vector of fixed effects, and h = (h 1 , ..., h n ) is a n × 1 vector of subject-specific random effects following h ~N{0, τ K(ρ)}, and the covariance matrix K(ρ) is the n × n kernel matrix as defined in previous section.
As K is not diagonal or block-diagonal, the random effects h i 's across all subjects are correlated. The i th mean response μ i depends on other random effects h i ' (i' ≠ i) through the correlations of h i with other random effects. To estimate the unknown parameters in the logistic mixed model (8), we estimate β and h by maximizing the PQL [10], which can be viewed as a joint log likelihood of (β , h), Setting τ = 1/λ and h = Kα , one can easily see that equations (6) and (9) are identical. It follows that the logistic kernel machine estimators and can be obtained by fitting the logistic mixed model (8) using PQL. In fact, examination of the kernel machine normal equations (7) shows that they are identical to the normal equations obtained from the PQL (9) [10], where in (7) is in fact the PQL working vector and Dis the PQL working weight matrix.
Note that the estimators of β and h depend on the unknown regularization parameter τ and the kernel parameter ρ. Within the PQL framework, we can estimate these parameters δ = (τ, ρ) by maximizing the approximate REML likelihood where V = D -1 + τ K, and is the working vector as defined above. The estimator of δ can be obtained by setting equal to zero the first derivative of (10) with respect to δ. The estimating procedure for β, h, and δ = (τ, ρ) can be summarized as follows: we fit the logistic kernel machine model by iteratively fitting the following working linear mixed model to estimate (β, h) using BLUPs and to estimate δ using REML, until convergence where is the working vector defined below equation (7), h is a random effect vector following N{0, τ K(ρ)}, and ε ~ N(0, D). The covariance of is estimated by (X T V -1 X) -1 , and the covariance of is estimated by The covariance of can be obtained as the inverse ŷ δ δ of the expected information matrix calculated using the second derivative of (10) with respect to δ. The square roots of the diagonal elements of the estimated covariance matrices give the standard errors of , , and δ. The above discussion shows that we can easily fit the logistic kernel machine regression using the existing PQL-based mixed model software, such as SAS GLIMMIX and R GLMMPQL.

Test for the Genetic Pathway Effect
It is of significant practical interest to test the overall genetic pathway effect Note that the null hypothesis places τ on the boundary of the parameter space. Since the kernel matrix K is not block diagonal, unlike the standard case considered by Self and Liang [17], the likelihood ratio for H 0 : τ = 0 does not follow a mixture of and distribution. We consider instead a score test in this paper.
When conducting statistical tests for pathways, two types of tests could be formulated. The first is called the competitive test and the second the self-contained test [18]. The competitive test compares an interested gene set to all the other genes on a gene chip. An example of the competitive test is the gene set enrichment analysis (GSEA) [1], where an enrichment score of a gene set is defined and a permutation test is used to test for the significance of the gene set based on the enrichment score. The self-contained test compares the gene set to an internal standard which does not involve any genes outside the gene set considered. In other words, the self-contained test examines the null hypothesis that a pathway has no effect on the outcome versus the alternative hypothesis that the pathway has an effect. The variance component test of [13] for the linear pathway effect is a self-contained test. Goeman and Bühlmann [18] pointed out that the self-contained test has a higher power than a competitive test and that its statistical formulation is also consistent for both single gene tests and gene set tests, and the statistical sampling properties of the competitive test can be difficult to interpret.
Our pathway effect hypothesis H 0 : h(z) = 0 vs H 1 : h(z) ≠ 0 is a self-contained hypothesis. We propose in this paper a self-contained test for the pathway effect by developing a kernel machine variance component score test for H 0 : τ = 0 vs H 0 :τ > 0. The proposed test allows for both linear and nonlinear pathway effects and includes the tests by Goeman et al. [13,14] as a special case. A key advantage of our kernel-based test is that we do not need to explicitly specify the basis functions for h(·), which is often difficult for modeling the joint effects of multiple genes, and we all let the data to estimate the best curvature of h(·).
Zhang and Lin [19] proposed a score test for H 0 : τ = 0 to compare a polynomial model with a smoothing spline. Goeman et al. [14] also proposed a global test against a high dimensional alternative under the empirical Bayesian framework. The variance-covariance matrix used in these tests do not involve any unknown parameters. However, the kernel function K(·, ·) in a kernel machine model usually depends on some unknown parameter ρ.
One can easily see from the mixed model representation (8) that under H 0 : τ = 0, the kernel matrix K disappears. This makes the parameter ρ inestimable under the null hypothesis and therefore renders the above tests inapplicable.
Davies [20,21] studied the problem of a parameter disappearing under H 0 and proposed a score test by treating the score statistic as a Gaussian process indexed by the nuisance parameter and then obtaining an upper bound to approximate the p-value of the score test. We adopt this line of approaches for our proposed score test.
Using the derivative of (10) with respect to τ, we propose the following score test statistic for If the Gaussian kernel is used, then an arbitrary nonlinear pathway effect is implicitly assumed. Our proposed test, which is derived to test for any nonlinear effect, is therefore more powerful than tests based on a parametric assumption. We show in Appendix A.1 that when ρ becomes large in the Gaussian kernel, our test statistic reduces asymptotically to the one based on linearity assumption of genetic effects. Hence our test includes lin- ear model based test as a special case. From (11) it is also clear that our test is invariant to the relative scaling of the kernel function K(·, ·).
Under appropriate regularity conditions similar to those specified in [22], S(ρ) under the null hypothesis can be considered as an approximate Gaussian process indexed by ρ. Using this formulation, we can then apply Davies' results [20,21]

Extension to generalized kernel machine model
For simplicity, we focus in this paper on logistic regression for binary outcomes. The proposed semiparametric model (3) can be easily extended to other types of continuous and discrete outcomes, such as normal, count, skewed data, whose distributions are in the exponential family [23]. In this section, we briefly discuss how to generalize our estimation and testing procedures for binary outcomes to other data types within the generalized kernel machine framework and discuss its fitting using generalized linear mixed models.
Suppose the data consist of n independent subjects. For subject i (i = 1, ..., n), y i is a response variable, x i is a q × 1 vector of covariates, z i is a p × 1 vector of gene expressions within a pathway. Suppose y i follows a distribution in the exponential family with density [23] where θ i is the canonical parameter, a(·) and c(·) are known functions, φ is a dispersion parameter, and m i is a known weight. The mean of y i satisfies μ i = E(y i ) = a'(θ i ) and Var(y i ) = φ m i a"(θ i ). The generalized kernel machine model is an extension of the generalized linear model [23] by allowing the pathway effect to be modeled nonparametrically using kernel machine as where g(·) is a known monotone link function, and h(·) is an unknown centered smooth function lying in the function space generated by a positive definite kernel function K(·, ·). For binary data, setting g(μ) = logit(μ) = gives the logistic kernel machine model (3); for count data, g(μ) = log(μ) gives the Poisson kernel machine model; for Gaussian data, g(μ) = μ gives linear kernel machine model [9]. The regression coefficients β and the nonparametric function h(·) in (14) can be obtained by maximizing the penalized log-likelihood function where ᐍ(·) = ln(p) is the log-likelihood, p is the density function given in (13), and λ is a tuning parameter. Using the kernel expression of h(·) in (5), the generalized kernel machine model (14) can be written as and the penalized likelihood can be written where K is an n × n matrix whose (i, j)th element is K(z i , z j ).
where τ = 1/λ, and h = (h 1 ..., h n ) is an n × n random vector with distribution N {0, τ K(ρ)}. The same PQL statistical software, such as SAS PROC GLIMMIX and R GLMMPQL, can be used to fit this model and obtain the kernel machine estimators of β and h(·).
The score test (11) also has a straightforward extension. The only change is that the elements in matrix D in (11) be replaced by appropriate variance function var(y i ) under the assumed parametric distribution of y i . . In practice we suggest take C 2 = 0.1, which yields a good approximation.