Bluues: a program for the analysis of the electrostatic properties of proteins based on generalized Born radii

Background The Poisson-Boltzmann (PB) equation and its linear approximation have been widely used to describe biomolecular electrostatics. Generalized Born (GB) models offer a convenient computational approximation for the more fundamental approach based on the Poisson-Boltzmann equation, and allows estimation of pairwise contributions to electrostatic effects in the molecular context. Results We have implemented in a single program most common analyses of the electrostatic properties of proteins. The program first computes generalized Born radii, via a surface integral and then it uses generalized Born radii (using a finite radius test particle) to perform electrostic analyses. In particular the ouput of the program entails, depending on user's requirement: 1) the generalized Born radius of each atom; 2) the electrostatic solvation free energy; 3) the electrostatic forces on each atom (currently in a dvelopmental stage); 4) the pH-dependent properties (total charge and pH-dependent free energy of folding in the pH range -2 to 18; 5) the pKa of all ionizable groups; 6) the electrostatic potential at the surface of the molecule; 7) the electrostatic potential in a volume surrounding the molecule; Conclusions Although at the expense of limited flexibility the program provides most common analyses with requirement of a single input file in PQR format. The results obtained are comparable to those obtained using state-of-the-art Poisson-Boltzmann solvers. A Linux executable with example input and output files is provided as supplementary material.


Background
Generalized Born models Electrostatic effects arising due to the interaction of solute charges among themselevs and with solvent and ion charges, are of utmost importance for biomolecular structure and function. Continuum methods based on the Poisson-Boltzmann equation have been widely used for calculating electrostatic effects [1][2][3][4][5]. In the last decades much interest has been devoted to developing fast approximations to the solution of the Poisson-Boltzmann (PB) equation.
Onufriev and coworkers have developed an analytical approximation to the exact potential inside and outside the low dielectric region of a sphere [6], that performs surprisingly well also for the complex shape of proteins [7][8][9][10] and is therefore more general than the generalized Born (GB) models.
When only self-and interaction energies and forces are sought, the most widely used approach is based on generalized Born (GB) models. Recent reviews summarize the approach and highlight most interesting recent results [3,[11][12][13][14].
Central to these models is the estimation of polarization charge contributions to: i) the self-energy of each charge (embedded in the solute); ii) the interaction energy of each pair of charges.
By equating the estimated reaction fieldÛ react i at the i th charge q i to the reaction field at a spherical ion in solution with the same charge, each charge is assigned an effective radius which is called the Born radius (a i ).
Once Born radii have been estimated the interaction energy between any two charges is computed as the sum of a direct Coulomb term (computed using the solute dielectric constant) and a solvation term which is, according to Still et al. [15], G solv = i,j G solv,ij with: This expression was found to be more accurate than the exact expression for two charges embedded in a conducting sphere, although the coefficient of 8.0 instead of 4.0 in the exponential has been suggested [16] and other similar forms have been proposed [17,18].

Computation of Born radii
Born radii could be in principle computed by solving a system of linear equations for the polarization charges at the boundaries of the solute volume [19][20][21][22][23][24][25]. Under the approximation that the ionic solution provides complete screening, amounting to the assumption that the surface behaves like a grounded conductor, polarization charges at the surface are such that the integral of their electric field at any outer surface point is exactly the opposite of the source charge field. This approximation amounts to setting ε out = ∞. In a different context the same approximation has been proposed many years ago as the conducting surface model (COSMO) [26,27] and successfully used since then. Under this condition it is possible to obtain simple formulae for the generalized Born radius for a low dielectric region delimited by a sphere or a plane. The reaction field satisfies Laplace equation inside the surface and the solution may be obtained solving the equation using suitable basis functions with Dirichlet boundary conditions at discretized points on the surface [28].
These methods are however slower than approximate methods. The computation of Born radii is performed by volume or surface integrals. An approximation that has been widely used is the Coulomb field approximation, where the formula for the electrostatic displacement for a uniform medium is applied in the inner and outer regions using the inner and outer dielectric constant, respectively. Born radius in this approximation does not depend on the inner and outer dielectric constant, but rather becomes a purely geometric quantity. The volume integral formulation of the Coulomb field approximation is turned into a surface integral formulation using the divergence theorem [29]. The Coulomb field approximation corresponds to neglecting the contribution to polarization at the surface due to polarization charges and to considering therefore the electric field inside the surface as due only to the source charge: Under the assumption of grounded conducting surface the outer field is zero and the polarization charge at the surface point r S is given by Gauss' theorem: where n( r S ) is the unit vector normal to the surface at point r S and pointing outwards.
The reaction field at the source charge will be equal to the integral of the fields due to surface charges at the source charge, i. e.: The reaction field may be used to compute the Born radius according to equation (1) with ε out = ∞. The Coulomb field approximation for the grounded conducting surface has the advantage that the integral of the surface polarization charges will be always equal to the opposite of the charge inside the surface as for the exact solution. On the other hand it gives a very bad approximation both of polarization charges, which depend only on the distance from the source charge and on the normal to the surface, and reaction field energies even for simple systems. Although the Coulomb field approximation is reasonable, Grycuk has pointed out that it is fundamentally inaccurate for charges embedded in a sphere [30] by comparing both the self energy computed under the Coulomb field approximation and the pair interaction energy computed using the formula by Still and coworkers [15], with the exact solution obtained by using the Kirkwood model [6]. Based on the analysis for charges embedded in a sphere an exact formula is provided which may be recast as a function of a surface integral: from which the integral expression for the generalized Born radius follows: The above expression which will be referred hereafter as GBR6 following Grycuk [30] and Tjong and Zhou [31,32], has been analysed in detail by Mongan et al. [33]. Equation 7 was found to perform extremely well also for "very" non spherical shapes and in the context of biomolecular models. Occasional large differences with respect to Poisson-Boltzmann calculations were found for inner cavities and local concavities at the surface, i.e. in conditions where a continuum model is anyway questionable. That study concluded that with a correction for a small systematic error, the GBR6 model is a sufficiently accurate continuum electrostatic model [33].
Tjong and Zhou [31,32] used an analytical implementation for the estimation of Born radii based on volume integrals (corresponding to the above formula (7) that uses a surface integral instead) and showed its superior accuracy compared to other existing methods for a set of 55 very different proteins. In their approach it is made clear that standard estimations of Born radii compute in fact only geometric properties, and they provide empirical formulae for the correct Born energy depending on the inner and outer dielectric constants, ionic strength, total charge and number of atoms.
Although successful and theoretically correct for a charge inside a grounded conducting sphere, there is no reason why equation (7) should provide good estimation of Born radii for complex shapes like those of proteins and biomolecules in general. In general Born radii could be estimated using integrals of the form (see the subsection Exact generalized Born radii for a conducting sphere): In the absence of a theory which could be cast in a fast computational framework, fitting approaches have been successfully followed and tested on large sets of proteins showing excellent agreement with Poisson-Boltzmann calculation results. Romanov et al. [34] proposed that a linear combination of integrals I 3 to I 6 , in the present notation, and a constant term could fit the self-polarization energy and thus be used to compute generalized Born radii (see also the discussion by Mongan et al. [33]).

Applications of the generalized Born model
Based on the correct estimation of Born radii, the solvation contribution to the interaction between any two charges may be computed by using equation (2). Derivation of equation (2) with respect to atomic positions gives the electrostatic solvation forces acting on atoms. The implicit dependence of Born radii on atomic positions makes computation of forces far from trivial [35][36][37][38][39][40] for the generalized Born model as well as for the parent Poisson-Boltzmann model where the boundaries depend on atomic positions [41]. The possibility of computing energies and forces faster with respect to the reference Poisson-Boltzmann equation has made the Generalized Born model the choice of election for implicit solvent molecular dynamics simulations. Also, the computation of pairwise solvation energies allows for fast computation of pKa of multiple titrating groups as we discuss in the Methods section.
Applications of generalized Born model (and other implicit solvent methods) have been reviewed elsewhere [3,[11][12][13][14]. At variance with the reference Poisson-Boltzmann model, the computation of electrostatic potential in space is outside the scope of the Generalized Born model where only interactions are considered thorugh equation 2. Here we use a finite radius test charge in order to define a potential within the frame of the generalized Born model.

Aim of this work
In the present work we provide a software that, based on generalized Born radii, provides most common electrostatic analyses of proteins. The program first computes generalized Born radii, via surface integrals, based on different definitions and then it uses generalized Born radii (using a finite radius test particle, where needed) to perform electrostic analyses. In particular the ouput of the program entails, depending on user's requirement: 1) the generalized Born radius of each atom; 2) the solvation electrostatic free energy; 3) the electrostatic forces on each atom (the theory of electrostatic forces based on surface integrals is developed here and implementation is still at a developmental stage); 4) the pH-dependent properties (total charge and pHdependent free energy of folding in the pH range -2 to 18); 5) the pKa of all ionizable groups; 6) the electrostatic potential at the surface of the molecule; 7) the electrostatic potential in a volume surrounding the molecule.

Generalized Born radii
The Born radii were computed from numerical surface integrals and the resulting self energies were compared with the reference ones obtained from the Poisson-Boltzmann equation for 1000 atoms randomly chosen in the test set of 55 proteins. Born radii and Poisson-Boltzmann self-energies have been computed using the solvent accessible surface as dielectric boundary or the molecular surface computed using the program MSMS [42]. The surface points density used for MSMS computations was 10 pts Å -2 . Table 1 reports the results concerning the self-energies obtained using different ways to estimate the generalized Born radii. It is seen that the GBR6 model performs very well and that the gain in accuracy when linear combinations of a constant term and self-energies corresponding to radii a n are used, with n ranging from 3 to 6 (LC5) and 3 to 10 (LC9), is limited.
For best performance it is necessary to use a number of probe points per atom larger than 100. For less dense surface points occasional negative sign integrals are observed which need an adhoc treatment, e.g. resetting the generalized Born radius to a predetermined value. Although negative radii are not found for higher densities of surface points, there are still rarely occurring large deviations between GB and PB self-energies, which could be due to differences in the computation of the boundary surface and its further treatment in the two approaches. As an illustration of the accuracy obtained, Figure 1 reports the plot of generalized Born self-energies versus those computed using APBS.

Electrostatic solvation free energy
The electrostatic solvation energy computed using the solvent accessible surface integral (model GBR6) at each atom, computed as half of the product charge times reaction field, was compared with the corresponding solvation energy computed using the Poisson-Boltzmann equation. Compared to the solvation self-energy, the solvation energy depends also on other charges in the molecule and provides a test for the solvation effects computed according to equation 2. The average root mean square deviation from reference Poisson-Boltzmann calculations is just 2.3 kJ/mol and the correlation coefficient is 0.9996. The accuracy of the GB model versus the Poisson-Boltzmann model may be judged from Figure 2 which reports the solvation energy (computed as half the product of charge times reaction field) at each atom for the 93240 atoms of the 55 protein set.

Electrostatic solvation forces
Since solvation energies depend, according to equation 2, on atomic positions explicitly and implicitly through Born radii, the computation of solvation electrostatic forces is quite complex, and strongly dependent on the interface model chosen [38][39][40] (see also for a general discussion [41]).
In order to estimate how important are effects due to the dependence of Born radii on atomic positions, solvation electrostatic forces have been computed for each atom of the 55 proteins as the derivative of the solvation energy under the approximation that Born radii are constant. Albeit approximate this way of computing   electrostatic forces preserve by definition zero total electrostatic force as expected for isotropic media and as found by the correct expression for the force [41]. Due to the different way of computing forces we did not attempt comparing ionic boundary, dielectric boundary and charge times electrostatic field components of the electrostatic force, but we rather compared the total solvation force. The results are not very accurate with the average root mean square deviation, with respect to the forces computed using APBS using the same parameters, equal to 2.7 kJ/(Åmol) compared to an average square root value of the force of 5.0 kJ/(Åmol). The correlation coefficient is 0.84. The full computation of electrostatic solvation forces using surface integrals is described in the Methods section. As a demonstration of the accuracy of the calculation 20 random point charges ranging between -1.0 and 1.0 q have been embedded in a sphere of radius 5.0 Å at a distance of at least 0.5 Å from each other. Besides small differences due to treatment of the boundary, the agreement between force components computed using surface integrals and those computed solving the Poisson-Boltzmann equation is very good, with a correlation coefficient of 0.9998 and a fitting slope of 0.983 ( Figure  3). Work is under way to implement force calculation in an efficient way for large systems like proteins.
pH-dependent properties (total charge, pH-dependent free energy of folding, and pKa of ionizable groups The test set provided by Gunner and coworkers has been used to test the prediction of pKa's in proteins [43]. We compared the results obtained with the results obtained by running the program Propka2.0 [44][45][46][47]. The latter program is very fast and provides predictions whose accuracy is comparable to that of more computationally intensive programs. We chose this program as a reference because it is widely used and it is seemingly the fastest available with a very good accuracy. A wide range of programs and methods have been compared recently [48] and the reader is referred to that work for more extensive comparisons of existing softwares. The results are summarized in Table 2. It should be noted that the method used here was not extensively optimized to maximize correlation with experimental results or to minimize the root mean square deviation (RMSD) with respect to the experimental data.
On the other hand some scaling of contributions is done and therefore large deviations from experimental results are not found. In this respect, Propka that uses ten adjustable parameters and other parameters chosen for best performance [46] apparently does not prevent very large shifts to be predicted. As a consequence the global RMSD from experimental data is large and could thus easily be reduced. The execution time of Propka is in the range of seconds, while our program runs in minutes. No optimization or approximation has been implemented as yet.
The results are somewhat better than those obtained by Propka (v. 2.0), although the figures reported here for the performance of Propka (v. 2.0) are worse than those reported in the original papers, because of a different test set. We remark that the comparison reported    Table 2 is dominated by outliers, that could be easily filtered or treated in an ad hoc manner, for Propka. Similar figures are found in independent tests [48][49][50]. An additional problem is that the data set includes multimeric (up to decameric) proteins and therefore the latter contribute more than others to the figures reported in Table 2. It should be noted that this comparison is far from exhaustive as many other features could be chosen to judge a method's value. Extensive comparisons have been performed by Stanton and Houk [48]. The purpose of the comparison performed here is to demonstrate that our program outputs results comparable (or better on the dataset used here) than the results obtained by one of the best state-of-the-art method. In order to better judge the performance of the method, computed versus experimental shifts in pKa values, with respect to the reference model ones, are reported in Figure 4.

Surface electrostatic potential
The evaluation of the potential in regions outside the molecule is beyond the scope of the generalized Born approach which focuses instead on self and interaction energies. This is at variance with approaches that approximate the potential computed using the Poisson-Boltzmann equation. In particular Onufriev and coworkers [7,8] have found an approximation (not simply amounting to a truncation) based on Kirkwood's series expansion of the solution for a sphere [6]. Their analytical model performs surprisingly well for a large set of proteins and enables fast calculation of the potential at the surface and in the volume surrounding the molecule.
The surface potential has been computed here, within the framework of generalized Born approach, as the energy of interaction of all atoms of the molecule with a unit test charge with a generalized Born radius equal to half the solvent probe radius, i.e. 0.7 Å. This value was found to reproduce well the potentials computed using APBS. Each surface point is assigned to the atom contacted by the solvent and for each solvent exposed atom the average surface potential is listed in the beta factor field in the output pdb file. The result is thus a readable list of atoms with the corresponding average surface potential if the atom is solvent exposed. This information may be used to display the potential using softwares like VMD [51]. As an example the potential computed using the Poisson-Boltzmann equation and using the method described above is shown in Figure 5 for the paired domain of the protein PAX6 and its cognate DNA. Exactly the same parameters are used in the two calculation with the only difference that the surface in the Poisson-Boltzmann calculation is generated using van der Waals radii inflated by 0.7 Å. The boundary surfaces are however different in the two calculations and this value was found to compensate for the differences.
For the same reason a point by point comparison of the potential at the surface is not completely appropriate, because boundaries are slightly different in the two calculations. Instead of taking the average surface atomic potential, we compared the potential computed using the generalized Born model and a 0.7 Å radius test charge at each surface point with the Poisson-Boltzmann potential computed by UHBD at the same point. Such comparison is reported in Figure 6 and, notwithstanding the differences in methodologies, the correlation coefficient is 0.90 and the average error is just +0.08 kcal/(mol q). The root mean square deviation is 0.43 kcal/(mol q) with the largest contributions to this figure due to outliers by several   standard deviations. Outliers are found in inner cavities, but also at concavities at the exposed surface. In some cases the errors are due to the differences in boundary definition. The error distribution is however similar to that reported by Onufriev and coworkers [8].
Electrostatic potential in a volume surrounding the molecule Similar to the surface potential the electrostatic potential for a unit test charge with generalized Born radius equal to half of the solvent probe radius may be computed at nodes of a grid enclosing the molecule and output in the DX format (the DX file format is described at [52]) readable by programs like VMD [51].
The grid is chosen based on as the minimum box entailing the molecule plus three Debye length on each side. If the Debye length is more than 10 Å, 30 Å are added at the minimum box on each side. The space is then divided in 97 × 97 × 97 grid points and the potential is computed at each point. The method is thus slower compared to solving the finite difference equation for the potential. At variance with standard options with other software, here the potential inside the molecule is set to 0, which removes isopotential curves inside the molecules (uninformative because arising from strong varying local fields) and helps with the visualization of the structure of the molecule together with outer isopotential curves. Example input files are given in the supplementary material. The comparison with the analogous analysis using the program APBS shows less smooth isopotential surfaces (Figure 7), a fact which might be due to the finite radius of the unit charge test particle. Note that the output potential is in kJ/(q mol) compared to widely used kT/(q mol) or kcal/(q mol).

Conclusions
A program for the analysis of the electrostatic properties of proteins based on the surface integral computation of generalized Born radii has been presented. Further work will be devoted to improve the efficiency of calculations in particular for what concerns electrostatic solvation forces. The program, together with examples is given as supplementary material (Additional file 1).

Reference APBS and UHBD calculations and parameters
Reference electrostatic calculations were performed with the programs APBS [53] and UHBD [54,55]. Except where noted, standard parameters were used: inner dielectric constant 1.0, outer dielectric constant 78.54, temperature 298.15 K, ionic strength 0.150 M. In APBS the mesh of the grid was 0.25 Å which resulted in very large grid size (257 3 points). In UHBD, used for surface potential calculations, a focusing procedure was used with the final mesh size of 0.8 Å.
The boundary surface was defined in the two programs in order to match as close as possible the definition used in our program, i.e. the molecular surface for atoms with radii inflated by the radius of the solvent. The surface point density was 10 Å -2 .
Except where noted the same temperature, dielectric constant and ionic strength were used in our program.

Solvent accessible surface
Solvent accessible surface points and surface normal vectors have been generated by considering the van der Waals sphere of each atom inflated by the radius of the solvent (e.g., for an atom of radius 1.9Å and a solvent radius of 1.4Å the inflated radius was 3.3Å).
Points on the inflated van der Waals sphere and surface normal vectors are precomputed for a set of inflated radii (r) and centered at atom positions. The coordinates of the i-th point (i ranging from 1 to n pt ) on the sphere are generated according to the following rule based on the golden section (due to Anton Sherwood, see http://www.cgafaq.info/wiki/Evenly_distribu-ted_points_on_sphere and links thereof): n pt was 200 corresponding for an atom with radius 1.9Å to a density of 4.4 pts Å -2 .
Atoms are mapped on a grid and the position of each surface point is compared only with the position of the list of atoms associated with neighboring grid points. This procedure was found to be efficient and robust without any failure.
Each surface point within the inflated van der Waals volume of a different atom is considered buried. All surface integrals have been performed as the corresponding finite sums over exposed (non buried) surface points.

Generalized Born radii
Reference Born radii have been obtained from Poisson-Boltzmann self-polarization energies computed using APBS [53] according to eq. 1.
Different Born radii have been obtained from surface integrals as:  where I n is defined by equation 8, or by fitting the reference solvation self energy by linear combinations of the solvation energies corresponding to different a n 's (n ranging here from 3 to 6) as done by Romanov et al. [34]. Fitting was necessary because the published coefficients did not provide good results, as a result of the different procedures used to compute surfaces and solvation free energies.

An equation for the generalized Born radius
The reaction field obeys Laplace equation in the solute region. When the reaction field is expressed via a generalized Born equation like (2), we may conveniently study what are the requirements imposed by Laplace equation on Born radii. In particular we consider the reaction field due to a source charge at r i and consider the reaction field at a point r j , U ij , in the limit r j → r i . Under this limit the reaction field may be approximated as: where the constant K depends on the coefficient used in the exponential and is 0.75 for the commonly adopted Still formula.
Imposing that ∇ 2 U ij = 0, where all derivatives are with respect to coordinates of r j , and letting r j tend to r i , after some straightforward manipulation, results in the following equation for the Born radius a i : For the conducting grounded surface approximation the boundary condition is that the Born radius approaches 0 as the boundary surface is approached. It is easy to check that this equation is satisfied for the sphere and the plane where K is equal 1.
For two parallel conducting infinite planes (approximated here by two very large circular plates) the solution may be compared with that obtained by expanding the exact solution in Bessel functions. We set the interplates distance along the z-axis equal to 1.0 and the radius R of the plates equal 1000.0. Consider x as the distance of the source charge from one of the plates on the axis of symmetry. The reaction field between the plates may be expressed as a sum of functions which satisfy the Laplace's equation in cylindrical coordinates: where J 0 (x) is the Bessel function of order zero. We may consider only order 0 functions due to cylindrical symmetry. The coefficients are obtained by imposing  that at the boundary the potential (U react + U source ) is equal to zero and using the orthogonality relation: . The parameters k are chosen such that J 0 (kR) = 0. The solution of equation (10) is obtained by bisection, guessing first a value at the midpoint of the axis of symmetry and integrating the equation using an adaptive Runge-Kutta fourth order method [56] and requiring that the value of the Born radius at the boundary be 0.
The two solutions reasonably agree, within numerical accuracy, and are compared in Figure 8. Solving Equation (10) for proteins could represent an alternative to other well established methods to compute generalized Born radii. Obviously all the above derivation rests on the assumption that Still's formula (or alike) provides a good approximation of the exact solution to the Poisson (or Poisson-Boltzmann) equation for non-homogeneous media.

Electrostatic solvation energies in ionic solutions
A merit of generalized Born models is to separate Coulombic interactions from reaction field effects. The generalized Born radii have been used to compute interactions in ionic solutions assuming that charges whose distance is much larger than their Born radii interact through a Debye-Huckel potential and that the self-polarization energy can be computed according to the Debye-Hückel law [57]. The expression used in the present work for the reaction potential due to a unit source charge i at the site j for an ionic solution of dielectric constant ε out and Debye constant k D is: which is slightly different from that reported by Srinivasan et al. [57] in order to include dielectric constant different from 1. When we fit the self-polarization energy obtained by the above equation versus that obtained by equation (22) in the work Tjong and Zhou [31], using a zero intercept line and assuming a physiological ionic strength of 0.150 M we find a correlation coefficient of 0.999 and a slope of 0.999. The total solvation energy is obtained by summing contributions from all pairs of atoms and solvation self-energies: The total energy of the molecule, which does not include self-energies in the inner dielectric medium is: where the functions j i,j are the Green's functions at points i and j but do not include the self potential of each charge in the inner dielectric medium and δ i,j is Kronecker's delta.

Electrostatic solvation forces
The electrostatic solvation forces are obtained by differentiation of G solv with respect to atomic coordinates. The implicit dependence of Born radii on atomic coordinates through a surface integral makes the derivation rather tedious. One approximation is to consider that effects from the variation of Born radii with atomic coordinates are smaller than the explicit variation of energies due to the change in interatomic distances. This is in general not the case. Just consider the force arising from variation of the self-energy with atomic coordinates. For a point charge q in a conducting sphere of radius R at a distance a from its center the solvation force is pointing away from the center of the sphere and its magnitude is: For a unit charge placed at 7.5 Å from the center of a globular protein of radius 10 Å, assuming an inner dielectric constant 4.0, the resulting force is 14 kJ/(mol Å). An opposite force of the same magnitude is applied Computed generalized Born radius for a charge between two infinite planes depending on the distance x from one of the planes. The generalized Born radius and the distance are normalized by the plane to plane distance. Solid line refers to the solution obtained from equation (10) and dashed line refers to the expansion (11) of the solution for a finite system. on the center of the sphere defining the boundaries. This simple example shows that forces arising from variations of Born radii are not negligible.
The derivative of electrostatic potential implies derivatives of Born radii with respect to atomic coordinates that may be computed as derivatives of a function of a surface integral: The derivative of I i 5 is not straightforward as the domain of integration depends on atomic coordinates.
The subtleties of derivation of integrals have been described by Flanders [58]. We use here a form valid for the derivation of a flux through a surface that does not entail the derivative of the boundary. The total derivative with respect to a variable t (typically time, here atomic coordinates) of the flux of a vector field F through a surface S is expressed as a surface integral [59]: (17) where v( r S ) = d r S dt The surface point (for the solvent accessible surface) is dependent on the contacting atom (say the j th atom): where r vdW,j is the van der Waals radius of atom j and r p is the radius of the solvent. The derivative of r S with respect to the k th coordinate of atom l takes a very simple form: whereê k is the versor of the k th axis. Further derivations of r S will be zero. Taking into account the latter statement and expanding the third term in the righthand side of equation 17 a simpler form of the same equation is found: The left-hand side of equation 20 is directly related to equation 16 with: We define a function is ( r S ) that assigns to each surface element the index of the atom defining the boundary and consider the derivative of I i 5 with respect to the k th coordinate of atom l: + implicit surface-dependent terms (23) Note that in the above equations the operator ∇ operates only on r S coordinates, i.e. at the point where the integrand function is evaluated.
Let us consider now a simplified form (exact for a conducting sphere) of the Green's function for the electrostatic solvation energy and calculate explicitly the solvation force along the k th coordinate of atom j: where the apices i and j in I 5 means that the integral involves the vector from the surface points to the coordinates of atoms i and j respectively. The computation of forces via surface integral according to the above equations is not practical, unless a cutoff on the distance between pairs of atoms and surface points is used.
Indeed for each of the N(N + 1) 2 pairs of atoms j and i, where N is the number of atoms, surface integrals must be computed (see equation (21)). This is the result of the brute force application of our surface integral definition of Born radius. Work is under way in our laboratory to make this computation efficient.
pH-dependent properties (total charge and pH-dependent free energy of folding in the pH range -2 to 18) The method described by Antosiewicz et al. [60] is implemented here with some modifications. PDB entries corresponding to proteins for which data are available in the pKa database made available by Gunner and coworkers [43] were prepared with pdb2pqr [61]. The method implemented here is a single conformer method, at variance with other methods like the QM/ MM/TI method by Simonson et al. [62], MCCE [43] and the constant pH molecular dynamics method by McCammon and coworkers [63]. These methods are expected to be more accurate, at the expense of large computational time.
Approaches similar to that implemented here have been used by Onufriev and coworkers [64,65] where a significant speedup for larger proteins, with respect to our implementation, is due to clustering of interacting sites.
Other approaches which use a continuum method have been proposed recently which achieve very good agreement with experimental values, by using a pentapetide reference model for the titratable group, and by optimizing the hydrogen positions [70].
Improvements similar to those mentioned in the above paragraphs will be implemented in the future in our program.
We summarize here the theory that is discussed at length by Antosiewicz et al. [60].

Energy of ionization
The free energy of ionization of an ionizable group in an unfolded polypeptide at a given pH is: where z is the charge of the group upon ionization, R is the gas constant, T is the temperature (298.15 K in the present calculations).
It is assumed that pKa of protein ionizable groups in an unfolded protein are the same as those of model compound. We assume here that the the pKa are the following: 4.5 for Glu, 3.8 for Asp, 6.5 for His, 12.5 for Arg, 10.5 for Lys, 9.0 for Tyr, 8.0 for Cys, 8.0 for the Nterminal ammine and 3.2 for the C-terminal carboxyl.
The protein environment may change the free energy of ionization. If we assume that the main effects are electrostatic in nature the free energy of ionization for groups in a folded proteins is: where, for simplicity of notation, the sum runs on all atoms of the protein. q i are the atomic charges in the neutral amino acids and z i is 0 for most atoms and 1 or -1 for those representing basic and acid ionized groups, repectively. Superscript M stands for the model compound representing the unfolded state and N stands for the natively folded protein. The model compound of ionizable residues is obtained by excising that residue from the protein. The Green's functions j i,j are given by equation 14. The equation can be rearranged as follows: where each term describes a contribution to the free energy of ionization: the first term is the free energy of ionization in the model compound, the second term is the difference in ionization self-energy, the third term describes the difference in interaction of ionization charges with partial charges in the unionized protein and model compound and the fourth term describes the interaction among ionization charges. Following Antosiewicz et al. the inner dielectric constant is set to 20.0. A wide range of inner dielectric constants has been used for pKa calculations. When a single dielectric constant has been used it has been chosen mostly larger than 1, e.g. 4 in MCCE [43], 8 in EGAD [68] and 11 in GB/ IMC [70]. A large value of the dielectric constant accounts in an empirical way for the missing degrees of freedom implied by a single conformer method (see the discussion by Schutz and Warshel [71]).

Monte Carlo simulation of the ionization state
The protonation state of the ionizable sidechains is changed according to a Monte Carlo procedure. Protonation or deprotonation are simulated by adding or subtracting a unit charge to an atom representing the ionizable group. The following atoms have been considered as representative for each ionizable group: CG for Asp, CD for Glu, NZ for Lys, CE for Arg, SG for Cys, OH for Tyr, N for the N-terminal ammine and C for the C-terminal carboxyl. For histidine protonation may occur alternatively at the atom NE2 and ND1.
Monte Carlo simulations are performed at 0.5 pH intervals between -2.0 and 18.0 and at each pH value average ionization values and average components of ionization free energy are computed. At each pH value the number of equilibration steps is 100 times the number of ionizable sites and the number of Monte Carlo steps is 1000 times the number of ionizable sites.
Based on Monte Carlo simulation the output of the program includes: i) the titration curve for each ionizable site; ii) the list of pKa values for each ionizable site; iii) the charge state of the protein; iv) the pH-dependent component of the free energy of folding. Compared to the scheme of Antosiewicz et al. we are using the solvent accessible surface (as detailed in the subsection "Solvent accessible surface") instead of the molecular surface because results are more stable.
Heuristic corrections (detailed hereafter) to the scheme of Antosewicz et al [60] have been done mainly to remove large contributions which are typically overestimated due to neglection of molecular flexibility. For instance for surface residues large unfavorable interactions may be accomodated by conformational relaxation. Also large charge-charge interactions are likely to lead to partial unfolding that relieves the strong unfavorable energy. These considerations are consistent with the idea of using different dielectric constants for buried and solvent exposed residues, as suggested many years ago by Demchuck and Wade [72].

Scaling self energies
Here we consider as solvent exposed all those sites that have generalized Born radius smaller than 4.0 Å and buried those that have generalized Born larger than 7.0 Å. Self-energy interactions (second right-hand term in Equation 32) are multiplied by a factor 2.0 for buried sites and by 0.25 for exposed residues. For all intermediate situations the multiplying factor is a linear combination of the two extreme values.

Scaling background interactions
For exposed residues unfavorable background interactions (third right-hand term in Equation 32) are weighted by the empirical function exp(− E 2RT ). The factor 2 in the Boltzmann-like function is purely empirical to downweigh a contribution that is likely to be relaxed by protein flexibility.
For each ionizable site the pKa is obtained as the midpoint of the titration curve. The value is further corrected by separating the shift in pKa due to the desolvation self-energy (dpK  , which sets 5 pKa units as the maximum contribution due to background interactions, in order to avoid few very large shifts.

Test sets and conditions
The 55 proteins selected by Tjong and Zhou [31] for testing the GBR6 model have been used here. Atoms of the PDB structures have been assigned charges and radii taken from the CHARMM forcefield, using the program PDB2PQR [61,73]. The PQR file format is described at [52]. The radius of hydrogen atoms was reset to 1.0 Å in order to avoid numerical inaccuracies in the analysis linked to the small radius of polar hydrogens in the CHARMM forcefield. The surface of the molecule was defined alternatively as the surface accessible to the center of a solvent probe sphere with radius 1.5 Å or as the surface accessible by the surface of a solvent probe sphere with radius 1.5 Å, except where noted. The first type of surface, here referred to as solvent accessible surface, is computed by default by the program while the second type of surface, here referred to as molecular surface, was computed using the program MSMS [42] and given as input to the program in the form of a list of vertices and normal vectors.
For the computation of self energies 1000 sites were randomly chosen in the 55 proteins providing an unbiased sample of different environments.

Exact generalized Born radii for a conducting sphere
We analyze the simple planar and spherical systems which are treatable analytically with closed formulae in the hypothesis of grounded conducting surface. We consider a hollow conducting grounded sphere of radius R filled with a medium with dielectric constant ε in . The reaction field for a point charge in the sphere may be easily obtained by the image charge method. The reaction field due to a source charge displaced by a with respect to the center of the sphere is computed as the field due to an image charge q = −q R a at a distance from the center R 2 a along the direction from the center to the source charge: which corresponds to a Born radius: In the limit of R ∞ and a R the above expression reduces to: where d is distance of the charge from the delimiting plane. The polarization charge contribution to the potential due to a source charge j at a point i inside the sphere is given by: where a i is the generalized Born radius at point i.
As noted by Grycuk [30] for a charge placed at a distance a from the center of a sphere with radius R, the Born radius computed according to equation (5) results in: This provides the exact Born radius in the limit a 0, i.e. when the charge is placed at the center of the sphere, but underestimates Born radius in the limit R ∞ and a R (i. e. the plane) by a factor 2. It would be desirable to find a surface integral that recovers the correct expression at least for charges embedded in a sphere. The same surface integral should have a form similar to that of equation (5) in order to ensure that no problems arise from complex shape surfaces, e. g. like those of biomolecules. A convenient correction may be sought considering other integrals like in equation (5) with increasing powers of the distance || r i − r S ||. In particular we may define the following integrals: ( r S − r i ) · n( r S ) || r S − r i || n+1 dS (40) For the sphere these integrals have closed forms that can be obtained using the formal integrator of Mathematica [75]. If the position of the source charge is displaced from the center of the sphere by a quantity a and R is the radius of the sphere the integrals have the following values: The integral I 5 is interesting because, as pointed out by Grycuk [30] it can be easily inverted to get the correct Born radius for both the plane and the sphere: The work of Mongan et al. [33] reports a compact form for the Born radii corresponding to the integrals above expressed in term of GBR6 multiplied by a correction factor.