qPIPSA: Relating enzymatic kinetic parameters and interaction fields
© Gabdoulline et al.. 2007
Received: 24 May 2007
Accepted: 05 October 2007
Published: 05 October 2007
Skip to main content
© Gabdoulline et al.. 2007
Received: 24 May 2007
Accepted: 05 October 2007
Published: 05 October 2007
The simulation of metabolic networks in quantitative systems biology requires the assignment of enzymatic kinetic parameters. Experimentally determined values are often not available and therefore computational methods to estimate these parameters are needed. It is possible to use the three-dimensional structure of an enzyme to perform simulations of a reaction and derive kinetic parameters. However, this is computationally demanding and requires detailed knowledge of the enzyme mechanism. We have therefore sought to develop a general, simple and computationally efficient procedure to relate protein structural information to enzymatic kinetic parameters that allows consistency between the kinetic and structural information to be checked and estimation of kinetic constants for structurally and mechanistically similar enzymes.
We describe qPIPSA: quantitative Protein Interaction Property Similarity Analysis. In this analysis, molecular interaction fields, for example, electrostatic potentials, are computed from the enzyme structures. Differences in molecular interaction fields between enzymes are then related to the ratios of their kinetic parameters. This procedure can be used to estimate unknown kinetic parameters when enzyme structural information is available and kinetic parameters have been measured for related enzymes or were obtained under different conditions. The detailed interaction of the enzyme with substrate or cofactors is not modeled and is assumed to be similar for all the proteins compared. The protein structure modeling protocol employed ensures that differences between models reflect genuine differences between the protein sequences, rather than random fluctuations in protein structure.
Provided that the experimental conditions and the protein structural models refer to the same protein state or conformation, correlations between interaction fields and kinetic parameters can be established for sets of related enzymes. Outliers may arise due to variation in the importance of different contributions to the kinetic parameters, such as protein stability and conformational changes. The qPIPSA approach can assist in the validation as well as estimation of kinetic parameters, and provide insights into enzyme mechanism.
The ability to estimate enzymatic kinetic parameters is very important for metabolic simulations [1–3]. This is because experimental values of the kinetic parameters measured under exactly the conditions of the model for exactly the proteins in the model are usually missing. Often kinetic parameters have been measured only for a related enzyme or for the correct enzyme but under different conditions, e.g. different temperature, pH or ionic strength. Therefore we developed a method that relates variations in kinetic parameters to differences in protein structures. This method can be used to check the consistency of kinetic measurements described in the literature with available protein structural data as well as to make estimates of kinetic parameters based on enzyme structures and kinetic parameters for related enzymes.
Protein structural information provides a basis for predicting and rationalizing protein function. Given a protein structure, molecular simulations can be performed and these can be used to compute kinetic parameters . For example, Brownian dynamics simulations can be used to simulate substrate-enzyme diffusional association and compute bimolecular association rate constants [5, 6]. We previously demonstrated how rate constants computed by Brownian dynamics simulation of the diffusional association of superoxide and myeloperoxidase could be used in the mathematical modeling and simulation of the oscillatory behaviour of metabolite levels in activated white blood cells . The type of molecular simulation to use must be chosen according to the mechanism determining the kinetic parameter. Whereas Brownian dynamics is appropriate for diffusional processes, molecular dynamics techniques may be required to simulate conformational changes and quantum mechanics for chemical reaction steps. These simulations can be computationally demanding and the accurate computation of kinetic parameters by simulations is a challenging and on-going research topic . Therefore, a simpler, less computationally demanding and more robust approach to exploit protein structural information is required in the context of biochemical network simulation. qPIPSA is designed to fulfill this requirement.
In qPIPSA, molecular descriptors are related to kinetic parameters. The molecular descriptors are the molecular interaction fields (MIFs) of the proteins. Molecular interaction fields map the interaction energy between a chemical probe and the target protein as the chemical probe is moved over a grid of points . Diverse chemical probes may be used, e.g. a water molecule, carbonyl oxygen or hydroxyl group. When the probe is a point charge, the molecular interaction field corresponds to the molecular electrostatic potential. Intermolecular interactions are fundamental to enzymatic reactions and are dependent on the enzyme MIFs. MIFs are often used in 3D-QSAR studies to derive quantitative structure-activity relationships (QSARs) . This approach may be taken for proteins  as well as for small molecules [9, 10]. The QSARs are generally derived by partial least squares (PLS) chemometric procedures for a training set with experimentally determined parameters. While such training procedures can be applied to predict enzyme parameters , typically, insufficient experimental data on kinetic parameters are available for training by PLS. In qPIPSA, we therefore employ a simpler linear regression procedure for which only two experimental measurements of a kinetic parameter for enzymes and at least one known three-dimensional structure are required. Further experimental measurements can be used and will help to improve the accuracy of predictions and assess the confidence of the parameter estimates. qPIPSA is based on the PIPSA method  which has been used to classify the interaction properties of protein families using MIFs [13, 14].
In this paper, we will focus on the use of molecular electrostatic potential as the descriptor MIF in qPIPSA. The molecular electrostatic potential is usually the most informative MIF for this purpose. The long-range nature of electrostatic interactions means that similarities or differences in electrostatic potentials are often not detected in a sequence analysis. Even proteins with low sequence similarity can have rather similar electrostatic potentials [12, 15]. For estimating enzyme kinetic parameters, the molecular electrostatic potential is appropriate because electrostatics are considered to be the most important contributor to enzymes' catalytic abilities , e.g. to stabilization of the transition state. Electrostatic steering has also been shown to enhance the association rates in fast, diffusion-influenced enzymes, such as superoxide dismutase .
In qPIPSA, the MIFs of different proteins and/or the same protein under different conditions, e.g. in a different titration state at a different pH, are compared. The enzyme kinetic parameters kcat and Km are associated with specific substrate-enzyme interactions. For the binding process relevant to the kinetic parameter, only one binding partner (here the enzyme) is modeled. The ligand is thus assumed to be the same or similar for all the protein structures compared. The differences in kinetic parameters are assumed to be determined by differences in the protein MIFs, and these differences are calculated in a region around the active site of the enzyme. Thus, qPIPSA requires no prior knowledge of the reaction mechanism. However, information about the location of the active site or substrate or ligand binding residues is used to define the region over which to compare MIFs. qPIPSA can be used to estimate missing kinetic parameters, to check the consistency of different measurements, and to investigate the mechanistic determinants of kinetic parameters. As such, it is a useful tool for biochemistry studies in general and for biochemical network simulation and comparative systems biology in particular.
In this paper, we first outline the qPIPSA approach. Details are given in the Methods section. Results are described for four illustrative and experimentally well-characterized enzymes. The criteria for choosing these enzymes were the availability of enough experimental kinetic and structural data for validation of the methodology, and a reasonable diversity in enzyme type. First, we present an analysis of acetylcholinesterase (AChE) mutants. This shows that not only inhibitor association rates but also substrate Km and kcat/Km parameters correlate remarkably well with the electrostatic potential differences near the active site of the enzyme. We then analyze superoxide dismutase (SOD) and show that the ionic strength and pH dependence of the rate constants for superoxide can be explained by the changes in the electrostatic potential. Next, we examine triose phosphate isomerase (TPI) enzymes from 12 different species. For TPIs, the electrostatic potential differences are found to be good descriptors for the cross-species variation in kinetic parameters kcat/Km and Km. In the fourth case, for 10 class I fructose-1,6-bisphosphate aldolases (FBA) the correlations are less obvious, but detectable. It appears that the conformation of the C-terminal region of FBA is critical for a description of the kinetics of this enzyme. Finally, we examine different protein structural modeling procedures and how to choose the comparison region for interaction fields, showing that this relates to enzyme mechanism.
where R is the region selected for comparison and considered important for the kinetic parameter k. The differences in MIF, Φ, are summed over the grid points in the "skins" (see Methods section for a definition) around the proteins within a distance R from a specified point, defining the region R and divided by the number of points in the overlapping skins to obtain a size-independent measure.
with <> being an average over all possible values of x or y.
Predictions of kinetic parameters are carried out by first deriving the parameter α for all pairs with known kinetic parameters and then averaging predictions for any unknown case from pairwise comparisons to all proteins with known kinetic parameters. The minimum required number of known parameters equals two in this approach.
When Φ is the molecular electrostatic potential, a physically meaningful estimate for the parameter α is Φ is -q/kBT, where q is the net charge of the substrate, kB is the Boltzmann constant and T is the temperature. This corresponds to the kinetic parameter being determined by the interaction energy of the substrate charge q with the electrostatic potential Φ of the enzyme: k ~ exp(-q·Φ/kBT).
For a single point in the region R, formulas (1) and (6) give the same results. When there are many points in the comparison region R, formula (6) will enhance the contributions from large and positive (+ sign) or large and negative (- sign) potentials. In some cases, formula (6) gives a better description of interaction field – kinetic parameter correlations.
Similarity indices, e.g. the Hodgkin index 
, may also be used to compare electrostatic potentials . Similarity indices can assist in assigning a kinetic parameter to a protein when experimental values are available for several related proteins by enabling the most similar protein to be found and hence the kinetic parameter to be estimated . For a more quantitative analysis, a pairwise distance matrix with as a distance measure can be constructed. We have previously shown how such a distance matrix can be used to compute relative electron transfer rates between plastocyanin and cytochrome f for a set of plastocyanin mutants based on the known electron transfer rate for wild-type plastocyanin . The formulations based on similarity indices however suffer from loss of the sign information present in equation (1) and therefore do not allow a correlation to be made in a fully automated fashion. Equation (1) provides the most direct model in terms of the physical determinants of the quantities computed and has therefore been used in this manuscript. The correlations obtained with equation (1) are mostly of similar or better quality than those using similarity indices or distance matrices based on similarity indices.
The average Boltzmann factor of the ligand-protein interaction energy when the ligand is near or in the active site  is another possible descriptor of kinetic parameters. However, as the average Boltzmann factor is calculated from the interaction energy rather than the interaction field, it requires knowledge of the position and charge distribution of the ligand interacting with the enzyme. The use of MIFs alone in qPIPSA is less demanding in terms of modeling but can nevertheless be applied to comparing different enzymes.
Kinetic data for mouse acetylcholinesterase (AChE) 
kon TFK+ (1011 M-1min-1)
kcat/Km ATCh (108 M-1min-1)
Km ATCh (μM)
9.8 ± 0.6
46 ± 3
0.39 ± 0.01
1300 ± 140
1.2 ± 0.1
140 ± 10
7.9 ± 0.4
200 ± 40
18000 ± 2500
8.2 ± 2.0
73 ± 4
7.6 ± 0.3
60 ± 9
2.3 ± 0.1
162 ± 6
1.8 ± 0.1
230 ± 32
4.3 ± 0.8
240 ± 52
0.14 ± 0.01
700 ± 29
0.31 ± 0.02 #
1600 ± 320
As seen in Fig. 1a, a decrease in the average electrostatic potential by 1.39 kcal/mol/e results in an increase of kon for TFK+ by 1 natural log (ln) unit (equivalent to a factor of 2.72). This relation is in agreement with the expectation that ln(kon) is is a linear function of -Φ/kBT, where Φ is an average electrostatic potential in the comparison region. Standard LTO (leave-two-out) cross-validation to assess the predictive ability shows that accurate predictions of kon values can be obtained on the basis of this correlation, see Fig. 1b. Thus an excellent linear correlation between calculated differences in electrostatic potentials and measured ln(kon) values for TFK+ could be achieved and this can be used to predict ln(kon) values for AChE mutants.
The Km values for ATCh are also correlated with the AChE electrostatic potentials as shown in Fig. 2b. A decrease of the electrostatic potential by 1.68 kcal/mol/e results in a decrease of Km for ATCh by 1 ln unit.
Correlations for koff for TFK+ and kcat for ATCh (not shown) are significantly weaker. ln(kcat) is the sum of ln(kcat/Km) and ln(Km). Both of these quantities correlate with the electrostatic potential differences, but with opposite signs. Therefore, their sum appears to have a weak correlation with electrostatic potential differences.
Kinetic constants measured under different environmental conditions can be correlated with MIFs computed for the respective conditions. Here, we demonstrate this for the rate constant for the first step of the SOD reaction which is limited by superoxide association to SOD. Bovine SOD was chosen because its reaction mechanism has been thoroughly studied  and the consistency of kinetic constants measured under different conditions has been analyzed . The association rate of superoxide to SOD shows a pronounced pH- and ionic strength dependence. The pH was varied between 7.7 and 11.0 (at a constant ionic strength of 20 mM) and the ionic strength was varied between 20 and 250 mM at a constant pH of 7.7.
For the bovine SOD above pH 11 and for the Photobacterium leiognathi SOD, larger electrostatic potential differences correspond to a 1 ln unit change in kinetic constant. This deviation in pH dependence can be attributed to the difficulty in calculating the pK a s of amino acid residues of SOD. The pH-dependence of the rate constants of bovine SOD is shown in Fig. 4b. For example, assigning a low dielectric constant (ε = 4) to the protein interior resulted in lower pK a values of Lys and Arg residues and a too steep drop in the rate constant around pH 9, see Fig. 4b. Significant pH-dependent changes in kcat/Km are observed experimentally to occur at pH 11. This behavior can be reproduced when electrostatic potentials are calculated using the charges, assigned from pK a calculations with a protein dielectric constant of 78. Use of this high value of the protein dielectric in computations of the pK a values of these residues is expected to give the most accurate pK a values according to the methodology applied .
Kinetic constants for triose phosphate isomerases (TPI) from 12 different organisms with glyceraldehyde-3-phosphate as substrate
kcat (105 min-1)
kcat/Km (107 M-1s-1)
SwissProt ID Sequence identity to 1r2rA
20°C, 100 mM, pH 7.4 
20°C, 100 mM, pH 7.4 
Oryctolagus cuniculus muscle
30°C, 50 mM, pH 7.6 
Gallus gallus muscle
30°C, 100 mM, pH 7.4 
25°C, 100 mM, pH 7.6 
25°C, 100 mM, pH 7.6 
30°C, 50 mM, pH 7.9 
10°C, 100 mM, pH 7.6 
25°C, 100 mM, pH 7.6 
25°C, 100 mM, pH 7.6 
25°C, 100 mM, pH 7.4 
25°C, 100 mM, pH 7.5 
This correlation can be used to predict enzyme kinetic parameters of triose phosphate isomerases. The kinetic parameters from species with the highest and lowest kcat/Km values were used to calculate and predict parameters for the remaining 10 species, see Fig. 5c and 5d. In general, a satisfying agreement between calculated and experimental values was obtained. The relative ordering of all species could be reproduced. Computation of the kcat/Km parameters for the remaining 10 TPIs shows that the TPIs (from rabbit and Giardia lamblia) appear to be outliers, see Fig. 5c and 5d. The predicted kcat/Km values for enzymes from these organisms were significantly larger than those measured experimentally. The correlations were better for the SwissModel models (Fig. 5b and 5d) than for the Modeller "Turbo" models (Fig. 5a and 5c).
In TPIs, the substrate has a net charge of opposite sign and twice the magnitude of the substrate of AchE. Therefore the correlation coefficient should be expected to have approximately half the magnitude and an inverted sign compared to the AchE case, in accord with its physical meaning, described in the Theory section. This is indeed the case for the correlation between the electrostatic potential differences and Km values.
Fructose-1,6-(bis)phosphate aldolase is a ubiquitous enzyme that catalyzes the reversible aldol cleavage of fructose-1,6-(bis)phosphate to glyceraldehyde-3-phosphate and glycerone phosphate. Class I aldolases of animals and higher plants use covalent catalysis through a Schiff-base intermediate. Class II aldolases of most bacteria and fungi require a divalent metal cation as a cofactor .
Kinetic data for 10 different Class I Fructose-1,6-bisphosphate aldolases (FBA) with fructose-1,6-bisphosphate as substrate
kcat/Km (106 M-1s-1)
SwissProt ID Sequence identity to 1zaiA
Aldolases present a rather complicated case for the analysis based on the electrostatic potential in the active site region because it is known that the kinetic parameters of aldolases are modulated by a flexible C-terminal part as well as isozyme specific residues that are not located in the active site of the enzyme [35, 36]. Nevertheless, kinetic parameters correlate with the interaction properties near the active site of the enzyme. The electrostatic potential changes here have a much larger impact on Km than in the other cases investigated in this manuscript: changes of electrostatic potential by 0.3 kcal/mol/e change Km by 1 ln unit. As was already mentioned, this can be attributed to a larger charge of the substrate in this case. At the same time, there is a possibility that putative conformational rearrangements, that are not included in our modeling, are accounted for by electrostatic potential changes implicitly – conformational changes could cause further changes in electrostatic potential and these changes could account for changes in Km directly by contributing to the interaction with the substrate. We also find that, in this case, the selection of a reference structure with an appropriate conformation is very important. The correlation was lost when we used the structure of the human aldolase A (muscle-type) with PDB code 1ALD instead of the structure of the rabbit muscle aldolase A (PDB code 1ZAI) as a template, even though these proteins have 98% sequence identity. The main difference between the two crystal structures is in the conformation of C-terminal residues: in 1ALD, the C-terminal forms part of the active site of the enzyme  whereas in 1ZAI, it does not. The critical involvement of the C-terminal loop in catalysis has been noted before. C-terminal truncated aldolases are practically catalytically inactive. See also reference .
When a probe radius, δ, of 1 Å was used to define the skin, the correlation error was roughly constant up to R = 18 Å and then abruptly increased for larger values of R. The use of a probe radius of 3 Å gave errors systematically increasing with R for "Turbo" models, and decreasing and reaching a minimum for "Automodel" models. The smallest correlation error, 17%, was obtained at a probe radius of 2 Å and "skin" thickness of 3 Å with "Turbo" models at R = 10 Å. "Automodel" models only gave a correlation error minimum of ca. 50% and this was obtained at R = 20 Å. This shows that the orientation of side chains is important for obtaining correlations. Inconsistent side-chain positioning, as in the "Automodel" case, reduces the possibilities for deriving a correlation between the structure and the kinetic parameters. Nevertheless, a weak correlation can still be identified by using a larger radius R for the comparison region so that errors due to side chain conformational differences are partly cancelled out.
In comparing MIFs and correlating their differences with kinetic parameter differences, a comparison region must be chosen. An appropriate size, shape and location are not obvious, although one can expect that the comparison region should be near the active site of the enzyme. We already noticed that the comparison region should neither be too small nor too large (see Materials and Methods section for a detailed description), a region with a radius of approximately 10 Å is optimal (see Fig. 9).
In the case of AChE, comparisons over a single region in the gorge near the buried active site result in correlations for the inhibitor association rate constant as well as the substrate kcat/Km and Km parameters. Moving the comparison region to the active site results in slightly poorer correlations. Moving the comparison region to the entrance of the gorge results in significantly poorer correlations for the majority of cases. This means that the electrostatic potential closer to the active site is of more importance for the kinetic parameter values, i.e. rate-determining. At the same time, the contribution from the surface residues to the potential in this region does not fully account for the contribution of these residues to the inhibitor association rate constant (Fig. 2a). This means that the interaction field in this region does not fully describe the kinetics. However, in this case a single region still describes kinetic parameters quite accurately.
It can be expected that the same region is responsible for parameters correlated with each other. The appearance of two different regions responsible for 2 different kinetic parameters is expected when the 2 different kinetic parameters are not correlated with each other, as in case of TPI, because the interaction field changes in a single region can describe only one set of parameters. The region responsible for the TPI parameter kcat/Km can be interpreted as contributing to the interaction between the substrate and the enzyme, because it is located near the substrate binding position. The region responsible for Km is close to the active site, but one cannot expect that the substrate interacts with the enzyme there, because the region is on the other side of the flexible loop from the active site. One can assume that the interaction field variations define Km variations indirectly. It is the variation of physico-chemical properties of the residues at the beginning of the flexible loop that changes Km values, but we see these variations as variations of the electrostatic potential around W168.
We have shown that protein interaction field differences can correlate with changes in the kinetic parameters of enzymes. The application possibilities of this observation are two-fold. First, the correlation method can be used to check the consistency of kinetic measurements with available structural data. In the case that the predictions are significantly different from measured parameters, significant structural changes can be expected or the measurement conditions are not consistent with the structure of the enzyme. Second, prediction of kinetic parameters is possible when data consistency is present – given two enzymes with known kinetic parameters and structure, one can derive a correlation and make predictions for the rate constants of a similar enzyme with known structure. Two is the minimum number of known parameters required for application of the method. For error estimates, more known kinetic parameters are needed and the more measured kinetic parameters are used, the more reliable the predictions will be.
An important area of application is the use of protein three-dimensional models to bring experimental measurements into consistency with each other. Measurements of kinetic constants are frequently done under different conditions and often result in different values for the same parameter. Here, the structural properties can be used as a reference point for rationalising two different measurements.
The generality of the qPIPSA method is that it uses a concept of similarity that does not require a priori knowledge of the specific mechanism of the enzymatic reaction. Although only consideration of the detailed mechanism can permit quantification of all the contributions, general structural differences can explain the differences between the enzymes resulting in their different kinetic properties. This is analogous to the case of protein association, in which relative association rates of protein mutants are described well by differences in interaction energies calculated in a simple way , although the association process itself may be changed significantly by mutations .
The consistency of kinetic measurement conditions and structural data is dependent on the accuracy of structural modeling. The requirement for protein structural model accuracy is in general high. It varies according to the type of parameter. kon or kcat/Km values, for which long-range interactions are important may be less sensitive to structural details than kcat which is defined by short-range interactions. In our study, we avoided the problem of rotating side-chains by forcing equivalent side-chains to have the same conformation. This approach may overlook parameter changes due to side-chain conformational changes, but as shown here, in many cases it allows significant correlations between structural and kinetic parameters to be detected. Improvements in homology modeling methods  may result in improvements in this type of prediction.
The minimum requirements for performing qPIPSA are (i) that a structural model of the enzyme of adequate quality for computation of the electrostatic potential in the region of the active site must be available or modellable, and (ii) that enzymatic kinetic parameters for related enzymes having the same reaction mechanism must be available. According to our estimates, a significant fraction of characterized enzymes satisfy these requirements and as such can be investigated by qPIPSA. Factors influencing the validity of qPIPSA include the following. (i) The relevance of the protein structures analysed for the rate-determining catalytic step. Proteins may adopt multiple conformations. Sometimes it may be necessary to do calculations with more than one conformation, e.g. with open and closed forms of an enzyme active site. If one of these gives MIFs that correlate better with known kinetic parameters, this provides some mechanistic information about the determinants of the kinetic parameter. (ii) Similarly, the region over which to compare the MIFs should be chosen. Calculations can be done for a number of regions. For TPIs, the best correlation with kinetic parameters was obtained for different regions for kcat/Km values and Km values. This again provides some mechanistic information on the parameter determinants. (iii) The method is most suitable when the rate-determining step is mechanistically the same across the set of protein structures compared. An outlier might be mechanistically different, but if there is wide mechanistic variation in the dataset, the comparative approach cannot be expected to work. (iv) Molecular dynamics are not currently considered in the approach and if they alter the protein structures in different ways across the dataset they will adversely affect the value of the MIF comparison.
In summary, we have described qPIPSA and demonstrated its validity for establishing correlations between kinetic parameters and MIFs, as well as estimating kinetic parameters using MIFs computed from enzyme structures. qPIPSA can be a useful tool for parameter estimation in biochemical network modelling in systems biology applications as well as in investigating enzymatic structure-function relationships and enzyme mechanisms.
All protein structures were modeled by comparative modelling techniques with SwissModel . In addition, Modeller  was used to model TPI structures. Comparative modeling techniques are suitable as the level of sequence identity between the template protein with experimentally determined structure and the modeled protein was rather high, being greater than 40% for all cases studied.
One X-ray crystal structure was chosen as the template for each enzyme family (with PDB and subunit identifiers as follows: 1mahA for AChE, 1cbjA for SOD, 1r2rA for TPI, 1zaiA for FBA). TPI and SOD are homodimers, but monomeric models were used here, as there is little interweaving of the monomer structures. Only one monomer of FBA was used for the same reason, although the majority of aldolases are homotetramers.
For a given enzyme family, all comparative models were built from one single template crystal structure even when other crystal structures were available with higher sequence similarity to the modeled protein. This was done to ensure that the differences between the models of different proteins of one enzyme family were due to differences in sequence only and not influenced by differences in crystallization conditions of different template crystal structures. Such differences can, for example, result in different rotamers of flexible side chains being seen in different structures corresponding to the same protein sequence.
For Modeller, as well as using the default mode "Automodel", we used a modeling procedure called "Turbo", to avoid excessive optimization of models. Excessive optimization and MD refinement can result in differences between models of different proteins in one enzyme family that do not reflect the sequence differences.
The "Automodel" class in Modeller8v.1  provides a convenient set up of default parameters for the generation of homology models. By default, protein structural models are energy minimized and then subject to a molecular dynamics and simulated annealing procedure to overcome bad contacts or local minima. A detailed description of the procedure and settings can be found in the Modeller documentation http://salilab.org/modeller/manual/. The CHARMM22  force field for proteins is used with modified dihedral parameters. Energy minimization is done using a conjugate gradients optimisation method for a given maximum number of iterations. The minimal atomic shift for the optimisation convergence test is 0.010 Å. The molecular dynamics integrator uses the Verlet algorithm; the time step in MD simulations is set to 4 fs.
The default (referred to here as "Automodel" procedure) settings correspond to a minimization for a maximum of 200 steps, followed by a "very_fast" MD and simulated annealing simulation. Initial models are randomised in their xyz coordinates by adding a random number between -4.0 and +4.0 Å to the initial coordinates. MD equilibration is performed at 150, 400 and 1000 K for 50 iterations each. Simulated annealing is done at 1000, 800, 500 and 300 K for 300 iterations at each temperature.
The simplified protocol (referred to here as the "Turbo" procedure) uses the "automodel.very_fast" pre-defined settings. They differ from the default in that initial coordinates are not randomised; energy minimization is done for a maximum of 50 steps with no subsequent MD and simulated annealing runs. In both cases, the quality of the generated models was checked with WHATCHECK  to remove bad models.
The PIPSA (Protein Interaction Property Similarity Analysis)  software (version 2 http://projects.villa-bosch.de/mcm/software/pipsa with minor modifications) was used to quantify MIF differences. The MIF used was the molecular electrostatic potential. Other MIFs for the interaction of different chemical fragments or functional groups with macromolecules (these can, for example, be computed with the GRID program  for a range of probes such as carboxylate, amino and hydrophobic probes), were not used in this work.
PIPSA permits the comparison of the interaction fields for a superimposed protein pair in a region defined by the intersection of the "skins" of these two proteins. The "skin" is defined as a region outside the protein surface, accessible to a probe of radius δ, and inside the surface accessible to a probe of radius δ + σ; the "skin" thus has a thickness σ. The comparison region can be restricted to be within a specified distance from a given point, for example, a chosen atomic center. Using PIPSA, we have previously carried out large scale classification of Plekstrin Homology (PH) domains , WW domains  and E2 ubiquitin conjugating domains  by electrostatic similarity to infer binding characteristics. We have also correlated electron transfer rates with the similarity of electrostatic potentials for a set of plastocyanin mutants . In these studies, a probe of radius ψ = 3 Å and a "skin" of thickness σ = 4 Å were used. In the present study, a probe of radius δ = 2 Å and a "skin" of thickness σ = 3 Å were used. These values are smaller than those used previously because the kinetic parameters of enzymes can be influenced by the interaction fields in the active site close to the enzyme surface. The use of a skin closer to the surface meant that it was important to derive models with consistent side chain orientations (see above).
For each protein structure, the molecular electrostatic potential was computed as follows. Polar hydrogen atoms were added using Whatif  and electrostatic potentials were calculated by numerically solving the linearized Poisson-Boltzmann equation using the UHBD program  with atomic radii and partial atomic charges assigned from the OPLS parameter set . The grid spacing was set to 1 Å and grid was dimensioned to 1103 points. The dielectric surface of the protein was defined as the molecular surface accessible to a spherical water probe of radius 1.4 Å. The relative dielectric constant of the protein interior was set to 2, the dielectric constant of the solvent was 80. The ionic strength of the solvent was 0 mM in the case of AChE, 100 mM for TPI, 150 mM for FBA and variable for SOD in accord with the corresponding experimental conditions.
In all cases, we used the following protocol to select the MIF comparison region:
1. The skin is assigned a thickness σ of 3Å extending from the probe accessible surface around the protein computed with a probe of radius (δ) 2 Å.
2. The region is assigned a radius of 10Å. If there are fewer than 50 points for comparison in this region, the radius is incremented in 2 Å intervals until enough points are included for reliable calculations.
3. The center of the region is assigned in the active or binding site. How this is chosen depends on the information available on the enzyme under study. If a structure is known of an enzyme-ligand complex, the region can be centered on the ligand. If the catalytic residues are known, the center can be one of these or their geometric center. The comparison region should cover the region in which the ligand interacts with the enzyme, but may also be chosen to include a region relevant for ligand access such as the gorge in AChE.
For AchE, the MIF comparison region was selected to be within 10Å of a point near His447 in the substrate active site gorge  (namely, the mid-point between the atomic coordinates of H447ND1 and Y124OH in the X-ray crystal structure with PDB identifier 1mah, subunit A). The spherical region thereby included the active site, the gorge and the peripheral substrate binding site at the gorge entrance. There were about 50 grid points in the 3 Å thick skin extending from the surface accessible to a 2 Å radius probe. Residues Glu202 was protonated and His447 was doubly protonated, as described previously .
SOD structures were modeled based on the structure of Bovine superoxide dismutase . The pH dependence of protein charges was modeled by calculating the pK a values of ionisable sites using a Poisson-Boltzmann electrostatics method . Calculations were done for assignments of high (78) and low (4) values of the dielectric constant of the protein. The charges of the Cu and Zn ions and their ligands were assigned as described previously  and these charges were not changed with pH. The electrostatic potentials were compared in the "skin" region within 10 Å of the catalytic copper ion. The comparison region in TPI and FBA cases has the same radius with the centers assigned as described in the text.
Class I fructose-1,6-bisphosphate aldolase
Molecular interaction field
Protein Interaction Property Similarity Analysis
Partial Least Squares
Root mean square deviation
Quantitative structure activity relationship
Triose phosphate isomerase
This work was supported by the BMBF Systems Biology Initiative HepatoSys Grant nos. 0313076 and 0313078C, the Klaus Tschira Foundation and the Heidelberg Center for Modelling and Simulation in the Biosciences (BIOMS).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.