The ability to estimate enzymatic kinetic parameters is very important for metabolic simulations [1–3]. This is because experimental values of the kinetic parameters measured under exactly the conditions of the model for exactly the proteins in the model are usually missing. Often kinetic parameters have been measured only for a related enzyme or for the correct enzyme but under different conditions, e.g. different temperature, pH or ionic strength. Therefore we developed a method that relates variations in kinetic parameters to differences in protein structures. This method can be used to check the consistency of kinetic measurements described in the literature with available protein structural data as well as to make estimates of kinetic parameters based on enzyme structures and kinetic parameters for related enzymes.

Protein structural information provides a basis for predicting and rationalizing protein function. Given a protein structure, molecular simulations can be performed and these can be used to compute kinetic parameters [4]. For example, Brownian dynamics simulations can be used to simulate substrate-enzyme diffusional association and compute bimolecular association rate constants [5, 6]. We previously demonstrated how rate constants computed by Brownian dynamics simulation of the diffusional association of superoxide and myeloperoxidase could be used in the mathematical modeling and simulation of the oscillatory behaviour of metabolite levels in activated white blood cells [2]. The type of molecular simulation to use must be chosen according to the mechanism determining the kinetic parameter. Whereas Brownian dynamics is appropriate for diffusional processes, molecular dynamics techniques may be required to simulate conformational changes and quantum mechanics for chemical reaction steps. These simulations can be computationally demanding and the accurate computation of kinetic parameters by simulations is a challenging and on-going research topic [4]. Therefore, a simpler, less computationally demanding and more robust approach to exploit protein structural information is required in the context of biochemical network simulation. qPIPSA is designed to fulfill this requirement.

In qPIPSA, molecular descriptors are related to kinetic parameters. The molecular descriptors are the molecular interaction fields (MIFs) of the proteins. Molecular interaction fields map the interaction energy between a chemical probe and the target protein as the chemical probe is moved over a grid of points [7]. Diverse chemical probes may be used, e.g. a water molecule, carbonyl oxygen or hydroxyl group. When the probe is a point charge, the molecular interaction field corresponds to the molecular electrostatic potential. Intermolecular interactions are fundamental to enzymatic reactions and are dependent on the enzyme MIFs. MIFs are often used in 3D-QSAR studies to derive quantitative structure-activity relationships (QSARs) [7]. This approach may be taken for proteins [8] as well as for small molecules [9, 10]. The QSARs are generally derived by partial least squares (PLS) chemometric procedures for a training set with experimentally determined parameters. While such training procedures can be applied to predict enzyme parameters [11], typically, insufficient experimental data on kinetic parameters are available for training by PLS. In qPIPSA, we therefore employ a simpler linear regression procedure for which only two experimental measurements of a kinetic parameter for enzymes and at least one known three-dimensional structure are required. Further experimental measurements can be used and will help to improve the accuracy of predictions and assess the confidence of the parameter estimates. qPIPSA is based on the PIPSA method [12] which has been used to classify the interaction properties of protein families using MIFs [13, 14].

In this paper, we will focus on the use of molecular electrostatic potential as the descriptor MIF in qPIPSA. The molecular electrostatic potential is usually the most informative MIF for this purpose. The long-range nature of electrostatic interactions means that similarities or differences in electrostatic potentials are often not detected in a sequence analysis. Even proteins with low sequence similarity can have rather similar electrostatic potentials [12, 15]. For estimating enzyme kinetic parameters, the molecular electrostatic potential is appropriate because electrostatics are considered to be the most important contributor to enzymes' catalytic abilities [16], e.g. to stabilization of the transition state. Electrostatic steering has also been shown to enhance the association rates in fast, diffusion-influenced enzymes, such as superoxide dismutase [17].

In qPIPSA, the MIFs of different proteins and/or the same protein under different conditions, e.g. in a different titration state at a different pH, are compared. The enzyme kinetic parameters k_{cat} and K_{m} are associated with specific substrate-enzyme interactions. For the binding process relevant to the kinetic parameter, only one binding partner (here the enzyme) is modeled. The ligand is thus assumed to be the same or similar for all the protein structures compared. The differences in kinetic parameters are assumed to be determined by differences in the protein MIFs, and these differences are calculated in a region around the active site of the enzyme. Thus, qPIPSA requires no prior knowledge of the reaction mechanism. However, information about the location of the active site or substrate or ligand binding residues is used to define the region over which to compare MIFs. qPIPSA can be used to estimate missing kinetic parameters, to check the consistency of different measurements, and to investigate the mechanistic determinants of kinetic parameters. As such, it is a useful tool for biochemistry studies in general and for biochemical network simulation and comparative systems biology in particular.

In this paper, we first outline the qPIPSA approach. Details are given in the Methods section. Results are described for four illustrative and experimentally well-characterized enzymes. The criteria for choosing these enzymes were the availability of enough experimental kinetic and structural data for validation of the methodology, and a reasonable diversity in enzyme type. First, we present an analysis of acetylcholinesterase (AChE) mutants. This shows that not only inhibitor association rates but also substrate K_{m} and k_{cat}/K_{m} parameters correlate remarkably well with the electrostatic potential differences near the active site of the enzyme. We then analyze superoxide dismutase (SOD) and show that the ionic strength and pH dependence of the rate constants for superoxide can be explained by the changes in the electrostatic potential. Next, we examine triose phosphate isomerase (TPI) enzymes from 12 different species. For TPIs, the electrostatic potential differences are found to be good descriptors for the cross-species variation in kinetic parameters k_{cat}/K_{m} and K_{m}. In the fourth case, for 10 class I fructose-1,6-bisphosphate aldolases (FBA) the correlations are less obvious, but detectable. It appears that the conformation of the C-terminal region of FBA is critical for a description of the kinetics of this enzyme. Finally, we examine different protein structural modeling procedures and how to choose the comparison region for interaction fields, showing that this relates to enzyme mechanism.

### Theory

We postulate that the ratio of the kinetic parameters

*k*
_{
a
}and

*k*
_{
b
}of a pair of enzymes,

*a* and

*b*, correlates with the average differences in their molecular interaction fields (MIFs), Φ

_{
a
}and Φ

_{
b
}:

where **R** is the region selected for comparison and considered important for the kinetic parameter *k*. The differences in MIF, Φ, are summed over the grid points in the "skins" (see Methods section for a definition) around the proteins within a distance R from a specified point, defining the region **R** and divided by the number of points in the overlapping skins to obtain a size-independent measure.

When applied to a set of proteins, the correlation (1) is required for all pairs with known kinetic parameters, and the correlation factor

*α* is derived by minimizing the function

The relative percentage correlation error is defined using the RMSD of the left-hand side of (1) from the right-hand side and it is given by formula (3), with

*α* minimizing (2):

The absolute error may be defined as:

It gives an average deviation of the fit in ln units, but is less useful than expression (3) when comparing different cases with different degrees of kinetic parameter deviation. It is, for example, small in the absence of correlation when kinetic constants differ insignificantly in a set of proteins compared. We also used the Pearson correlation coefficient (R-coefficient) to estimate the degree of overall correlation between two sets of parameters x and y:

with <> being an average over all possible values of x or y.

Predictions of kinetic parameters are carried out by first deriving the parameter *α* for all pairs with known kinetic parameters and then averaging predictions for any unknown case from pairwise comparisons to all proteins with known kinetic parameters. The minimum required number of known parameters equals two in this approach.

When Φ is the molecular electrostatic potential, a physically meaningful estimate for the parameter *α* is Φ is -*q*/k_{B}T, where *q* is the net charge of the substrate, k_{B} is the Boltzmann constant and T is the temperature. This corresponds to the kinetic parameter being determined by the interaction energy of the substrate charge *q* with the electrostatic potential Φ of the enzyme: *k* ~ *exp*(-*q*·Φ/k_{B}T).

We also tested other possibilities for defining correlations, including

For a single point in the region **R**, formulas (1) and (6) give the same results. When there are many points in the comparison region **R**, formula (6) will enhance the contributions from large and positive (+ sign) or large and negative (- sign) potentials. In some cases, formula (6) gives a better description of interaction field – kinetic parameter correlations.

Similarity indices, e.g. the Hodgkin index [18]

, may also be used to compare electrostatic potentials [13]. Similarity indices can assist in assigning a kinetic parameter to a protein when experimental values are available for several related proteins by enabling the most similar protein to be found and hence the kinetic parameter to be estimated [19]. For a more quantitative analysis, a pairwise distance matrix with
as a distance measure can be constructed. We have previously shown how such a distance matrix can be used to compute relative electron transfer rates between plastocyanin and cytochrome f for a set of plastocyanin mutants based on the known electron transfer rate for wild-type plastocyanin [20]. The formulations based on similarity indices however suffer from loss of the sign information present in equation (1) and therefore do not allow a correlation to be made in a fully automated fashion. Equation (1) provides the most direct model in terms of the physical determinants of the quantities computed and has therefore been used in this manuscript. The correlations obtained with equation (1) are mostly of similar or better quality than those using similarity indices or distance matrices based on similarity indices.

The average Boltzmann factor of the ligand-protein interaction energy when the ligand is near or in the active site [21] is another possible descriptor of kinetic parameters. However, as the average Boltzmann factor is calculated from the interaction energy rather than the interaction field, it requires knowledge of the position and charge distribution of the ligand interacting with the enzyme. The use of MIFs alone in qPIPSA is less demanding in terms of modeling but can nevertheless be applied to comparing different enzymes.