- Research article
- Open Access
Automated smoother for the numerical decoupling of dynamics models
© Vilela et al; licensee BioMed Central Ltd. 2007
- Received: 09 April 2007
- Accepted: 21 August 2007
- Published: 21 August 2007
Structure identification of dynamic models for complex biological systems is the cornerstone of their reverse engineering. Biochemical Systems Theory (BST) offers a particularly convenient solution because its parameters are kinetic-order coefficients which directly identify the topology of the underlying network of processes. We have previously proposed a numerical decoupling procedure that allows the identification of multivariate dynamic models of complex biological processes. While described here within the context of BST, this procedure has a general applicability to signal extraction. Our original implementation relied on artificial neural networks (ANN), which caused slight, undesirable bias during the smoothing of the time courses. As an alternative, we propose here an adaptation of the Whittaker's smoother and demonstrate its role within a robust, fully automated structure identification procedure.
In this report we propose a robust, fully automated solution for signal extraction from time series, which is the prerequisite for the efficient reverse engineering of biological systems models. The Whittaker's smoother is reformulated within the context of information theory and extended by the development of adaptive signal segmentation to account for heterogeneous noise structures. The resulting procedure can be used on arbitrary time series with a nonstationary noise process; it is illustrated here with metabolic profiles obtained from in-vivo NMR experiments. The smoothed solution that is free of parametric bias permits differentiation, which is crucial for the numerical decoupling of systems of differential equations.
The method is applicable in signal extraction from time series with nonstationary noise structure and can be applied in the numerical decoupling of system of differential equations into algebraic equations, and thus constitutes a rather general tool for the reverse engineering of mechanistic model descriptions from multivariate experimental time series.
- Artificial Neural Network Model
- Reverse Engineering
- Kernel Size
- Signal Extraction
- Noise Structure
The reverse engineering of biological systems from experimental data often cannot be achieved on first principles. This is as much a reflection of the lack of plausible hypotheses as it is an indication of excessive parametric sensitivity when alleged mechanistic formulations are at hand. Consequently, there is a critical need for a description of dynamic behaviors that can be used as a machine learning tool (e.g., as a generic "black box"), but with parameters capable of shedding light on the topology of the underlying mechanisms. Biochemical Systems Theory [1–3], offers such formalism, especially in the form of S-systems, where kinetic-order coefficients characterize the topology of a biological network as well as the magnitude of each interaction. A drawback of this approach is that the parameterization of S-systems is a difficult problem, even for five metabolic species . In a previous report , we proposed to overcome this challenge by tracing each species independently with a universal function of time, x(t) = f(t), such that dx/dt = g(t), and where g(t) = df(t)/dt is deduced symbolically from the neural network transfer function; see also . The independency of each metabolic profile allows solving the S-system linearization problem by decoupling it, which reduces the computational effort in the system parameters identification by preventing numerical integration. In the earlier report we suggested using artificial neural networks (ANN) with optimized topology and early stopping procedures . The distinctive advantage of using an ANN is that it is a universal function  with a closed form for which we were able to determine the first derivative symbolically . The ANN solution, however, is not without problems. The most significant issue is that its discriminant function often leads to artifacts in its derivatives, which distort the solution, even when they are not visually apparent in the smoothed signal. The artifacts reflect the sigmoidal transfer function of the ANN model. That conclusion drove the pursuit of an entirely implicit solution that is driven solely by the experimental data and is free of all parametric bias.
The task of inferring signal from noisy time series falls into the general category of developing denoising filters. In an effort extracting signal from noise in chromatograms, Paul Eilers  recently proposed a "perfect smoother", which is basically a matrix form of a much older implicit method known as Whittaker's smoother . Those works [9, 10] are the starting point for the procedure introduced here. Consideration of the denoising problem as the task of "learning" an arbitrary signal suggests the possibility of applying principles from Information Theoretic Learning [11–14], which allows signal scaling without causing bias in signal extraction. Specifically, the use of quadratic Renyi's entropy for assessing the learning process offers a foundation for the re-identification of smoothers based on Error Entropy Minimization (EEM). This procedure has been successfully applied to chaotic time series prediction and in nonlinear system identification, where the mean square error was replaced by error entropy as cost function, for instance, in the training of ANN models . In this report, we explore the use of error entropy as optimization criterion for reconfiguring the Whittaker's smoother. In complementary research, we, as well as many others, have been working on parameterization procedures for S-systems [e.g.,  and references therein], implicitly assuming that noisy time series and their slopes could efficiently be smoothed by some unidentified procedure. The algorithm reported here describes such a procedure.
The proposed signal extraction method was tested with an application to metabolic profiles. These had been measured with in vivo NMR methods at a sequence of time points and quantify glycolysis in the lactic acid producing bacterium Lactococcus lactis . The experimental data are very interesting because the underlying molecular mechanisms are relatively well understood, because of the high frequency of sampling, and because the data have a complicated time course and noise structure. They were therefore recently proposed as a reference case study for testing reverse engineering methods for biological networks . The data were included in our open source MATLAB toolbox and stand-alone GUI application, described in the Section Availability and requirements.
In this report, Rényi's second-order entropy of the cross-validation error entropy (cvEE) was used as optimization criterion for the parameters estimation in Whittaker's smoother. The optimization process is based on gradient ascent of the information potential of cvE in λ direction, where Parzen Windows with Gaussian kernel are used for the pdf estimation. In general, this type of pdf estimation faces one particular problem: the kernel size σ. This variable is crucial for convergence of the gradient method because it causes the algorithm to reach an optimal local minimal if its value is misestimated. The estimation of kernel size from the data covariance has yielded good results in some applications . In our software, the user can choose an alternative automatic method that uses a machine learning kernel  or set the value of σ manually. The effect of this parameter for specific data sets can therefore be studied by systematically screening a range of values for the particular application. Although we found the automatic setting to be generally satisfactory, this is only an empirical result, which suggests that pdf estimation warrants further investigation.
It should be noted that the reliance on cross-validation implies that time points at the edges of each window cannot be used for signal extraction. This loss at break points might appear to be a significant drawback of the filtering procedure. However, as established in prior work , the identification of decoupled systems is discontinuous in nature as it consist on the generation of pairs of (dx/dt, x) values. Therefore, it suffers only mildly from a few missing or omitted data points. More important is that multiple independent time series are available to narrow the boundary estimates for the system parameters to the point where the topology of the biological network can be inferred with reliability.
The goal of developing a "perfect smoother" that can be used as an automated tool for signal extraction has been an elusive goal in the field of signal processing. Based on historical work that started with the Whittaker's smoother and was advanced by cross-validation in Eilers' smoother, here we take that approach one step further by removing the parametric bias of using squared deviations as an optimization criterion. In its place we proposed an informational measure of variation in the form of cross-validated error entropy. The crucial step of the proposed methodology is the identification of the matrix format of the cvE (Equation 12) that permits a closed-form solution for its derivative with respect to the smoothing parameter λ (Equation 15). That solution is also used to segment the signal in windows where the consideration of the neighboring values would decrease the optimality of within-window signal extraction. The resulting algorithm is fully automated and was successfully applied to reference, notoriously difficult biological time series. Applicability to signal extraction in other areas may be anticipated.
Starting point: Whittaker's smoother
Equation 1 describes the balance of two additive components: one quantifying the smoothness Δz i of the output data and the other quantifying the fidelity of the smoothed model output to the raw data (y i - z i ). Thus, the parameterization of this smoother consists of finding an optimal weighing of the two components of Q. The solution is signal specific and involves the identification of optimal values for the order of the filter, d, and the weighing of its residuals, λ. The order d is an implicit parameter that determines the specific definition of Δz i . For example, for a first order filter (d = 1), the smoothness measure Δz i of the output data is defined as the difference between two consecutive points Δz i = (zi+1- z i ). For a 2nd order filter, d = 2, the smoothness measure is formulated as the difference between pair-wise differences among three consecutive points Δz i = (zi+2- zi+1) - (zi+1- z i ). The accumulation of differences proceeds for higher orders accordingly. The order d is therefore an integer number that sets the flexibility of the signal extracted. By contrast, the positive real scalar λ weighs the smoothness term Δz i such that high values dampen the extracted signal. Accordingly, high values of d and low values of λ tend to cause over-fitting, while the inverse combination leads to more ragged output. In cases of very high values of λ, the filter behaves like polynomial regression of degree d - 1 . For implementation purposes, the original proposition of a "perfect smoother" rewrites the Equation 1 in its matrix form , where D is a (N - d) × N matrix such that Dz = ∑Δz and |A|2 = ∑A2:Q = |y - z|2 + λ|Dz|2(2)
In the Equation 3, I represents the identity matrix of order N. The smoother equation can be written in a more general form, where the noisy time series y can have missing points. Let w be a vector of weights with the same dimension of y where for each missing point y i , w i = 0 and w i = 1 otherwise. Thus, Equation 3 can be rewritten asz = (W + λ D T D)-1Wy(4)
where H is the "hat matrix", given byH = (W + λ D T D)-1W(6)
The use of the cvE in an automatic Whittaker's smoother optimization was advised as a limited procedure . In an early implementation using cvE as cost function, we found out that the problem of automation using cvE is also associated with the scale and skewness of the residual variation. An adaptive supervised method specifically developed for the optimization of the smoother is described in this report by identifying a novel formulation that is not sensitive to signal scaling (it is therefore non-parametric), but only to its distribution. This is an important characteristic when a comparison of an error measure between windows with different signal structures is made. Symptomatically, we have observed that the exhaustion of key metabolites like glucose will cause dramatic shifts in noise structure. Accordingly, noise restructuring was targeted as the defining feature to find window boundaries by the segmentation algorithm, as if distinguishing what could be thought as distinct signals. Despite the advantages of the new formulation, the issue of the under-smoothing with correlated data remains. This problem might be addressable by weighting the cost function with the eigenvalues of the covariance matrix as described in , but is beyond the scope of this report.
. In our application, the parameters λ and d of the filter are optimized by minimizing Renyi's second-order entropy of the cross-validation error.
Minimal cross-validation error entropy
The identification of values for λ and d in the Whittaker's smoother is challenging to the point that the original report advised against automation altogether . The computational challenge is exacerbated when automation is combined with a nested estimation of the information potential, IP (Equations 8 and 9). To overcome these complications, the method proposed here minimizes error entropy instead of the typical mean square error (MSE), as it has been recommended for the extraction of information in adaptive systems . Specifically, we propose a new method for determining optimal values for λ and d in Whittaker's smoother, where the cross-validation error entropy (cvEE) is used instead of the cross-validation error (cvE). As the logarithm is a monotonic function, the minimization of Renyi's second-order entropy is equivalent to the maximization of IP . However, the optimization procedure has to be tailored to the integer nature of d, which requires a different treatment than the gradient method employed for the identification of λ. For this reason, the algorithm searches for d within a reasonable set of integer orders (between 1 and 6), and for each value of d the λ parameter is found by the gradient of IP as described in Equation 10:λi+1= λ i + η∇ λ V(e)(10)
Here, η is the learning rate and e represents the cvE vector. The adaptation of λ terminates when one of the stop criteria is satisfied; that is, if either the cvE entropy increases, or if the algorithm reaches the minimal gradient or the maximum number of epochs. The order value d and the λ value with the minimal cvEE are chosen as the optimal parameters values (see next Subsection).
Gradient of cvEE
The leave-one-out cross-validation error vector can be rewritten using the "entry-wise" Hadamard product represented by the symbol ∘ in the Equation 12.e = [y - Hy] ο [dg(I - H)]ο-1(12)
where . Substituting Equation 16 into 15 and then 15 into 11 results in the gradient of IP of cvE, and allows recursive determination of λ in Equation 10. To prevent problems caused by an amplitude shift in the signal, we found it advantageous to normalize the signal to a linear scale with the range [0, 1].
The Whittaker's smoother assumes an invariant noise structure throughout the signal . This assumption is often not valid for biological time series such as those measured in metabolic profile analyses. To overcome this problem, the proposed smoothing procedure includes a process for segmenting the time series into windows with similar noise structure.
is evaluated, where ||·|| signifies the absolute value. is the minimal cvEE found for window w, using kernel size σ w , and is the entropy of the null vector with the same kernel size σ w estimated from the points in the respective window. For each iteration, window 1 is increased by one time point and window 2 is correspondingly decreased by the same point, and the process of parameter optimization and evaluation of the cost function cf is performed again. After N-7 iterations the signal is completely scanned and the entropy information of the windows is evaluated (Figure 1). The window with minimal cost function cf and its complementary window are chosen, and the cvEE is updated for this new smoothed configuration. The signal is definitely broken into two windows if the new cvEE is lower than the current cvEE, which had been obtained before of the scanning process. The same search process for a break point (minimal cf point) repeats inside each of the new two windows. The recursion proceeds until the scanning of all windows produces cvEE values that are above the one obtained when the window was segmented. To remove the effect of the kernel size, the entropy values are considered in the cost function cf as the signal's information and referenced by the minimal possible information measured with the same space metric, the kernel size σ w . This normalization removes the bias towards small windows, which would result in extraction of the noise through over-fitting. In summary, after each scanning run, the algorithm creates two new windows for each current window if the cvEE of the entire signal is minimized, and otherwise terminates the recursive segmentation. The windows with minimal entropy match with the assumption of "constant measure of precision" made by Whittaker in the smoother equation . In the unlikely event that more than one window reaches the same minimum cvEE, which would make the cost function zero, the break point will be selected as the one where the complementary window has the minimum cvEE (Figure 1c). As the scanning process goes progressively through the signal where the two windows have one fix point (the first point for the left window and the last point for the right window), the algorithm works well only with signal that presents a gradual changes on the noise structure, no matters in which direction. One general solution could be built by moving the fix points of the windows, scanning all the possible segments of the signal. It would require a great computational power, but fortunately this solution is out of our main purpose. Most biological time series have a noise behavior addressed here that makes the tool described sufficiently useful for application on metabolomic profiles. Tests with a different synthetics signal were performed [see Additional file 1]. The segmentation algorithm together with the kernel density estimation (in the parameters optimization procedure) comes with a significant computational cost. However, this cost allows independent model identification for each metabolic in the time series. The resulting parallel parameterization allowed by an efficient numerical decoupling translates into immense net computational gains even for S-systems models (or other systems of coupled differential equations) with as few as 3 variables.
The library implementing the filter described in this report is provided both with open source (MathWorks Matlab) and as a stand alone application. The library is provided at http://autosmooth.sourceforge.net/ with free access and use, under a GNU GPL license. It can also be conveniently obtained as a module of the Bioinformatics Station resource http://bioinformaticstation.org.
This work was partially supported by the NHLBI Proteomics Initiative through contract N01-HV-28181. S.Vinga also recognizes award PTDC/EEA-ACR/69530/2006 "DynaMo – Dynamical modeling, control and optimization of metabolic networks" by Fundação para a Ciência e Tecnologia (FCT, Portugal). The first author, M. Vilela, also thanks Coordenadoria de Aperfeicoamento de Pessoal do Ensino Superior (CAPES) and Laboratório Nacional de Computação Científica (LNCC).
- Savageau MA: Biochemical systems analysis. I. Some mathematical properties of the rate law for the component enzymatic reactions. J Theor Biol. 1969, 25 (3): 365-369. 10.1016/S0022-5193(69)80026-3.View ArticlePubMedGoogle Scholar
- Savageau MA: Biochemical systems analysis. 3. Dynamic solutions using a power-law approximation. J Theor Biol. 1970, 26 (2): 215-226. 10.1016/S0022-5193(70)80013-3.View ArticlePubMedGoogle Scholar
- Voit EO: Computational analysis of biochemical systems : a practical guide for biochemists and molecular biologists. 2000, Cambridge ; New York , Cambridge University Press, xii, 531 p.,  p. of plates-Google Scholar
- Kikuchi S, Tominaga D, Arita M, Takahashi K, Tomita M: Dynamic modeling of genetic networks using genetic algorithm and S-system. Bioinformatics. 2003, 19 (5): 643-650. 10.1093/bioinformatics/btg027.View ArticlePubMedGoogle Scholar
- Voit EO, Almeida J: Decoupling dynamical systems for pathway identification from metabolic profiles. Bioinformatics. 2004, 20 (11): 1670-1681. 10.1093/bioinformatics/bth140.View ArticlePubMedGoogle Scholar
- Voit EO, Savageau MA: Power-law approach to modeling biological systems; III. Methods of analysis. J Ferment Technol. 1982, 60 (3): 233-241.Google Scholar
- Almeida JS, Voit EO: Neural-network-based parameter estimation in S-system models of biological networks. Genome Inform. 2003, 14: 114-123.PubMedGoogle Scholar
- Hornik K, Stinchcombe M, White H: Multilayer feedforward networks are universal approximators. Neural Networks. 1989, 2: 359–366-10.1016/0893-6080(89)90020-8.View ArticleGoogle Scholar
- Eilers PH: A perfect smoother. Anal Chem. 2003, 75 (14): 3631-3636. 10.1021/ac034173t.View ArticlePubMedGoogle Scholar
- Whittaker ET: On a new method of graduation. Proceedings of the Edinburgh Mathematical Society. 1923, 41: 63-75.Google Scholar
- Erdogmus D, Principe JC: An Error-Entropy Minimization Algorithm for Supervised Training of Nonlinear Adaptive Systems. IEEE Transactions on Signal Processing. 2002, 50 (7): 1780-1786. 10.1109/TSP.2002.1011217.View ArticleGoogle Scholar
- Erdogmus D, Principe JC: Comparison of Entropy and Mean Square Error Criteria in Adaptive System Training Using Higher Order Statistics: Helsinki, Finland.2000, , 75-80.Google Scholar
- Santamaria I, Pokharel PP, Principe JC: Generalized correlation function: definition, properties, and application to blind equalization. IEEE Transactions on Signal Processing. 2006, 54: 2187-2197. 10.1109/TSP.2006.872524.View ArticleGoogle Scholar
- Principe J, Xu D, Fisher J: Information-Theoretic Learning. Advances in unsupervised adaptive filtering. 1999, WileyGoogle Scholar
- Chou IC, Martens H, Voit EO: Parameter estimation in biochemical systems models with alternating regression. Theor Biol Med Model. 2006, 3 (1): 25-10.1186/1742-4682-3-25.PubMed CentralView ArticlePubMedGoogle Scholar
- Ramos A, Neves AR, Santos H: Metabolism of lactic acid bacteria studied by nuclear magnetic resonance. Antonie Van Leeuwenhoek. 2002, 82 (1-4): 249-261. 10.1023/A:1020664422633.View ArticlePubMedGoogle Scholar
- Voit EO, Almeida J, Marino S, Lall R, Goel G, Neves AR, Santos H: Regulation of glycolysis in Lactococcus lactis: an unfinished systems biological case study. Syst Biol (Stevenage). 2006, 153 (4): 286-298.View ArticleGoogle Scholar
- Silverman BW: Density Estimation for Statistics and Data Analisys. 1986, Chapman & Hall, 26:View ArticleGoogle Scholar
- Santos JM, Sa JM, Alexandre LA: Neural Networks Trained with the EEM Algorithm: Tuning the Smoothing Parameter. 2005Google Scholar
- Sardo L, Kittler J: Minimum complexity PDF estimation for correlated data. 1996, 2: 750-754 vol.2.Google Scholar
- Benveniste A, Goursat M, Ruget G: Robust identification of a nonminimum phase system: Blind adjustment of a linear equalizer in data communications. Automatic Control, IEEE Transactions on. 1980, 25 (3): 385-399. 10.1109/TAC.1980.1102343.View ArticleGoogle Scholar
- Bell AJ, Sejnowski TJ: An information-maximization approach to blind separation and blind deconvolution. Neural Comp. 1995, 7 (6): 1129-1159.View ArticleGoogle Scholar
- Kullback S, Leibler RA: On information and sufficiency. Annals of Mathematical Statistics. 1951, 22: 79-86.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.