- Methodology article
- Open Access
Piecewise multivariate modelling of sequential metabolic profiling data
© Rantalainen et al; licensee BioMed Central Ltd. 2008
- Received: 04 August 2007
- Accepted: 19 February 2008
- Published: 19 February 2008
Modelling the time-related behaviour of biological systems is essential for understanding their dynamic responses to perturbations. In metabolic profiling studies, the sampling rate and number of sampling points are often restricted due to experimental and biological constraints.
A supervised multivariate modelling approach with the objective to model the time-related variation in the data for short and sparsely sampled time-series is described. A set of piecewise Orthogonal Projections to Latent Structures (OPLS) models are estimated, describing changes between successive time points. The individual OPLS models are linear, but the piecewise combination of several models accommodates modelling and prediction of changes which are non-linear with respect to the time course. We demonstrate the method on both simulated and metabolic profiling data, illustrating how time related changes are successfully modelled and predicted.
The proposed method is effective for modelling and prediction of short and multivariate time series data. A key advantage of the method is model transparency, allowing easy interpretation of time-related variation in the data. The method provides a competitive complement to commonly applied multivariate methods such as OPLS and Principal Component Analysis (PCA) for modelling and analysis of short time-series data.
- Partial Little Square
- Partial Little Square Model
- Principal Component Analysis Component
- Partial Little Square Component
- Time Related Change
Metabolic profiling, (also referred to as metabonomics  or metabolomics ) is a rapidly developing field in which the levels of hundreds to thousands of low molecular weight metabolites are simultaneously profiled in biofluids, cells and tissues. The methodology is well-established for characterizing disease states, toxicity and differences in physiological condition [1, 3–7, 7–10] and for extracting metabolite patterns associated with these conditions. In many experiments the biological system is followed over time, generating a multivariate metabolic time course. For example, staging of a disease process may be more important than merely determining its presence or absence. The ability to accurately define and predict disease stage also has obvious application in assessment of response to therapeutic intervention. Ideally such time-series data should be well sampled in the time domain and have an adequate statistical experimental design, which is an essential component for the outcome of the study and quality of data . However, for practical reasons collection of an optimal dataset is not always possible.
In the case of multivariate time-series data with low sample rate, many classical statistical methods are not appropriate for analysis and characterization of the time related variance due to the low number of time-points and in some cases inability to handle multivariate data. Principal Component Analysis (PCA)  has previously been applied for the analysis of time-series data in metabonomics [13–17], allowing visualization of the main time-related patterns of variation. PCA has the objective of describing the main variance in the data in a low-dimensional subspace spanned by a few linear components. Since PCA does not explicitly model time-related variation, it does not provide an optimal representation of time-related data. In addition, the PCA model usually has more than one PCA component, making subtle changes described by multiple components hard to interpret. Partial Least Squares (PLS) regression  with the time as regressand has also been used for analysis of time-series metabonomic data [19, 14, 15, 20], but the assumption of a linear relation between descriptor variables and time is valid only under some specific circumstances, but not in the general case. The PARAFAC model [21, 22] provides a generalisation of PCA to data matrices of higher dimension. In this case, the three-way structure of the data consists of [Animal × Time × Variables]. The reason why 3-way methods are not used in this study is because the NMR spectral profile (over all animals) do not preserve neither the rank, nor the spectral profile over all time points, which violates the 3-way method assumption of tri-linearity. Smilde et al.  described a generalization of the ANOVA approach to the multivariate case for data generated from an experimental design, labelled as ANOVA-Simultaneous Component Analysis (ASCA), with application to time-series data. However, in ASCA the time related effects are assumed to be linear in relation to time, which is rarely a valid assumption, neither does the ASCA method providing a predictive model. For short and univariate time-series, piecewise linear modelling methods can be used to describe progression over a time-series, which is similar in some aspects to the method proposed here for the multivariate case. Other statistical methods applied for the analysis and modelling of time-series data in omics biology include Clustering , Dynamic Bayesian Networks  and Batch Statistical Process Control [14, 26, 27]. Applications of time-series analysis have been described by Trygg and Lundstedt  in a review of chemometric techniques applied in metabonomics, and some of the current issues with regard to analysis of time-series gene expression data were reviewed by Bar-Joseph .
Here a new method for piecewise multivariate modelling of time-series spectroscopically generated metabolic data is proposed, which can be used for characterization and modelling of short (less than 20 time points) and sparsely sampled (sampling frequency is low relative the time-scale of the events studied) time-series data of high dimension. The method is well suited for analysis of spectral metabolite profiles where variables are intrinsically multicolinear, but is also generally applicable to other types of omics data. The suggested method also provides descriptive information, enables visualization and establishes a predictive model based on time-related variance, putting focus on effects seen between local time-points. The proposed method is based on multivariate piecewise models, where each sub-model describes changes occurring between neighbouring time points in a series of time frames over the time course, here the piecewise model is an Orthogonal Projections to Latent Structures  (OPLS) model. Overall, the set of sub-models describe the time-related changes over the full time also encompassing the modelling of non-linear changes in relation to time. Visualization of the piecewise multivariate model can be accomplished by investigation of sub-models separately, cumulatively over all time frames and as a time-trajectory. One can interpret the local changes as the rate of metabolic change in the time course. This aspect of explicitly investigating the multivariate characteristics of change, together with the magnitude of change over time in a biological system, has not been explored previously to the knowledge of the authors. In addition we show how this approach can be used for prediction of the time-point along a time-series, based upon measured metabolic profiles. Prediction of time-point by the model could be used for monitoring disease stage over time as well as for evaluation of the efficacy of an intervention, e.g. by assessing change in predicted disease stage after an intervention.
The paper is organized as follows. A brief introduction to the OPLS method is given, followed by a detailed description of then proposed piecewise multivariate modelling of sequential data. Finally the method is demonstrated on both simulated data and metabolic profiling data and results are compared to results from PCA analysis as well as linear OPLS regression modelling.
With the objective of describing the time-related variance in the data, a set of multivariate piecewise models is estimated, describing the transitions between metabolic states in neighbouring time points, using the OPLS algorithm. Each model establishes a function for the transition between two time points will be called a sub-model in the following sections. A distinction is made between the piecewise approach, consisting of a set of OPLS sub-models, and the OPLS regression approach where the descriptor matrix is regressed against the time using all time points in a common model, thus, assuming a linear relationship between the data and time. Let the matrix X [N × K] (for N observations and K descriptor variables) represent the matrix of descriptor variables, where each observation, e.g. a metabolic profile, is a row-vector of X. The data vector for time-point i for individual n is denoted by xn,i. Y is the response matrix [N × M] (for N observations and M response variables). T represents the total number of time-points, resulting in T-1 sub-models. Throughout the paper matrices are represented by bold uppercase letter, vectors bold lower-case, scalars are represented as italics, p(·) represents a probability density and tr(·) is the trace function.
PLS and OPLS methods
Partial Least Squares regression (PLS)  has been used successfully for estimation of multivariate regression and discriminant models in many applications, especially in cases when descriptor variables are multicollinear and noisy, and when the number of variables exceeds the number of observations, which is common for e.g. spectroscopic and other omics data. For data with systematic variation, which is orthogonal to the regressand, the number of PLS components required for an optimal predictive model normally exceeds the rank of the Y-matrix. In such cases, the Orthogonal Projections to Latent Structures (OPLS) method , which has an integrated Orthogonal Signal Correction filter [30–33] specifically designed for PLS, will benefit the analysis. This allows the estimation of an optimal model (in the predictive sense) with a single predictive component for the single Y-variable case, contrary to the PLS model which may have several components if structured Y-orthogonal noise is present in data. This property of the OPLS algorithm, guaranteeing a single predictive component for the single Y-variable case, is utilized in the method described here. It confers an advantage compared to other similar multivariate projection methods, in terms of clearer interpretation of the model and enabling a straightforward extension to the piecewise model described here. The simplicity of interpretation is due to the separate modelling of correlated components and Y-orthogonal components in the OPLS model.
Estimation of piecewise sub-models
Estimation of a multivariate sub-model between time point i and i+1 can be treated as a discriminant analysis problem between two time points, describing the time (Y) as a function of the descriptor matrix (X). Let the Xi [Ni × K] matrix consist of training data from time t = i and t = i+1 with N i observations, and let the Yi [Ni × 1] matrix to be a dummy matrix of zeros and ones, indicating which observation belongs to time point t = i and t = i+1 respectively.
The OPLS algorithm decomposes Xi into a predictive weight vector, wp,i [K × 1], describing the direction in the K-dimensional space between the two time points (i and i+1), and a predictive score vector, tp,i [Ni × 1], representing the orthogonal projection of X onto wp,i (Equation 1). If Y-orthogonal variance is present in the data, the optimal predictive PLS model would include more than one PLS component, which in the OPLS model is equivalent to the estimation of additional Y-orthogonal components in addition to the predictive component in the model (Equation 1). This results in the guarantee of a single predictive component w p,i , describing the discriminative (locally time related) direction in Xi, and A o Y-orthogonal components with loading matrix Po,i [K × Ao] and score matrix To,i [Ni × Ao], describing the systematic Y-orthogonal variation present in the data, if any.
Xi = tp,i w T p,i + To,i P T o,i + Ei (1)
The Y-orthogonal variance may be analyzed further, either separately or together with the X residuals (Ei) to understand the variance patterns present in the data that are not time-related, which may provide information of systematic instrumentation errors or biological variation not directly related to time but which may still be of value.
In Equation 1, wp,i may be interpreted as the direction of change in the 'metabolic space' in the local time frame, describing the transition between two neighbouring time points, i and i+1. wp,i may also be interpreted as an approximation to the derivative of the time dependent function of the metabolic state. wp,i only describes a direction but does not contain any information quantifying the magnitude of the change in each time frame. An intuitive measurement of the magnitude of change would be the Euclidean norm of the predictive score vector ||tp,i||. However the Euclidean norm may be affected by even moderate outliers and is therefore not an optimal choice. Instead we use the median distance in the score space as the metric for the magnitude of change (Equation 2).
di = |median(tp,i) - median(tp,i+1)| (2)
wdist,i = wp,i di (3)
wdist,i is then defined as wp,i weighted by a scalar (d i ) defining the magnitude of the change in the local time frame i (Equation 3), thus incorporating the information about the direction as well as magnitude of change. wdist will be referred to as the magnitude weights and may be used as a way of describing and visualizing the profile of time-related change in any given sub-model. wdist is also comparable in magnitude between the different sub-models, contrary to wp, which is scaled to unit norm.
Interpretation of time related changes in model
By applying elementary vector algebra we also define the cumulative wdist, wdist,cum, which represents the total time related changes as described by the sub-models between t = 1 and t = T (Equation 4). This provides useful information for interpretation and visualization of the time related changes described by the sub-models over the whole time-series.
wdist,cum,i= wdist,t=1+ wdist,t=2+ ... + wdist,t=i, t = 1...i (4)
wdist,cum provides information on the overall change from a given reference point (e.g. t = 1). This enables us to not only track the changes in the local time frames, but also to depict the accumulated change over the time course. wdist,cum may prove to be useful for investigations of systems where there is a change occurring from a homeostatic state or when studying the recovery over time after a perturbation to establish whether the system returns either to the biological state prior to the perturbation, or alternatively, to a new state. A return back to the original biological state would in result in a wdist.cum vector close to a vector of zeros. For visualization purposes, and to summarize the changes described by wdist.cum vectors, PCA may be applied on the Wdist,cum matrix to visualize the major patterns of time-related variation in a low dimensional subspace, describing the main changes in the time-series. The low dimensional representation of the time points provides an overview of the relationship and similarity between the temporal states, or stages, rather than maximizing the amount of modelled variation in the original data, hence providing a less noisy visualization of the time-related variation in the data compared to a conventional PCA trajectory.
Prediction of time point
Time predictions for new observations are carried out in two steps. First the sub-model that best fits the new observation is established and within this sub-model a more detailed time prediction is then made. The decision of which sub-mode fits best the test-set observation is determined by two likelihoods. The first is pT2(tp,test|mi), the likelihood for a test-set observation (represented by the predicted score, tp,test) to fit to the sub-model (with set of parameters mi), based on the score (tp), where the likelihood is based upon Hotellings T2 statistic estimated from the training data. The second is based upon analysis of the model residual vector (e). Using the distribution of the residuals from the training set we can calculate the Q-statistics , or alternatively DmodX , which shows similar characteristics. The Q-statistic (Equation 5), is based upon the sums of squares of the residuals, which is used to estimate the likelihood pQ(e|mi) for the test-set observation based on the Q-statistic from the training data. Q-statistics for residual analysis were described by Jackson [36, 37]. Equations 5–8 describe how the parameter c, which follows an approximate N(0,1) distribution, can be calculated [36, 37], leading us to the calculation of PQ(e|mi) (Equation 9). In Equation 5 etest represents the residual vector for a test-set observation. In equation 6, Σ E i is the covariance matrix of E i , which is the residual matrix for the training data for sub-model i. In equation 9 c' represents an instance of c as calculated in Equation 8 for a specific test-set observation.
Q test = e test T e test (5)
The simulated data set
PRINCIPAL COMPONENT ANALYSIS
PIECEWISE MULTIVARIATE MODELLING
The mercury II chloride data set
PRINCIPAL COMPONENT ANALYSIS
PIECEWISE MULTIVARIATE MODELLING
To provide an overview of the time-related changes, PCA was used to visualize Wdist.cum as a trajectory (Figure 9C). The main information provided by this plot is for assessment of whether the perturbed system has returned to its starting point, and to provide a visualization of the overall time-events. In Figure 9C we can see a trend towards the metabolite profile returning to a state close to that prior to the administration of HgCl2. However, over the study duration the recovery was not complete, which may either be due to presence of irreversible injury inflicted by the toxin or that the time span over which the study was carried out was too short to allow observation of a full recovery.
A R package  implementing the method described is available upon request from the corresponding author.
We have presented a framework for analysis of short time-series multivariate data, typical of post-genomic (omic) biology, using piecewise multivariate modelling and we demonstrated the method on metabolic profiling data. However, this approach is applicable to other types of omics data as well. The proposed method facilitates a transparent model allowing straightforward interpretation of time related variation in the data over the time course. Prediction of the time-point for a new sample is possible. The method has applications in areas such as monitoring and prediction of disease progression or the effects of a perturbation over time, allowing for evaluation of different types of interventions. The piecewise approach makes no assumption of linear relationship between the data and time and is therefore ideal for the analysis of non-linear time-related events in a biological system, as exemplified in the analysis of the simulated and the exemplar HgCl2 nephrotoxic data sets.
In comparison to PCA, the proposed method provides a more detailed picture of the time-related events including small and local changes in the time domain. In addition, the predictive properties of the proposed method can be utilised for prediction of different stages of time-dependent biological events, such as disease or a toxic perturbation studied over time. In comparison to linear multivariate regression methods (e.g. PLS and OPLS), using the time as a response variable, the piecewise multivariate approach also models non-linear time related variation. This renders a model framework describing additional time-related variation with the potential to improve prediction and interpretation if non-linear variation is dominant, in this way providing a complement to both PCA and OPLS regression against time. PCA provides an overview of the main variation in the data, while OPLS regression against time models monotonic increasing or decreasing signals over the time course. The piecewise OPLS approach provides detailed information of time related effects seen locally over the time-course as well as non-linear time-related variation.
The proposed method does not exploit autocorrelation structures in the time-series and does not providing a tool for forecasting, as do methods like Auto-Regressive Moving Average (ARMA) . One reason why ARMA cannot be applied successfully to the type of data described here is the restricted number of time-points available. Non-linear modelling approaches, e.g. Artificial Neural Networks or kernel based regression methods such as Kernel PLS , have properties that in some cases provide models with better prediction results, however, these models are often hard or impossible to interpret in relation to the descriptor variables. For example, in the case of the mercury II chloride data set, a Kernel PLS model provided only a marginally better prediction result, with a RMSECV value of 20.0% (using a Gaussian kernel function with sigma = 0.016 and three Kernel PLS components), compared to the piecewise multivariate model. However, in many biological applications of predictive modelling it is essential to have access to transparent models that allow interpretation, rendering the proposed method beneficial compared to less transparent alternatives. Another possible limitation of the proposed method is to handle time-series samples that are severely unsynchronized, e.g. high variability in response time between animals after a perturbation.
A new perspective on the time dynamic data was achieved by placing the focus on the analysis and comparison of data based on the changes over the time-series, i.e. the derivative of the time dependent function. This approach provides new insights into the dynamics of the biological system, which may otherwise have been overlooked. The method could also provide the basis for fast and large-scale comparison of biological responses studied over time due to different types of perturbations. This can be accomplished by means of comparison of the piecewise weights between sub-models, which can be seen as a multivariate representation of the time-dependent events taking place in the biological system.
The common case of data sampled in a synchronized fashion has mainly been discussed in this paper. The method could easily be modified to handle cases where individuals in the training set are not sampled at the same time points, but where the sampling time is known. In this case the sub-model will be a regression model instead of a discriminant model. In such cases, the boundaries of the local time frames will be chosen so that they are sufficiently local and overlapping in the time domain. If changes between subsequent time points are very small and noisy, while the number of time-points is not limiting, the same approach can be applied by treating some neighbouring time points together in a common time-frame when estimating sub-models.
Given short time-series data of high dimensionality, the proposed multivariate piecewise approach provides more detailed information compared to other commonly applied multivariate methods for analysis of post-genomic data. The temporal resolution for interpretation of the model is enhanced in the sense that it is easier to conclude which changes occur over time and when they occur, improving the interpretation of the data and providing a tool for the understanding of the biological system. The method also allows time predictions, which is an important feature in many biological and clinical applications, where time may represent e.g. disease stage and interventions are evaluated in relation to the disease stage.
This work was supported by grants from the METAGRAD Project (supported by AstraZeneca and Unilever) (MR), Wellcome Trust Functional Genomics Initiative BAIR (Biological Atlas of Insulin Resistance) (066786) (OC), the Swedish Foundation for Strategic Research (JT), The Swedish Research Council (JT), and The Knut and Alice Wallenberg Foundation(JT). We thank Vincent Rouilly and Martin Hemberg for valuable discussions. Disclosure Statement: OPLS is patented technology of MKS Umetrics, [US6754543, US6853923]. OPLS and OPLS-DA are registered trademarks of MKS Umetrics.
- Nicholson JK, Connelly J, Lindon JC, Holmes E: Metabonomics: a platform for studying drug toxicity and gene function. Nat Rev Drug Discov 2002, 1(2):153–161. 10.1038/nrd728View ArticlePubMedGoogle Scholar
- Fiehn O: Metabolomics--the link between genotypes and phenotypes. Plant Mol Biol 2002, 48(1–2):155–171. 10.1023/A:1013713905833View ArticlePubMedGoogle Scholar
- Holmes E, Tsang TM, Huang JT, Leweke FM, Koethe D, Gerth CW, Nolden BM, Gross S, Schreiber D, Nicholson JK, Bahn S: Metabolic Profiling of CSF: Evidence That Early Intervention May Impact on Disease Progression and Outcome in Schizophrenia. PLoS Med 2006., 3(8):Google Scholar
- Dunne VG, Bhattachayya S, Besser M, Rae C, Griffin JL: Metabolites from cerebrospinal fluid in aneurysmal subarachnoid haemorrhage correlate with vasospasm and clinical outcome: a pattern-recognition 1H NMR study. NMR Biomed 2005, 18(1):24–33. 10.1002/nbm.918View ArticlePubMedGoogle Scholar
- Odunsi K, Wollman RM, Ambrosone CB, Hutson A, McCann SE, Tammela J, Geisler JP, Miller G, Sellers T, Cliby W, Qian F, Keitz B, Intengan M, Lele S, Alderfer JL: Detection of epithelial ovarian cancer using 1H-NMR-based metabonomics. Int J Cancer 2005, 113(5):782–788. 10.1002/ijc.20651View ArticlePubMedGoogle Scholar
- Anthony ML, Beddell CR, Lindon JC, Nicholson JK: Studies on the comparative toxicity of S-(1,2-dichlorovinyl)-L-cysteine, S-(1,2-dichlorovinyl)-L-homocysteine and 1,1,2-trichloro-3,3,3-trifluoro-1-propene in the Fischer 344 rat. Arch Toxicol 1994, 69(2):99–110. 10.1007/s002040050144View ArticlePubMedGoogle Scholar
- Nicholls AW, Holmes E, Lindon JC, Shockcor JP, Farrant RD, Haselden JN, Damment SJ, Waterfield CJ, Nicholson JK: Metabonomic investigations into hydrazine toxicity in the rat. Chem Res Toxicol 2001, 14(8):975–987. 10.1021/tx000231jView ArticlePubMedGoogle Scholar
- Bollard ME, Stanley EG, Lindon JC, Nicholson JK, Holmes E: NMR-based metabonomic approaches for evaluating physiological influences on biofluid composition. NMR Biomed 2005, 18(3):143–162. 10.1002/nbm.935View ArticlePubMedGoogle Scholar
- Fiehn O, Kopka J, Dormann P, Altmann T, Trethewey RN, Willmitzer L: Metabolite profiling for plant functional genomics. Nat Biotechnol 2000, 18(11):1157–1161. 10.1038/81137View ArticlePubMedGoogle Scholar
- Ebbels T, Keun H, Beckonert O, Antti H, Bollard M, Holmes E, Lindon J, Nicholson J: Toxicity classification from metabonomic data using a density superposition approach: `CLOUDS'. Analytica Chimica Acta 2003, 490(1–2):109–122. 10.1016/S0003-2670(03)00121-1View ArticleGoogle Scholar
- Trygg J, Holmes E, Lundstedt T: Chemometrics in metabonomics. J Proteome Res 2007, 6(2):469–479. 10.1021/pr060594qView ArticlePubMedGoogle Scholar
- Pearson K: On lines and planes of closest fit to systems of points in space. Phil Mag 1901, 559–572.Google Scholar
- Azmi J, Griffin JL, Shore RF, Holmes E, Nicholson JK: Chemometric analysis of biofluids following toxicant induced hepatotoxicity: a metabonomic approach to distinguish the effects of 1-naphthylisothiocyanate from its products. Xenobiotica 2005, 35(8):839–852. 10.1080/00498250500297940View ArticlePubMedGoogle Scholar
- Holmes E, Antti H: Chemometric contributions to the evolution of metabonomics: mathematical solutions to characterising and interpreting complex biological NMR spectra. Analyst 2002, 127(12):1549–1557. 10.1039/b208254nView ArticlePubMedGoogle Scholar
- Williams RE, Lenz EM, Lowden JS, Rantalainen M, Wilson ID: The metabonomics of aging and development in the rat: an investigation into the effect of age on the profile of endogenous metabolites in the urine of male rats using 1H NMR and HPLC-TOF MS. Mol Biosyst 2005, 1(2):166–175. 10.1039/b500852bView ArticlePubMedGoogle Scholar
- Holmes E, Bonner FW, Sweatman BC, Lindon JC, Beddell CR, Rahr E, Nicholson JK: Nuclear magnetic resonance spectroscopy and pattern recognition analysis of the biochemical processes associated with the progression of and recovery from nephrotoxic lesions in the rat induced by mercury(II) chloride and 2-bromoethanamine. Mol Pharmacol 1992, 42(5):922–930.PubMedGoogle Scholar
- Beckwith-Hall BM, Nicholson JK, Nicholls AW, Foxall PJ, Lindon JC, Connor SC, Abdi M, Connelly J, Holmes E: Nuclear magnetic resonance spectroscopic and principal components analysis investigations into biochemical effects of three model hepatotoxins. Chem Res Toxicol 1998, 11(4):260–272. 10.1021/tx9700679View ArticlePubMedGoogle Scholar
- Wold S, Ruhe A, Wold H, Dunn WJIII: The collinearity problem in linear regression. The partial least squares approach to generalized inverses. SIAM J Sci Stat Comput 1984, 5: 735–743. 10.1137/0905052View ArticleGoogle Scholar
- Azmi J, Griffin JL, Antti H, Shore RF, Johansson E, Nicholson JK, Holmes E: Metabolic trajectory characterisation of xenobiotic-induced hepatotoxic lesions using statistical batch processing of NMR data. Analyst 2002, 127(2):271–276. 10.1039/b109430kView ArticlePubMedGoogle Scholar
- Williams RE, Lenz EM, Rantalainen M, Wilson ID: The comparative metabonomics of age-related changes in the urinary composition of male Wistar-derived and Zucker (fa/fa) obese rats. Mol Biosyst 2006, 2(3–4):193–202. 10.1039/b517195dView ArticlePubMedGoogle Scholar
- Harshman RA: Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-modal factor analysis. UCLA Working Papers in Phonetics 1970, 16: 1--84.Google Scholar
- Bro R: Parafac. Tutorial and Applications. Chemometr Intell Lab Chemometr Intell Lab 1997, 38(2):149–171. 10.1016/S0169-7439(97)00032-4View ArticleGoogle Scholar
- Smilde AK, Jansen JJ, Hoefsloot HC, Lamers RJ, van der Greef J, Timmerman ME: ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data. Bioinformatics 2005, 21(13):3043–3048. 10.1093/bioinformatics/bti476View ArticlePubMedGoogle Scholar
- Luan Y, Li H: Clustering of time-course gene expression data using a mixed-effects model with B-splines. Bioinformatics 2003, 19(4):474–482. 10.1093/bioinformatics/btg014View ArticlePubMedGoogle Scholar
- Kim SY, Imoto S, Miyano S: Inferring gene networks from time series microarray data using dynamic Bayesian networks. Brief Bioinform 2003, 4(3):228–235. 10.1093/bib/4.3.228View ArticlePubMedGoogle Scholar
- Trygg J, Lundstedt T: Chemometrics Techniques for Metabonomics. In The Handbook of Metabonomics and Metabolomics. Edited by: Lindon JCNJKHE. Elsevier; 2006.Google Scholar
- Wold S, Kettaneh N, Friden H, Holmberg A: Modelling and diagnostics of batch processes and analogous kinetic experiments. Chemometrics and Intelligent Laboratory Systems 1998, 44(1–2):331–340. 10.1016/S0169-7439(98)00162-2View ArticleGoogle Scholar
- Bar-Joseph Z: Analyzing time series gene expression data. Bioinformatics 2004, 20(16):2493–2503. 10.1093/bioinformatics/bth283View ArticlePubMedGoogle Scholar
- Trygg J, Wold S: Orthogonal projections to latent structures (O-PLS). Journal of Chemometrics 2002, 16(3):119–128. 10.1002/cem.695View ArticleGoogle Scholar
- Wold S, Antti H, Lindgren F, Öhman J: Orthogonal signal correction of near-infrared spectra. Chemometr Intell Lab Chemometr Intell Lab 1998, 44(1–2):175–185. 10.1016/S0169-7439(98)00109-9View ArticleGoogle Scholar
- Fearn T: On orthogonal signal correction. Chemometrics and Intelligent Laboratory Systems 2000, 50(1):47–52. 10.1016/S0169-7439(99)00045-3View ArticleGoogle Scholar
- Höskuldsson A: Variable and subset selection in PLS regression. Chemometrics and Intelligent Laboratory Systems 2001, 55(1–2):23–38. 10.1016/S0169-7439(00)00113-1View ArticleGoogle Scholar
- Sjöblom J, Svensson O, Josefson M, Kullberg H, Wold S: An evaluation of orthogonal signal correction applied to calibration transfer of near infrared spectra. Chemometrics and Intelligent Laboratory Systems 1998, 44(1–2):229–244. 10.1016/S0169-7439(98)00112-9View ArticleGoogle Scholar
- Jackson JE: An Application of Multivariate Quality Control to Photographic Processing. Journal of the American Statistical Association 1957, 52(278):186. 10.2307/2280844View ArticleGoogle Scholar
- Wold S: Pattern-Recognition by Means of Disjoint Principal Components Models. Pattern Recognition 1976, 8(3):127–139. 10.1016/0031-3203(76)90014-5View ArticleGoogle Scholar
- Jackson JE: A user's guide to principal components. In Wiley series in probability and statistics. Hoboken, N.J., Wiley; 2003:xvii, 569.Google Scholar
- Jackson JE, Mudholkar GS: Control Procedures for Residuals Associated with Principal Component Analysis. Technometrics 1979, 21(3):341–349. 10.2307/1267757View ArticleGoogle Scholar
- Holmes E, Cloarec O, Nicholson JK: Probing latent biomarker signatures and in vivo pathway activity in experimental disease states via statistical total correlation spectroscopy (STOCSY) of biofluids: application to HgCl2 toxicity. J Proteome Res 2006, 5(6):1313–1320. 10.1021/pr050399wView ArticlePubMedGoogle Scholar
- The R project for statistical computing[http://www.r-project.org/]
- Box J: Time Series Analysis: Forecasting and Control. Holden-Day; 1976.Google Scholar
- Rosipal R, Trejo LJ: Kernel partial least squares regression in Reproducing Kernel Hilbert Space. J Mach Learn Res 2002, 2(2):97–123. 10.1162/15324430260185556Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.